This file is the synthesised source-read for the kwin_overlay_subsurface campaign. It opens with the Phase 1 leading question answer (per worklist), follows with the architectural diagram of the per-frame path, and ends with file-level findings ordered by the priority list in worklist.
Discipline guard: no patches are written before this file is committed. Re-scoping is documented honestly with the deferral target named.
Status: LOCKED 2026-05-02. This section is the Phase 1 deliverable. Future Phase 2 / Phase 3 measurements may modify the file-level findings below this section but must not silently move the Phase 1 answer; if reality after measurement contradicts this, the contradiction is documented as a new section, not by editing this one.
Question (from worklist):
On what condition does KWin promote awp_linux_dmabuf_v1surface to direct scanout versus falling back to GPU composite, and does the hantro NV12DRM_FORMAT_MOD_LINEARoutput satisfy those conditions on this DRM driver (rockchip-drm on RK3568, Mesa 26.0.5)?
Neither of KWin v6.6.4's two scanout-promotion paths can place the hantro NV12 LINEAR buffer on a DRM plane on this hardware in the windowed Brave case, for two distinct structural reasons. The Phase 4 design space narrows to the import-caching hypothesis only. This aligns with — does not contradict — the architect's prior from the A2 trajectory hint.
KWin v6.6.4 has two distinct paths that could in principle promote a wp_linux_dmabuf_v1 surface to scanout. Both pass through the same per-layer feasibility check (OutputLayer::importScanoutBuffer) but differ in how the candidate is chosen.
Entry: Compositor::prepareDirectScanout (src/compositor.cpp:379, prepareDirectScanout(view, logicalOutput, backendOutput, frame)).
view→scanoutCandidates(1) (compositor.cpp:385). On WorkspaceScene this calls WorkspaceScene::scanoutCandidates (src/scene/workspacescene.cpp:281), which walks m_containerItem→sortedChildItems() top-to-bottom via the recursive helper addCandidates (workspacescene.cpp:197).addCandidates produces up to maxCount + 1 = 2 candidate SurfaceItems. The walk requires every traversed item to have opacity == 1.0 and no effects (workspacescene.cpp:202-203).checkForBlackBackground (workspacescene.cpp:263-279). Otherwise scanoutCandidates returns {} (workspacescene.cpp:306-308).prepareDirectScanout requires it to be a SurfaceItemWayland with a valid surface, valid buffer, and dmabuf attributes (compositor.cpp:391-402). It then takes the format/modifier intersection: layer→supportedDrmFormats() for non-tearing or supportedAsyncDrmFormats() for tearing must contain attrs→format AND attrs→modifier (compositor.cpp:404-409).layer→importScanoutBuffer(buffer, frame) is invoked (compositor.cpp:416). For the DRM backend that resolves to EglGbmLayer::importScanoutBuffer (src/backends/drm/drm_egl_layer.cpp:81).
Entry: Compositor::repaint per-layer loop (compositor.cpp:680+), through Scene::overlayCandidates and assignOverlays. Candidate filtering for an overlay is findOverlayCandidates (workspacescene.cpp:335), which accepts an item iff:
SurfaceItem,frameTimeEstimation < 50 ms (≥ 20 fps source frame cadence),surfaceItem→buffer()→dmabufAttributes() is non-null,opacity == 1.0 (TODO comment on line 381 says item-opacity is not yet handled),
Per-layer feasibility is then the same EglGbmLayer::importScanoutBuffer gate, with the layer being a non-Primary OutputLayerType.
(src/backends/drm/drm_egl_layer.cpp:81-127, top-to-bottom)
KWIN_DRM_NO_DIRECT_SCANOUT unset.drmOutput()→shouldDisableNonPrimaryPlanes() is false. The latter is only true in PresentationMode::Async or AdaptiveAsync (drm_output.cpp:112-117) — i.e. tearing modes — so this conjunct is inactive for Brave's default 30 fps playback.gpu()→needsModeset() is false (no pending modeset).drmOutput()→needsShadowBuffer() is false (no display-side shadow buffer required, e.g. for HDR/colour conversion).gpu() == gpu()→platform()→primaryGpu() (no cross-GPU scanout).colorPowerTradeoff != PreferAccuracy.sourceRect() == sourceRect().toRect() — the source rect must be integer-aligned. Sub-pixel cropping → reject. Comment cites the kernel doc note that “devices that don't support subpixel plane coordinates can ignore the fractional part.”offloadTransform() is non-identity, the plane must support that transform via m_plane→supportsTransformation.gpu()→importBuffer(buffer, FileDescriptor{}) returns non-null (gbm import succeeds for this dmabuf format/modifier/stride).
The doc comment on OutputLayer::importScanoutBuffer (src/core/outputlayer.h:101-106) notes that even when this returns true, “a presentation request on the output must however be used afterwards to find out if it's actually successful” — i.e. the final filter is the kernel's DRM atomic-test.
(src/backends/drm/drm_plane.cpp:84-142)
DrmPlane::updateProperties() reads the kernel's IN_FORMATS blob via drmModeFormatModifierBlobIterNext. Each (fmt, mod) pair the kernel advertises is added to m_supportedFormats. EglGbmLayer returns this dictionary verbatim from supportedDrmFormats().
So whatever the kernel's rockchip-drm driver advertises in IN_FORMATS for a given DRM plane is what KWin treats as scanout-eligible for that layer. There is no further KWin-side filter on top.
Raw evidence: ohm_drm_info_2026-05-02.json (inlined), ohm_modetest_planes_2026-05-02.txt (inlined).
DRM driver: rockchip-drm (RockChip Soc DRM, 1.0.0). Active connector: DSI-1 (the PineTab2's internal panel), 800×1280 mode preferred. Two CRTCs visible (51 inactive, 52 active, fb=60).
Three planes (full set on the SoC):
| Plane ID | DRM type | possible_crtcs | KWin OutputLayerType | NV12 LINEAR? | Notes |
|---|---|---|---|---|---|
| 33 | Primary | 0x01 (CRTC 51 only — inactive) | Primary | No (RGB-only LINEAR) | This CRTC has no display attached |
| 39 | Primary | 0x02 (CRTC 52 only — active) | Primary | YES (LINEAR(0x0)) | Currently driving fb=60 (the GL framebuffer) |
| 45 | Overlay | 0x03 (either CRTC) | GenericLayer | No | XR30/XB30/XR/XB/AR/AB 24/RG/BG 24/16, YU08/YU10/YUYV/Y210, all in AFBC modifiers (ARM_BLOCK_SIZE=16×16 family). No NV12 in any modifier. |
CRTC index mapping is positional: CRTC ID 51 = index 0 (bit 0), CRTC ID 52 = index 1 (bit 1). Plane 39 is restricted to CRTC 52; Plane 45 can drive either CRTC. KWin's planeToLayerType (drm_layer.cpp:34-49) maps DRM Primary→OutputLayerType::Primary and DRM Overlay→OutputLayerType::GenericLayer directly.
So on the active CRTC 52, the OutputLayer set KWin sees is:
OutputLayerType::Primary from Plane 39 — supports NV12 LINEAR.OutputLayerType::GenericLayer from Plane 45 — does not support NV12 in any modifier.For Brave's windowed parent + wp_subsurface case:
addCandidates (workspacescene.cpp:197-261) walks the Brave window top-to-bottom. The walk would produce two candidates: the wp_subsurface (video) — added first because it has higher z than its parent — and the parent surface (chrome UI). With maxCount=1, WorkspaceScene::scanoutCandidates calls addCandidates with maxCount + 1 = 2, so two candidates are gathered before the inner size check rejects.
After the walk, workspacescene.cpp:306-308 checks ret.size() == maxCount + 1 && !checkForBlackBackground(ret.back()). The back of the list is the parent surface (Brave UI). It is not a 1×1 single-pixel SHM/single-pixel buffer. Therefore checkForBlackBackground returns false, and the function returns {}. Path A returns empty for windowed Brave by construction.
The “black background” idiom is from 8473b90a20 (Xaver Hugl, 2025-09-03, “compositor: move the 'black background' check to workspacescene”) which moved the check from compositor.cpp into the scene. The check exists for fullscreen-on-black-window patterns (some games / video players render a 1×1 black parent window with their actual content as a child surface, to bypass compositor work) — Brave does not use that pattern.
The wp_subsurface (video) clears every findOverlayCandidates filter at 30 fps with NV12 LINEAR dmabufs. The candidate makes it to prepareDirectScanout for a non-Primary OutputLayer. On CRTC 52, the only non-Primary OutputLayer is Plane 45 (GenericLayer). Plane 45 advertises no NV12 modifier in its IN_FORMATS blob.
Therefore compositor.cpp:404-409:
const auto formats = ... layer->supportedDrmFormats(); if (auto it = formats.find(attrs->format); it == formats.end() || !it->contains(attrs->modifier)) { layer->setScanoutCandidate(candidate); candidate->setScanoutHint(layer->scanoutDevice(), formats); return false; }
returns false: formats.find(DRM_FORMAT_NV12) == formats.end() for Plane 45 → reject. Path B is rejected at the format/modifier intersection. No further conjunct in EglGbmLayer::importScanoutBuffer is even evaluated.
The Primary plane (39) does support NV12 LINEAR, but it is in use as the GL framebuffer surface (OutputLayerType::Primary is the single-framebuffer canonical role in KWin). KWin v6.6.4 does not have logic to swap plane roles dynamically (move the GL framebuffer to Plane 45 in AFBC, free Plane 39 for video). That would be a substantial KWin design change.
Per worklist Phase 1 contract — “yes/no plus a paragraph naming the specific conjunct(s) that pass or fail”:
src/wayland/linuxdmabufv1clientbuffer.cpp, src/scene/surfaceitem_wayland.cpp, src/scene/itemrenderer_opengl.cpp.wp_drm_lease_v1. STRUCTURALLY UNREACHABLE on this hardware/driver combo. Two reasons:wp_drm_lease_v1 is the wrong protocol for this case — it leases an entire connector/output to a client (typical consumer: VR HMDs). It is not the mechanism for putting a subsurface on its own DRM plane within a managed Plasma session. The protocol-correct mechanism would be KWin's existing multi-overlay path (Path B above), which fails at the format/modifier intersection on rockchip-drm.Per worklist: “Either answer also informs the bug-report shape … Different messages, different audiences.”
findOverlayCandidates, opacity == 1.0 is required, not entirely-covered is required). The deeper question — whether Brave's parent renders content behind the video subsurface region — is deferred to Phase 2 source-read per the proposal accepted on 2026-05-02. It does not change Phase 1's answer because Path B is already disqualified at the format intersection upstream of any geometric considerations.drm_egl_layer.cpp:117) is noted but not load-bearing for Phase 1's answer. It would be a load-bearing conjunct if Path B reached importScanoutBuffer, which it does not on this hardware. Banked for Phase 2.End-to-end per-frame path for a Brave wp_subsurface presenting an NV12 LINEAR dmabuf at 1080p30 on the windowed parent (Brave UI), Plasma 6.6.4 + EglBackend + panfrost.
wp_linux_buffer_params.create_immed per V4L2 capture buffer slot. KWin instantiates a LinuxDmaBufV1ClientBuffer (src/wayland/linuxdmabufv1clientbuffer.cpp:216), which IS-A GraphicsBuffer storing DmaBufAttributes (fd, offset, pitch per plane, modifier, format, width, height) — :354-358.RenderBackend::testImportBuffer(clientBuffer) (:217) validates that the rendering backend can import this dmabuf at all. Successful → wl_buffer.created sent, the wl_buffer enters Brave's reuse pool. No GL texture exists yet.:339). For Chromium with a V4L2 capture pool of N buffers, this means N stable LinuxDmaBufV1ClientBuffer / GraphicsBuffer* identities, reused round-robin across frames.SurfaceInterface::bufferChanged fires → SurfaceItemWayland::handleBufferChanged (src/scene/surfaceitem_wayland.cpp:103-106) → setBuffer(…). KWin 6.6.4 negotiates wp_linux_drm_syncobj_v1 explicit sync with Chromium-class clients, so the buffer commit goes through Transaction::watchSyncObj (src/wayland/transaction.cpp:244-249), NOT watchDmaBuf. (Source: kwin-fourier MR body, zero DMA_BUF_IOCTL_EXPORT_SYNC_FILE over 60 s playback.) Fourier patches only touch watchDmaBuf — confirmed irrelevant.
SurfaceItem::preprocess() (src/scene/surfaceitem.cpp:187-208): if m_texture exists and size matches, call m_texture→update(damageRegion); else call m_texture→create(). Brave's video buffers are all the same size (1920×1080), so after the first frame the same OpenGLSurfaceTexture is re-used — update() is the steady-state path, not create().
OpenGLSurfaceTexture::updateDmabufTexture(buffer) (src/scene/surfaceitem.cpp:472-501):
// for NV12 (s_drmConversions match in src/utils/drm_format_helper.h:35-44): // plane 0 = R8 full-size (Y), plane 1 = GR88 half-size (CbCr) for (uint plane = 0; plane < itConv->plane.count(); ++plane) { ... m_texture.planes[plane]->bind(); // glBindTexture, cheap glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, m_backend->importBufferAsImage(buffer, plane, currentPlane.format, size)); // *** suspect *** m_texture.planes[plane]->unbind(); }
EglBackend::importBufferAsImage(buffer, plane, format, size) (src/opengl/eglbackend.cpp:279-299):
std::pair key(buffer, plane); auto it = m_importedBuffers.constFind(key); if (Q_LIKELY(it != m_importedBuffers.constEnd())) { return *it; // CACHE HIT after warmup } // MISS: create fresh EGLImage from DmaBufAttributes via EglDisplay::importDmaBufAsImage // — the genuinely expensive cold-path EGL_LINUX_DMA_BUF_EXT import
The cache key is (GraphicsBuffer *, plane_index). After warmup, every frame for every plane is a cache hit on EGLImage but glEGLImageTargetTexture2DOES is still called every frame to re-target the persistent GLTexture to whatever EGLImage corresponds to the current frame's buffer.
For NV12 video, this re-target happens 2× per frame (Y plane R8 + CbCr plane GR88). For RGBA single-surface (e.g. cage's fullscreen output, or any non-YUV client), it happens 1×.
ItemRendererOpenGL::renderItem (src/scene/itemrenderer_opengl.cpp:334): standard quad render. vbo→bindArrays, glActiveTexture(GL_TEXTURE0+i) + texture[i]→bind() per plane (:473-474), draw, unbind. No suspicious work; the texture binds here are plain glBindTexture, not glEGLImageTargetTexture2DOES.
src/wayland/linuxdmabufv1clientbuffer.cpp — protocol-only. Creates LinuxDmaBufV1ClientBuffer once per wl_buffer (:165, :216). renderBackend→testImportBuffer validates at creation time (:166, :217). NO GL texture import here; that lives in the EglBackend / surface-texture code. Lifetime tied to wl_buffer; for Chromium's V4L2 capture pool this is N stable buffers reused round-robin.src/scene/surfaceitem_wayland.cpp — slot-driven: handleBufferChanged (:103) just stores the new buffer pointer and emits damage. Subsurface tree built in handleChildSubSurfacesChanged (:142) once per surface tree change — not per frame. No per-frame slow path here. The actual texture work is in surfaceitem.cpp::OpenGLSurfaceTexture::updateDmabufTexture.src/scene/itemrenderer_opengl.cpp — renderItem (:334-499) does standard quad rendering with glBindTexture on the already-imported GLTextures. Not the cost site. Per-plane texture binds at :473-474 are plain glBindTexture. No special-case for parent+subsurface vs single-surface.src/scene/composite.cpp + scene scheduling — promotion predicate. Done in Phase 1.src/backends/drm/ — DRM atomic plane-probe, format/modifier acceptance per output. Done in Phase 1 to the depth needed for the leading question.SUPERSEDED 2026-05-02 by Phase 3 measurement. Verdict: H1 rejected at N=1 across C0/C1/C2 + exploratory C3 stock-Brave. The symbol's self-time peaks at 0.15 % vs the 20 % threshold. See phase3_findings. The hypothesis text below is preserved as the “what we believed before measurement” record per the discipline rule (feedback_phase_discipline.md); do not edit it. New working hypothesis H1' (per-frame Wayland-protocol dispatch dominates) emerges from Phase 3 and gets its own Phase 2-prime source-read.
Per-frame cost in KWin's parent + wp_subsurface composite path on Mali-G52 panfrost lives in OpenGLSurfaceTexture::updateDmabufTexture (src/scene/surfaceitem.cpp:472-501), specifically the glEGLImageTargetTexture2DOES call at line 490 (multi-plane YUV) / line 496 (single-plane).
The EglBackend::m_importedBuffers cache (src/opengl/eglbackend.h:116, src/opengl/eglbackend.cpp:279-321) does cache the EGLImage per (GraphicsBuffer *, plane), so after warmup the EGLImage lookup is a hash hit. But the EGLImage and GLTexture are decoupled: a single per-surface m_texture.planes[plane] GLTexture is re-targeted to a different EGLImage every frame via glEGLImageTargetTexture2DOES, because OpenGLSurfaceTexture::updateDmabufTexture is unconditional — it calls the function on every update(), regardless of whether the underlying EGLImage actually changed.
For Brave's V4L2 capture pool of N buffers cycling round-robin:
EglDisplay::importDmaBufAsImage (kernel-side dmabuf-to-EGLImage import). 6-9 expensive first-imports correlate with the three drop-bursts in the A2 trajectory (ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md) at t ≈ 0–5 / 10–12 / 20–30 s. Pool grows in response to scene complexity (B-frame depth, motion-vector load), explaining the discrete bursts.glEGLImageTargetTexture2DOES for NV12 (Y plane + CbCr plane). On panfrost, this rebind has non-trivial cost even when the (texture, image) pair is unchanged or when the new image was previously bound to the same texture in a recent frame.
cage's parity here is informative: cage composites a single fullscreen RGBA surface, so its OpenGLSurfaceTexture::updateDmabufTexture runs the single-plane branch (:493-498) — 1× rebind per frame. KWin direct on the same workload runs the multi-plane branch — 2× rebind per frame, plus the warmup re-import bursts that cage does not exhibit (cage's surfaces are GL-rendered framebuffers KWin imports once, not a V4L2-cycled video pool).
Cache the GLTexture alongside the EGLImage in EglBackend::m_importedBuffers, keyed by (GraphicsBuffer *, plane). On updateDmabufTexture, look up the per-(buffer, plane) GLTexture and re-target the per-surface m_texture.planes[plane] to that GLTexture's name (or, more invasively, swap the GLTexture pointer entirely). Eliminates per-frame glEGLImageTargetTexture2DOES after warmup. Concrete edit site: src/scene/surfaceitem.cpp:472-501 (updateDmabufTexture) plus the cache extension in src/opengl/eglbackend.cpp:279-321 and its header.
The rebind pattern was introduced in the original NV12 Wayland dmabuf support (commit 3568829216 opengl: Add support for NV12 on Wayland dmabufs, pre-2024); no commit message documents a defensive rationale. The merge commit 8c37d1926a (BasicEGLSurfaceTextureWayland → OpenGLSurfaceTexture) and refactor cf8ee656a9 (move surface-texture business to scene/) preserved the pattern unchanged. Phase 5 patch description must explain the mechanism (glEGLImageTargetTexture2DOES is idempotent for an unchanged image binding, and the buffer's contents change doesn't require a re-bind because the texture is already backed by the dmabuf via the EGLImage) — not just cite the symptom.
perf record -p $(pgrep kwin_wayland) during 70 s playback under the locked phase1_lock protocol. Expectation: hot symbols include glEGLImageTargetTexture2DOES (or its panfrost-side implementation, e.g. panvk_* / panfrost_resource_setup) at a non-trivial fraction of kwin_wayland self-time during steady-state. If hot, hypothesis confirmed at the file:line. If cold (i.e. glEGLImageTargetTexture2DOES doesn't show up), the cost is elsewhere and Phase 2 must re-open. Cage perf record under the same workload provides the differential — cage should NOT show the same symbol at the same heat.
glEGLImageTargetTexture2DOES happens to short-circuit identical re-binds, the steady-state cost is elsewhere (possibly in m_texture.planes[plane]→bind() GL state churn or further upstream in damage tracking).OpenGLSurfaceContents (the m_texture field, surfaceitem.h:174) is per-OpenGLSurfaceTexture, not per-buffer. Caching the GLTexture per-buffer requires a different ownership model. This is a Phase 4 design decision, not a Phase 2 fact.