====== Phase 2 — KWin source archaeology ======

This file is the synthesised source-read for the ''kwin_overlay_subsurface'' campaign. It opens with the **Phase 1 leading question answer** (per [[kwin_overlay_subsurface:worklist|worklist]]), follows with the architectural diagram of the per-frame path, and ends with file-level findings ordered by the priority list in [[kwin_overlay_subsurface:worklist|worklist]].

> Discipline guard: no patches are written before this file is committed. Re-scoping is documented honestly with the deferral target named.

----

===== Phase 1 — leading question answer =====

**Status: LOCKED 2026-05-02.** This section is the Phase 1 deliverable. Future Phase 2 / Phase 3 measurements may modify the file-level findings below this section but must not silently move the Phase 1 answer; if reality after measurement contradicts this, the contradiction is documented as a new section, not by editing this one.

**Question** (from [[kwin_overlay_subsurface:worklist|worklist]]):

> On what condition does KWin promote a ''wp_linux_dmabuf_v1'' surface to direct scanout versus falling back to GPU composite, and does the hantro NV12 ''DRM_FORMAT_MOD_LINEAR'' output satisfy those conditions on this DRM driver (rockchip-drm on RK3568, Mesa 26.0.5)?

==== Short answer — NO ====

Neither of KWin v6.6.4's two scanout-promotion paths can place the hantro NV12 LINEAR buffer on a DRM plane on this hardware in the windowed Brave case, for two distinct structural reasons. **The Phase 4 design space narrows to the import-caching hypothesis only.** This aligns with — does not contradict — the architect's prior from the A2 trajectory hint.

==== KWin's two scanout-promotion paths ====

KWin v6.6.4 has two distinct paths that could in principle promote a ''wp_linux_dmabuf_v1'' surface to scanout. Both pass through the same per-layer feasibility check (''OutputLayer::importScanoutBuffer'') but differ in how the candidate is chosen.

=== Path A — single-plane direct scanout ===

Entry: ''Compositor::prepareDirectScanout'' (''src/compositor.cpp:379'', ''prepareDirectScanout(view, logicalOutput, backendOutput, frame)'').

  - ''view->scanoutCandidates(1)'' (compositor.cpp:385). On ''WorkspaceScene'' this calls ''WorkspaceScene::scanoutCandidates'' (''src/scene/workspacescene.cpp:281''), which walks ''m_containerItem->sortedChildItems()'' top-to-bottom via the recursive helper ''addCandidates'' (''workspacescene.cpp:197'').
  - ''addCandidates'' produces up to ''maxCount + 1 = 2'' candidate ''SurfaceItem''s. The walk requires every traversed item to have ''opacity == 1.0'' and no effects (workspacescene.cpp:202-203).
  - After the walk, the back element of the candidate list must be either absent OR a 1×1 single-pixel black-background buffer per ''checkForBlackBackground'' (''workspacescene.cpp:263-279''). Otherwise ''scanoutCandidates'' returns ''{}'' (workspacescene.cpp:306-308).
  - If a candidate is returned, ''prepareDirectScanout'' requires it to be a ''SurfaceItemWayland'' with a valid surface, valid buffer, and //dmabuf attributes// (compositor.cpp:391-402). It then takes the format/modifier intersection: ''layer->supportedDrmFormats()'' for non-tearing or ''supportedAsyncDrmFormats()'' for tearing must contain ''attrs->format'' AND ''attrs->modifier'' (compositor.cpp:404-409).
  - If the intersection passes, ''layer->importScanoutBuffer(buffer, frame)'' is invoked (compositor.cpp:416). For the DRM backend that resolves to ''EglGbmLayer::importScanoutBuffer'' (''src/backends/drm/drm_egl_layer.cpp:81'').

=== Path B — multi-overlay plane scanout ===

Entry: ''Compositor::repaint'' per-layer loop (compositor.cpp:680+), through ''Scene::overlayCandidates'' and ''assignOverlays''. Candidate filtering for an overlay is ''findOverlayCandidates'' (''workspacescene.cpp:335''), which accepts an item iff:

  * it is a ''SurfaceItem'',
  * non-empty rect,
  * ''frameTimeEstimation < 50 ms'' (≥ 20 fps source frame cadence),
  * ''surfaceItem->buffer()->dmabufAttributes()'' is non-null,
  * ''opacity == 1.0'' (TODO comment on line 381 says item-opacity is not yet handled),
  * not entirely covered by other opaque windows,
  * if the region is occupied or rounded-corner-clipped, the item must be fully opaque to qualify as an underlay (workspacescene.cpp:386-389).

Per-layer feasibility is then the same ''EglGbmLayer::importScanoutBuffer'' gate, with the layer being a non-Primary ''OutputLayerType''.

=== EglGbmLayer::importScanoutBuffer — the conjunct list ===

(''src/backends/drm/drm_egl_layer.cpp:81-127'', top-to-bottom)

  - Env var ''KWIN_DRM_NO_DIRECT_SCANOUT'' unset.
  - Layer is Primary OR ''drmOutput()->shouldDisableNonPrimaryPlanes()'' is false. The latter is only true in ''PresentationMode::Async'' or ''AdaptiveAsync'' (drm_output.cpp:112-117) — i.e. tearing modes — so this conjunct is **inactive for Brave's default 30 fps playback**.
  - ''gpu()->needsModeset()'' is false (no pending modeset).
  - ''drmOutput()->needsShadowBuffer()'' is false (no display-side shadow buffer required, e.g. for HDR/colour conversion).
  - ''gpu() == gpu()->platform()->primaryGpu()'' (no cross-GPU scanout).
  - Color pipeline is identity OR ''colorPowerTradeoff != PreferAccuracy''.
  - **''sourceRect() == sourceRect().toRect()''** — the source rect must be integer-aligned. Sub-pixel cropping → reject. Comment cites the kernel doc note that "devices that don't support subpixel plane coordinates can ignore the fractional part."
  - If ''offloadTransform()'' is non-identity, the plane must support that transform via ''m_plane->supportsTransformation''.
  - ''gpu()->importBuffer(buffer, FileDescriptor{})'' returns non-null (gbm import succeeds for this dmabuf format/modifier/stride).

The doc comment on ''OutputLayer::importScanoutBuffer'' (''src/core/outputlayer.h:101-106'') notes that even when this returns true, "a presentation request on the output must however be used afterwards to find out if it's actually successful" — i.e. the final filter is the kernel's DRM atomic-test.

=== Where supportedDrmFormats() comes from ===

(''src/backends/drm/drm_plane.cpp:84-142'')

''DrmPlane::updateProperties()'' reads the kernel's ''IN_FORMATS'' blob via ''drmModeFormatModifierBlobIterNext''. Each ''(fmt, mod)'' pair the kernel advertises is added to ''m_supportedFormats''. ''EglGbmLayer'' returns this dictionary verbatim from ''supportedDrmFormats()''.

So whatever the kernel's rockchip-drm driver advertises in ''IN_FORMATS'' for a given DRM plane //is// what KWin treats as scanout-eligible for that layer. There is no further KWin-side filter on top.

==== Hardware: rockchip-drm plane format/modifier table on ohm ====

Raw evidence: [[kwin_overlay_subsurface:evidence:ohm_drm_info_2026-05-02|ohm_drm_info_2026-05-02.json (inlined)]], [[kwin_overlay_subsurface:evidence:ohm_modetest_planes_2026-05-02|ohm_modetest_planes_2026-05-02.txt (inlined)]].

DRM driver: ''rockchip-drm'' (RockChip Soc DRM, 1.0.0). Active connector: DSI-1 (the PineTab2's internal panel), 800×1280 mode preferred. Two CRTCs visible (51 inactive, 52 active, fb=60).

Three planes (full set on the SoC):

^ Plane ID ^ DRM type ^ possible_crtcs ^ KWin OutputLayerType ^ NV12 LINEAR? ^ Notes ^
| 33 | Primary | ''0x01'' (CRTC 51 only — inactive) | Primary | No (RGB-only LINEAR) | This CRTC has no display attached |
| 39 | Primary | ''0x02'' (CRTC 52 only — active) | Primary | **YES** (LINEAR(0x0)) | Currently driving fb=60 (the GL framebuffer) |
| 45 | Overlay | ''0x03'' (either CRTC) | GenericLayer | **No** | XR30/XB30/XR/XB/AR/AB 24/RG/BG 24/16, YU08/YU10/YUYV/Y210, all in AFBC modifiers (''ARM_BLOCK_SIZE=16x16'' family). No NV12 in any modifier. |

CRTC index mapping is positional: CRTC ID 51 = index 0 (bit 0), CRTC ID 52 = index 1 (bit 1). Plane 39 is restricted to CRTC 52; Plane 45 can drive either CRTC. KWin's ''planeToLayerType'' (''drm_layer.cpp:34-49'') maps DRM Primary→''OutputLayerType::Primary'' and DRM Overlay→''OutputLayerType::GenericLayer'' directly.

So on the active CRTC 52, the OutputLayer set KWin sees is:

  * 1 × ''OutputLayerType::Primary'' from Plane 39 — supports NV12 LINEAR.
  * 1 × ''OutputLayerType::GenericLayer'' from Plane 45 — does not support NV12 in any modifier.

==== Why the answer is NO — the failing conjunct, named ====

For Brave's //windowed// parent + wp_subsurface case:

=== Path A is rejected at the scene-walk stage ===

''addCandidates'' (workspacescene.cpp:197-261) walks the Brave window top-to-bottom. The walk would produce two candidates: the wp_subsurface (video) — added first because it has higher z than its parent — and the parent surface (chrome UI). With ''maxCount=1'', ''WorkspaceScene::scanoutCandidates'' calls ''addCandidates'' with ''maxCount + 1 = 2'', so two candidates are gathered before the inner size check rejects.

After the walk, ''workspacescene.cpp:306-308'' checks ''ret.size() == maxCount + 1 && !checkForBlackBackground(ret.back())''. The back of the list is the parent surface (Brave UI). It is //not// a 1×1 single-pixel SHM/single-pixel buffer. Therefore ''checkForBlackBackground'' returns false, and the function returns ''{}''. **Path A returns empty for windowed Brave by construction.**

The "black background" idiom is from [[https://invent.kde.org/plasma/kwin/-/commit/8473b90a20|8473b90a20]] (Xaver Hugl, 2025-09-03, "compositor: move the 'black background' check to workspacescene") which moved the check from compositor.cpp into the scene. The check exists for fullscreen-on-black-window patterns (some games / video players render a 1×1 black parent window with their actual content as a child surface, to bypass compositor work) — Brave does not use that pattern.

=== Path B is rejected at the format/modifier intersection ===

The wp_subsurface (video) clears every ''findOverlayCandidates'' filter at 30 fps with NV12 LINEAR dmabufs. The candidate makes it to ''prepareDirectScanout'' for a non-Primary OutputLayer. On CRTC 52, the only non-Primary OutputLayer is Plane 45 (GenericLayer). Plane 45 advertises **no NV12 modifier** in its ''IN_FORMATS'' blob.

Therefore ''compositor.cpp:404-409'':

<code cpp>
const auto formats = ... layer->supportedDrmFormats();
if (auto it = formats.find(attrs->format); it == formats.end() || !it->contains(attrs->modifier)) {
    layer->setScanoutCandidate(candidate);
    candidate->setScanoutHint(layer->scanoutDevice(), formats);
    return false;
}
</code>

returns false: ''formats.find(DRM_FORMAT_NV12) == formats.end()'' for Plane 45 → reject. **Path B is rejected at the format/modifier intersection.** No further conjunct in ''EglGbmLayer::importScanoutBuffer'' is even evaluated.

The Primary plane (39) does support NV12 LINEAR, but it is in use as the GL framebuffer surface (''OutputLayerType::Primary'' is the single-framebuffer canonical role in KWin). KWin v6.6.4 does not have logic to swap plane roles dynamically (move the GL framebuffer to Plane 45 in AFBC, free Plane 39 for video). That would be a substantial KWin design change.

==== Implications for Phase 4 design space ====

Per [[kwin_overlay_subsurface:worklist|worklist]] Phase 1 contract — "yes/no plus a paragraph naming the specific conjunct(s) that pass or fail":

  * **Architect's hypothesis (a) — cache the dmabuf-to-GL-texture import.** Remains the primary candidate. Aligns with the A2 trajectory data (drops in three bursts during ~30 s warmup, then steady 0/sec). Phase 2 source-read prioritises: ''src/wayland/linuxdmabufv1clientbuffer.cpp'', ''src/scene/surfaceitem_wayland.cpp'', ''src/scene/itemrenderer_opengl.cpp''.
  * **Architect's hypothesis (b) — promote single-color-plane subsurface video to direct scanout via ''wp_drm_lease_v1''.** STRUCTURALLY UNREACHABLE on this hardware/driver combo. Two reasons:
    - ''wp_drm_lease_v1'' is the wrong protocol for this case — it leases an //entire connector/output// to a client (typical consumer: VR HMDs). It is not the mechanism for putting a subsurface on its own DRM plane within a managed Plasma session. The protocol-correct mechanism would be KWin's existing multi-overlay path (Path B above), which fails at the format/modifier intersection on rockchip-drm.
    - Even if KWin gained dynamic plane-role swapping, the rockchip-drm overlay plane (Plane 45) does not advertise NV12 in any modifier — that is a kernel-side gap, **out of this campaign's scope** per [[kwin_overlay_subsurface:readme|README]].

==== Bug-report shape — narrowed ====

Per [[kwin_overlay_subsurface:worklist|worklist]]: "Either answer also informs the bug-report shape … Different messages, different audiences."

  * The "missed scanout-promotion" framing has two possible audiences, neither well-suited to this campaign:
    * KWin maintainers: would require a design-discussion patch (dynamic plane-role swap). Out of scope.
    * rockchip-drm maintainers: kernel patch to expose NV12 on the overlay plane (if the VOP2 hardware actually supports it on the overlay window — needs separate VOP2 archaeology). Out of scope per README.
  * The "your subsurface composite is slow" framing (Phase 4 hypothesis a — import-caching) has one audience: **KWin maintainers, with a measurement-grounded patch description**. This is the Phase 5 bug-report shape this campaign should pursue.

==== Caveats and Phase 1-step-3 deferral ====

  * This answer rests on the assumption that "windowed Brave with chrome UI visible" is the in-scope case (per Phase 1 lock). //Fullscreen// Brave (F11) would change Path A's outcome — the parent surface might fill the viewport with no second candidate, in which case Path A could potentially succeed //if// Plane 39 is available. Not measured in Phase 0 / not in scope.
  * Phase 1 step 3 from [[kwin_overlay_subsurface:worklist|worklist]] ("does KWin require the subsurface to be the only damageable region of a given plane") is partially answered by the conjunct list above (rounded-corner clipping is in ''findOverlayCandidates'', opacity == 1.0 is required, not entirely-covered is required). The deeper question — whether Brave's parent renders content //behind// the video subsurface region — is **deferred to Phase 2 source-read** per the proposal accepted on 2026-05-02. It does not change Phase 1's answer because Path B is already disqualified at the format intersection upstream of any geometric considerations.
  * The integer-source-rect requirement (''drm_egl_layer.cpp:117'') is noted but not load-bearing for Phase 1's answer. It would be a load-bearing conjunct //if// Path B reached ''importScanoutBuffer'', which it does not on this hardware. Banked for Phase 2.

----

===== Architectural map =====

End-to-end per-frame path for a Brave wp_subsurface presenting an NV12 LINEAR dmabuf at 1080p30 on the windowed parent (Brave UI), Plasma 6.6.4 + EglBackend + panfrost.

==== Buffer ingress (one-time per wl_buffer) ====

  * Brave commits ''wp_linux_buffer_params.create_immed'' per V4L2 capture buffer slot. KWin instantiates a ''LinuxDmaBufV1ClientBuffer'' (''src/wayland/linuxdmabufv1clientbuffer.cpp:216''), which IS-A ''GraphicsBuffer'' storing ''DmaBufAttributes'' (fd, offset, pitch per plane, modifier, format, width, height) — '':354-358''.
  * ''RenderBackend::testImportBuffer(clientBuffer)'' ('':217'') validates that the rendering backend can import this dmabuf at all. Successful → ''wl_buffer.created'' sent, the wl_buffer enters Brave's reuse pool. **No GL texture exists yet.**
  * Lifetime: until Brave destroys the wl_buffer ('':339''). For Chromium with a V4L2 capture pool of N buffers, this means N stable ''LinuxDmaBufV1ClientBuffer'' / ''GraphicsBuffer*'' identities, reused round-robin across frames.

==== Per-attach (every wl_surface.commit with a buffer attached) ====

  * ''SurfaceInterface::bufferChanged'' fires → ''SurfaceItemWayland::handleBufferChanged'' (''src/scene/surfaceitem_wayland.cpp:103-106'') → ''setBuffer(...)''. KWin 6.6.4 negotiates ''wp_linux_drm_syncobj_v1'' explicit sync with Chromium-class clients, so the buffer commit goes through ''Transaction::watchSyncObj'' (''src/wayland/transaction.cpp:244-249''), NOT ''watchDmaBuf''. (Source: kwin-fourier MR body, zero ''DMA_BUF_IOCTL_EXPORT_SYNC_FILE'' over 60 s playback.) Fourier patches only touch ''watchDmaBuf'' — confirmed irrelevant.
  * Damage region updated; per-frame compositor wakes.

==== Per-frame texture-update (the hot path) ====

''SurfaceItem::preprocess()'' (''src/scene/surfaceitem.cpp:187-208''): if ''m_texture'' exists and size matches, call ''m_texture->update(damageRegion)''; else call ''m_texture->create()''. Brave's video buffers are all the same size (1920×1080), so after the first frame the same ''OpenGLSurfaceTexture'' is re-used — **''update()'' is the steady-state path, not ''create()''**.

''OpenGLSurfaceTexture::updateDmabufTexture(buffer)'' (''src/scene/surfaceitem.cpp:472-501''):

<code cpp>
// for NV12 (s_drmConversions match in src/utils/drm_format_helper.h:35-44):
//   plane 0 = R8 full-size (Y), plane 1 = GR88 half-size (CbCr)
for (uint plane = 0; plane < itConv->plane.count(); ++plane) {
    ...
    m_texture.planes[plane]->bind();                                           // glBindTexture, cheap
    glEGLImageTargetTexture2DOES(GL_TEXTURE_2D,
        m_backend->importBufferAsImage(buffer, plane, currentPlane.format, size));  // *** suspect ***
    m_texture.planes[plane]->unbind();
}
</code>

''EglBackend::importBufferAsImage(buffer, plane, format, size)'' (''src/opengl/eglbackend.cpp:279-299''):

<code cpp>
std::pair key(buffer, plane);
auto it = m_importedBuffers.constFind(key);
if (Q_LIKELY(it != m_importedBuffers.constEnd())) {
    return *it;                                                                // CACHE HIT after warmup
}
// MISS: create fresh EGLImage from DmaBufAttributes via EglDisplay::importDmaBufAsImage
// — the genuinely expensive cold-path EGL_LINUX_DMA_BUF_EXT import
</code>

The cache key is ''(GraphicsBuffer *, plane_index)''. After warmup, every frame for every plane is a **cache hit on EGLImage** but **''glEGLImageTargetTexture2DOES'' is still called every frame to re-target the persistent ''GLTexture'' to whatever EGLImage corresponds to the current frame's buffer**.

For NV12 video, this re-target happens **2× per frame** (Y plane R8 + CbCr plane GR88). For RGBA single-surface (e.g. cage's fullscreen output, or any non-YUV client), it happens 1×.

==== Per-frame rendering (also hot but well-understood) ====

''ItemRendererOpenGL::renderItem'' (''src/scene/itemrenderer_opengl.cpp:334''): standard quad render. ''vbo->bindArrays'', ''glActiveTexture(GL_TEXTURE0+i) + texture[i]->bind()'' per plane ('':473-474''), draw, unbind. No suspicious work; the texture binds here are plain ''glBindTexture'', not ''glEGLImageTargetTexture2DOES''.

----

===== File-level findings — Phase 2 reading list =====

  * [x] ''src/wayland/linuxdmabufv1clientbuffer.cpp'' — protocol-only. Creates ''LinuxDmaBufV1ClientBuffer'' once per wl_buffer ('':165, :216''). ''renderBackend->testImportBuffer'' validates at creation time ('':166, :217''). NO GL texture import here; that lives in the EglBackend / surface-texture code. Lifetime tied to wl_buffer; for Chromium's V4L2 capture pool this is N stable buffers reused round-robin.
  * [x] ''src/scene/surfaceitem_wayland.cpp'' — slot-driven: ''handleBufferChanged'' ('':103'') just stores the new buffer pointer and emits damage. Subsurface tree built in ''handleChildSubSurfacesChanged'' ('':142'') once per surface tree change — not per frame. **No per-frame slow path here.** The actual texture work is in ''surfaceitem.cpp::OpenGLSurfaceTexture::updateDmabufTexture''.
  * [x] ''src/scene/itemrenderer_opengl.cpp'' — ''renderItem'' ('':334-499'') does standard quad rendering with ''glBindTexture'' on the already-imported GLTextures. **Not the cost site.** Per-plane texture binds at '':473-474'' are plain ''glBindTexture''. No special-case for parent+subsurface vs single-surface.
  * [x] ''src/scene/composite.cpp'' + scene scheduling — promotion predicate. Done in Phase 1.
  * [x] ''src/backends/drm/'' — DRM atomic plane-probe, format/modifier acceptance per output. Done in Phase 1 to the depth needed for the leading question.

----

===== Phase 2 hypothesis — concrete, file:line =====

> **SUPERSEDED 2026-05-02 by Phase 3 measurement.** Verdict: H1 rejected at N=1 across C0/C1/C2 + exploratory C3 stock-Brave. The symbol's self-time peaks at 0.15 % vs the 20 % threshold. See [[kwin_overlay_subsurface:phase3_findings|phase3_findings]]. The hypothesis text below is preserved as the "what we believed before measurement" record per the discipline rule (''feedback_phase_discipline.md''); do not edit it. New working hypothesis H1' (per-frame Wayland-protocol dispatch dominates) emerges from Phase 3 and gets its own Phase 2-prime source-read.

**Per-frame cost in KWin's parent + wp_subsurface composite path on Mali-G52 panfrost lives in ''OpenGLSurfaceTexture::updateDmabufTexture'' (''src/scene/surfaceitem.cpp:472-501''), specifically the ''glEGLImageTargetTexture2DOES'' call at line 490 (multi-plane YUV) / line 496 (single-plane).**

==== Mechanism ====

The ''EglBackend::m_importedBuffers'' cache (''src/opengl/eglbackend.h:116'', ''src/opengl/eglbackend.cpp:279-321'') does cache the EGLImage per ''(GraphicsBuffer *, plane)'', so after warmup the EGLImage lookup is a hash hit. But the EGLImage and GLTexture are decoupled: a single per-surface ''m_texture.planes[plane]'' GLTexture is **re-targeted to a different EGLImage every frame** via ''glEGLImageTargetTexture2DOES'', because ''OpenGLSurfaceTexture::updateDmabufTexture'' is unconditional — it calls the function on every ''update()'', regardless of whether the underlying EGLImage actually changed.

For Brave's V4L2 capture pool of N buffers cycling round-robin:

  * **Warmup (≤ ~30 s on ohm)**: each new GraphicsBuffer\* miss in the cache → fresh ''EglDisplay::importDmaBufAsImage'' (kernel-side dmabuf-to-EGLImage import). 6-9 expensive first-imports correlate with the three drop-bursts in the A2 trajectory (''ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md'') at t ≈ 0–5 / 10–12 / 20–30 s. Pool grows in response to scene complexity (B-frame depth, motion-vector load), explaining the discrete bursts.
  * **Steady state (post-warmup)**: every frame pays 2× ''glEGLImageTargetTexture2DOES'' for NV12 (Y plane + CbCr plane). On panfrost, this rebind has non-trivial cost even when the (texture, image) pair is unchanged or when the new image was previously bound to the same texture in a recent frame.

cage's parity here is informative: cage composites a single fullscreen RGBA surface, so its ''OpenGLSurfaceTexture::updateDmabufTexture'' runs the **single-plane** branch ('':493-498'') — 1× rebind per frame. KWin direct on the same workload runs the **multi-plane** branch — 2× rebind per frame, plus the warmup re-import bursts that cage does not exhibit (cage's surfaces are GL-rendered framebuffers KWin imports once, not a V4L2-cycled video pool).

==== Predicted Phase 4 patch shape ====

Cache the GLTexture alongside the EGLImage in ''EglBackend::m_importedBuffers'', keyed by ''(GraphicsBuffer *, plane)''. On ''updateDmabufTexture'', look up the per-(buffer, plane) GLTexture and re-target the per-surface ''m_texture.planes[plane]'' to that GLTexture's name (or, more invasively, swap the GLTexture pointer entirely). Eliminates per-frame ''glEGLImageTargetTexture2DOES'' after warmup. Concrete edit site: ''src/scene/surfaceitem.cpp:472-501'' (''updateDmabufTexture'') plus the cache extension in ''src/opengl/eglbackend.cpp:279-321'' and its header.

The rebind pattern was introduced in the original NV12 Wayland dmabuf support (commit ''3568829216 opengl: Add support for NV12 on Wayland dmabufs'', pre-2024); no commit message documents a defensive rationale. The merge commit ''8c37d1926a'' (BasicEGLSurfaceTextureWayland → OpenGLSurfaceTexture) and refactor ''cf8ee656a9'' (move surface-texture business to scene/) preserved the pattern unchanged. Phase 5 patch description must explain the mechanism (''glEGLImageTargetTexture2DOES'' is idempotent for an unchanged image binding, and the buffer's //contents// change doesn't require a re-bind because the texture is already backed by the dmabuf via the EGLImage) — not just cite the symptom.

==== Phase 3 measurement that validates this hypothesis ====

''perf record -p $(pgrep kwin_wayland)'' during 70 s playback under the locked [[kwin_overlay_subsurface:phase1_lock|phase1_lock]] protocol. Expectation: hot symbols include ''glEGLImageTargetTexture2DOES'' (or its panfrost-side implementation, e.g. ''panvk_*'' / ''panfrost_resource_setup'') at a non-trivial fraction of ''kwin_wayland'' self-time during steady-state. If hot, hypothesis confirmed at the file:line. If cold (i.e. ''glEGLImageTargetTexture2DOES'' doesn't show up), the cost is elsewhere and Phase 2 must re-open. Cage perf record under the same workload provides the differential — cage should NOT show the same symbol at the same heat.

==== Caveats ====

  * The hypothesis is consistent with A2 trajectory (warmup bursts + steady-state CPU) but is **not yet validated by hot-path data**. Phase 3 perf record is the highest-value remaining measurement (per architect, see [[kwin_overlay_subsurface:phase0_findings|phase0_findings]]).
  * The rebind cost on panfrost specifically is asserted from first principles. If the panfrost implementation of ''glEGLImageTargetTexture2DOES'' happens to short-circuit identical re-binds, the steady-state cost is elsewhere (possibly in ''m_texture.planes[plane]->bind()'' GL state churn or further upstream in damage tracking).
  * ''OpenGLSurfaceContents'' (the ''m_texture'' field, surfaceitem.h:174) is per-''OpenGLSurfaceTexture'', not per-buffer. Caching the GLTexture per-buffer requires a different ownership model. This is a Phase 4 design decision, not a Phase 2 fact.