| Next revision | Previous revision |
| kwin_overlay_subsurface:phase2_source_findings [2026/05/02 17:51] – Phase 1 deliverables — initial upload from kwin_overlay_subsurface dossier markus_fritsche | kwin_overlay_subsurface:phase2_source_findings [2026/05/02 20:16] (current) – SUPERSEDED block-quote at top of Phase 2 hypothesis section, pointing at phase3_findings (prior preserved verbatim) markus_fritsche |
|---|
| |
| ===== Phase 1 — leading question answer ===== | ===== Phase 1 — leading question answer ===== |
| | |
| | **Status: LOCKED 2026-05-02.** This section is the Phase 1 deliverable. Future Phase 2 / Phase 3 measurements may modify the file-level findings below this section but must not silently move the Phase 1 answer; if reality after measurement contradicts this, the contradiction is documented as a new section, not by editing this one. |
| |
| **Question** (from [[kwin_overlay_subsurface:worklist|worklist]]): | **Question** (from [[kwin_overlay_subsurface:worklist|worklist]]): |
| ==== Hardware: rockchip-drm plane format/modifier table on ohm ==== | ==== Hardware: rockchip-drm plane format/modifier table on ohm ==== |
| |
| Raw evidence: {{:kwin_overlay_subsurface:ohm_drm_info_2026-05-02.json|ohm_drm_info_2026-05-02.json}}, {{:kwin_overlay_subsurface:ohm_modetest_planes_2026-05-02.txt|ohm_modetest_planes_2026-05-02.txt}}. | Raw evidence: [[kwin_overlay_subsurface:evidence:ohm_drm_info_2026-05-02|ohm_drm_info_2026-05-02.json (inlined)]], [[kwin_overlay_subsurface:evidence:ohm_modetest_planes_2026-05-02|ohm_modetest_planes_2026-05-02.txt (inlined)]]. |
| |
| DRM driver: ''rockchip-drm'' (RockChip Soc DRM, 1.0.0). Active connector: DSI-1 (the PineTab2's internal panel), 800×1280 mode preferred. Two CRTCs visible (51 inactive, 52 active, fb=60). | DRM driver: ''rockchip-drm'' (RockChip Soc DRM, 1.0.0). Active connector: DSI-1 (the PineTab2's internal panel), 800×1280 mode preferred. Two CRTCs visible (51 inactive, 52 active, fb=60). |
| ---- | ---- |
| |
| ===== Architectural map — to be written during Phase 2 file-level read ===== | ===== Architectural map ===== |
| |
| (Stub. Phase 2 reading list per [[kwin_overlay_subsurface:worklist|worklist]]: ''src/wayland/linuxdmabufv1clientbuffer.cpp'', ''src/scene/surfaceitem_wayland.cpp'', ''src/scene/itemrenderer_opengl.cpp'', scanout entry points already read for Phase 1, ''src/backends/drm/''.) | End-to-end per-frame path for a Brave wp_subsurface presenting an NV12 LINEAR dmabuf at 1080p30 on the windowed parent (Brave UI), Plasma 6.6.4 + EglBackend + panfrost. |
| |
| Architectural inputs already established: | ==== Buffer ingress (one-time per wl_buffer) ==== |
| |
| * KWin 6.6.4 negotiates ''wp_linux_drm_syncobj_v1'' **explicit sync** with Chromium-class clients. Brave's dmabuf attaches go through ''Transaction::watchSyncObj'' (''src/wayland/transaction.cpp:244-249''), not ''Transaction::watchDmaBuf''. (Source: kwin-fourier MR body in ''~/src/marfrit-packages/upstream-submissions/kwin-fourier/kde-mr-body.md'', zero-EXPORT_SYNC_FILE strace finding over 60 s playback.) | * Brave commits ''wp_linux_buffer_params.create_immed'' per V4L2 capture buffer slot. KWin instantiates a ''LinuxDmaBufV1ClientBuffer'' (''src/wayland/linuxdmabufv1clientbuffer.cpp:216''), which IS-A ''GraphicsBuffer'' storing ''DmaBufAttributes'' (fd, offset, pitch per plane, modifier, format, width, height) — '':354-358''. |
| * Fourier patches ''0001'' and ''0002'' only touch ''watchDmaBuf'' — irrelevant to Phase 1's scanout-promotion path. | * ''RenderBackend::testImportBuffer(clientBuffer)'' ('':217'') validates that the rendering backend can import this dmabuf at all. Successful → ''wl_buffer.created'' sent, the wl_buffer enters Brave's reuse pool. **No GL texture exists yet.** |
| | * Lifetime: until Brave destroys the wl_buffer ('':339''). For Chromium with a V4L2 capture pool of N buffers, this means N stable ''LinuxDmaBufV1ClientBuffer'' / ''GraphicsBuffer*'' identities, reused round-robin across frames. |
| | |
| | ==== Per-attach (every wl_surface.commit with a buffer attached) ==== |
| | |
| | * ''SurfaceInterface::bufferChanged'' fires → ''SurfaceItemWayland::handleBufferChanged'' (''src/scene/surfaceitem_wayland.cpp:103-106'') → ''setBuffer(...)''. KWin 6.6.4 negotiates ''wp_linux_drm_syncobj_v1'' explicit sync with Chromium-class clients, so the buffer commit goes through ''Transaction::watchSyncObj'' (''src/wayland/transaction.cpp:244-249''), NOT ''watchDmaBuf''. (Source: kwin-fourier MR body, zero ''DMA_BUF_IOCTL_EXPORT_SYNC_FILE'' over 60 s playback.) Fourier patches only touch ''watchDmaBuf'' — confirmed irrelevant. |
| | * Damage region updated; per-frame compositor wakes. |
| | |
| | ==== Per-frame texture-update (the hot path) ==== |
| | |
| | ''SurfaceItem::preprocess()'' (''src/scene/surfaceitem.cpp:187-208''): if ''m_texture'' exists and size matches, call ''m_texture->update(damageRegion)''; else call ''m_texture->create()''. Brave's video buffers are all the same size (1920×1080), so after the first frame the same ''OpenGLSurfaceTexture'' is re-used — **''update()'' is the steady-state path, not ''create()''**. |
| | |
| | ''OpenGLSurfaceTexture::updateDmabufTexture(buffer)'' (''src/scene/surfaceitem.cpp:472-501''): |
| | |
| | <code cpp> |
| | // for NV12 (s_drmConversions match in src/utils/drm_format_helper.h:35-44): |
| | // plane 0 = R8 full-size (Y), plane 1 = GR88 half-size (CbCr) |
| | for (uint plane = 0; plane < itConv->plane.count(); ++plane) { |
| | ... |
| | m_texture.planes[plane]->bind(); // glBindTexture, cheap |
| | glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, |
| | m_backend->importBufferAsImage(buffer, plane, currentPlane.format, size)); // *** suspect *** |
| | m_texture.planes[plane]->unbind(); |
| | } |
| | </code> |
| | |
| | ''EglBackend::importBufferAsImage(buffer, plane, format, size)'' (''src/opengl/eglbackend.cpp:279-299''): |
| | |
| | <code cpp> |
| | std::pair key(buffer, plane); |
| | auto it = m_importedBuffers.constFind(key); |
| | if (Q_LIKELY(it != m_importedBuffers.constEnd())) { |
| | return *it; // CACHE HIT after warmup |
| | } |
| | // MISS: create fresh EGLImage from DmaBufAttributes via EglDisplay::importDmaBufAsImage |
| | // — the genuinely expensive cold-path EGL_LINUX_DMA_BUF_EXT import |
| | </code> |
| | |
| | The cache key is ''(GraphicsBuffer *, plane_index)''. After warmup, every frame for every plane is a **cache hit on EGLImage** but **''glEGLImageTargetTexture2DOES'' is still called every frame to re-target the persistent ''GLTexture'' to whatever EGLImage corresponds to the current frame's buffer**. |
| | |
| | For NV12 video, this re-target happens **2× per frame** (Y plane R8 + CbCr plane GR88). For RGBA single-surface (e.g. cage's fullscreen output, or any non-YUV client), it happens 1×. |
| | |
| | ==== Per-frame rendering (also hot but well-understood) ==== |
| | |
| | ''ItemRendererOpenGL::renderItem'' (''src/scene/itemrenderer_opengl.cpp:334''): standard quad render. ''vbo->bindArrays'', ''glActiveTexture(GL_TEXTURE0+i) + texture[i]->bind()'' per plane ('':473-474''), draw, unbind. No suspicious work; the texture binds here are plain ''glBindTexture'', not ''glEGLImageTargetTexture2DOES''. |
| |
| ---- | ---- |
| ===== File-level findings — Phase 2 reading list ===== | ===== File-level findings — Phase 2 reading list ===== |
| |
| (To be filled in during Phase 2 source-read.) | * [x] ''src/wayland/linuxdmabufv1clientbuffer.cpp'' — protocol-only. Creates ''LinuxDmaBufV1ClientBuffer'' once per wl_buffer ('':165, :216''). ''renderBackend->testImportBuffer'' validates at creation time ('':166, :217''). NO GL texture import here; that lives in the EglBackend / surface-texture code. Lifetime tied to wl_buffer; for Chromium's V4L2 capture pool this is N stable buffers reused round-robin. |
| | * [x] ''src/scene/surfaceitem_wayland.cpp'' — slot-driven: ''handleBufferChanged'' ('':103'') just stores the new buffer pointer and emits damage. Subsurface tree built in ''handleChildSubSurfacesChanged'' ('':142'') once per surface tree change — not per frame. **No per-frame slow path here.** The actual texture work is in ''surfaceitem.cpp::OpenGLSurfaceTexture::updateDmabufTexture''. |
| * [ ] ''src/wayland/linuxdmabufv1clientbuffer.cpp'' — dmabuf import, fd → GL texture lifetime, cache hit/miss surface. | * [x] ''src/scene/itemrenderer_opengl.cpp'' — ''renderItem'' ('':334-499'') does standard quad rendering with ''glBindTexture'' on the already-imported GLTextures. **Not the cost site.** Per-plane texture binds at '':473-474'' are plain ''glBindTexture''. No special-case for parent+subsurface vs single-surface. |
| * [ ] ''src/scene/surfaceitem_wayland.cpp'' — per-subsurface state, slow paths on first sight or fd cycle. | |
| * [ ] ''src/scene/itemrenderer_opengl.cpp'' — parent+subsurface render path special cases. | |
| * [x] ''src/scene/composite.cpp'' + scene scheduling — promotion predicate. Done in Phase 1. | * [x] ''src/scene/composite.cpp'' + scene scheduling — promotion predicate. Done in Phase 1. |
| * [x] ''src/backends/drm/'' — DRM atomic plane-probe, format/modifier acceptance per output. Done in Phase 1 to the depth needed for the leading question. | * [x] ''src/backends/drm/'' — DRM atomic plane-probe, format/modifier acceptance per output. Done in Phase 1 to the depth needed for the leading question. |
| | |
| | ---- |
| | |
| | ===== Phase 2 hypothesis — concrete, file:line ===== |
| | |
| | > **SUPERSEDED 2026-05-02 by Phase 3 measurement.** Verdict: H1 rejected at N=1 across C0/C1/C2 + exploratory C3 stock-Brave. The symbol's self-time peaks at 0.15 % vs the 20 % threshold. See [[kwin_overlay_subsurface:phase3_findings|phase3_findings]]. The hypothesis text below is preserved as the "what we believed before measurement" record per the discipline rule (''feedback_phase_discipline.md''); do not edit it. New working hypothesis H1' (per-frame Wayland-protocol dispatch dominates) emerges from Phase 3 and gets its own Phase 2-prime source-read. |
| | |
| | **Per-frame cost in KWin's parent + wp_subsurface composite path on Mali-G52 panfrost lives in ''OpenGLSurfaceTexture::updateDmabufTexture'' (''src/scene/surfaceitem.cpp:472-501''), specifically the ''glEGLImageTargetTexture2DOES'' call at line 490 (multi-plane YUV) / line 496 (single-plane).** |
| | |
| | ==== Mechanism ==== |
| | |
| | The ''EglBackend::m_importedBuffers'' cache (''src/opengl/eglbackend.h:116'', ''src/opengl/eglbackend.cpp:279-321'') does cache the EGLImage per ''(GraphicsBuffer *, plane)'', so after warmup the EGLImage lookup is a hash hit. But the EGLImage and GLTexture are decoupled: a single per-surface ''m_texture.planes[plane]'' GLTexture is **re-targeted to a different EGLImage every frame** via ''glEGLImageTargetTexture2DOES'', because ''OpenGLSurfaceTexture::updateDmabufTexture'' is unconditional — it calls the function on every ''update()'', regardless of whether the underlying EGLImage actually changed. |
| | |
| | For Brave's V4L2 capture pool of N buffers cycling round-robin: |
| | |
| | * **Warmup (≤ ~30 s on ohm)**: each new GraphicsBuffer\* miss in the cache → fresh ''EglDisplay::importDmaBufAsImage'' (kernel-side dmabuf-to-EGLImage import). 6-9 expensive first-imports correlate with the three drop-bursts in the A2 trajectory (''ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md'') at t ≈ 0–5 / 10–12 / 20–30 s. Pool grows in response to scene complexity (B-frame depth, motion-vector load), explaining the discrete bursts. |
| | * **Steady state (post-warmup)**: every frame pays 2× ''glEGLImageTargetTexture2DOES'' for NV12 (Y plane + CbCr plane). On panfrost, this rebind has non-trivial cost even when the (texture, image) pair is unchanged or when the new image was previously bound to the same texture in a recent frame. |
| | |
| | cage's parity here is informative: cage composites a single fullscreen RGBA surface, so its ''OpenGLSurfaceTexture::updateDmabufTexture'' runs the **single-plane** branch ('':493-498'') — 1× rebind per frame. KWin direct on the same workload runs the **multi-plane** branch — 2× rebind per frame, plus the warmup re-import bursts that cage does not exhibit (cage's surfaces are GL-rendered framebuffers KWin imports once, not a V4L2-cycled video pool). |
| | |
| | ==== Predicted Phase 4 patch shape ==== |
| | |
| | Cache the GLTexture alongside the EGLImage in ''EglBackend::m_importedBuffers'', keyed by ''(GraphicsBuffer *, plane)''. On ''updateDmabufTexture'', look up the per-(buffer, plane) GLTexture and re-target the per-surface ''m_texture.planes[plane]'' to that GLTexture's name (or, more invasively, swap the GLTexture pointer entirely). Eliminates per-frame ''glEGLImageTargetTexture2DOES'' after warmup. Concrete edit site: ''src/scene/surfaceitem.cpp:472-501'' (''updateDmabufTexture'') plus the cache extension in ''src/opengl/eglbackend.cpp:279-321'' and its header. |
| | |
| | The rebind pattern was introduced in the original NV12 Wayland dmabuf support (commit ''3568829216 opengl: Add support for NV12 on Wayland dmabufs'', pre-2024); no commit message documents a defensive rationale. The merge commit ''8c37d1926a'' (BasicEGLSurfaceTextureWayland → OpenGLSurfaceTexture) and refactor ''cf8ee656a9'' (move surface-texture business to scene/) preserved the pattern unchanged. Phase 5 patch description must explain the mechanism (''glEGLImageTargetTexture2DOES'' is idempotent for an unchanged image binding, and the buffer's //contents// change doesn't require a re-bind because the texture is already backed by the dmabuf via the EGLImage) — not just cite the symptom. |
| | |
| | ==== Phase 3 measurement that validates this hypothesis ==== |
| | |
| | ''perf record -p $(pgrep kwin_wayland)'' during 70 s playback under the locked [[kwin_overlay_subsurface:phase1_lock|phase1_lock]] protocol. Expectation: hot symbols include ''glEGLImageTargetTexture2DOES'' (or its panfrost-side implementation, e.g. ''panvk_*'' / ''panfrost_resource_setup'') at a non-trivial fraction of ''kwin_wayland'' self-time during steady-state. If hot, hypothesis confirmed at the file:line. If cold (i.e. ''glEGLImageTargetTexture2DOES'' doesn't show up), the cost is elsewhere and Phase 2 must re-open. Cage perf record under the same workload provides the differential — cage should NOT show the same symbol at the same heat. |
| | |
| | ==== Caveats ==== |
| | |
| | * The hypothesis is consistent with A2 trajectory (warmup bursts + steady-state CPU) but is **not yet validated by hot-path data**. Phase 3 perf record is the highest-value remaining measurement (per architect, see [[kwin_overlay_subsurface:phase0_findings|phase0_findings]]). |
| | * The rebind cost on panfrost specifically is asserted from first principles. If the panfrost implementation of ''glEGLImageTargetTexture2DOES'' happens to short-circuit identical re-binds, the steady-state cost is elsewhere (possibly in ''m_texture.planes[plane]->bind()'' GL state churn or further upstream in damage tracking). |
| | * ''OpenGLSurfaceContents'' (the ''m_texture'' field, surfaceitem.h:174) is per-''OpenGLSurfaceTexture'', not per-buffer. Caching the GLTexture per-buffer requires a different ownership model. This is a Phase 4 design decision, not a Phase 2 fact. |
| |