User Tools

Site Tools


kwin_overlay_subsurface:phase2_source_findings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
kwin_overlay_subsurface:phase2_source_findings [2026/05/02 17:51] – Phase 1 deliverables — initial upload from kwin_overlay_subsurface dossier markus_fritschekwin_overlay_subsurface:phase2_source_findings [2026/05/02 20:16] (current) – SUPERSEDED block-quote at top of Phase 2 hypothesis section, pointing at phase3_findings (prior preserved verbatim) markus_fritsche
Line 8: Line 8:
  
 ===== Phase 1 — leading question answer ===== ===== Phase 1 — leading question answer =====
 +
 +**Status: LOCKED 2026-05-02.** This section is the Phase 1 deliverable. Future Phase 2 / Phase 3 measurements may modify the file-level findings below this section but must not silently move the Phase 1 answer; if reality after measurement contradicts this, the contradiction is documented as a new section, not by editing this one.
  
 **Question** (from [[kwin_overlay_subsurface:worklist|worklist]]): **Question** (from [[kwin_overlay_subsurface:worklist|worklist]]):
Line 71: Line 73:
 ==== Hardware: rockchip-drm plane format/modifier table on ohm ==== ==== Hardware: rockchip-drm plane format/modifier table on ohm ====
  
-Raw evidence: {{:kwin_overlay_subsurface:ohm_drm_info_2026-05-02.json|ohm_drm_info_2026-05-02.json}}{{:kwin_overlay_subsurface:ohm_modetest_planes_2026-05-02.txt|ohm_modetest_planes_2026-05-02.txt}}.+Raw evidence: [[kwin_overlay_subsurface:evidence:ohm_drm_info_2026-05-02|ohm_drm_info_2026-05-02.json (inlined)]][[kwin_overlay_subsurface:evidence:ohm_modetest_planes_2026-05-02|ohm_modetest_planes_2026-05-02.txt (inlined)]].
  
 DRM driver: ''rockchip-drm'' (RockChip Soc DRM, 1.0.0). Active connector: DSI-1 (the PineTab2's internal panel), 800×1280 mode preferred. Two CRTCs visible (51 inactive, 52 active, fb=60). DRM driver: ''rockchip-drm'' (RockChip Soc DRM, 1.0.0). Active connector: DSI-1 (the PineTab2's internal panel), 800×1280 mode preferred. Two CRTCs visible (51 inactive, 52 active, fb=60).
Line 146: Line 148:
 ---- ----
  
-===== Architectural map — to be written during Phase 2 file-level read =====+===== Architectural map =====
  
-(Stub. Phase 2 reading list per [[kwin_overlay_subsurface:worklist|worklist]]: ''src/wayland/linuxdmabufv1clientbuffer.cpp''''src/scene/surfaceitem_wayland.cpp'', ''src/scene/itemrenderer_opengl.cpp'', scanout entry points already read for Phase 1, ''src/backends/drm/''.)+End-to-end per-frame path for a Brave wp_subsurface presenting an NV12 LINEAR dmabuf at 1080p30 on the windowed parent (Brave UI)Plasma 6.6.4 + EglBackend + panfrost.
  
-Architectural inputs already established:+==== Buffer ingress (one-time per wl_buffer) ====
  
-  * KWin 6.6.4 negotiates ''wp_linux_drm_syncobj_v1'' **explicit sync** with Chromium-class clients. Brave's dmabuf attaches go through ''Transaction::watchSyncObj'' (''src/wayland/transaction.cpp:244-249''), not ''Transaction::watchDmaBuf''. (Source: kwin-fourier MR body in ''~/src/marfrit-packages/upstream-submissions/kwin-fourier/kde-mr-body.md'', zero-EXPORT_SYNC_FILE strace finding over 60 s playback.) +  * Brave commits ''wp_linux_buffer_params.create_immed'' per V4L2 capture buffer slot. KWin instantiates a ''LinuxDmaBufV1ClientBuffer'' (''src/wayland/linuxdmabufv1clientbuffer.cpp:216''), which IS-A ''GraphicsBuffer'' storing ''DmaBufAttributes'' (fd, offset, pitch per plane, modifier, format, width, height) — '':354-358''
-  Fourier patches ''0001'' and ''0002'' only touch ''watchDmaBuf'' — irrelevant to Phase 1'scanout-promotion path.+  * ''RenderBackend::testImportBuffer(clientBuffer)'' ('':217'') validates that the rendering backend can import this dmabuf at all. Successful → ''wl_buffer.created'' sent, the wl_buffer enters Brave's reuse pool. **No GL texture exists yet.** 
 +  * Lifetime: until Brave destroys the wl_buffer ('':339''). For Chromium with a V4L2 capture pool of N buffers, this means N stable ''LinuxDmaBufV1ClientBuffer'' / ''GraphicsBuffer*'' identities, reused round-robin across frames. 
 + 
 +==== Per-attach (every wl_surface.commit with a buffer attached) ==== 
 + 
 +  * ''SurfaceInterface::bufferChanged'' fires → ''SurfaceItemWayland::handleBufferChanged'' (''src/scene/surfaceitem_wayland.cpp:103-106'') → ''setBuffer(...)''KWin 6.6.4 negotiates ''wp_linux_drm_syncobj_v1'' explicit sync with Chromium-class clients, so the buffer commit goes through ''Transaction::watchSyncObj'' (''src/wayland/transaction.cpp:244-249''), NOT ''watchDmaBuf''. (Source: kwin-fourier MR body, zero ''DMA_BUF_IOCTL_EXPORT_SYNC_FILE'' over 60 s playback.) Fourier patches only touch ''watchDmaBuf'' — confirmed irrelevant. 
 +  * Damage region updated; per-frame compositor wakes. 
 + 
 +==== Per-frame texture-update (the hot path) ==== 
 + 
 +''SurfaceItem::preprocess()'' (''src/scene/surfaceitem.cpp:187-208''): if ''m_texture'' exists and size matches, call ''m_texture->update(damageRegion)''; else call ''m_texture->create()''. Brave's video buffers are all the same size (1920×1080), so after the first frame the same ''OpenGLSurfaceTexture'' is re-used — **''update()'' is the steady-state path, not ''create()''**. 
 + 
 +''OpenGLSurfaceTexture::updateDmabufTexture(buffer)'' (''src/scene/surfaceitem.cpp:472-501''): 
 + 
 +<code cpp> 
 +// for NV12 (s_drmConversions match in src/utils/drm_format_helper.h:35-44): 
 +//   plane 0 = R8 full-size (Y), plane 1 = GR88 half-size (CbCr) 
 +for (uint plane = 0; plane < itConv->plane.count(); ++plane) { 
 +    ... 
 +    m_texture.planes[plane]->bind();                                           // glBindTexture, cheap 
 +    glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, 
 +        m_backend->importBufferAsImage(buffer, plane, currentPlane.format, size));  // *** suspect *** 
 +    m_texture.planes[plane]->unbind(); 
 +
 +</code> 
 + 
 +''EglBackend::importBufferAsImage(bufferplane, format, size)'' (''src/opengl/eglbackend.cpp:279-299''): 
 + 
 +<code cpp> 
 +std::pair key(buffer, plane); 
 +auto it = m_importedBuffers.constFind(key); 
 +if (Q_LIKELY(it != m_importedBuffers.constEnd())) { 
 +    return *it;                                                                // CACHE HIT after warmup 
 +
 +// MISS: create fresh EGLImage from DmaBufAttributes via EglDisplay::importDmaBufAsImage 
 +// — the genuinely expensive cold-path EGL_LINUX_DMA_BUF_EXT import 
 +</code> 
 + 
 +The cache key is ''(GraphicsBuffer *, plane_index)''. After warmup, every frame for every plane is a **cache hit on EGLImage** but **''glEGLImageTargetTexture2DOES'' is still called every frame to re-target the persistent ''GLTexture'' to whatever EGLImage corresponds to the current frame'buffer**. 
 + 
 +For NV12 video, this re-target happens **2× per frame** (Y plane R8 + CbCr plane GR88). For RGBA single-surface (e.g. cage's fullscreen output, or any non-YUV client), it happens 1×. 
 + 
 +==== Per-frame rendering (also hot but well-understood) ==== 
 + 
 +''ItemRendererOpenGL::renderItem'' (''src/scene/itemrenderer_opengl.cpp:334''): standard quad render. ''vbo->bindArrays'', ''glActiveTexture(GL_TEXTURE0+i) + texture[i]->bind()'' per plane ('':473-474''), draw, unbind. No suspicious work; the texture binds here are plain ''glBindTexture'', not ''glEGLImageTargetTexture2DOES''.
  
 ---- ----
Line 159: Line 205:
 ===== File-level findings — Phase 2 reading list ===== ===== File-level findings — Phase 2 reading list =====
  
-(To be filled in during Phase 2 source-read.) +  * [x] ''src/wayland/linuxdmabufv1clientbuffer.cpp'' — protocol-only. Creates ''LinuxDmaBufV1ClientBuffer'' once per wl_buffer ('':165:216''). ''renderBackend->testImportBuffer'' validates at creation time ('':166, :217''). NO GL texture import here; that lives in the EglBackend / surface-texture code. Lifetime tied to wl_buffer; for Chromium's V4L2 capture pool this is N stable buffers reused round-robin
- +  * [x] ''src/scene/surfaceitem_wayland.cpp'' — slot-driven: ''handleBufferChanged'' ('':103'') just stores the new buffer pointer and emits damage. Subsurface tree built in ''handleChildSubSurfacesChanged'' ('':142'') once per surface tree change — not per frame. **No per-frame slow path here.** The actual texture work is in ''surfaceitem.cpp::OpenGLSurfaceTexture::updateDmabufTexture''
-  * [ ] ''src/wayland/linuxdmabufv1clientbuffer.cpp'' — dmabuf importfd → GL texture lifetime, cache hit/miss surface. +  * [x] ''src/scene/itemrenderer_opengl.cpp'' — ''renderItem'' ('':334-499'') does standard quad rendering with ''glBindTexture'' on the already-imported GLTextures. **Not the cost site.** Per-plane texture binds at '':473-474'' are plain ''glBindTexture''. No special-case for parent+subsurface vs single-surface.
-  * [ ] ''src/scene/surfaceitem_wayland.cpp'' — per-subsurface state, slow paths on first sight or fd cycle+
-  * [ ] ''src/scene/itemrenderer_opengl.cpp'' — parent+subsurface render path special cases.+
   * [x] ''src/scene/composite.cpp'' + scene scheduling — promotion predicate. Done in Phase 1.   * [x] ''src/scene/composite.cpp'' + scene scheduling — promotion predicate. Done in Phase 1.
   * [x] ''src/backends/drm/'' — DRM atomic plane-probe, format/modifier acceptance per output. Done in Phase 1 to the depth needed for the leading question.   * [x] ''src/backends/drm/'' — DRM atomic plane-probe, format/modifier acceptance per output. Done in Phase 1 to the depth needed for the leading question.
 +
 +----
 +
 +===== Phase 2 hypothesis — concrete, file:line =====
 +
 +> **SUPERSEDED 2026-05-02 by Phase 3 measurement.** Verdict: H1 rejected at N=1 across C0/C1/C2 + exploratory C3 stock-Brave. The symbol's self-time peaks at 0.15 % vs the 20 % threshold. See [[kwin_overlay_subsurface:phase3_findings|phase3_findings]]. The hypothesis text below is preserved as the "what we believed before measurement" record per the discipline rule (''feedback_phase_discipline.md''); do not edit it. New working hypothesis H1' (per-frame Wayland-protocol dispatch dominates) emerges from Phase 3 and gets its own Phase 2-prime source-read.
 +
 +**Per-frame cost in KWin's parent + wp_subsurface composite path on Mali-G52 panfrost lives in ''OpenGLSurfaceTexture::updateDmabufTexture'' (''src/scene/surfaceitem.cpp:472-501''), specifically the ''glEGLImageTargetTexture2DOES'' call at line 490 (multi-plane YUV) / line 496 (single-plane).**
 +
 +==== Mechanism ====
 +
 +The ''EglBackend::m_importedBuffers'' cache (''src/opengl/eglbackend.h:116'', ''src/opengl/eglbackend.cpp:279-321'') does cache the EGLImage per ''(GraphicsBuffer *, plane)'', so after warmup the EGLImage lookup is a hash hit. But the EGLImage and GLTexture are decoupled: a single per-surface ''m_texture.planes[plane]'' GLTexture is **re-targeted to a different EGLImage every frame** via ''glEGLImageTargetTexture2DOES'', because ''OpenGLSurfaceTexture::updateDmabufTexture'' is unconditional — it calls the function on every ''update()'', regardless of whether the underlying EGLImage actually changed.
 +
 +For Brave's V4L2 capture pool of N buffers cycling round-robin:
 +
 +  * **Warmup (≤ ~30 s on ohm)**: each new GraphicsBuffer\* miss in the cache → fresh ''EglDisplay::importDmaBufAsImage'' (kernel-side dmabuf-to-EGLImage import). 6-9 expensive first-imports correlate with the three drop-bursts in the A2 trajectory (''ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md'') at t ≈ 0–5 / 10–12 / 20–30 s. Pool grows in response to scene complexity (B-frame depth, motion-vector load), explaining the discrete bursts.
 +  * **Steady state (post-warmup)**: every frame pays 2× ''glEGLImageTargetTexture2DOES'' for NV12 (Y plane + CbCr plane). On panfrost, this rebind has non-trivial cost even when the (texture, image) pair is unchanged or when the new image was previously bound to the same texture in a recent frame.
 +
 +cage's parity here is informative: cage composites a single fullscreen RGBA surface, so its ''OpenGLSurfaceTexture::updateDmabufTexture'' runs the **single-plane** branch ('':493-498'') — 1× rebind per frame. KWin direct on the same workload runs the **multi-plane** branch — 2× rebind per frame, plus the warmup re-import bursts that cage does not exhibit (cage's surfaces are GL-rendered framebuffers KWin imports once, not a V4L2-cycled video pool).
 +
 +==== Predicted Phase 4 patch shape ====
 +
 +Cache the GLTexture alongside the EGLImage in ''EglBackend::m_importedBuffers'', keyed by ''(GraphicsBuffer *, plane)''. On ''updateDmabufTexture'', look up the per-(buffer, plane) GLTexture and re-target the per-surface ''m_texture.planes[plane]'' to that GLTexture's name (or, more invasively, swap the GLTexture pointer entirely). Eliminates per-frame ''glEGLImageTargetTexture2DOES'' after warmup. Concrete edit site: ''src/scene/surfaceitem.cpp:472-501'' (''updateDmabufTexture'') plus the cache extension in ''src/opengl/eglbackend.cpp:279-321'' and its header.
 +
 +The rebind pattern was introduced in the original NV12 Wayland dmabuf support (commit ''3568829216 opengl: Add support for NV12 on Wayland dmabufs'', pre-2024); no commit message documents a defensive rationale. The merge commit ''8c37d1926a'' (BasicEGLSurfaceTextureWayland → OpenGLSurfaceTexture) and refactor ''cf8ee656a9'' (move surface-texture business to scene/) preserved the pattern unchanged. Phase 5 patch description must explain the mechanism (''glEGLImageTargetTexture2DOES'' is idempotent for an unchanged image binding, and the buffer's //contents// change doesn't require a re-bind because the texture is already backed by the dmabuf via the EGLImage) — not just cite the symptom.
 +
 +==== Phase 3 measurement that validates this hypothesis ====
 +
 +''perf record -p $(pgrep kwin_wayland)'' during 70 s playback under the locked [[kwin_overlay_subsurface:phase1_lock|phase1_lock]] protocol. Expectation: hot symbols include ''glEGLImageTargetTexture2DOES'' (or its panfrost-side implementation, e.g. ''panvk_*'' / ''panfrost_resource_setup'') at a non-trivial fraction of ''kwin_wayland'' self-time during steady-state. If hot, hypothesis confirmed at the file:line. If cold (i.e. ''glEGLImageTargetTexture2DOES'' doesn't show up), the cost is elsewhere and Phase 2 must re-open. Cage perf record under the same workload provides the differential — cage should NOT show the same symbol at the same heat.
 +
 +==== Caveats ====
 +
 +  * The hypothesis is consistent with A2 trajectory (warmup bursts + steady-state CPU) but is **not yet validated by hot-path data**. Phase 3 perf record is the highest-value remaining measurement (per architect, see [[kwin_overlay_subsurface:phase0_findings|phase0_findings]]).
 +  * The rebind cost on panfrost specifically is asserted from first principles. If the panfrost implementation of ''glEGLImageTargetTexture2DOES'' happens to short-circuit identical re-binds, the steady-state cost is elsewhere (possibly in ''m_texture.planes[plane]->bind()'' GL state churn or further upstream in damage tracking).
 +  * ''OpenGLSurfaceContents'' (the ''m_texture'' field, surfaceitem.h:174) is per-''OpenGLSurfaceTexture'', not per-buffer. Caching the GLTexture per-buffer requires a different ownership model. This is a Phase 4 design decision, not a Phase 2 fact.
  
kwin_overlay_subsurface/phase2_source_findings.1777744279.txt.gz · Last modified: by markus_fritsche