====== Phase 0 — handover findings from ohm_gl_fix ====== This is the campaign's Phase 0 substrate: what we already know, what's been measured, what's open. All evidence comes from the [[ohm_gl_fix:start|ohm_gl_fix]] campaign (''~/src/ohm_gl_fix/'') which closed 2026-05-02 with the explicit re-scope into this campaign. ===== Decisive A/B that motivates this campaign ===== ''ohm_gl_fix/phase3_remeasure_2026-05-02/task25_cage_vs_kwin_decisive.md''. Same Brave binary (Chromium 149 with ohm_gl_fix Step 1 + Step 2 patches), same hardware (PineTab2 RK3568 + Mali-G52 panfrost + Mesa 26.0.5), same clip (''bbb_1080p30_h264.mp4''), same qt6-base-fourier + kwin-fourier installed: ^ Compositor ^ Frames ^ Drops ^ drops_post_warmup ^ drops/total ^ | KWin (default Plasma session) | 1498 | 58 | 29 | 3.87 % | | cage (nested kiosk inside KWin) | 1686 | 7 | 0 | 0.42 % | Phase 1r [[kwin_overlay_subsurface:metrics|metrics.csv]] binding cell ''drops_post_warmup == 0'' with sanity cap ''drops <= 10'' is **MET under cage and not met under KWin** on the same hardware with the same Brave binary. ===== What this isolates ===== cage runs //nested inside// KWin in this measurement (cage's output is a single fullscreen wayland surface that KWin then composites). That means: * KWin's GL composite of //one regular surface//: cheap. * KWin's GL composite of //one parent + one subsurface//: expensive (51-drop delta in our 70 s window). The bug is **subsurface-specific cost path inside KWin**, not "KWin is slow on ARM" generally. That framing is critical for the Phase 5 bug report — the reception is very different for "your subsurface composite path has a regression" vs "your compositor is slow on ARM." ===== What is NOT the cause (ruled out by ohm_gl_fix Phase 3 remeasure) ===== * **Brave-side overlay route.** Per ''ohm_gl_fix/phase3_remeasure_2026-05-02/task23_per_frame_route.md'', Brave correctly engages ''WaylandBufferManagerHost::CommitOverlays'' + ''GbmSurfacelessWayland::QueueWaylandOverlayConfig'' 87 / 88 Viz frames during steady-state playback. The ''zwp_linux_dmabuf_v1'' subsurface IS being committed correctly per frame. The Step 2 ohm_gl_fix patch did its job at the stage-1 gate level and the stage-2 OverlayCandidate filter accepts the candidates. * **Mali-G52 / panfrost as a hardware overlay scanout floor.** cage runs on the SAME panfrost + Mesa 26.0.5 stack and gets 7 drops. The architect's earlier hypothesis ("panfrost has no hardware overlay scanout, so KWin GL-composites the subsurface, and the cost is hardware-bounded") is partially refuted — cage GL-composites too, and gets 7 drops. The cost is in KWin-specific per-frame work for the parent+subsurface case. * **The KWin GL_ALPHA stall** (separate fourier issue). Already patched out via ''kwin-fourier 1:6.6.4-3''. Confirmed gone via ''journalctl --user -u plasma-kwin_wayland'' — 0 GL_INVALID errors during both runs. The ''kwin-fourier'' patch is necessary but not sufficient — under the patched KWin we still get 58 drops on direct Brave. * **Step 1 libva-v4l2-request port.** Brave doesn't use libva (per ''ohm_gl_fix/phase3_remeasure_2026-05-02/B3_decoder_discovery.md'': Brave uses Chromium's own ''V4L2VideoDecoder'' in ''media/gpu/v4l2/'', opens /dev/video1 directly, no libva libs in /proc/PID/maps). Step 1 patches are off this campaign's critical path. ===== A2 trajectory pattern that hints at mechanism ===== ''ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md'' shows drops accumulate on KWin in three discrete burst events at t≈0-5 s, t≈10-12 s, t≈20-30 s, then steady at 0/sec from t≈30 s on. Cage settles to 0/sec from t≈4 s. The architect's 2026-05-02 second consultation flagged this trajectory as evidence that **KWin is hitting cold-path EGL/dmabuf re-import that hits a slow path the first time it sees a particular dmabuf-fd**. As the V4L2 capture-buffer ring recycles its fds (Phase 2 substrate of ohm_gl_fix found 9 distinct dmabuf fds from hantro), KWin warms up over the first ~30 seconds and then steady-states. cage sees the same fds and warms up in 4 seconds. This points specifically at the **dmabuf-import-caching hypothesis** (KWin re-imports dmabufs as GL textures every frame, or every-fd-cycle) and **away from the wp_drm_lease / direct-scanout hypothesis**. Phase 2 source read should prioritise the import-caching path. ===== Per-process CPU split during steady-state ===== From ''ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md'': ^ Process ^ %CPU ^ | chrome gpu-process | 33.5 | | chrome renderer | 5.9 | | **kwin_wayland** | **20.5** | | chrome browser | 8.0 | | chrome audio service | 8.0 | | **combined** | **~76** | Under cage the same combined-CPU breakdown isn't yet captured — that's the Phase 3 first measurement of this campaign (see [[kwin_overlay_subsurface:worklist|worklist]]). If cage's ''kwin_wayland %CPU'' is significantly lower than 20.5 %, the work IS being avoided (compositor is genuinely doing less). If similar, the work moved laterally (maybe cage absorbs some of it before submitting one buffer to KWin) — which would point at where the patch needs to land. The architect's recommended highest-value remaining measurement is ''perf record -p $(pgrep kwin_wayland)'' during BOTH cage and direct-Brave runs — converts the hypothesis into hot-path evidence. That's a Phase 3 measurement of this campaign. ===== Software stack at handover time ===== * **chromium-builder-x86 LXC on data**: holds the chromium 149 build with ohm_gl_fix Step 1 + Step 2 patches. Reach via ''boltzmann → ssh -J hertz root@data → pct exec 220 --''. * **chromium-builder LXD on boltzmann**: holds the qt6-base-fourier + kwin-fourier source/build context. Used during the qt6/kwin builds that were installed on ohm. * **ohm**: PineTab2 with current install state — Step 1 + Step 2 chrome at ''/tmp/chromium-ohm-gl-fix-step2/chrome'', qt6-base-fourier 1:6.11.0-3, kwin-fourier 1:6.6.4-3, cage 0.x, Mesa 26.0.5, kernel 6.19.10. ===== Architect's recommended Phase 2 reading list ===== Quoted from the second consultation: > "Phase 2 / Phase 3 explicit work: read the relevant KWin source (''/usr/src'' if you have it, or the upstream tree at ''invent.kde.org/plasma/kwin'': ''src/scene/surfaceitem_wayland.cpp'', ''src/scene/itemrenderer_opengl.cpp'', ''src/wayland/linuxdmabufv1clientbuffer.cpp'', ''src/backends/drm/''). Determine whether the per-frame cost is dmabuf re-import to GL texture, full-frame GL composite, or missed scanout-promotion via the DRM atomic path. **Do not write any patch before that read is documented.** This is the discipline ohm_gl_fix lacked early."