Phase 0 — handover findings from ohm_gl_fix

Phase 0 — handover findings from ohm_gl_fix

This is the campaign's Phase 0 substrate: what we already know, what's been measured, what's open. All evidence comes from the ohm_gl_fix campaign (~/src/ohm_gl_fix/) which closed 2026-05-02 with the explicit re-scope into this campaign.

Decisive A/B that motivates this campaign

ohm_gl_fix/phase3_remeasure_2026-05-02/task25_cage_vs_kwin_decisive.md. Same Brave binary (Chromium 149 with ohm_gl_fix Step 1 + Step 2 patches), same hardware (PineTab2 RK3568 + Mali-G52 panfrost + Mesa 26.0.5), same clip (bbb_1080p30_h264.mp4), same qt6-base-fourier + kwin-fourier installed:

Compositor	Frames	Drops	drops_post_warmup	drops/total
KWin (default Plasma session)	1498	58	29	3.87 %
cage (nested kiosk inside KWin)	1686	7	0	0.42 %

Phase 1r metrics.csv binding cell drops_post_warmup == 0 with sanity cap drops ⇐ 10 is MET under cage and not met under KWin on the same hardware with the same Brave binary.

What this isolates

cage runs nested inside KWin in this measurement (cage's output is a single fullscreen wayland surface that KWin then composites). That means:

KWin's GL composite of one regular surface: cheap.
KWin's GL composite of one parent + one subsurface: expensive (51-drop delta in our 70 s window).

The bug is subsurface-specific cost path inside KWin, not “KWin is slow on ARM” generally. That framing is critical for the Phase 5 bug report — the reception is very different for “your subsurface composite path has a regression” vs “your compositor is slow on ARM.”

What is NOT the cause (ruled out by ohm_gl_fix Phase 3 remeasure)

Brave-side overlay route. Per ohm_gl_fix/phase3_remeasure_2026-05-02/task23_per_frame_route.md, Brave correctly engages WaylandBufferManagerHost::CommitOverlays + GbmSurfacelessWayland::QueueWaylandOverlayConfig 87 / 88 Viz frames during steady-state playback. The zwp_linux_dmabuf_v1 subsurface IS being committed correctly per frame. The Step 2 ohm_gl_fix patch did its job at the stage-1 gate level and the stage-2 OverlayCandidate filter accepts the candidates.
Mali-G52 / panfrost as a hardware overlay scanout floor. cage runs on the SAME panfrost + Mesa 26.0.5 stack and gets 7 drops. The architect's earlier hypothesis (“panfrost has no hardware overlay scanout, so KWin GL-composites the subsurface, and the cost is hardware-bounded”) is partially refuted — cage GL-composites too, and gets 7 drops. The cost is in KWin-specific per-frame work for the parent+subsurface case.
The KWin GL_ALPHA stall (separate fourier issue). Already patched out via kwin-fourier 1:6.6.4-3. Confirmed gone via journalctl –user -u plasma-kwin_wayland — 0 GL_INVALID errors during both runs. The kwin-fourier patch is necessary but not sufficient — under the patched KWin we still get 58 drops on direct Brave.
Step 1 libva-v4l2-request port. Brave doesn't use libva (per ohm_gl_fix/phase3_remeasure_2026-05-02/B3_decoder_discovery.md: Brave uses Chromium's own V4L2VideoDecoder in media/gpu/v4l2/, opens /dev/video1 directly, no libva libs in /proc/PID/maps). Step 1 patches are off this campaign's critical path.

A2 trajectory pattern that hints at mechanism

ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md shows drops accumulate on KWin in three discrete burst events at t≈0-5 s, t≈10-12 s, t≈20-30 s, then steady at 0/sec from t≈30 s on. Cage settles to 0/sec from t≈4 s.

The architect's 2026-05-02 second consultation flagged this trajectory as evidence that KWin is hitting cold-path EGL/dmabuf re-import that hits a slow path the first time it sees a particular dmabuf-fd. As the V4L2 capture-buffer ring recycles its fds (Phase 2 substrate of ohm_gl_fix found 9 distinct dmabuf fds from hantro), KWin warms up over the first ~30 seconds and then steady-states. cage sees the same fds and warms up in 4 seconds.

This points specifically at the dmabuf-import-caching hypothesis (KWin re-imports dmabufs as GL textures every frame, or every-fd-cycle) and away from the wp_drm_lease / direct-scanout hypothesis. Phase 2 source read should prioritise the import-caching path.

Per-process CPU split during steady-state

From ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md:

Process	%CPU
chrome gpu-process	33.5
chrome renderer	5.9
kwin_wayland	20.5
chrome browser	8.0
chrome audio service	8.0
combined	~76

Under cage the same combined-CPU breakdown isn't yet captured — that's the Phase 3 first measurement of this campaign (see worklist). If cage's kwin_wayland %CPU is significantly lower than 20.5 %, the work IS being avoided (compositor is genuinely doing less). If similar, the work moved laterally (maybe cage absorbs some of it before submitting one buffer to KWin) — which would point at where the patch needs to land.

The architect's recommended highest-value remaining measurement is perf record -p $(pgrep kwin_wayland) during BOTH cage and direct-Brave runs — converts the hypothesis into hot-path evidence. That's a Phase 3 measurement of this campaign.

Software stack at handover time

chromium-builder-x86 LXC on data: holds the chromium 149 build with ohm_gl_fix Step 1 + Step 2 patches. Reach via boltzmann → ssh -J hertz root@data → pct exec 220 –.
chromium-builder LXD on boltzmann: holds the qt6-base-fourier + kwin-fourier source/build context. Used during the qt6/kwin builds that were installed on ohm.
ohm: PineTab2 with current install state — Step 1 + Step 2 chrome at /tmp/chromium-ohm-gl-fix-step2/chrome, qt6-base-fourier 1:6.11.0-3, kwin-fourier 1:6.6.4-3, cage 0.x, Mesa 26.0.5, kernel 6.19.10.

Architect's recommended Phase 2 reading list

Quoted from the second consultation:

“Phase 2 / Phase 3 explicit work: read the relevant KWin source (/usr/src if you have it, or the upstream tree at invent.kde.org/plasma/kwin: src/scene/surfaceitem_wayland.cpp, src/scene/itemrenderer_opengl.cpp, src/wayland/linuxdmabufv1clientbuffer.cpp, src/backends/drm/). Determine whether the per-frame cost is dmabuf re-import to GL texture, full-frame GL composite, or missed scanout-promotion via the DRM atomic path. Do not write any patch before that read is documented. This is the discipline ohm_gl_fix lacked early.”

Table of Contents