Table of Contents

Phase 0 — handover findings from ohm_gl_fix

This is the campaign's Phase 0 substrate: what we already know, what's been measured, what's open. All evidence comes from the ohm_gl_fix campaign (~/src/ohm_gl_fix/) which closed 2026-05-02 with the explicit re-scope into this campaign.

Decisive A/B that motivates this campaign

ohm_gl_fix/phase3_remeasure_2026-05-02/task25_cage_vs_kwin_decisive.md. Same Brave binary (Chromium 149 with ohm_gl_fix Step 1 + Step 2 patches), same hardware (PineTab2 RK3568 + Mali-G52 panfrost + Mesa 26.0.5), same clip (bbb_1080p30_h264.mp4), same qt6-base-fourier + kwin-fourier installed:

Compositor Frames Drops drops_post_warmup drops/total
KWin (default Plasma session) 1498 58 29 3.87 %
cage (nested kiosk inside KWin) 1686 7 0 0.42 %

Phase 1r metrics.csv binding cell drops_post_warmup == 0 with sanity cap drops ⇐ 10 is MET under cage and not met under KWin on the same hardware with the same Brave binary.

What this isolates

cage runs nested inside KWin in this measurement (cage's output is a single fullscreen wayland surface that KWin then composites). That means:

The bug is subsurface-specific cost path inside KWin, not “KWin is slow on ARM” generally. That framing is critical for the Phase 5 bug report — the reception is very different for “your subsurface composite path has a regression” vs “your compositor is slow on ARM.”

What is NOT the cause (ruled out by ohm_gl_fix Phase 3 remeasure)

A2 trajectory pattern that hints at mechanism

ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md shows drops accumulate on KWin in three discrete burst events at t≈0-5 s, t≈10-12 s, t≈20-30 s, then steady at 0/sec from t≈30 s on. Cage settles to 0/sec from t≈4 s.

The architect's 2026-05-02 second consultation flagged this trajectory as evidence that KWin is hitting cold-path EGL/dmabuf re-import that hits a slow path the first time it sees a particular dmabuf-fd. As the V4L2 capture-buffer ring recycles its fds (Phase 2 substrate of ohm_gl_fix found 9 distinct dmabuf fds from hantro), KWin warms up over the first ~30 seconds and then steady-states. cage sees the same fds and warms up in 4 seconds.

This points specifically at the dmabuf-import-caching hypothesis (KWin re-imports dmabufs as GL textures every frame, or every-fd-cycle) and away from the wp_drm_lease / direct-scanout hypothesis. Phase 2 source read should prioritise the import-caching path.

Per-process CPU split during steady-state

From ohm_gl_fix/phase3_remeasure_2026-05-02/A2_brave_drops_findings.md:

Process %CPU
chrome gpu-process 33.5
chrome renderer 5.9
kwin_wayland 20.5
chrome browser 8.0
chrome audio service 8.0
combined ~76

Under cage the same combined-CPU breakdown isn't yet captured — that's the Phase 3 first measurement of this campaign (see worklist). If cage's kwin_wayland %CPU is significantly lower than 20.5 %, the work IS being avoided (compositor is genuinely doing less). If similar, the work moved laterally (maybe cage absorbs some of it before submitting one buffer to KWin) — which would point at where the patch needs to land.

The architect's recommended highest-value remaining measurement is perf record -p $(pgrep kwin_wayland) during BOTH cage and direct-Brave runs — converts the hypothesis into hot-path evidence. That's a Phase 3 measurement of this campaign.

Software stack at handover time

Quoted from the second consultation:

“Phase 2 / Phase 3 explicit work: read the relevant KWin source (/usr/src if you have it, or the upstream tree at invent.kde.org/plasma/kwin: src/scene/surfaceitem_wayland.cpp, src/scene/itemrenderer_opengl.cpp, src/wayland/linuxdmabufv1clientbuffer.cpp, src/backends/drm/). Determine whether the per-frame cost is dmabuf re-import to GL texture, full-frame GL composite, or missed scanout-promotion via the DRM atomic path. Do not write any patch before that read is documented. This is the discipline ohm_gl_fix lacked early.”