This is an old revision of the document!

ohm_gl_fix — Phase 1 (revised), 2026-05-01

This page replaces the original Phase 1 lock at phase1_2026-04-30. The original locked an mpv-specific quantitative target (“drops from 1039/1440 → within transient-startup floor of gst→waylandsink”) on a single test invocation. By the time Phase 3 revised landed, the campaign had been reframed twice (Markus 2026-04-30: “drops post warmup, not drops total”; Markus 2026-04-30 evening: “I do not seek to optimize mpv. I seek to identify the structural gap”). This refinement folds those corrections plus the empirical evidence from Phase 3 revised into a single load-bearing Phase 1.

The original Phase 1 page stands as audit trail; Phase 3 revised + this Phase 1 revised are the live driver going forward.

1. Goal (essence)

Buffer-to-display achieves zero-copy for libavcodec / libva consumers
on Mali-G52 + KWin Wayland, such that in-scope workloads run with the
same memory-subsystem pressure profile as the GStreamer +
linux-dmabuf-v1 reference path.

“Same memory-subsystem pressure profile” is what makes the goal measurable below. The reference path is named in §3.

2. In-scope use cases

YouTube / HTML5 <video> in Brave — the highest-traffic

video-decode workload on this device class.

Web browsing in Brave — compositor-side video + animation

surfaces; same Chromium GPU-process pipeline as YouTube.

VS Code (Electron + Chromium under the hood) — same

pipeline as Brave for any embedded video / animation rendering.

Workloads outside this list are not the campaign's subject. This is deliberate scope-tightening — see §6.

3. Reference baseline (zero-copy benchmark)

gst-launch-1.0 -q filesrc location=bbb_1080p30_h264.mp4 \! qtdemux \! h264parse \! v4l2slh264dec \! waylandsink sync=true on ohm (scenario S1 in Phase 3 revised). Empirical numbers, 60 s steady-state, current stack:

Metric	S1 (zero-copy reference)
CPU%	≈7 % paced
LLC-load-misses	3.0 M / 10 s
cache-misses	2.1 M / 10 s
DRM_IOCTL_* / sec	0 (client-side; compositor presents)
`VIDIOC_EXPBUF`	13 dmabuf fds (9 capture + 4 control)
Wayland fd-passing	11 `SCM_RIGHTS` events to compositor
post-warmup drops	0 (fourier 2026-04-24); ~0.3/s today (stack drift, see §6)

This is the empirical operating point a successful campaign aims to reach for libavcodec / libva consumers on the in-scope workloads.

4. Measurable success criteria (all must hold)

Measurements taken on the in-scope workload (e.g. Brave + bbb-class H.264 video over a 60 s steady-state window, with strace + perf-stat instrumentation per phase3_revised_2026-05-01 §3).

C1 — Drops. Post-warmup drops ≤ 10 over 60 s. Warmup = first

10 s. Drops in warmup may be up to ~10; drops after warmup must be 0.

  Sanity cap on total drops across the full 60 s = 10.
- **C2 — Memory-subsystem pressure.** LLC-load-misses ≤ **3 ×
  baseline** over 10 s steady-state (i.e. ≤ ~9 M). cache-misses
  ≤ ~6 M as a leading indicator.
- **C3 — Display-path activity.** DRM_IOCTL_* per second ≤ **100**.
  Current libavcodec-using contenders sit at 800-1 050 per sec;
  target is the baseline rate (0) plus tolerance.
- **C4 — Boundary fd-passing.** At least one of:
  * (a) ''VIDIOC_EXPBUF'' count > 0 from V4L2 hantro AND the resulting
    fd flowing to the compositor via ''SCM_RIGHTS'' over the Wayland
    socket, OR
  * (b) ''PRIME_FD_TO_HANDLE'' count > 0 from a V4L2-produced dmabuf
    flowing into the GPU process / browser compositor.

C1 is the user-visible criterion (“the video plays smoothly”). C2 and C3 are the physical-layer criteria for “no CPU memcpy of frame data, no per-frame Mesa GL+DRM round-trips”. C4 is the structural criterion (“the path actually exists, not just is fast”).

5. Loopback edges

C1 ✓ + C2 ✗ + C3 ✓ → not possible without something else

creating cache pressure; flag for re-investigation.

C1 ✓ + C2 ✓ + C3 ✗ → Level-1 zero-copy fixed, Level-2 still

missing (decoder produces dmabuf but display path goes through

  Mesa GL). **Re-enter Phase 4 fix-surface choice.**
* **C1 ✗** at Phase 7 verification → re-enter Phase 4 with new perf
  evidence per [[ohm_gl_fix:phase4_2026-04-30|Phase 4]] §6
  loopback condition.
* **C4 ✗ but C1 ✓ + C2 ✓ + C3 ✓** → the path is not what we
  expected (e.g. compositor copies despite client passing fd; or
  a hidden zero-copy mechanism exists we haven't characterised) —
  treat as a measurement-classification problem, surface to Phase 5
  review.

6. What this Phase 1 deliberately does NOT lock

A specific patch site. That's Phase 4's job; this Phase 1 only sets the success criteria the

patch must meet.

An absolute CPU% target. cycles/cache-misses are the binding

physical-layer metrics; CPU% is a leading indicator that varies

  with kernel scheduling decisions and is not directly comparable
  across single- vs multi-process pipelines (Brave's CPU% is
  spread across renderer + GPU process; mpv's is one process).
* **Out-of-scope workload performance.** 3D games, Proton/DXVK,
  general-purpose Vulkan applications may regress arbitrarily — for
  example, a ''panvk-1.2-fakeshim'' (Phase 4 §6 row C2) would
  crash Vulkan workloads other than the in-scope video presentation
  path. That is the explicit trade.
* **The S5 regression (gst-launch waylandsink ~0.3 drops/sec on
  today's stack)** — separate iteration. Stack drift between
  fourier 2026-04-24 (0/62) and ohm_gl_fix 2026-04-30 (~16/60).
  Likely candidates within marfrit-packages' custom mesa / ffmpeg /
  alsa / libdrm-pinebookpro builds (per Markus 2026-04-30). A
  separate ohm-gl-fix-companion campaign would bisect via
  ''pacman.log''.
* **Per-application HW-decode engagement.** Different consumers may
  take different fix surfaces (e.g. browser via libva-v4l2-request
  multiplanar — fix surface A; mpv via libavcodec drm_prime →
  linux-dmabuf-v1 — fix surface B). Phase 1 does not pre-select
  which.

7. Differences from the original Phase 1 (2026-04-30)

Aspect	Original Phase 1	Phase 1 (revised)
Workload	mpv `–hwdec=v4l2request –vo=gpu-next` on bbb_1080p30_h264.mp4	Brave / Chromium-based browsers + VS Code on in-scope content
Metric	drops_post_warmup == 0 (single-criterion)	C1 (drops) + C2 (LLC-misses) + C3 (DRM_IOCTL/sec) + C4 (boundary fd-passing) — all must hold
Reference baseline	“transient-startup floor of gst→waylandsink” (qualitative)	S1 = gst-launch v4l2slh264dec → waylandsink, with explicit numeric anchors for cache-misses + DRM_IOCTL + EXPBUF + SCM_RIGHTS
Out-of-scope	implicit	explicit (§6): gaming, Proton/DXVK, general Vulkan; SW-emulation acceptable for those
Loopback granularity	“if drops > 10, re-enter Phase 4”	per-criterion failure modes (§5) — Level-1 vs Level-2 distinction now diagnostic

8. Live data anchors

The CSV schema captures these criteria as machine-readable rows. As of 2026-05-01:

metrics.csv — extended with llc_load_misses,

drm_ioctl_per_sec, boundary_fd_passed columns and refined

  ''phase1r_*'' rows.
* ''phase3/io_cache_2026-05-01/boundary_counts.csv'' — per-scenario
  EXPBUF/DQBUF/PRIME_*/SCM_RIGHTS/anon-mmap counts.
* ''phase3/io_cache_2026-05-01/perfstat.csv'' — per-scenario
  cache-misses, LLC-load-misses, cycles, instructions, IPC.

C1 / C2 / C3 / C4 are computable from these CSVs at any future measurement point.

9. References

Phase 3 revised — empirical

bucket-attribution + boundary characterisation across six contenders.

Phase 4 — the gap, fix-surface

options, ranked. This Phase 1 refines the success criteria

  against which Phase 4's chosen fix surface will be measured.
* [[ohm_gl_fix:phase2_2026-04-30|Phase 2]] — substrate (versions,
  V4L2 9-fd buffer pool, panfrost capability surface).
* [[ohm_gl_fix:phase1_2026-04-30|Original Phase 1]] — superseded by
  this page; preserved for audit trail.
* [[ohm_gl_fix:start|Namespace landing]].

Phase 1 (revised) ends here. Phase 4 fix-surface choice + Phase 6 implementation will deliver against C1-C4. Phase 7 verification re-runs the strace + perf-stat instrumentation from Phase 3 revised on the in-scope workload and writes the resulting numbers to metrics.csv as phase7_verify_* rows.

Zettelkasten

Table of Contents