This is an old revision of the document!
Table of Contents
ohm_gl_fix — Phase 4, 2026-05-01
This page replaces both prior Phase 4 drafts: the original
libplacebo fd-cache plan (retracted
after perf record showed libplacebo at 0.41 % of CPU and the patched
code path not on the hot path) and its in-place revision into a
“documentation of the gap” page. Phase 4 is now a plan, not an
enumeration. It picks one fix surface, names the implementation, states
what gets measured at Phase 7, and identifies the loopback edges.
The driver of this rewrite: Phase 1 was refined on 2026-05-01 with machine-readable criteria (Phase 1 revised §4 — C1 drops, C2 LLC-load-misses, C3 DRM_IOCTL/sec, C4 boundary fd-passing) and Phase 3 was rebuilt on the same day with empirically-grounded boundary characterisation (Phase 3 revised §3, §4). With both anchors in place, Phase 4 can commit.
1. What this Phase 4 is targeting
Phase 1 revised §2 named the in-scope workloads:
- YouTube / HTML5
<video>in Brave - Web browsing in Brave (compositor-side video + animation)
- VS Code (Electron + Chromium under the hood)
All three traverse the Chromium video pipeline:
VaapiVideoDecoder → libva → libva-v4l2-request → V4L2 stateless
This is not the libavcodec hwaccel chain that mpv, ffplay, and VLC
use. Browsers vendor their own ffmpeg fork and gate hardware video
decode through libva. Therefore: the fix surfaces from the prior
Phase 4 enumeration that touch libavcodec (B “libavcodec drm_prime →
linux-dmabuf-v1”) or libplacebo (C2 “panvk-1.2-fakeshim”) do
not lift the in-scope use cases, however structurally clean they
look in isolation. The empirical entrypoint for Brave is libva, and
libva on this hardware fails at vaInitialize
(Phase 3 revised §1, §8;
also fourier README L236-281).
Phase 4 commits to fix surface A: libva-v4l2-request multiplanar port as the primary direction, with an explicit pre-implementation research step (Step 0) that may discover the campaign needs a follow-up Chromium-side patch.
2. Decision rationale
Three reasons to commit to A specifically:
- It is the only fix surface that touches Brave's actual chain.
B (libavcodec) and C2 (libplacebo Vulkan layer) target consumers
Markus does not use. D (compositor DRM-shim) is a Wayland-protocol
proposal that does not exist upstream and would not survive a
Phase 5 review.
- **Substantial groundwork exists.** fourier's local
[[https://github.com/bootlin/libva-v4l2-request|libva-v4l2-request]]
patches (on ohm at ''~/fourier-test/libva-patches/fourier-local.patch'')
already get the bootlin source past format enumeration on the
multiplanar hantro device (fourier ''README'' L240-256). The
starting point is not "from zero" — it is "from probe-passing,
multiplanar buffer setup still single-plane".
- **It addresses the structural gap, not the symptom.** Phase 1
revised's criteria all hold globally for libva consumers once A
is delivered, not just for one application. fourier already
flagged this as the right axis ("//browser HW video decode on
ohm is parked until a multiplanar libva-v4l2-request rework
exists, either ours or someone else's//", fourier ''README''
L276-281).
Note explicitly: A alone may not suffice. Once the libva chain
produces a NV12 dmabuf for Brave's VaapiVideoDecoder, the
display side — Chromium's GPU-process compositor — still has to
present that dmabuf without per-frame Mesa GL+DRM round-trips
(Phase 1 revised's C3, ≤100 DRM_IOCTL/sec). Whether Chromium does
this on Wayland today, or needs an additional patch, is the open
question Step 0 below answers before code is written.
3. Implementation plan
Step 0 — Research: characterise Chromium's Wayland video presentation path
Duration: 3–7 days. Output: decision document attached to this Phase 4 plan, naming whether Step 2 is required.
Question to answer: when VaapiVideoDecoder produces a
NativePixmap (= dmabuf-backed VA-API surface) on
chrome –ozone-platform=wayland, does Chromium's GPU process
present it via zwp_linux_dmabuf_v1 subsurface (Wayland direct
overlay) or via Skia GL composite onto the page's main surface?
Concrete sub-tasks:
- Source archaeology in Chromium (current Brave-bin's underlying
Chromium version, likely M138-class):
ui/ozone/platform/wayland/host/wayland_buffer_manager_host.cc
and surrounding files — Wayland buffer attachment.
components/viz/service/display_embedder/— overlay candidate
surface processing.
media/gpu/vaapi/— VA-API surface to native-pixmap conversion.gpu/ipc/service/gpu_video_decode_accelerator_helpers.cc—
dmabuf flow from decoder to compositor.
- Empirical synthesis test: with current Brave (libva broken),
can we coax Chromium into the dmabuf-overlay path using a
different content source — e.g. WebGL canvas, or a video element with software decode where the decoded YUV is uploaded once to a GL texture and we observe whether composite uses the texture via Wayland subsurface or via Skia main-surface compositing? Look at ''DRM_IOCTL_*'' rate and ''SCM_RIGHTS'' fd-passing on the GPU process (already instrumented in [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] §3). - **Feature flag inventory:** check ''chrome://flags'' and ''--enable-features='' for relevant entries: ''VaapiVideoDecoder'', ''VaapiVideoDecodeLinuxGL'', ''UseChromeOSDirectVideoDecoder'', ''UseDelegatedCompositing'', ''DelegatedCompositingLimitToUi'', ''AcceleratedVideoDecodeLinuxGL'', ''wayland-screen-coordinates'', ''ozone-overlay-priority-hint''.
Output gate: decision document records whether Chromium's GPU
process under default flags will route a working VA-API dmabuf to
zwp_linux_dmabuf_v1 (Step 2 not needed) or composite via Skia GL
(Step 2 needed). The decision document attaches to this Phase 4
page after Step 0 completes.
Step 1 — libva-v4l2-request multiplanar port
Duration: 4–8 weeks of focused work; the lower end if fourier's
local patches and Phase 2 §3 substrate (9-fd capture pool, NV12
single-plane 1920×1088 sizeimage = 3 655 712) generalise. The
upper end if hantro's request-API control set turns out to need
additional reverse-engineering against the kernel driver
(drivers/staging/media/rkvdec/ / drivers/staging/media/hantro/).
Source basis:
- Upstream fork: https://github.com/bootlin/libva-v4l2-request
(last meaningful commit ~years ago per fourier; confirm at
Step 1 start). * fourier local patches: ''~/fourier-test/libva-patches/fourier-local.patch'' — HEVC stripped (RK3566 has no HEVC HW), missing ''#include "utils.h"'' in ''src/h264.c'' restored, ''src/config.c'' format-enumeration extended to try both ''V4L2_BUF_TYPE_VIDEO_OUTPUT'' and ''V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE'' (fourier ''README'' L240-256).
Concrete work surface, in order:
- Fork + import groundwork. Set up
marfrit-packages/libva-v4l2-request-ohm-gl-fix/. Apply
fourier's patches as the patch-zero baseline. ''pkgname=
libva-v4l2-request-ohm-gl-fix'', ''provides+conflicts+replaces=
libva-v4l2-request''. Build via fermi (Gitea Actions runner
archlinuxarm aarch64).
- **Multiplanar buffer setup in ''src/v4l2.c''.** Replace
single-plane ''v4l2_buffer'' / ''v4l2_format'' usage with
MPLANE variants (''VIDIOC_S_FMT'' on
''V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE'' for bitstream input,
''V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE'' for NV12 output;
''VIDIOC_QBUF'' / ''VIDIOC_DQBUF'' with ''planes[]'' arrays).
The Phase 2 §3 strace evidence (''ffmpeg -hwaccel
v4l2request -hwaccel_output_format drm_prime'' producing
9 ''VIDIOC_EXPBUF''s with NV12 single-plane ''sizeimage =
3 655 712'') is the per-buffer template.
- **Multiplanar context lifecycle in ''src/context.c''.**
Replace ''vaCreateContext'' single-plane buffer-pool setup
with multiplanar pool that mirrors the
''VIDIOC_REQBUFS+CREATE_BUFS, count=1''-loop pattern Phase 2
captured. Capture ring depth = 9 (per Phase 2 §3). Output
ring (bitstream input) depth = 4.
- **Multiplanar slice submission in ''src/picture.c'' and
''src/h264.c''.** Adapt request-API frame submission: build
''V4L2_CTRL_*_HEADER'' control payloads (SPS, PPS, decode
params, slice params, scaling matrix) attached to the request
fd, ''VIDIOC_QBUF'' the bitstream input MPLANE buffer with the
request fd, ''VIDIOC_DQBUF'' the capture MPLANE NV12 buffer
after decode. The kernel UAPI is in
''include/uapi/linux/v4l2-controls.h''
''V4L2_CID_STATELESS_H264_*'' (note: the older
''V4L2_CID_MPEG_VIDEO_HEVC_*'' was renamed; H264 was renamed
to ''V4L2_CID_STATELESS_H264_*'' on the same wave).
- **NativePixmap export.** Ensure each capture-side dmabuf fd
flows out of libva to the caller (Chromium's
''VaapiPicture'') as a NativePixmap with the right DRM format
(''DRM_FORMAT_NV12'') and modifier (''DRM_FORMAT_MOD_LINEAR''
per Phase 3 Finding 1). Verify the modifier matches what
Chromium will accept.
- **Test corpus.** Run against:
* ''bbb_1080p30_h264.mp4'' (the campaign's reference clip).
* ''vainfo'' (libva self-test) on
''/dev/dri/renderD128'' equivalent.
* Any failure cases noted by fourier (''README'' L319-340,
"test corpus" — pull list at Step 1 start).
- **Package + publish.** PKGBUILD finalised, builds on fermi,
pushes to marfrit-packages pacman repo.
Step 2 (conditional) — Chromium display-side patch
Trigger: Step 0 finds Chromium does not auto-route VA-API
NativePixmaps through zwp_linux_dmabuf_v1 on Wayland under the
default feature flags — i.e. it composites via Skia GL and Phase 1
revised's C3 (≤ 100 DRM_IOCTL/sec) cannot be reached from Step 1
alone.
Shape (deferred — exact scope set by Step 0): patch Chromium
to route VAAPI NativePixmaps as Wayland subsurfaces for video
elements; or enable a feature flag set that does this. Build as
chromium-ohm-gl-fix (or brave-ohm-gl-fix) on
marfrit-packages.
If Step 0 finds Step 2 is not needed, Phase 4 implementation ends at Step 1 + Step 3.
Step 3 — Verification (Phase 7 prep)
After Step 1 (and conditionally Step 2) lands on ohm:
- Reinstall:
sudo pacman -U libva-v4l2-request-ohm-gl-fix-*.pkg.tar.zst
(and conditionally chromium-ohm-gl-fix-*).
- Re-run Phase 3 revised
§3 v2 strace (ioctl,mmap,munmap,sendmsg,recvmsg) and §4
perf-stat (''cache-misses,LLC-load-misses,cycles,instructions'')
on Brave + ''bbb_1080p30_h264.mp4'' over a 60 s steady-state
window. Capture renderer + GPU-process targets.
- Check Phase 1 revised C1-C4:
* **C1** drops ≤ 10 over 60 s, drops_post_warmup = 0
* **C2** LLC-load-misses ≤ 9 M / 10 s
* **C3** DRM_IOCTL/sec ≤ 100
* **C4** at least one of (a) ''VIDIOC_EXPBUF'' + ''SCM_RIGHTS''
OR (b) ''PRIME_FD_TO_HANDLE'' from V4L2 dmabuf observed
- Append result row(s) to ''metrics.csv'' as ''phase7_verify_*''.
4. What's touched, what's not
Touched:
- libva-v4l2-request — substantial multiplanar rewrite of
src/v4l2.c, src/context.c, src/picture.c,
''src/h264.c''. Public ABI preserved (libva-driver entrypoints unchanged); internal restructuring only. * marfrit-packages — new ''libva-v4l2-request-ohm-gl-fix/'' tree. Conditionally: ''chromium-ohm-gl-fix/'' (Step 2 only). * ohm system — ''pacman -U'' replaces stock libva-v4l2-request (and conditionally Chromium/Brave) with the campaign packages.
Not touched:
- mpv, ffplay, VLC, gst-* — these remain on their current paths.
Their users will not benefit from Phase 4. Out of campaign scope.
- Mesa / panfrost / panvk / libplacebo — their state is unchanged.
The panvk-1.2-fakeshim option from prior Phase 4 drafts is
not pursued in this iteration. * libavcodec / ffmpeg — Chromium statically vendors its own; the system ''ffmpeg-v4l2-request-git'' package is unchanged. * Kernel drivers (hantro-vpu, panfrost). Step 1 builds against the existing UAPI surface; no kernel work. * KWin / Wayland protocol. Step 1 produces dmabuf fds; existing KWin ''zwp_linux_dmabuf_v1'' implementation consumes them. No compositor work. * The S5 regression ([[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] §6 / §8 — gst-launch waylandsink ~0.3 drops/sec on today's stack vs. fourier 2026-04-24's 0/62). Separate iteration if pursued.
5. Predicted outcome (against Phase 1 revised C1-C4)
If Step 0 + Step 1 deliver and Step 2 turns out unnecessary (optimistic case):
| Criterion | Current (Brave SW path) | Predicted (Phase 4 delivered) | How verified |
|---|---|---|---|
| C1 drops post-warmup ≤ 10 / 60 s | not measured (estimated 100s+ based on Brave's CPU footprint) | 0 drops post-warmup; total drops ≤ 5 (Vulkan-init blip equivalent) | Phase 7 strace/perf-stat re-run |
| C2 LLC-load-misses ≤ 9 M / 10 s | Brave GPU process has heavy memcpy traffic (Phase 3 revised §2 — 12.92 % memcpy on GPU process) | ≤ 9 M / 10 s for GPU process (no per-frame dmabuf-to-shmem CPU copy) | perf-stat re-run |
| C3 DRM_IOCTL/sec ≤ 100 | not measured for Brave (S2/S3/S4 sit at 800–1 050; S5 at 1 046) | ≤ 100 if Chromium routes the dmabuf via zwp_linux_dmabuf_v1 overlay; otherwise Step 2 needed | strace v2 + boundary_counts.csv extension |
| C4 boundary fd-passing | NO (libva fails, no V4L2 path engaged) | YES — VIDIOC_EXPBUF from libva, then either SCM_RIGHTS to KWin or PRIME_FD_TO_HANDLE to GL (depending on Step 2 outcome) | strace v2 boundary inspection |
If Step 2 is required, the same outcome but reached via Step 1 + Step 2 in sequence, with Step 1's standalone result being C1+C2 met and C3+C4 partially met (Level 1 zero-copy at the decode boundary; Level 2 still not at the compositor boundary).
6. Risks and mitigations
- R1 — Multiplanar port takes longer than 8 weeks. V4L2
stateless API + request-API + hantro-specific control set is
intricate. //Mitigation:// scope to H.264 only initially. HEVC
is moot (RK3566 hantro has no HEVC HW). VP8 / VP9 / AV1 follow
only if H.264 lands cleanly. If a single sub-task slips by >3
weeks, surface to Markus for re-scoping.
- **R2 — Chromium routes VA-API NativePixmap through Skia GL on
Wayland by default** (Step 0 negative finding). //Mitigation://
Step 2 patches Chromium. Engineering cost goes up materially
but campaign scope still tractable. If Step 2 itself looks
>2 months, reconsider whether to ship Step 1 alone with C1+C2
met and document C3 as still missing.
- **R3 — hantro's H.264 conformance is incomplete.** Some streams
(interlaced, certain profile/level combinations,
Hi10P) may fail. //Mitigation://
cross-check against fourier's ''gst v4l2slh264dec'' working
output on the same clip — that path uses the same kernel
driver and is a known-good reference. Use the test corpus from
fourier ''README'' L319-340 once
enumerated.
- **R4 — KWin's ''zwp_linux_dmabuf_v1'' modifier handling on the
NV12 ''DRM_FORMAT_MOD_LINEAR'' that hantro produces.** Phase 3
Finding 1 already showed all panvk modifiers carry
''external_only=1''; that's a panvk-side property, but KWin's
own modifier acceptance for NV12 is independent. //Mitigation://
cross-check by running ''gst-launch v4l2slh264dec → waylandsink''
on today's stack — that path produces the same modifier and is
accepted by KWin (the S1 zero-copy reference). If S1 still
works, KWin's acceptance is fine for the Step 1 output.
- **R5 — fourier's libva-v4l2-request local patches were against
an older bootlin tree.** May not apply cleanly to current
upstream. //Mitigation:// start by rebasing fourier's patches
on current upstream as the first sub-task of Step 1. If
upstream has moved more than expected, fall back to
fourier's snapshot.
- **R6 — Chromium's VAAPI gating** (''VaapiVideoDecoder'',
''VaapiIgnoreDriverChecks''). The driver-check path inspects
the libva driver's reported profile set. fourier already saw
''vainfo'' enumerate H.264 profiles successfully with the
probe patch; the multiplanar Step 1 should preserve that.
//Mitigation:// after Step 1, re-run ''vainfo
LIBVA_DRIVER_NAME=v4l2_request
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1'' to confirm profile
enumeration still passes. Then Brave's
''--enable-features=VaapiVideoDecoder,VaapiIgnoreDriverChecks''
invocation should engage.
7. Phase 5 hand-over
Per ~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md,
Phase 5 is second-model review of all Phase 1-4 artefacts. Markus
pastes the materials uncurated:
- This Phase 4 page
- Companion CSVs:
metrics.csv,
phase3/io_cache_2026-05-01/boundary_counts.csv,
''phase3/io_cache_2026-05-01/perfstat.csv''
Specific questions for the second-model reviewer to challenge:
- Is fix surface A actually the right pick given Phase 1
revised's use-case priority? In particular: does the reviewer
see a path Phase 4 missed where Brave's chain could be lifted without rewriting libva-v4l2-request multiplanar? - **Is Step 0's research scope sufficient** to commit to or rule out Step 2 with confidence, or does Step 0 itself need a Phase 4-internal sub-plan? - **Risk R1 (slip) and R2 (Step 2 needed) — is the mitigation realistic** given a single-engineer-with-Claude-assistance capacity? - **Test corpus from fourier README L319-340 — is it adequate** for declaring Step 1 complete, or should we extend it?
8. Phase 6 (implementation) and Phase 7 (verification) order
Phase 6 = “execute Step 0 → Step 1 → conditionally Step 2”.
Phase 7 = “Step 3” above. metrics.csv rows
phase7_verify_brave_* will hold the binding numbers.
Phase 6 is long (weeks-to-months in elapsed wall time, not full-time). Sub-step boundaries inside Phase 6 are Phase-4-internal; no need to re-enter Phase 4 unless a step-level surprise demands re-planning (e.g. Step 0 turns up something that invalidates Step 1's direction).
The three loopback edges (Phase 1 revised §5):
- C1 ✓ + C2 ✗ + C3 ✓ → flag, investigate. Surfaces a measurement
classification issue.
- C1 ✓ + C2 ✓ + C3 ✗ → Level-1 fixed, Level-2 missing. This is the expected post-Step-1 state if Step 0 said Step 2 is needed. Re-enter Phase 4 with Step 2 spec'd.
- C1 ✗ at Phase 7 → drops still happen. Re-enter Phase 4 with
new perf evidence.
9. Deferred / out of scope
- Other libva consumers (mpv-via-vaapi, VLC-via-vaapi) — same
Step 1 lifts them indirectly. Verification is Brave-only; gains
on other libva consumers are documented at Phase 7 but not
required for closure.
* **libavcodec hwaccel consumers** (mpv ''gpu-next'', ffplay,
VLC ''qt'') — fix surface B from prior Phase 4 enumeration.
Separate campaign.
* **Vulkan-anchored consumers** (libplacebo Vulkan backend on
Mali-G52). Fix surface C2 (''panvk-1.2-fakeshim''). Separate
campaign.
* **HEVC, VP8, VP9, AV1.** RK3566 hantro has H.264 + MPEG2 + VP8
HW only. AV1 / VP9 / HEVC are SW even after Step 1. Out of scope
for this campaign's verification.
* **The S5 zero-drop regression**
([[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] §6 +
§8). Side investigation if pursued.
* **Other Mali-Bifrost-v7 hardware** (G31 / G51 / G76 — same
panvk arch, different SBC stacks). Out of scope; Phase 1's
"Mali-G52" framing is hardware-specific.
* **General-purpose Vulkan workloads.** Phase 1 revised §6
explicit out-of-scope. SW-emulated mandatory-1.2 entry points
in any future panvk-fakeshim are tolerated.
10. References
measurable success criteria.
versions, V4L2 9-fd buffer pool, panvk gates, panfrost modifier
surface. * [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] — six-contender empirical bucket-attribution + boundary characterisation; the basis for §1's "Brave is libva, not libavcodec" pivot. * [[ohm_gl_fix:phase4_2026-04-30|Original Phase 4]] — superseded by this page; preserved for audit trail. * fourier ''README'' L236-281 — prior libva-v4l2-request investigation and partial multiplanar probe patches that form Step 1's starting point. * Bootlin libva-v4l2-request: [[https://github.com/bootlin/libva-v4l2-request]] * Local artefact: ''~/fourier-test/libva-patches/fourier-local.patch'' (HEVC-stripped, missing-include fixed, format-enumeration extended for MPLANE). * marfrit-packages parallel: ''ffmpeg-v4l2-request-git/'' is the template for the new ''libva-v4l2-request-ohm-gl-fix/'' package layout.
Phase 4 ends here. Phase 6 (implementation) begins with Step 0,
which produces a small attached decision document on this page. The
first pacman -U on ohm marks Phase 6's first deliverable.
Phase 7 is the metrics.csv phase7_verify_* row(s).
