Table of Contents
2026-05-01: This page is superseded by phase4_2026-05-01. The 2026-04-30 page is preserved below for audit trail; the live Phase 4 plan is the new page.
Phase 4 — The Gap (ohm_gl_fix iteration 1)
Status — replaces the SUPERSEDED 2026-04-30 libplacebo-cache draft. The earlier plan is in git history; it was retracted afterperf recordshowed libplacebo accounted for 0.41 % of CPU and the patched code path was not being executed.
This page is the campaign's deliverable: a structural-gap identification, traced from the symptoms to the gap, with a fix-surface assessment. It is not a player recommendation, not a per-player patch plan, and not a workaround proposal. Per Markus's reframe 2026-04-30 — “I am interested to find the gap which makes (a) VLC unusable and (b) facilitates all programs outputting video to run efficiently, i.e. web browsers” — this campaign's job is to name the gap, not to fill it.
1. Goal (essence) and use-case scope
The Phase 1 measurable target (“on bbb_1080p30_h264.mp4 with mpv –hwdec=v4l2request –vo=gpu-next over 60 s, drops_post_warmup == 0”) was the symptom whose investigation led here. The campaign's actual goal, as clarified during Phase 3, is:
Identify the structural gap such that filling it would lift VLC out of unusability and would let any video-displaying program on Mali-G52 + KWin Wayland (web browsers especially) run with efficient HW-accelerated playback against stock libraries.
The deeper framing, accepted 2026-04-30: the predicament is the buffer-to-display path without CPU copy. Decode is not the issue — hantro-VPU on RK3566 (and rkvdec on RK3588) can decode H.264 1080p with substantial headroom, and libva can produce ≈300 fps of decoded buffers with display=null. The breakage is on post-decode handoff: when the decoded dmabuf needs to land on the screen and there is no zero-copy path, every consumer in the chain breaks.
In-scope use cases (this informs how fix-surface rows are ranked in §6):
- YouTube in Brave (Chromium-based browser video, the highest-traffic workload on this device class).
- General web browsing in Brave (compositor-side video / animations / WebGL).
- VS Code (Electron + Chromium under the hood; same compositor pipeline as Brave).
Explicitly out of scope:
- 3D games / Tux Racer / Doom / GTA / Proton/DXVK / general-purpose Vulkan workloads. Software-emulated mandatory-1.2 entry points with poor performance characteristics are acceptable as long as they don't degrade the in-scope use cases.
- mpv / ffplay / VLC as primary daily players. They appear as symptoms in §4 because they're representative test instruments that exercise the same libavcodec hwaccel chain; they are not the workloads that motivate the fix.
Phase 1 metric, refined: a single gap is named; every observed symptom (mpv, ffplay, VLC, Chromium-via-VAAPI, plus the gst-launch regression) is traced to it; a fix-surface assessment names the shape of work that would actually close it, ranked by impact on the in-scope use cases. Anything that closes fewer than all listed symptoms is a workaround for the scope-out symptoms; for the in-scope use cases a partial fix may still be a proper fix.
2. The gap, in one paragraph
There is no completed integration of “V4L2 stateless decode → GPU-displayable surface” in the stock Linux video stack on aarch64 SBCs running mainline Wayland — outside the GStreamer v4l2codecs plugin + linux-dmabuf-v1 Wayland-protocol path. Every other client of the V4L2 stateless decoder (libavcodec hwaccels, libva-v4l2-request, libplacebo's drm_prime importer, mpv's drmprime hwdec) inherits a different incompleteness on its specific chain, and each chain's gap manifests as a different specific failure. The absence of any one completed end-to-end path through libavcodec or libva is the structural gap. There is no single missing function or typo'd condition that, if fixed, would lift every symptom — what is missing is one completed integration story that the libavcodec and libva ecosystems can both navigate without depending on infrastructure (Vulkan, DRM master) that aarch64 + Wayland clients do not have.
3. Why this is one gap and not N independent bugs
A naive read says “four players failed for four different reasons”. That is true at the file:line level — see the symptom inventory below — but every chain, when traced upward, terminates at the same architectural decision: the assumption that a hardware video-decode pipeline ends in either Vulkan or DRM-master access. Both assumptions match desktop-class hardware (Intel/AMD/NVIDIA, on a TTY-anchored X11 or KMS direct-scanout client). Both assumptions break on:
- Mali-G52 / Bifrost gen 2 — panvk Vulkan implementation gap. Not “no
VK_KHR_video_decode_*” — that codec extension is irrelevant to this campaign because hantro already does the decode and the Vulkan side only needs to present the resulting dmabuf (the NVIDIA NVDEC→Vulkan model). The actual gap is the API version envelope: panvk on Bifrost v7 carries every dmabuf-import-and-present extension required (verified 2026-04-30, seephase3/research_2026-04-30_panvk_brokenness.md), but the device advertises Vulkan 1.0 while libplacebo's Vulkan consumers gate on ≥1.2 (ffplay's renderer) or ≥1.3 (mpv's gpu-next). Mesa promoted Valhall v10 to Vulkan 1.1+ (Mesa 25.0) and then 1.2 conformant on G610; the same promotion has not been done on Bifrost v6/v7. Plus the device is gated default-off behindPAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1. The gate's stated reason in Mesa MR!13016(2021, Vizoso): “we miss a lot of functionality, which would cause so many crashes that the runs aren't practical” — generic 2021 panvk incompleteness, not a v7-hardware-specific characterisation. The CI runs onlydEQP-VK.api.copy_and_blit.*+fill_and_update_buffer.*to this day. What the modern panvk-v7 actually crashes on, in the use-case-relevant subset of Vulkan (image-import, swapchain, sampler ycbcr, present), is currently uncharacterised in this campaign and would be the right thing to test before committing to a fix-surface direction that depends on it. - KWin Wayland (and Mutter, sway, river — every Wayland compositor, by Wayland-spec design) — clients do not get DRM master. Anything in the stack that reaches for
drm_params_v2or equivalent fails. mpv'sdrmprime-overlayloader is one example; any libavcodec hwaccel that wants direct KMS scanout is another.
The “one path that works” — gst v4l2codecs → linux-dmabuf-v1 waylandsink — works because GStreamer has its own pipeline-level dmabuf negotiation (the `caps = video/x-raw(memory:DMABuf), format=DMA_DRM` capability we observed during Finding 6 probing) that bypasses both libavcodec hwaccels and libva entirely. The compositor accepts the dmabuf via the linux-dmabuf-v1 protocol. No Vulkan, no DRM master, no library chain involving libavcodec hwaccel display. That path was designed for this hardware class. The libavcodec/libva paths were not.
4. Symptom inventory
All measured 2026-04-30 on ohm (PineTab2, RK3566, Mali-G52, hantro VPU, kernel 6.19.10, mesa 26.0.5, KWin 6.6.4, Plasma 6.6.4) playing bbb_1080p30_h264.mp4 (60 s, 1440 frames @ 24 fps).
| # | Client | Decode path attempted | What broke | Evidence |
|---|---|---|---|---|
| S1 | mpv 0.41.0 + gpu-next | libavcodec → v4l2request hwaccel → drm_prime → libplacebo GL backend | drmprime-overlay loader's init() calls ra_get_native_resource(“drm_params_v2”) which returns NULL under Wayland; vd_lavc bails to SW; 134 % CPU, 70 % drops | Findings 4 & 5; phase3/baseline_2026-04-30_mpv_verbose.log; mpv_v0.41.0_video_out_hwdec_hwdec_drmprime_overlay.c:290 |
| S2 | ffplay (FFmpeg n8.1) | libavcodec → v4l2request hwaccel → drm_prime → libplacebo Vulkan renderer | (1) without PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1: panvk gates Bifrost-v7 default-off, no Vulkan device → “Enable vulkan renderer to support hwaccel v4l2request” + VK_ERROR_INITIALIZATION_FAILED. (2) with the flag set: panvk enumerates Mali-G52 with all dmabuf-import extensions present, but advertises apiVersion = 1.0.335; ffplay rejects with “Device API version 1.0.335 is lower than the minimum required version of 1.2.0, cannot proceed!” Both cases fall to SW. | phase3/research_2026-04-30_ffplay_with_broken_flag.txt; phase3/research_2026-04-30_vulkaninfo_panvk_v7_with_broken_flag.txt; phase3/research_2026-04-30_panvk_brokenness.md |
| S3 | VLC 3.0.22 | bundled libavcodec 58.134 (ffmpeg 4.4) → vdpau / vaapi_vld / yuv420p only | bundled libavcodec predates v4l2request hwaccel landing; no path even attempted; 235 % CPU, slow-motion playback (110 s wall for 60 s media), late-picture drops | phase3/cross_player_2026-04-30_vlc_vout_and_gst_idle.txt; phase3/cross_player_2026-04-30_vlc_qt_and_gst_drops_trajectory.txt; VLC's PKGBUILD –disable-libplacebo line |
| S4 | Chromium / Brave (browser HW decode) | Chromium VaapiVideoDecoder → libva → libva-v4l2-request → V4L2 ioctls | libva-v4l2-request hardcodes single-plane (sunxi-cedrus) buffer setup; RK3568 hantro is multiplanar; vaCreateContext fails after format enumeration succeeds; falls to libavcodec SW | fourier README L236-281 (prior investigation) |
| S5 | gst v4l2slh264dec → waylandsink (the “working path” reference) | GStreamer v4l2codecs → linux-dmabuf-v1 protocol direct | regression vs fourier 2026-04-24's 0/62 drops; today reports ~0.3 drops/sec on the same pipeline. Stack drift in 6 days. | Finding 6; phase3/cross_player_2026-04-30_vlc_qt_and_gst_drops_trajectory.txt |
S5 is included not as a failure of the “working path” but as evidence that even that path is fragile under the marfrit-packages custom-stack drift Markus already maintains (mesa, ffmpeg, alsa, libdrm-pinebookpro). The gap analysis below does not attempt to explain S5; it is recorded here as a known follow-up.
5. Trace from each symptom to the gap
- S1 (mpv). mpv assumed “if libavcodec produces drm_prime frames, the VO can ingest them via the drmprime hwdec interop, whose loader can get a DRM fd from the native display.” On Wayland, the native display is the compositor; the compositor does not give clients DRM master. Without DRM-master, no drm_params_v2, no drmprime-overlay, no completed hwdec group → vd_lavc bails. The integration assumption “the VO can reach the KMS layer” breaks under Wayland.
- S2 (ffplay). libavcodec n8.x's v4l2request hwaccel was wired to require libplacebo's Vulkan renderer. The integration assumption isn't “the consumer can use Vulkan-the-codec-engine” (
VK_KHR_video_decode_*, irrelevant — hantro decodes); it is “the consumer can use Vulkan ≥ 1.2 as the presentation backend” (import dmabuf, sample, swapchain present). panvk on Bifrost v7 has every required extension but is stuck at Vulkan 1.0 and default-off-gated. Mesa upstream lifted Valhall v10 to ≥ 1.2; v6/v7 promotion has not been done. Verified 2026-04-30 by settingPAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1and running ffplay: device enumerates, dmabuf-import extensions present, ffplay rejects on “API version 1.0.335 is lower than the minimum required version of 1.2.0”. - S3 (VLC). VLC's Arch package was built
–disable-libplaceboagainst a bundled ffmpeg 4.4. The integration assumption “distros will ship a libavcodec new enough to have v4l2request” breaks where downstream package decisions favour stability or legacy compatibility over hwaccel currency. (Newer VLC 4.x changes this picture; on this stack, the legacy dependency wins.) - S4 (Chromium). libva-v4l2-request was written for sunxi-cedrus and never completed a multiplanar port for the kernel-mainline V4L2 stateless decoders that ship on Rockchip / NXP / RK35xx hardware. The integration assumption “libva-v4l2-request will eventually have a multiplanar implementation” has been blocked on maintainership for years (per fourier README L267-274).
The four assumptions are independent at the code level; they are unified at the integration-story level. None of the upstream projects involved (libavcodec, libplacebo, mpv, libva-v4l2-request, VLC) carries primary responsibility for the integration as a whole. That is the gap.
6. Fix surface candidates
Each row below describes a direction in which a fix could live. None is proposed by this campaign — Markus's “no upstreaming unless specifically tasked” policy applies, and even within that policy, this Phase 4 documents rather than picks. Tractability assessments are rough.
| Direction | What it would lift | Tractability | Where the work lives |
|---|---|---|---|
| A. Complete libva-v4l2-request multiplanar port | S4 (browsers via libva); S3 partial (VLC if it migrates to libva-vaapi for HW decode); S2 partial (ffplay via vaapi backend) | Hard. fourier started this with local patches; the upstream is “effectively unmaintained” (fourier L267-274). A multiplanar rewrite of context.c / picture.c / v4l2.c is months of work. | bootlin / Collabora / community fork of libva-v4l2-request |
| B. Add a non-Vulkan, non-DRM-master path in libavcodec drm_prime hwaccel | S1 (mpv); S2 (ffplay); plus future libavcodec consumers on this hardware class | Medium. The path would be: drm_prime → linux-dmabuf-v1 protocol export → compositor consumes via dmabuf-direct, like GStreamer's waylandsink does today. Requires libavcodec to learn Wayland-protocol negotiation (or to delegate it cleanly to consumers). | FFmpeg upstream; libplacebo's GL backend; mpv's drmprime hwdec |
| C1. Promote panvk on Bifrost v7 from Vulkan 1.0 to Vulkan ≥ 1.2 (upstream Mesa) | S1 (mpv via gpu-next-vulkan) and S2 (ffplay's v4l2request hwaccel). Markus 2026-04-30 corrected an earlier framing here: the campaign needs Vulkan as the presentation backend for an externally-decoded NV12 dmabuf (NVIDIA's NVDEC→Vulkan model), not Vulkan as a codec engine. With PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1, panvk on G52 advertises all the dmabuf-import-and-present extensions needed (VK_EXT_external_memory_dma_buf, VK_KHR_external_memory_fd, VK_EXT_image_drm_format_modifier, VK_KHR_sampler_ycbcr_conversion, VK_KHR_swapchain, VK_KHR_external_semaphore_fd, VK_KHR_wayland_surface). The proximate wall is apiVersion = 1.0.335; ffplay rejects on 1.0 < 1.2, libplacebo gpu-next on 1.0 < 1.3. | Tractable but maintainership-bound. Mesa already promoted Valhall v10 to ≥ 1.2 (Mesa 25.0); the same work on Bifrost v7 has not been done because active panvk effort focuses Valhall + 5th-gen. Bringing panvk/v7 to Vulkan 1.1 then 1.2 means implementing or stubbing each mandatory feature added in those versions. Whether modern panvk/v7 actually crashes on use-case-relevant test paths (image-import, swapchain, sampler ycbcr, present) is uncharacterised — see §3 note. | Mesa upstream src/panfrost/vulkan/. |
C2. Out-of-tree Vulkan layer (panvk-1.2-fakeshim) that lies about apiVersion | Same as C1 (S1, S2). Same scope on the in-scope use cases (browser does not use this libplacebo path). | Lower upfront cost than C1. A Vulkan layer (not ICD wrapper) sits between application and the stock Mesa panvk ICD; explicitly opt-in via VK_INSTANCE_LAYERS= env var, no system-wide effect on Chromium / general-purpose Vulkan apps. Three categories of work: (i) intercept vkGetPhysicalDeviceProperties2 / vkEnumerateInstanceVersion to advertise apiVersion 1.2.0; (ii) pass-through redirects from core-1.2 entry points to KHR-extension equivalents already in panvk (VK_KHR_sampler_ycbcr_conversion, VK_KHR_descriptor_update_template, VK_KHR_maintenance1/2, VK_KHR_dedicated_allocation); (iii) stubs or software-emulated implementations for genuinely-mandatory-1.2 entry points panvk doesn't have (timeline semaphores via binary-sem + condvar; render-passes-2.0 lowered to V1; vkResetQueryPool; assert-abort on rare paths). Software emulation costs are acceptable per the §1 use-case scope: not gaming, not Proton/DXVK. The layer's correctness only has to hold for the dmabuf-import-and-present hot path. | New territory. No existing-art for “lie about apiVersion + stub mandatory entry points” Vulkan layer that Claude could find; closest are property-only simulators (VK_LAYER_KHRONOS_profiles). Patch surface lives outside Mesa, in a small standalone repo. Probably ~few KLOC of layer C with an ongoing tail as Vulkan minors land new mandatory entry points. |
| D. Compositor-level DRM-shim for Wayland clients | S1 (mpv specifically — drmprime-overlay would get its drm_params_v2) | Medium-low. Would need a Wayland protocol extension that grants clients enough KMS view to satisfy drmprime-overlay without granting full DRM master. KWin or wlroots would have to participate, and the protocol is not on either roadmap. Brittle. | Wayland-protocols + KWin / wlroots upstream |
No row above lifts every listed symptom. A fix that lifts S3 specifically requires a downstream packaging change at the distro level (rebuild VLC against current ffmpeg with libplacebo enabled, or wait for VLC 4.x to land in stable Arch) — not something any of A-D upstream projects would deliver. This is part of the gap's shape: the symptom set is not uniformly fixable from any single location, because the integration that's missing was always going to require coordination across libavcodec, libva, the libplacebo chain, the compositor, and downstream packagers.
Ranking against the §1 in-scope use cases
The four rows lift different symptoms; “most symptoms” is not the right ranking metric for the in-scope use cases. Brave / Chromium video decode goes through `Chromium VaapiVideoDecoder → libva → libva-v4l2-request`, not through libavcodec hwaccel + libplacebo + Vulkan/GL. So the libplacebo-chain fixes (B, C1, C2) lift mpv and ffplay, which Markus does not use, and do not touch Brave's video decode pipeline.
Use-case-ranked:
- Row A (libva-v4l2-request multiplanar port) is the only row that lifts S4 (browser HW video decode). YouTube in Brave is in S4. Without A, browser HW decode does not engage; Brave / VS Code / Chromium fall to libavcodec SW decode, which defeats the buffer-to-display predicament for the highest-traffic workload on this device class.
- Row C2 (Vulkan layer, in-tree-of-its-own-repo) is a smaller, self-contained engineering effort that lifts S1+S2. Worth a feasibility test (build the layer, see whether ffplay completes a 60 s playback) because the cost is bounded and the result informs whether C-class fixes are tractable in general. Does not address the in-scope use cases directly, since browsers don't traverse this chain — but useful as a vehicle for characterising what panvk-v7 actually does and doesn't crash on, which is currently unknown.
- Row B (libavcodec drm_prime → linux-dmabuf-v1 path) is architecturally cleanest and would generalise across consumers, but the work lives in FFmpeg upstream and would not be in-scope-impactful unless Brave eventually changed its decode pipeline to consume libavcodec hwaccels (it currently does not).
- Row C1 (Mesa upstream panvk/v7 promotion) is the same scope as C2 but at higher cost and with longer wall-clock until it lands in stock packages. Lower priority than C2 for the same symptom set.
- Row D (compositor DRM-shim) is brittle, narrow, and lifts only S1.
The campaign as documented does not pick. This ranking informs which row(s) would be worth the next engineering investment if a fix is to be enacted; that decision is Markus's, not this document's.
7. What this campaign deliberately does NOT do
- Does not pick or propose a patch. A and B in §6 are both reasonable; neither is enacted here.
- Does not recommend a player. “Use gst-play-1.0” would be a workaround, not a fix to the gap. (gst-play-1.0 also exhibits the S5 regression on its own; the working path is fragile.)
- Does not patch any single player as
*-ohm-gl-fix. The Phase 1 / Phase 4 originally suggestedmpv-ohm-gl-fix/libplacebo-ohm-gl-fixpackages on marfrit-packages; both were retracted. A per-player fix lifts one symptom and leaves the gap. - Does not investigate the S5 regression. Stack drift between fourier 2026-04-24 (0/62) and ohm_gl_fix 2026-04-30 (~0.3 drops/sec) is a separate concern. Likely candidates within marfrit-packages' custom mesa / ffmpeg / alsa / libdrm-pinebookpro builds, per Markus 2026-04-30. A separate iteration would bisect via
pacman.log.
8. Phase 1 metric — refined
- Original (locked 2026-04-30 morning): “on
bbb_1080p30_h264.mp4withmpv –hwdec=v4l2request –vo=gpu-nextover a 60 s window,drops_post_warmup == 0.” - Refined (locked 2026-04-30 evening, after the perf invalidation and Markus reframe): “the structural gap is named; every Phase 6 symptom (mpv, ffplay, VLC, browser HW decode, the gst regression) is traced to the gap with file:line evidence; a fix-surface assessment names what work would actually close it; the campaign ships documentation, not a patch.”
metrics.csv's phase1_baseline row remains valid as the symptom that opened the campaign. phase1_goal_target is left for historical context but no longer drives the campaign's success criterion. The success criterion now is qualitative — the gap identification — and is verified at Phase 7 by review of this document against a second pair of eyes.
9. References
metrics.csv— original quantitative anchor (Phase 1 / Phase 3 baselines).phase2.md— substrate (versions, V4L2 buffer-pool, KWin and panfrost capability surveys).phase3/findings.md— Findings 1–6, the perf-grounded symptom inventory.phase3/INDEX.md— durable evidence file index per finding.phase3/source_archaeology/— upstream source files at exact campaign-relevant tags, for independent verification of file:line citations.- fourier
README.mdL236-281 — prior investigation of S4 (browser HW decode via libva-v4l2-request). - DokuWiki:
ohm_gl_fix:phase1_2026-04-30,ohm_gl_fix:phase2_2026-04-30. - Dev process:
~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md.
