> **2026-05-01:** This page is superseded by [[ohm_gl_fix:phase4_2026-05-01]]. The 2026-04-30 page is preserved below for audit trail; the live Phase 4 plan is the new page. ====== Phase 4 — The Gap (ohm_gl_fix iteration 1) ====== > **Status — replaces the SUPERSEDED 2026-04-30 libplacebo-cache draft. The earlier plan is in git history; it was retracted after ''perf record'' showed libplacebo accounted for 0.41 % of CPU and the patched code path was not being executed.** This page is the campaign's deliverable: a structural-gap identification, traced from the symptoms to the gap, with a fix-surface assessment. It is **not** a player recommendation, not a per-player patch plan, and not a workaround proposal. Per Markus's reframe 2026-04-30 — "I am interested to find the gap which makes (a) VLC unusable and (b) facilitates all programs outputting video to run efficiently, i.e. web browsers" — this campaign's job is to name the gap, not to fill it. ---- ===== 1. Goal (essence) and use-case scope ===== The Phase 1 measurable target ("on ''bbb_1080p30_h264.mp4'' with ''mpv --hwdec=v4l2request --vo=gpu-next'' over 60 s, ''drops_post_warmup == 0''") was the symptom whose investigation led here. The campaign's actual goal, as clarified during Phase 3, is: > Identify the structural gap such that filling it would lift VLC out of unusability **and** would let any video-displaying program on Mali-G52 + KWin Wayland (web browsers especially) run with efficient HW-accelerated playback against stock libraries. The deeper framing, accepted 2026-04-30: **the predicament is the buffer-to-display path without CPU copy.** Decode is not the issue — hantro-VPU on RK3566 (and rkvdec on RK3588) can decode H.264 1080p with substantial headroom, and libva can produce ≈300 fps of decoded buffers with ''display=null''. The breakage is on post-decode handoff: when the decoded dmabuf needs to land on the screen and there is no zero-copy path, every consumer in the chain breaks. **In-scope use cases** (this informs how fix-surface rows are ranked in §6): * YouTube in Brave (Chromium-based browser video, the highest-traffic workload on this device class). * General web browsing in Brave (compositor-side video / animations / WebGL). * VS Code (Electron + Chromium under the hood; same compositor pipeline as Brave). **Explicitly out of scope:** * 3D games / Tux Racer / Doom / GTA / Proton/DXVK / general-purpose Vulkan workloads. Software-emulated mandatory-1.2 entry points with poor performance characteristics are acceptable as long as they don't degrade the in-scope use cases. * mpv / ffplay / VLC as primary daily players. They appear as symptoms in §4 because they're representative test instruments that exercise the same libavcodec hwaccel chain; they are not the workloads that motivate the fix. Phase 1 metric, refined: a single gap is named; every observed symptom (mpv, ffplay, VLC, Chromium-via-VAAPI, plus the gst-launch regression) is traced to it; a fix-surface assessment names the shape of work that would actually close it, ranked by impact on the in-scope use cases. Anything that closes fewer than all listed symptoms is a workaround for the scope-out symptoms; //for the in-scope use cases// a partial fix may still be a proper fix. ===== 2. The gap, in one paragraph ===== There is no completed integration of "V4L2 stateless decode → GPU-displayable surface" in the stock Linux video stack on aarch64 SBCs running mainline Wayland — //outside// the GStreamer ''v4l2codecs'' plugin + ''linux-dmabuf-v1'' Wayland-protocol path. Every other client of the V4L2 stateless decoder (libavcodec hwaccels, libva-v4l2-request, libplacebo's drm_prime importer, mpv's drmprime hwdec) inherits a different incompleteness on its specific chain, and each chain's gap manifests as a different specific failure. The absence of //any one// completed end-to-end path through libavcodec or libva is the structural gap. There is no single missing function or typo'd condition that, if fixed, would lift every symptom — what is missing is one **completed integration story** that the libavcodec and libva ecosystems can both navigate without depending on infrastructure (Vulkan, DRM master) that aarch64 + Wayland clients do not have. ===== 3. Why this is one gap and not N independent bugs ===== A naive read says "four players failed for four different reasons". That is true at the file:line level — see the symptom inventory below — but every chain, when traced upward, terminates at the same architectural decision: //the assumption that a hardware video-decode pipeline ends in either Vulkan or DRM-master access.// Both assumptions match desktop-class hardware (Intel/AMD/NVIDIA, on a TTY-anchored X11 or KMS direct-scanout client). Both assumptions break on: * **Mali-G52 / Bifrost gen 2** — panvk Vulkan implementation gap. Not "no ''VK_KHR_video_decode_*''" — that codec extension is irrelevant to this campaign because hantro already does the decode and the Vulkan side only needs to //present// the resulting dmabuf (the NVIDIA NVDEC→Vulkan model). The actual gap is the **API version envelope**: panvk on Bifrost v7 carries every dmabuf-import-and-present extension required (verified 2026-04-30, see ''phase3/research_2026-04-30_panvk_brokenness.md''), but the device advertises Vulkan 1.0 while libplacebo's Vulkan consumers gate on ≥1.2 (ffplay's renderer) or ≥1.3 (mpv's gpu-next). Mesa promoted Valhall v10 to Vulkan 1.1+ (Mesa 25.0) and then 1.2 conformant on G610; the same promotion has not been done on Bifrost v6/v7. Plus the device is gated default-off behind ''PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1''. The gate's stated reason in Mesa MR ''!13016'' (2021, Vizoso): "we miss a lot of functionality, which would cause so many crashes that the runs aren't practical" — generic 2021 panvk incompleteness, not a v7-hardware-specific characterisation. The CI runs only ''dEQP-VK.api.copy_and_blit.*'' + ''fill_and_update_buffer.*'' to this day. **What the modern panvk-v7 actually crashes on, in the use-case-relevant subset of Vulkan (image-import, swapchain, sampler ycbcr, present), is currently uncharacterised in this campaign and would be the right thing to test before committing to a fix-surface direction that depends on it.** * **KWin Wayland** (and Mutter, sway, river — every Wayland compositor, by Wayland-spec design) — clients do not get DRM master. Anything in the stack that reaches for ''drm_params_v2'' or equivalent fails. mpv's ''drmprime-overlay'' loader is one example; any libavcodec hwaccel that wants direct KMS scanout is another. The "one path that works" — ''gst v4l2codecs'' → ''linux-dmabuf-v1'' waylandsink — works because GStreamer has its own pipeline-level dmabuf negotiation (the `caps = video/x-raw(memory:DMABuf), format=DMA_DRM` capability we observed during Finding 6 probing) that bypasses both libavcodec hwaccels //and// libva entirely. The compositor accepts the dmabuf via the linux-dmabuf-v1 protocol. No Vulkan, no DRM master, no library chain involving libavcodec hwaccel display. **That path was designed for this hardware class. The libavcodec/libva paths were not.** ===== 4. Symptom inventory ===== All measured 2026-04-30 on ohm (PineTab2, RK3566, Mali-G52, hantro VPU, kernel 6.19.10, mesa 26.0.5, KWin 6.6.4, Plasma 6.6.4) playing ''bbb_1080p30_h264.mp4'' (60 s, 1440 frames @ 24 fps). ^ # ^ Client ^ Decode path attempted ^ What broke ^ Evidence ^ | S1 | mpv 0.41.0 + gpu-next | libavcodec → v4l2request hwaccel → drm_prime → libplacebo GL backend | ''drmprime-overlay'' loader's ''init()'' calls ''ra_get_native_resource("drm_params_v2")'' which returns NULL under Wayland; vd_lavc bails to SW; 134 % CPU, 70 % drops | Findings 4 & 5; ''phase3/baseline_2026-04-30_mpv_verbose.log''; ''mpv_v0.41.0_video_out_hwdec_hwdec_drmprime_overlay.c:290'' | | S2 | ffplay (FFmpeg n8.1) | libavcodec → v4l2request hwaccel → drm_prime → libplacebo Vulkan renderer | (1) without ''PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1'': panvk gates Bifrost-v7 default-off, no Vulkan device → "Enable vulkan renderer to support hwaccel v4l2request" + ''VK_ERROR_INITIALIZATION_FAILED''. (2) with the flag set: panvk enumerates Mali-G52 with all dmabuf-import extensions present, but advertises ''apiVersion = 1.0.335''; ffplay rejects with //"Device API version 1.0.335 is lower than the minimum required version of 1.2.0, cannot proceed!"// Both cases fall to SW. | ''phase3/research_2026-04-30_ffplay_with_broken_flag.txt''; ''phase3/research_2026-04-30_vulkaninfo_panvk_v7_with_broken_flag.txt''; ''phase3/research_2026-04-30_panvk_brokenness.md'' | | S3 | VLC 3.0.22 | bundled libavcodec 58.134 (ffmpeg 4.4) → vdpau / vaapi_vld / yuv420p only | bundled libavcodec predates ''v4l2request'' hwaccel landing; no path even attempted; 235 % CPU, slow-motion playback (110 s wall for 60 s media), late-picture drops | ''phase3/cross_player_2026-04-30_vlc_vout_and_gst_idle.txt''; ''phase3/cross_player_2026-04-30_vlc_qt_and_gst_drops_trajectory.txt''; VLC's PKGBUILD ''--disable-libplacebo'' line | | S4 | Chromium / Brave (browser HW decode) | Chromium VaapiVideoDecoder → libva → libva-v4l2-request → V4L2 ioctls | libva-v4l2-request hardcodes single-plane (sunxi-cedrus) buffer setup; RK3568 hantro is multiplanar; ''vaCreateContext'' fails after format enumeration succeeds; falls to libavcodec SW | fourier README L236-281 (prior investigation) | | S5 | gst v4l2slh264dec → waylandsink (the "working path" reference) | GStreamer v4l2codecs → linux-dmabuf-v1 protocol direct | regression vs fourier 2026-04-24's 0/62 drops; today reports ~0.3 drops/sec on the same pipeline. Stack drift in 6 days. | Finding 6; ''phase3/cross_player_2026-04-30_vlc_qt_and_gst_drops_trajectory.txt'' | S5 is included not as a failure of the "working path" but as evidence that even that path is fragile under the marfrit-packages custom-stack drift Markus already maintains (mesa, ffmpeg, alsa, libdrm-pinebookpro). The gap analysis below does not attempt to explain S5; it is recorded here as a known follow-up. ===== 5. Trace from each symptom to the gap ===== * **S1 (mpv).** mpv assumed "if libavcodec produces drm_prime frames, the VO can ingest them via the drmprime hwdec interop, whose loader can get a DRM fd from the native display." On Wayland, the native display is the compositor; the compositor does not give clients DRM master. Without DRM-master, no drm_params_v2, no drmprime-overlay, no completed hwdec group → vd_lavc bails. The integration assumption //"the VO can reach the KMS layer"// breaks under Wayland. * **S2 (ffplay).** libavcodec n8.x's v4l2request hwaccel was wired to require libplacebo's Vulkan renderer. The integration assumption isn't //"the consumer can use Vulkan-the-codec-engine"// (''VK_KHR_video_decode_*'', irrelevant — hantro decodes); it is //"the consumer can use Vulkan ≥ 1.2 as the presentation backend"// (import dmabuf, sample, swapchain present). panvk on Bifrost v7 has every required extension but is stuck at Vulkan 1.0 and default-off-gated. Mesa upstream lifted Valhall v10 to ≥ 1.2; v6/v7 promotion has not been done. Verified 2026-04-30 by setting ''PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1'' and running ffplay: device enumerates, dmabuf-import extensions present, ffplay rejects on //"API version 1.0.335 is lower than the minimum required version of 1.2.0"//. * **S3 (VLC).** VLC's Arch package was built ''--disable-libplacebo'' against a bundled ffmpeg 4.4. The integration assumption //"distros will ship a libavcodec new enough to have v4l2request"// breaks where downstream package decisions favour stability or legacy compatibility over hwaccel currency. (Newer VLC 4.x changes this picture; on this stack, the legacy dependency wins.) * **S4 (Chromium).** libva-v4l2-request was written for sunxi-cedrus and never completed a multiplanar port for the kernel-mainline V4L2 stateless decoders that ship on Rockchip / NXP / RK35xx hardware. The integration assumption //"libva-v4l2-request will eventually have a multiplanar implementation"// has been blocked on maintainership for years (per fourier README L267-274). The four assumptions are independent at the code level; they are unified at the //integration-story// level. None of the upstream projects involved (libavcodec, libplacebo, mpv, libva-v4l2-request, VLC) carries primary responsibility for the integration as a whole. That is the gap. ===== 6. Fix surface candidates ===== Each row below describes a //direction in which a fix could live//. None is proposed by this campaign — Markus's "no upstreaming unless specifically tasked" policy applies, and even within that policy, this Phase 4 documents rather than picks. Tractability assessments are rough. ^ Direction ^ What it would lift ^ Tractability ^ Where the work lives ^ | **A. Complete libva-v4l2-request multiplanar port** | S4 (browsers via libva); S3 partial (VLC if it migrates to libva-vaapi for HW decode); S2 partial (ffplay via vaapi backend) | Hard. fourier started this with local patches; the upstream is "effectively unmaintained" (fourier L267-274). A multiplanar rewrite of context.c / picture.c / v4l2.c is months of work. | bootlin / Collabora / community fork of ''libva-v4l2-request'' | | **B. Add a non-Vulkan, non-DRM-master path in libavcodec drm_prime hwaccel** | S1 (mpv); S2 (ffplay); plus future libavcodec consumers on this hardware class | Medium. The path would be: drm_prime → linux-dmabuf-v1 protocol export → compositor consumes via dmabuf-direct, like GStreamer's waylandsink does today. Requires libavcodec to learn Wayland-protocol negotiation (or to delegate it cleanly to consumers). | FFmpeg upstream; libplacebo's GL backend; mpv's drmprime hwdec | | **C1. Promote panvk on Bifrost v7 from Vulkan 1.0 to Vulkan ≥ 1.2 (upstream Mesa)** | S1 (mpv via gpu-next-vulkan) and S2 (ffplay's v4l2request hwaccel). Markus 2026-04-30 corrected an earlier framing here: the campaign needs Vulkan as the //presentation backend// for an externally-decoded NV12 dmabuf (NVIDIA's NVDEC→Vulkan model), not Vulkan as a codec engine. With ''PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1'', panvk on G52 advertises **all** the dmabuf-import-and-present extensions needed (''VK_EXT_external_memory_dma_buf'', ''VK_KHR_external_memory_fd'', ''VK_EXT_image_drm_format_modifier'', ''VK_KHR_sampler_ycbcr_conversion'', ''VK_KHR_swapchain'', ''VK_KHR_external_semaphore_fd'', ''VK_KHR_wayland_surface''). The proximate wall is ''apiVersion = 1.0.335''; ffplay rejects on ''1.0 < 1.2'', libplacebo gpu-next on ''1.0 < 1.3''. | Tractable but maintainership-bound. Mesa already promoted Valhall v10 to ≥ 1.2 (Mesa 25.0); the same work on Bifrost v7 has not been done because active panvk effort focuses Valhall + 5th-gen. Bringing panvk/v7 to Vulkan 1.1 then 1.2 means implementing or stubbing each mandatory feature added in those versions. Whether modern panvk/v7 actually crashes on use-case-relevant test paths (image-import, swapchain, sampler ycbcr, present) is uncharacterised — see §3 note. | Mesa upstream ''src/panfrost/vulkan/''. | | **C2. Out-of-tree Vulkan layer (''panvk-1.2-fakeshim'') that lies about apiVersion** | Same as C1 (S1, S2). Same scope on the in-scope use cases (browser does not use this libplacebo path). | Lower upfront cost than C1. A Vulkan layer (not ICD wrapper) sits between application and the stock Mesa panvk ICD; explicitly opt-in via ''VK_INSTANCE_LAYERS='' env var, no system-wide effect on Chromium / general-purpose Vulkan apps. Three categories of work: (i) intercept ''vkGetPhysicalDeviceProperties2'' / ''vkEnumerateInstanceVersion'' to advertise apiVersion ''1.2.0''; (ii) pass-through redirects from core-1.2 entry points to KHR-extension equivalents already in panvk (''VK_KHR_sampler_ycbcr_conversion'', ''VK_KHR_descriptor_update_template'', ''VK_KHR_maintenance1/2'', ''VK_KHR_dedicated_allocation''); (iii) stubs or software-emulated implementations for genuinely-mandatory-1.2 entry points panvk doesn't have (timeline semaphores via binary-sem + condvar; render-passes-2.0 lowered to V1; ''vkResetQueryPool''; assert-abort on rare paths). Software emulation costs are acceptable per the §1 use-case scope: not gaming, not Proton/DXVK. The layer's correctness only has to hold for the dmabuf-import-and-present hot path. | New territory. No existing-art for "lie about apiVersion + stub mandatory entry points" Vulkan layer that Claude could find; closest are property-only simulators (''VK_LAYER_KHRONOS_profiles''). Patch surface lives outside Mesa, in a small standalone repo. Probably ~few KLOC of layer C with an ongoing tail as Vulkan minors land new mandatory entry points. | | **D. Compositor-level DRM-shim for Wayland clients** | S1 (mpv specifically — drmprime-overlay would get its drm_params_v2) | Medium-low. Would need a Wayland protocol extension that grants clients enough KMS view to satisfy drmprime-overlay without granting full DRM master. KWin or wlroots would have to participate, and the protocol is not on either roadmap. Brittle. | Wayland-protocols + KWin / wlroots upstream | **No row above lifts every listed symptom.** A fix that lifts S3 specifically requires a downstream packaging change at the distro level (rebuild VLC against current ffmpeg with libplacebo enabled, or wait for VLC 4.x to land in stable Arch) — not something any of A-D upstream projects would deliver. This is part of the gap's shape: the symptom set is //not// uniformly fixable from any single location, because the integration that's missing was always going to require coordination across libavcodec, libva, the libplacebo chain, the compositor, and downstream packagers. ==== Ranking against the §1 in-scope use cases ==== The four rows lift different symptoms; "most symptoms" is not the right ranking metric for the in-scope use cases. Brave / Chromium video decode goes through `Chromium VaapiVideoDecoder → libva → libva-v4l2-request`, **not** through libavcodec hwaccel + libplacebo + Vulkan/GL. So the libplacebo-chain fixes (B, C1, C2) lift mpv and ffplay, //which Markus does not use//, and do not touch Brave's video decode pipeline. Use-case-ranked: - **Row A (libva-v4l2-request multiplanar port)** is the only row that lifts S4 (browser HW video decode). YouTube in Brave is in S4. Without A, browser HW decode does not engage; Brave / VS Code / Chromium fall to libavcodec SW decode, which defeats the buffer-to-display predicament for the highest-traffic workload on this device class. - **Row C2 (Vulkan layer, in-tree-of-its-own-repo)** is a smaller, self-contained engineering effort that lifts S1+S2. Worth a feasibility test (build the layer, see whether ffplay completes a 60 s playback) because the cost is bounded and the result informs whether C-class fixes are tractable in general. **Does not address the in-scope use cases directly**, since browsers don't traverse this chain — but useful as a vehicle for characterising what panvk-v7 actually does and doesn't crash on, which is currently unknown. - **Row B (libavcodec drm_prime → linux-dmabuf-v1 path)** is architecturally cleanest and would generalise across consumers, but the work lives in FFmpeg upstream and would not be in-scope-impactful unless Brave eventually changed its decode pipeline to consume libavcodec hwaccels (it currently does not). - **Row C1 (Mesa upstream panvk/v7 promotion)** is the same scope as C2 but at higher cost and with longer wall-clock until it lands in stock packages. Lower priority than C2 for the same symptom set. - **Row D (compositor DRM-shim)** is brittle, narrow, and lifts only S1. **The campaign as documented does not pick.** This ranking informs which row(s) would be worth the next engineering investment //if// a fix is to be enacted; that decision is Markus's, not this document's. ===== 7. What this campaign deliberately does NOT do ===== * **Does not pick or propose a patch.** A and B in §6 are both reasonable; neither is enacted here. * **Does not recommend a player.** "Use gst-play-1.0" would be a workaround, not a fix to the gap. (gst-play-1.0 also exhibits the S5 regression on its own; the working path is fragile.) * **Does not patch any single player as ''*-ohm-gl-fix''.** The Phase 1 / Phase 4 originally suggested ''mpv-ohm-gl-fix'' / ''libplacebo-ohm-gl-fix'' packages on marfrit-packages; both were retracted. A per-player fix lifts one symptom and leaves the gap. * **Does not investigate the S5 regression.** Stack drift between fourier 2026-04-24 (0/62) and ohm_gl_fix 2026-04-30 (~0.3 drops/sec) is a separate concern. Likely candidates within marfrit-packages' custom mesa / ffmpeg / alsa / libdrm-pinebookpro builds, per Markus 2026-04-30. A separate iteration would bisect via ''pacman.log''. ===== 8. Phase 1 metric — refined ===== * **Original** (locked 2026-04-30 morning): "on ''bbb_1080p30_h264.mp4'' with ''mpv --hwdec=v4l2request --vo=gpu-next'' over a 60 s window, ''drops_post_warmup == 0''." * **Refined** (locked 2026-04-30 evening, after the perf invalidation and Markus reframe): "the structural gap is named; every Phase 6 symptom (mpv, ffplay, VLC, browser HW decode, the gst regression) is traced to the gap with file:line evidence; a fix-surface assessment names what work would actually close it; the campaign ships documentation, not a patch." ''metrics.csv'''s ''phase1_baseline'' row remains valid as the symptom that opened the campaign. ''phase1_goal_target'' is left for historical context but no longer drives the campaign's success criterion. The success criterion now is qualitative — the gap identification — and is verified at Phase 7 by review of this document against a second pair of eyes. ===== 9. References ===== * ''metrics.csv'' — original quantitative anchor (Phase 1 / Phase 3 baselines). * ''phase2.md'' — substrate (versions, V4L2 buffer-pool, KWin and panfrost capability surveys). * ''phase3/findings.md'' — Findings 1–6, the perf-grounded symptom inventory. * ''phase3/INDEX.md'' — durable evidence file index per finding. * ''phase3/source_archaeology/'' — upstream source files at exact campaign-relevant tags, for independent verification of file:line citations. * fourier ''README.md'' L236-281 — prior investigation of S4 (browser HW decode via libva-v4l2-request). * DokuWiki: ''ohm_gl_fix:phase1_2026-04-30'', ''ohm_gl_fix:phase2_2026-04-30''. * Dev process: ''~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md''.