User Tools

Site Tools


ohm_gl_fix:phase4_2026-05-01

ohm_gl_fix — Phase 4, 2026-05-01

This page replaces both prior Phase 4 drafts: the original libplacebo fd-cache plan (retracted after perf record showed libplacebo at 0.41 % of CPU and the patched code path not on the hot path) and its in-place revision into a “documentation of the gap” page. Phase 4 is now a plan, not an enumeration. It picks one fix surface, names the implementation, states what gets measured at Phase 7, and identifies the loopback edges.

The driver of this rewrite: Phase 1 was refined on 2026-05-01 with machine-readable criteria (Phase 1 revised §4 — C1 drops, C2 LLC-load-misses, C3 DRM_IOCTL/sec, C4 boundary fd-passing) and Phase 3 was rebuilt on the same day with empirically-grounded boundary characterisation (Phase 3 revised §3, §4). With both anchors in place, Phase 4 can commit.

2026-05-01 amendment (post-Phase 5 review): Q1 (Brave's V4L2VideoDecoder reachability) closed by the strings /opt/brave-bin/brave deep-dive. UseChromeOSDirectVideoDecoder / V4L2FlatStatelessVideoDecoder / V4L2StatelessVideoDecoder / V4L2H264Decoder all return 0 matches in this Brave build (Arch Linux ARM brave-bin, 2026-04-30). The single V4L2VideoDecoder string match is vestigial; all actual V4L2 source-line strings in the binary are camera-capture (v4l2_capture_delegate.cc, libtegrav4l2.so), not video-decode. The V4L2 direct-decode path is not compiled in for this build, so fix surface A (libva-v4l2-request multiplanar) stands. Q2 (Step 0 methodology fix) and Q3 (Step 0.5 kernel UAPI surface audit + R1 trigger revision) are folded into §3 and §6 below. Q4 (test corpus extension) lives in phase4_step1_test_corpus_2026-05-01.
2026-05-01 Step 0 finding (Phase 6 dipping): Step 2 is confirmed needed, not conditional. Chromium M138's overlay-delegation gate at ui/ozone/platform/wayland/host/wayland_connection.cc ShouldUseOverlayDelegation() lines 495-509 includes the predicate !fractional_scale_manager_v1() — KWin advertises wp_fractional_scale_manager_v1 (verified empirically in mpv's verbose log), so the predicate returns false unconditionally on KWin Wayland regardless of feature flag. Step 2 patch site is now named with file:line. Step 0 details: phase6/step0_chromium_wayland_routing_2026-05-01 (companion: phase6/step0_5_uapi_audit_2026-05-01).

1. What this Phase 4 is targeting

Phase 1 revised §2 named the in-scope workloads:

  • YouTube / HTML5 <video> in Brave
  • Web browsing in Brave (compositor-side video + animation)
  • VS Code (Electron + Chromium under the hood)

All three traverse the Chromium video pipeline:

VaapiVideoDecoder → libva → libva-v4l2-request → V4L2 stateless

This is not the libavcodec hwaccel chain that mpv, ffplay, and VLC use. Browsers vendor their own ffmpeg fork and gate hardware video decode through libva. Therefore: the fix surfaces from the prior Phase 4 enumeration that touch libavcodec (B “libavcodec drm_prime → linux-dmabuf-v1”) or libplacebo (C2 “panvk-1.2-fakeshim”) do not lift the in-scope use cases, however structurally clean they look in isolation. The empirical entrypoint for Brave is libva, and libva on this hardware fails at vaInitialize (Phase 3 revised §1, §8; also fourier README L236-281).

Phase 4 commits to fix surface A: libva-v4l2-request multiplanar port as the primary direction, with an explicit pre-implementation research step (Step 0) that may discover the campaign needs a follow-up Chromium-side patch.

2. Decision rationale

Three reasons to commit to A specifically:

  1. It is the only fix surface that touches Brave's actual chain. B (libavcodec) and C2 (libplacebo Vulkan layer) target consumers Markus does not use. D (compositor DRM-shim) is a Wayland-protocol proposal that does not exist upstream and would not survive a Phase 5 review.
  2. Substantial groundwork exists. fourier's local libva-v4l2-request patches (on ohm at ~/fourier-test/libva-patches/fourier-local.patch) already get the bootlin source past format enumeration on the multiplanar hantro device (fourier README L240-256). The starting point is not “from zero” — it is “from probe-passing, multiplanar buffer setup still single-plane”.
  3. It addresses the structural gap, not the symptom. Phase 1 revised's criteria all hold globally for libva consumers once A is delivered, not just for one application. fourier already flagged this as the right axis (“browser HW video decode on ohm is parked until a multiplanar libva-v4l2-request rework exists, either ours or someone else's”, fourier README L276-281).

Note explicitly: A alone may not suffice. Once the libva chain produces a NV12 dmabuf for Brave's VaapiVideoDecoder, the display side — Chromium's GPU-process compositor — still has to present that dmabuf without per-frame Mesa GL+DRM round-trips (Phase 1 revised's C3, ≤100 DRM_IOCTL/sec). Whether Chromium does this on Wayland today, or needs an additional patch, is the open question Step 0 below answers before code is written.

3. Implementation plan

Step 0 — Research: characterise Chromium's Wayland video presentation path

Duration: 3–7 days. Output: decision document attached to this Phase 4 plan, naming whether Step 2 is required.

Question to answer: when VaapiVideoDecoder produces a NativePixmap (= dmabuf-backed VA-API surface) on chrome –ozone-platform=wayland, does Chromium's GPU process present it via zwp_linux_dmabuf_v1 subsurface (Wayland direct overlay) or via Skia GL composite onto the page's main surface?

Concrete sub-tasks:

  1. Source archaeology in Chromium (current Brave-bin's underlying Chromium version, likely M138-class):
    • ui/ozone/platform/wayland/host/wayland_buffer_manager_host.cc and surrounding files — Wayland buffer attachment.
    • components/viz/service/display_embedder/ — overlay candidate surface processing.
    • media/gpu/vaapi/ — VA-API surface to native-pixmap conversion.
    • gpu/ipc/service/gpu_video_decode_accelerator_helpers.cc — dmabuf flow from decoder to compositor.
  2. Static source trace (replaces the SW-decode synthesis test that was here in the pre-Phase-5 draft — Phase 5 reviewer flagged it as broken-by-design: SW-decode produces shmem buffers not NativePixmap dmabufs, so the test cannot validate whether a hardware-decode NativePixmap would be routed via zwp_linux_dmabuf_v1). Trace the path VaapiPicture / VaapiPictureNativePixmapOzone → NativePixmap → GpuMemoryBuffer → SharedImageBacking → wayland_buffer_manager_host in Chromium M138-class. Determine statically whether the subsurface path is gated on GpuMemoryBufferType == NATIVE_PIXMAP or on some other condition. Cite source file:line in the decision document.
  3. Stub libva driver test (optional, only if static analysis is inconclusive). Build a stub libva backend that returns a valid NativePixmap backed by a linear dma-heap allocation (no hantro needed). Run Brave with LIBVA_DRIVER_NAME pointing at the stub. Observe whether the GPU process emits PRIME_FD_TO_HANDLE or SCM_RIGHTS on the Wayland socket. This isolates the compositor routing question from the decode question.
  4. Feature flag inventory: check chrome:flags and –enable-features= for relevant entries: VaapiVideoDecoder, VaapiVideoDecodeLinuxGL, UseChromeOSDirectVideoDecoder, UseDelegatedCompositing, DelegatedCompositingLimitToUi, AcceleratedVideoDecodeLinuxGL, wayland-screen-coordinates, ozone-overlay-priority-hint. Output gate: decision document records whether Chromium's GPU process under default flags will route a working VA-API dmabuf to zwp_linux_dmabuf_v1 (Step 2 not needed) or composite via Skia GL (Step 2 needed) — with the source file:line that creates the Wayland buffer for a VA-API NativePixmap explicitly cited (per Phase 5 review Q2 output gate). The decision document attaches to this Phase 4 page after Step 0 completes. ==== Step 0.5 — Kernel UAPI surface audit ==== Duration: 1–2 days. Output: documented control-structure layout that the hantro driver actually consumes. Inserted post-Phase-5-review per Q3 — the V4L2 stateless request-API control payload format on hantro G1/G2 (RK3566) is poorly documented in UAPI headers alone, and a control-payload mismatch produces silent black-frame failures rather than EINVAL. fourier's local libva-v4l2-request patches were validated against the GStreamer codepath's buffer-management model, not libva's allocation model, so they don't pre-empt the question. Concrete sub-tasks: - strace -f -e trace=ioctl -e signal=none -o /tmp/gst_h264.strace gst-launch-1.0 -q filesrc location=bbb_1080p30_h264.mp4 \! qtdemux \! h264parse \! v4l2slh264dec \! fakesink. If strace truncates the embedded payload-data field, fall back to ftrace tracepoints on vidioc_* for fuller capture. - Extract the exact byte payload of VIDIOC_S_EXT_CTRLS calls for one I-frame and one P-frame. - Compare byte-for-byte against the kernel header include/uapi/linux/v4l2-controls.h V4L2_CID_STATELESS_H264_* structs (specifically V4L2_CID_STATELESS_H264_DECODE_PARAMS, V4L2_CID_STATELESS_H264_SLICE_PARAMS, V4L2_CID_STATELESS_H264_PRED_WEIGHTS, V4L2_CID_STATELESS_H264_SCALING_MATRIX, V4L2_CID_STATELESS_H264_DECODE_MODE, V4L2_CID_STATELESS_H264_START_CODE). - Document the actual hantro driver control-structure layout: field ordering, padding, reference-frame DPB array conventions, VIDIOC_STREAMON sequencing relative to request fd lifecycle. Output gate: the documented control-structure layout serves as the per-byte template for Step 1 src/picture.c / src/h264.c work. If the layout diverges from kernel-header naive interpretation (highly likely on hantro), Step 1 starts with the actual layout, not the header layout. ==== Step 1 — libva-v4l2-request multiplanar port ==== Duration: 4–8 weeks of focused work; the lower end if fourier's local patches and Phase 2 §3 substrate (9-fd capture pool, NV12 single-plane 1920×1088 sizeimage = 3 655 712) generalise. The upper end if hantro's request-API control set turns out to need additional reverse-engineering against the kernel driver (drivers/staging/media/rkvdec/ / drivers/staging/media/hantro/). Source basis: * Upstream fork: https://github.com/bootlin/libva-v4l2-request (last meaningful commit ~years ago per fourier; confirm at Step 1 start). * fourier local patches: ~/fourier-test/libva-patches/fourier-local.patch — HEVC stripped (RK3566 has no HEVC HW), missing #include “utils.h” in src/h264.c restored, src/config.c format-enumeration extended to try both V4L2_BUF_TYPE_VIDEO_OUTPUT and V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE (fourier README L240-256). Concrete work surface, in order: - Fork + import groundwork. Set up marfrit-packages/libva-v4l2-request-ohm-gl-fix/. Apply fourier's patches as the patch-zero baseline. pkgname= libva-v4l2-request-ohm-gl-fix, provides+conflicts+replaces= libva-v4l2-request. Build via fermi (Gitea Actions runner archlinuxarm aarch64). - Multiplanar buffer setup in src/v4l2.c. Replace single-plane v4l2_buffer / v4l2_format usage with MPLANE variants (VIDIOC_S_FMT on V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE for bitstream input, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE for NV12 output; VIDIOC_QBUF / VIDIOC_DQBUF with planes[] arrays). The Phase 2 §3 strace evidence (ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime producing 9 VIDIOC_EXPBUFs with NV12 single-plane sizeimage = 3 655 712) is the per-buffer template. - Multiplanar context lifecycle in src/context.c. Replace vaCreateContext single-plane buffer-pool setup with multiplanar pool that mirrors the VIDIOC_REQBUFS+CREATE_BUFS, count=1-loop pattern Phase 2 captured. Capture ring depth = 9 (per Phase 2 §3). Output ring (bitstream input) depth = 4. - Multiplanar slice submission in src/picture.c and src/h264.c. Adapt request-API frame submission: build V4L2_CTRL_*_HEADER control payloads (SPS, PPS, decode params, slice params, scaling matrix) attached to the request fd, VIDIOC_QBUF the bitstream input MPLANE buffer with the request fd, VIDIOC_DQBUF the capture MPLANE NV12 buffer after decode. The kernel UAPI is in include/uapi/linux/v4l2-controls.h V4L2_CID_STATELESS_H264_* (note: the older V4L2_CID_MPEG_VIDEO_HEVC_* was renamed; H264 was renamed to V4L2_CID_STATELESS_H264_* on the same wave). - NativePixmap export. Ensure each capture-side dmabuf fd flows out of libva to the caller (Chromium's VaapiPicture) as a NativePixmap with the right DRM format (DRM_FORMAT_NV12) and modifier (DRM_FORMAT_MOD_LINEAR per Phase 3 Finding 1). Verify the modifier matches what Chromium will accept. - Test corpus. Run against: * bbb_1080p30_h264.mp4 (the campaign's reference clip). * vainfo (libva self-test) on /dev/dri/renderD128 equivalent. * Any failure cases noted by fourier (README L319-340, “test corpus” — pull list at Step 1 start). - Package + publish. PKGBUILD finalised, builds on fermi, pushes to marfrit-packages pacman repo. ==== Step 2 — Chromium display-side patch (confirmed needed by Step 0 finding 2026-05-01) ==== Status: Step 0 found that Chromium M138's overlay-delegation system is force-disabled on KWin Wayland by a single predicate. Step 2 is no longer conditional. Trigger met. Patch site: chromium/ui/ozone/platform/wayland/host/wayland_connection.cc WaylandConnection::ShouldUseOverlayDelegation() lines 495-509: <code c> bool WaylandConnection::ShouldUseOverlayDelegation() const { bool should_use_overlay_delegation = IsWaylandOverlayDelegationEnabled() && !fractional_scale_manager_v1(); should_use_overlay_delegation &= !!single_pixel_buffer(); return should_use_overlay_delegation; } </code> The !fractional_scale_manager_v1() conjunct is the load-bearing fail. KWin advertises wp_fractional_scale_manager_v1; the predicate is false; overlay delegation is force-disabled regardless of feature flag. Patch shape (recommended — minimal blast radius): surface-state-gated relaxation. Replace !fractional_scale_manager_v1() with a check that returns true when the surface's currently-applied scale is integer (1.0, 2.0, etc.). The protocol is allowed to be advertised; we just require the relevant surface isn't *using* fractional scale right now. Preserves correctness when fractional scale IS in fact active for the surface. Two alternative shapes considered and parked: drop the gate entirely and let Viz `OverlayCandidate` validators reject candidates needing viewport-subpixel destinations (bigger refactor, touches Viz code); add a feature flag bypass (crudest, relies on user to know the trade-off). See Step 0 doc §"Patch shape" for full reasoning. Open Step 2 sub-task: characterise the Viz-side per-buffer filtering (`OverlayCandidate` validation in components/viz/service/display/overlay_processor*.cc) that becomes the next-level gate once stage-1 is lifted. Not blocking Step 2 implementation; needed before Phase 7 can predict whether C3 is met by patch alone or also requires a Viz tweak. Build target: chromium-ohm-gl-fix or brave-ohm-gl-fix on marfrit-packages. ABI-compatible patch (small change to one .cc); no soname change. Substantial build cost (Chromium full rebuild on aarch64 takes hours-to-days; consider building on a beefier ARM host or distcc). ==== Step 3 — Verification (Phase 7 prep) ==== After Step 1 (and conditionally Step 2) lands on ohm: - Reinstall: sudo pacman -U libva-v4l2-request-ohm-gl-fix-*.pkg.tar.zst (and conditionally chromium-ohm-gl-fix-*). - Re-run Phase 3 revised §3 v2 strace (ioctl,mmap,munmap,sendmsg,recvmsg) and §4 perf-stat (cache-misses,LLC-load-misses,cycles,instructions) on Brave + bbb_1080p30_h264.mp4 over a 60 s steady-state window. Capture renderer + GPU-process targets. - Check Phase 1 revised C1-C4: * C1 drops ≤ 10 over 60 s, drops_post_warmup = 0 * C2 LLC-load-misses ≤ 9 M / 10 s * C3 DRM_IOCTL/sec ≤ 100 * C4 at least one of (a) VIDIOC_EXPBUF + SCM_RIGHTS OR (b) PRIME_FD_TO_HANDLE from V4L2 dmabuf observed - Append result row(s) to metrics.csv as phase7_verify_*. ===== 4. What's touched, what's not ===== Touched: * libva-v4l2-request — substantial multiplanar rewrite of src/v4l2.c, src/context.c, src/picture.c, src/h264.c. Public ABI preserved (libva-driver entrypoints unchanged); internal restructuring only. * marfrit-packages — new libva-v4l2-request-ohm-gl-fix/ tree. Conditionally: chromium-ohm-gl-fix/ (Step 2 only). * ohm system — pacman -U replaces stock libva-v4l2-request (and conditionally Chromium/Brave) with the campaign packages. Not touched: * mpv, ffplay, VLC, gst-* — these remain on their current paths. Their users will not benefit from Phase 4. Out of campaign scope. * Mesa / panfrost / panvk / libplacebo — their state is unchanged. The panvk-1.2-fakeshim option from prior Phase 4 drafts is not pursued in this iteration. * libavcodec / ffmpeg — Chromium statically vendors its own; the system ffmpeg-v4l2-request-git package is unchanged. * Kernel drivers (hantro-vpu, panfrost). Step 1 builds against the existing UAPI surface; no kernel work. * KWin / Wayland protocol. Step 1 produces dmabuf fds; existing KWin zwp_linux_dmabuf_v1 implementation consumes them. No compositor work. * The S5 regression (Phase 3 revised §6 / §8 — gst-launch waylandsink ~0.3 drops/sec on today's stack vs. fourier 2026-04-24's 0/62). Separate iteration if pursued. ===== 5. Predicted outcome (against Phase 1 revised C1-C4) ===== If Step 0 + Step 1 deliver and Step 2 turns out unnecessary (optimistic case): ^ Criterion ^ Current (Brave SW path) ^ Predicted (Phase 4 delivered) ^ How verified ^ | C1 drops post-warmup ≤ 10 / 60 s | not measured (estimated 100s+ based on Brave's CPU footprint) | 0 drops post-warmup; total drops ≤ 5 (Vulkan-init blip equivalent) | Phase 7 strace/perf-stat re-run | | C2 LLC-load-misses ≤ 9 M / 10 s | Brave GPU process has heavy memcpy traffic (Phase 3 revised §2 — 12.92 % memcpy on GPU process) | ≤ 9 M / 10 s for GPU process (no per-frame dmabuf-to-shmem CPU copy) | perf-stat re-run | | C3 DRM_IOCTL/sec ≤ 100 | not measured for Brave (S2/S3/S4 sit at 800–1 050; S5 at 1 046) | ≤ 100 if Chromium routes the dmabuf via zwp_linux_dmabuf_v1 overlay; otherwise Step 2 needed | strace v2 + boundary_counts.csv extension | | C4 boundary fd-passing | NO (libva fails, no V4L2 path engaged) | YESVIDIOC_EXPBUF from libva, then either SCM_RIGHTS to KWin or PRIME_FD_TO_HANDLE to GL (depending on Step 2 outcome) | strace v2 boundary inspection | If Step 2 is required, the same outcome but reached via Step 1 + Step 2 in sequence, with Step 1's standalone result being C1+C2 met and C3+C4 partially met (Level 1 zero-copy at the decode boundary; Level 2 still not at the compositor boundary). ===== 6. Risks and mitigations ===== - R1 — Multiplanar port takes longer than 8 weeks. V4L2 stateless API + request-API + hantro-specific control set is intricate. Mitigation: scope to H.264 only initially. HEVC is moot (RK3566 hantro has no HEVC HW). VP8 / VP9 / AV1 follow only if H.264 lands cleanly. Slip trigger (revised post-Phase 5 review Q3): any sub-task in Step 1 produces silent black frames or no decoder output for >3 days — that is the observable early signal of a control-payload mismatch (the most likely failure mode), and it is materially earlier than calendar-slip detection. Calendar slip alone (>3 weeks) is insufficient as a trigger because silent corruption can disguise itself as a build/integration problem for a long time. Surface either trigger to Markus for re-scoping. - R2 — Chromium routes VA-API NativePixmap through Skia GL on Wayland by defaultrealised, not just risked. Step 0 (2026-05-01) found the gating predicate at WaylandConnection::ShouldUseOverlayDelegation() line 495-509 is force-false on KWin because KWin advertises wp_fractional_scale_manager_v1. Step 2 is now in scope unconditionally; see §3 Step 2 above for patch site + shape. Mitigation status: activated. If Step 2 itself looks >2 months (Chromium build cost dominates), reconsider whether to ship Step 1 alone with C1+C2 met and document C3 as still missing. - R3 — hantro's H.264 conformance is incomplete. Some streams (interlaced, certain profile/level combinations, Hi10P) may fail. Mitigation: cross-check against fourier's gst v4l2slh264dec working output on the same clip — that path uses the same kernel driver and is a known-good reference. Use the test corpus from fourier README L319-340 once enumerated. - R4 — KWin's zwp_linux_dmabuf_v1 modifier handling on the NV12 DRM_FORMAT_MOD_LINEAR that hantro produces. Phase 3 Finding 1 already showed all panvk modifiers carry external_only=1; that's a panvk-side property, but KWin's own modifier acceptance for NV12 is independent. Mitigation: cross-check by running gst-launch v4l2slh264dec → waylandsink on today's stack — that path produces the same modifier and is accepted by KWin (the S1 zero-copy reference). If S1 still works, KWin's acceptance is fine for the Step 1 output. - R5 — fourier's libva-v4l2-request local patches were against an older bootlin tree. May not apply cleanly to current upstream. Mitigation: start by rebasing fourier's patches on current upstream as the first sub-task of Step 1. If upstream has moved more than expected, fall back to fourier's snapshot. - R6 — Chromium's VAAPI gating (VaapiVideoDecoder, VaapiIgnoreDriverChecks). The driver-check path inspects the libva driver's reported profile set. fourier already saw vainfo enumerate H.264 profiles successfully with the probe patch; the multiplanar Step 1 should preserve that. Mitigation: after Step 1, re-run vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 to confirm profile enumeration still passes. Then Brave's –enable-features=VaapiVideoDecoder,VaapiIgnoreDriverChecks invocation should engage. ===== 7. Phase 5 hand-over ===== Per ~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md, Phase 5 is second-model review of all Phase 1-4 artefacts. Markus pastes the materials uncurated: * Phase 1 revised * Phase 2 (substrate) * Phase 3 revised * This Phase 4 page * Companion CSVs: metrics.csv, phase3/io_cache_2026-05-01/boundary_counts.csv, phase3/io_cache_2026-05-01/perfstat.csv Specific questions for the second-model reviewer to challenge: - Is fix surface A actually the right pick given Phase 1 revised's use-case priority? In particular: does the reviewer see a path Phase 4 missed where Brave's chain could be lifted without rewriting libva-v4l2-request multiplanar? - Is Step 0's research scope sufficient to commit to or rule out Step 2 with confidence, or does Step 0 itself need a Phase 4-internal sub-plan? - Risk R1 (slip) and R2 (Step 2 needed) — is the mitigation realistic given a single-engineer-with-Claude-assistance capacity? - Test corpus from fourier README L319-340 — is it adequate for declaring Step 1 complete, or should we extend it? ===== 8. Phase 6 (implementation) and Phase 7 (verification) order ===== Phase 6 = “execute Step 0 → Step 1 → conditionally Step 2”. Phase 7 = “Step 3” above. metrics.csv rows phase7_verify_brave_* will hold the binding numbers. Phase 6 is long (weeks-to-months in elapsed wall time, not full-time). Sub-step boundaries inside Phase 6 are Phase-4-internal; no need to re-enter Phase 4 unless a step-level surprise demands re-planning (e.g. Step 0 turns up something that invalidates Step 1's direction). The three loopback edges (Phase 1 revised §5): * C1 ✓ + C2 ✗ + C3 ✓ → flag, investigate. Surfaces a measurement classification issue. * C1 ✓ + C2 ✓ + C3 ✗ → Level-1 fixed, Level-2 missing. This is the expected post-Step-1 state if Step 0 said Step 2 is needed. Re-enter Phase 4 with Step 2 spec'd. * C1 ✗ at Phase 7 → drops still happen. Re-enter Phase 4 with new perf evidence. ===== 9. Deferred / out of scope ===== * Other libva consumers (mpv-via-vaapi, VLC-via-vaapi) — same Step 1 lifts them indirectly. Verification is Brave-only; gains on other libva consumers are documented at Phase 7 but not required for closure. * libavcodec hwaccel consumers (mpv gpu-next, ffplay, VLC qt) — fix surface B from prior Phase 4 enumeration. Separate campaign. * Vulkan-anchored consumers (libplacebo Vulkan backend on Mali-G52). Fix surface C2 (panvk-1.2-fakeshim). Separate campaign. * HEVC, VP8, VP9, AV1. RK3566 hantro has H.264 + MPEG2 + VP8 HW only. AV1 / VP9 / HEVC are SW even after Step 1. Out of scope for this campaign's verification. * The S5 zero-drop regression (Phase 3 revised §6 + §8). Side investigation if pursued. * Other Mali-Bifrost-v7 hardware (G31 / G51 / G76 — same panvk arch, different SBC stacks). Out of scope; Phase 1's “Mali-G52” framing is hardware-specific. * General-purpose Vulkan workloads. Phase 1 revised §6 explicit out-of-scope. SW-emulated mandatory-1.2 entry points in any future panvk-fakeshim are tolerated. ===== 10. References ===== * Phase 1 revised — measurable success criteria. * Phase 2 (substrate) — versions, V4L2 9-fd buffer pool, panvk gates, panfrost modifier surface. * Phase 3 revised — six-contender empirical bucket-attribution + boundary characterisation; the basis for §1's “Brave is libva, not libavcodec” pivot. * Original Phase 4 — superseded by this page; preserved for audit trail. * fourier README L236-281 — prior libva-v4l2-request investigation and partial multiplanar probe patches that form Step 1's starting point. * Bootlin libva-v4l2-request: https://github.com/bootlin/libva-v4l2-request * Local artefact: ~/fourier-test/libva-patches/fourier-local.patch (HEVC-stripped, missing-include fixed, format-enumeration extended for MPLANE). * marfrit-packages parallel: ffmpeg-v4l2-request-git/ is the template for the new libva-v4l2-request-ohm-gl-fix/ package layout. —- Phase 4 ends here. Phase 6 (implementation) begins with Step 0, which produces a small attached decision document on this page. The first pacman -U on ohm marks Phase 6's first deliverable. Phase 7 is the metrics.csv phase7_verify_* row(s).
ohm_gl_fix/phase4_2026-05-01.txt · Last modified: by markus_fritsche