ohm_gl_fix — Phase 5 Second-Model Review, 2026-05-01

Per the dev-process Phase 5 step, the campaign artefacts (Phase 1 revised, Phase 2, Phase 3 revised, Phase 4 from-scratch + companion CSVs) were handed to a fresh Sonnet 4.6 instance with challenging the four Phase 4 §7 hand-over questions as its primary task — not summarising, not balanced verdict. Markus's instruction: “make challenging those four questions its primary task, not optional context.”

The reviewer was instructed to default to “the plan's answer is wrong” and find the strongest case, only conceding if honest effort failed to find a credible angle.

This page records the unredacted reviewer response. Per dev-process, the next step is Markus reviewing and redacting before incorporating findings into a Phase 4 amendment or a fresh Phase 4 iteration.

Q1 — Is fix surface A actually the right pick?

Steelman against

The plan commits to libva-v4l2-request multiplanar rewrite as the primary path because “Brave is libva.” But that conclusion rests entirely on one Phase 3 data point: “libva failed at vaInitialize” for S6 Brave, producing 0 VIDIOC_* in the strace boundary count. The plan interprets this as “Brave tries libva but libva is broken, so fix libva.” The stronger reading of the same evidence is: libva is broken on this device, so Brave has silently fallen back to its non-hardware path already. That path is not libva at all — it's whatever Chromium does when VaapiVideoDecoder fails its capability check. Chromium's fallback on Linux/Wayland for devices without working VAAPI is typically the V4L2VideoDecoder (the stateless V4L2 interface exposed via the //media/gpu/v4l2/ stack, distinct from the libva path). The feature flag UseChromeOSDirectVideoDecoder controls exactly this: on ChromeOS and some Linux builds it bypasses libva entirely and drives V4L2 stateless directly from the GPU process. If Brave's packaged Chromium (M138-class on Arch ARM) has this path compiled in and enabled, then the right fix surface is not libva-v4l2-request at all — it is either (a) verifying that Chromium's built-in V4L2VideoDecoder is capable of addressing /dev/video1 on RK3566, or (b) enabling the correct feature flags to activate it. This would completely bypass libva and make the 4-8 week multiplanar rewrite irrelevant to Brave's actual path.

What would have to be true

Brave's Chromium build (packaged in Arch Linux ARM's brave-bin) must have //media/gpu/v4l2/v4l2_video_decode_accelerator.cc or //media/gpu/v4l2/v4l2_stateless_video_decoder.cc compiled and linked, and the UseChromeOSDirectVideoDecoder or V4L2FlatStatelessVideoDecoder flag must be reachable via command-line feature flags. A secondary condition: the RK3566 V4L2 stateless interface must be enumerable as a /dev/videoN node with the correct capabilities from Chromium's GPU sandbox.

Cheapest test

strings /usr/bin/brave | grep -i “v4l2stateless\|V4L2VideoDecoder\|DirectVideoDecoder\|v4l2_video_decode” takes 10 seconds and immediately shows whether these code paths exist in the binary. Follow with brave –enable-features=UseChromeOSDirectVideoDecoder,V4L2FlatStatelessVideoDecoder –use-gl=egl –ozone-platform=wayland –enable-logging –v=1 2>&1 | grep -i “v4l2\|vaapi\|decoder” under a dummy video page to see which decoder the GPU process selects. If V4L2VideoDecoder initialises and queries /dev/video1 successfully, fix surface A is wrong.

Recommendation

Phase 4 Step 0 must include this check as its first sub-task, before any Chromium source archaeology. If the direct V4L2 path is present and activatable, Phase 4 should be revised to pivot to fix surface “A-prime”: characterise and fix whatever is preventing Chromium's built-in V4L2VideoDecoder from producing dmabufs that flow to zwp_linux_dmabuf_v1 on Wayland. The libva-v4l2-request multiplanar port would still be correct for the non-Brave libva consumers (if any are in scope), but the framing of fix surface A as “the path Brave actually uses” is not confirmed by evidence — it is inferred from a failure mode.

Q2 — Is Step 0's research scope sufficient?

Steelman against

The plan's empirical synthesis test is broken by design. It proposes to “coax Chromium into the dmabuf-overlay path using a different content source — e.g. WebGL canvas, or a video element with software decode.” The problem: software decode in Chromium produces decoded YUV frames in renderer-process shared memory (or in a GPU-process staging buffer), not in a V4L2 capture dmabuf. When this YUV reaches the compositor, it goes through a GL texture upload — not through zwp_linux_dmabuf_v1. So a positive result from this test (SW decode video element → Wayland subsurface) would not actually demonstrate that Chromium's compositor is capable of taking a libva-produced NativePixmap and routing it as a Wayland dmabuf. The two paths diverge at the point where the frame buffer type (dmabuf-backed NativePixmap vs. shared-memory-backed SHMBuffer) is determined, which is upstream of any compositor routing logic. A negative result is equally uninformative: if SW decode video doesn't get a Wayland subsurface, that tells you nothing about whether a working VA-API NativePixmap would. The synthesis test as written cannot distinguish between “Chromium supports dmabuf-subsurface routing but only for NativePixmaps, not for GL textures” and “Chromium never uses dmabuf subsurfaces on this platform.” If the test gives the wrong answer and Step 0 concludes “Step 2 not needed,” Step 1 ships, vainfo passes, Brave gets a working libva path, but video still goes through Skia GL and C3 fails at Phase 7 — with 8 weeks of Step 1 work already spent.

What would have to be true

Chromium's Wayland compositor routing decision must be made at or after the point where buffer type is known. If Chromium decides whether to use a Wayland overlay/subsurface based on the buffer descriptor type (dmabuf vs. SHMBuffer vs. GL texture), then no SW-decode or WebGL proxy can simulate a hardware-decode NativePixmap path. The relevant decision code is in ui/ozone/platform/wayland/host/wayland_buffer_manager_host.cc and gpu/command_buffer/service/shared_image/ — the exact question is whether the subsurface path is gated on GpuMemoryBufferType == NATIVE_PIXMAP or on some other condition.

Cheapest test

Source archaeology in the specified files is already in Step 0, but the synthesis test should be replaced. A better Step 0 empirical test: install a known-working libva backend (e.g. a stub libva driver that reports success and returns a valid NativePixmap backed by a linear dma-heap allocation, without needing hantro to work), run Brave with LIBVA_DRIVER_NAME pointing to the stub, and observe whether the GPU process calls PRIME_FD_TO_HANDLE or passes SCM_RIGHTS via the Wayland socket. This isolates the compositor routing question from the decode question. Alternatively: check if gst-play (S5), which does produce a dmabuf (22 PRIME_FD_TO_HANDLE observed), is using Chromium's compositor at all — it is not, so that comparison is also uninformative. The best zero-cost proxy is reading the Chromium source for the VaapiPicture::DownloadFromTexture / VaapiPictureNativePixmapOzone code path and following NativePixmap → GpuMemoryBuffer → SharedImageBacking → wayland_buffer_manager_host to determine statically whether a zwp_linux_dmabuf_v1 buffer is created.

Recommendation

Delete the synthesis test as written from Step 0. Replace it with: (1) static source trace of NativePixmap → Wayland buffer creation in Chromium M138-class, and (2) a stub libva driver test if static analysis is inconclusive. Explicitly add this to Step 0's output gate: the decision document must state which code path in Chromium creates the Wayland buffer for a VA-API NativePixmap, with a source file and line number cited. “Empirical synthesis with SW decode” should be removed from the plan to prevent a false confidence outcome.

Q3 — Risk register and single-engineer capacity

Steelman against

The plan's work decomposition (v4l2.c → context.c → picture.c → h264.c) reads like a refactoring list ordered by file, not by discovery cost. The actual cost structure for this kind of V4L2 stateless driver work is front-loaded by a large and poorly-documented discovery phase that the plan treats as implicit background. Specifically: the hantro kernel driver's request-API control payload format is not fully specified in the V4L2 UAPI headers alone. The V4L2_CID_STATELESS_H264_* controls have known quirks on Hantro (the G1/G2 decoder variants in RK3566) — in particular, the field ordering in V4L2_CID_STATELESS_H264_DECODE_PARAMS, the handling of reference frame lists in multi-slice frames, and the interaction between VIDIOC_STREAMON sequencing and request fd lifecycle. These are not documented; they are known only by reading the kernel driver source (drivers/media/platform/rockchip/vpu/rockchip_vpu_hw_h264_dec.c and the underlying hantro driver files) and cross-referencing against working GStreamer V4L2SLH264Dec source code. The plan assumes fourier's local patches generalise; but fourier's patches were validated only against the GStreamer codepath's buffer management model, which differs from libva's. The MPLANE conversion in src/v4l2.c is probably not the hardest part — the hardest part is getting the request-API control structure layout exactly right for H.264 on this specific SoC. A single engineer who hits a wrong payload format will spend days on silent decoder failures (no error returned, black frames) before identifying the control struct mismatch. The risk register lists this as R3 with mitigation “cross-check against fourier's gst v4l2slh264dec working output,” but that cross-check is non-trivial: it requires instrumenting the GStreamer pipeline at the kernel driver level, which is its own 2-3 day subtask.

What would have to be true for the mitigation to be inadequate

The fourier patches' buffer layout assumptions must differ from what libva's VaapiPicture allocation model produces. This is likely if fourier patched the GStreamer path (which uses GstV4l2BufferPool with its own MPLANE mapping) and not the libva path. The kernel driver's tolerance for control payload mismatches must be low — i.e., it silently produces garbage frames rather than returning EINVAL. The “R1 slip >3 weeks → surface to Markus” trigger is the mitigation, but it assumes Markus can detect the slip. Silent black-frame failures in early Step 1 work could look like a build/integration problem rather than a slip, and 3 weeks could pass before the root cause is identified as a control payload format issue.

Cheapest test / evidence

Before Step 1 begins, add one explicit pre-step: run strace on GStreamer's v4l2slh264dec pipeline and dump the exact VIDIOC_S_EXT_CTRLS call payloads for a single I-frame and a single P-frame decode. Compare the control structure layout byte-for-byte against the V4L2_CID_STATELESS_H264_* structs in the kernel headers. This is a 1-day task and will either confirm the UAPI is clean or expose driver-specific quirks upfront. The result directly sets the lower bound on Step 1 effort.

Recommendation

Add an explicit Step 0.5 (1-2 days) between Step 0 and Step 1: “Kernel UAPI surface audit.” Extract actual VIDIOC_S_EXT_CTRLS payloads from S1's GStreamer pipeline via strace or ftrace. Document the control structure layout the hantro driver actually consumes. Only then begin Step 1 src/v4l2.c work. Revise R1 mitigation: the slip trigger should be “any sub-task in Step 1 produces silent black frames or no output for >3 days” rather than “calendar slip >3 weeks,” because the former is the observable early signal of the specific failure mode most likely to cause the slip.

Q4 — Test corpus adequacy for Step 1 complete gate

Steelman against

The proposed corpus (bbb_1080p30_h264.mp4 + vainfo + fourier README L319-340 failure cases) is almost certainly insufficient as a “Step 1 complete” gate for the stated goal of handling YouTube/HTML5 video in Brave. Big Buck Bunny at 1080p30 is a synthetically clean encode: progressive, 4:2:0, High Profile Level 4.0, AVCC bitstream as packaged in MP4, no field coding, well-behaved GOP structure, no scaling lists beyond the default, no leading B-frames in awkward positions. It represents perhaps 30% of what YouTube actually delivers. The other 70% includes:

Baseline Profile streams from older devices/uploads — these use different slice types and lack the cabac entropy coding that H.264 High Profile assumes.
Constrained Baseline and Main Profile — different deblocking filter semantics.
MBAFF (macroblock-adaptive frame-field) — interlaced-within-progressive encoding used by some broadcast-originated content on YouTube. This is the single most likely case where the hantro H.264 decoder either silently produces wrong output or returns an error, because MBAFF changes the reference frame addressing model significantly.
Long GOPs with no IDR for 10+ seconds — tests the reference frame list management in the stateless API. V4L2_CID_STATELESS_H264_DECODE_PARAMS includes dpb[] reference picture list; getting the dpb entries wrong in long-GOP content produces corruption only after ~1s, not immediately.
Slices not aligned to tile boundaries — some encoders produce slices that split mid-row. The stateless API's slice handling for these is implementation-defined per driver.
Annex B bitstream inside an MP4 container — some content has in-band parameter sets using the Annex B start-code format even inside MP4 (technically non-conformant but common). libva typically expects AVCC; Annex B would cause a parse failure in src/h264.c’s SPS/PPS extraction.
4:2:2 or 4:4:4 chroma — High 4:2:2 and High 4:4:4 Predictive profiles exist on YouTube for some content. hantro likely does not support these, but the failure mode (rejected by V4L2 format negotiation vs. silent corruption) should be characterised.

The fourier README L319-340 “failure cases” are not available in the provided materials — that reference is opaque in the plan. If those lines cover only the specific bugs fourier encountered (likely HEVC-related, given Phase 4 notes “HEVC stripped”), they will not cover the H.264 edge cases above.

What would have to be true for the corpus to be adequate

YouTube's H.264 stream characteristics must be bounded to progressive, High Profile, AVCC, moderate GOP length, and single-slice-per-frame structure. That is not true — YouTube encodes adaptively across devices and upload source quality.

Cheapest test

Download 5 YouTube videos spanning: (1) a recent 1080p upload, (2) an older SD-resolution upload from a mobile device, (3) a broadcast news clip, (4) a gaming stream, (5) a user-uploaded 4K downscaled to 1080p. Run mediainfo –Inform=“Video;%Format_Profile%,%Height%,%ScanType%,%Format_Settings_GOP%” on each. This takes 20 minutes and will likely surface at least two profile/scan-type combinations not covered by bbb.

Recommendation

Extend the test corpus to include: one Baseline Profile clip, one Main Profile clip, one MBAFF / interlaced-flagged clip (any broadcast-sourced upload), one long-GOP clip (GOP > 60 frames), and one clip with explicit scaling list. These can be sourced from publicly available ITU H.264 conformance bitstreams (the JVT conformance suite is freely downloadable and covers exactly these edge cases). Add a requirement that vainfo must report the decoder capable of the relevant profile/level for each clip before that clip is considered a pass/fail gate. Add “MBAFF decode produces correct output or returns explicit ENOTSUP” as an explicit criterion, because silent corruption on MBAFF is worse than a clean reject.

Reviewer metadata

Model: Sonnet 4.6, dispatched 2026-05-01 from this campaign's Claude Code session.
Brief shape: four challenge questions presented as primary task, with bias toward “the plan's answer is wrong” unless honest effort failed. Materials (Phase 1-4 + CSVs) provided as background, not as the task.
Token usage: 19945 total, 0 tool calls (review was pure-text reasoning over the pasted artefacts).

Action items surfaced

The review surfaces four pre-Phase-6 actions that, if taken, could materially redirect Phase 4. Markus's call which to enact:

Q1 cheapest test — runs in 10 seconds: strings /usr/bin/brave | grep -i “v4l2stateless\|V4L2VideoDecoder\|DirectVideoDecoder\|v4l2_video_decode”. If positive, the entire 4-8 week libva multiplanar rewrite may be the wrong direction.
Q2 fix: drop the SW-decode synthesis test from Step 0; replace with static Chromium source trace of NativePixmap → Wayland buffer creation, plus optional stub libva driver if static analysis inconclusive.
Q3 fix: add Step 0.5 (1-2 days) — VIDIOC_S_EXT_CTRLS payload capture from working GStreamer pipeline. Revise R1 slip trigger from “calendar slip >3 weeks” to “silent black frames >3 days.”
Q4 fix: extend test corpus to include Baseline / Main / MBAFF / long-GOP / scaling-list cases, sourced from JVT H.264 conformance suite.

Q1 is the structural challenge — it could obviate the whole plan. Q2-Q4 sharpen the plan if Q1's cheap test confirms direction A is right.

Phase 5 ends here. Markus reviews and redacts; the next step is either a Phase 4 amendment incorporating the surfaced action items (and possibly pivoting away from libva-v4l2-request if Q1's cheap test fires) or a fresh Phase 4 iteration. Phase 6 implementation does not begin until Phase 4 is locked.

Table of Contents

ohm_gl_fix — Phase 5 Second-Model Review, 2026-05-01

Q1 — Is fix surface A actually the right pick?

Steelman against

What would have to be true

Cheapest test

Recommendation

Q2 — Is Step 0's research scope sufficient?

Steelman against

What would have to be true

Cheapest test

Recommendation

Q3 — Risk register and single-engineer capacity

Steelman against

What would have to be true for the mitigation to be inadequate

Cheapest test / evidence

Recommendation

Q4 — Test corpus adequacy for Step 1 complete gate

Steelman against

What would have to be true for the corpus to be adequate

Cheapest test

Recommendation

Reviewer metadata

Action items surfaced