User Tools

Site Tools


ohm_gl_fix:phase4_2026-05-01

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ohm_gl_fix:phase4_2026-05-01 [2026/05/01 13:08] – rewrap paragraphs (DokuWiki single-newline fix) markus_fritscheohm_gl_fix:phase4_2026-05-01 [2026/05/01 18:00] (current) – Step 0 finding folded in: Step 2 confirmed needed (KWin advertises wp_fractional_scale_manager_v1 → ShouldUseOverlayDelegation forced false) markus_fritsche
Line 4: Line 4:
  
 The driver of this rewrite: Phase 1 was refined on 2026-05-01 with machine-readable criteria ([[ohm_gl_fix:phase1_revised_2026-05-01|Phase 1 revised]] §4 — C1 drops, C2 LLC-load-misses, C3 DRM_IOCTL/sec, C4 boundary fd-passing) and Phase 3 was rebuilt on the same day with empirically-grounded boundary characterisation ([[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] §3, §4). With both anchors in place, Phase 4 can commit. The driver of this rewrite: Phase 1 was refined on 2026-05-01 with machine-readable criteria ([[ohm_gl_fix:phase1_revised_2026-05-01|Phase 1 revised]] §4 — C1 drops, C2 LLC-load-misses, C3 DRM_IOCTL/sec, C4 boundary fd-passing) and Phase 3 was rebuilt on the same day with empirically-grounded boundary characterisation ([[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] §3, §4). With both anchors in place, Phase 4 can commit.
 +
 +> **2026-05-01 amendment** (post-[[ohm_gl_fix:phase5_review_2026-05-01|Phase 5 review]]): Q1 (Brave's V4L2VideoDecoder reachability) closed by the ''strings /opt/brave-bin/brave'' deep-dive. ''UseChromeOSDirectVideoDecoder'' / ''V4L2FlatStatelessVideoDecoder'' / ''V4L2StatelessVideoDecoder'' / ''V4L2H264Decoder'' all return **0 matches** in this Brave build (Arch Linux ARM brave-bin, 2026-04-30). The single ''V4L2VideoDecoder'' string match is vestigial; all actual V4L2 source-line strings in the binary are camera-capture (''v4l2_capture_delegate.cc'', ''libtegrav4l2.so''), not video-decode. The V4L2 direct-decode path is **not compiled in** for this build, so fix surface A (libva-v4l2-request multiplanar) stands. Q2 (Step 0 methodology fix) and Q3 (Step 0.5 kernel UAPI surface audit + R1 trigger revision) are folded into §3 and §6 below. Q4 (test corpus extension) lives in [[ohm_gl_fix:phase4_step1_test_corpus_2026-05-01]].
 +
 +> **2026-05-01 Step 0 finding** (Phase 6 dipping): Step 2 is **confirmed needed**, not conditional. Chromium M138's overlay-delegation gate at ''ui/ozone/platform/wayland/host/wayland_connection.cc'' ''ShouldUseOverlayDelegation()'' lines 495-509 includes the predicate ''!fractional_scale_manager_v1()'' — KWin advertises ''wp_fractional_scale_manager_v1'' (verified empirically in mpv's verbose log), so the predicate returns false unconditionally on KWin Wayland regardless of feature flag. Step 2 patch site is now named with file:line. Step 0 details: [[ohm_gl_fix:phase6_step0_chromium_wayland_routing_2026-05-01|phase6/step0_chromium_wayland_routing_2026-05-01]] (companion: [[ohm_gl_fix:phase6_step0_5_uapi_audit_2026-05-01|phase6/step0_5_uapi_audit_2026-05-01]]).
  
 ===== 1. What this Phase 4 is targeting ===== ===== 1. What this Phase 4 is targeting =====
Line 46: Line 50:
     * ''media/gpu/vaapi/'' — VA-API surface to native-pixmap conversion.     * ''media/gpu/vaapi/'' — VA-API surface to native-pixmap conversion.
     * ''gpu/ipc/service/gpu_video_decode_accelerator_helpers.cc'' — dmabuf flow from decoder to compositor.     * ''gpu/ipc/service/gpu_video_decode_accelerator_helpers.cc'' — dmabuf flow from decoder to compositor.
-  - **Empirical synthesis test:** with current Brave (libva broken), can we coax Chromium into the dmabuf-overlay path using a different content source — e.g. WebGL canvasor video element with software decode where the decoded YUV is uploaded once to a GL texture and we observe whether composite uses the texture via Wayland subsurface or via Skia main-surface compositing? Look at ''DRM_IOCTL_*'' rate and ''SCM_RIGHTS'' fd-passing on the GPU process (already instrumented in [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] §3).+  - **Static source trace** (replaces the SW-decode synthesis test that was here in the pre-Phase-5 draft — Phase 5 reviewer flagged it as broken-by-design: SW-decode produces shmem buffers not ''NativePixmap'' dmabufsso the test cannot validate whether hardware-decode ''NativePixmap'' would be routed via ''zwp_linux_dmabuf_v1''). Trace the path ''VaapiPicture / VaapiPictureNativePixmapOzone → NativePixmap → GpuMemoryBuffer → SharedImageBacking → wayland_buffer_manager_host'' in Chromium M138-class. Determine **statically** whether the subsurface path is gated on ''GpuMemoryBufferType == NATIVE_PIXMAP'' or on some other condition. Cite source ''file:line'' in the decision document. 
 +  **Stub libva driver test (optional, only if static analysis is inconclusive).** Build a stub libva backend that returns a valid ''NativePixmap'' backed by a linear dma-heap allocation (no hantro needed). Run Brave with ''LIBVA_DRIVER_NAME'' pointing at the stub. Observe whether the GPU process emits ''PRIME_FD_TO_HANDLE'' or ''SCM_RIGHTS'' on the Wayland socket. This isolates the compositor routing question from the decode question.
   - **Feature flag inventory:** check ''chrome://flags'' and ''--enable-features='' for relevant entries: ''VaapiVideoDecoder'', ''VaapiVideoDecodeLinuxGL'', ''UseChromeOSDirectVideoDecoder'', ''UseDelegatedCompositing'', ''DelegatedCompositingLimitToUi'', ''AcceleratedVideoDecodeLinuxGL'', ''wayland-screen-coordinates'', ''ozone-overlay-priority-hint''.   - **Feature flag inventory:** check ''chrome://flags'' and ''--enable-features='' for relevant entries: ''VaapiVideoDecoder'', ''VaapiVideoDecodeLinuxGL'', ''UseChromeOSDirectVideoDecoder'', ''UseDelegatedCompositing'', ''DelegatedCompositingLimitToUi'', ''AcceleratedVideoDecodeLinuxGL'', ''wayland-screen-coordinates'', ''ozone-overlay-priority-hint''.
  
-**Output gate:** decision document records whether Chromium's GPU process under default flags will route a working VA-API dmabuf to ''zwp_linux_dmabuf_v1'' (Step 2 not needed) or composite via Skia GL (Step 2 needed). The decision document attaches to this Phase 4 page after Step 0 completes.+**Output gate:** decision document records whether Chromium's GPU process under default flags will route a working VA-API dmabuf to ''zwp_linux_dmabuf_v1'' (Step 2 not needed) or composite via Skia GL (Step 2 needed) — **with the source ''file:line''** that creates the Wayland buffer for a VA-API ''NativePixmap'' explicitly cited (per Phase 5 review Q2 output gate). The decision document attaches to this Phase 4 page after Step 0 completes
 + 
 +==== Step 0.5 — Kernel UAPI surface audit ==== 
 + 
 +**Duration:** 1–2 days. **Output:** documented control-structure layout that the hantro driver actually consumes. Inserted post-Phase-5-review per Q3 — the V4L2 stateless request-API control payload format on hantro G1/G2 (RK3566) is poorly documented in UAPI headers alone, and a control-payload mismatch produces silent black-frame failures rather than ''EINVAL''. fourier's local libva-v4l2-request patches were validated against the GStreamer codepath's buffer-management model, not libva's allocation model, so they don't pre-empt the question. 
 + 
 +Concrete sub-tasks: 
 + 
 +  - ''strace -f -e trace=ioctl -e signal=none -o /tmp/gst_h264.strace gst-launch-1.0 -q filesrc location=bbb_1080p30_h264.mp4 \! qtdemux \! h264parse \! v4l2slh264dec \! fakesink''. If strace truncates the embedded payload-data field, fall back to ''ftrace'' tracepoints on ''vidioc_*'' for fuller capture. 
 +  - Extract the exact byte payload of ''VIDIOC_S_EXT_CTRLS'' calls for one I-frame and one P-frame. 
 +  - Compare byte-for-byte against the kernel header ''include/uapi/linux/v4l2-controls.h'' ''V4L2_CID_STATELESS_H264_*'' structs (specifically ''V4L2_CID_STATELESS_H264_DECODE_PARAMS'', ''V4L2_CID_STATELESS_H264_SLICE_PARAMS'', ''V4L2_CID_STATELESS_H264_PRED_WEIGHTS'', ''V4L2_CID_STATELESS_H264_SCALING_MATRIX'', ''V4L2_CID_STATELESS_H264_DECODE_MODE'', ''V4L2_CID_STATELESS_H264_START_CODE''). 
 +  - Document the actual hantro driver control-structure layout: field ordering, padding, reference-frame DPB array conventions, ''VIDIOC_STREAMON'' sequencing relative to request fd lifecycle. 
 + 
 +**Output gate:** the documented control-structure layout serves as the per-byte template for Step 1 ''src/picture.c'' / ''src/h264.c'' work. If the layout diverges from kernel-header naive interpretation (highly likely on hantro), Step 1 starts with the actual layout, not the header layout.
  
 ==== Step 1 — libva-v4l2-request multiplanar port ==== ==== Step 1 — libva-v4l2-request multiplanar port ====
Line 73: Line 91:
   - **Package + publish.** PKGBUILD finalised, builds on fermi, pushes to marfrit-packages pacman repo.   - **Package + publish.** PKGBUILD finalised, builds on fermi, pushes to marfrit-packages pacman repo.
  
-==== Step 2 (conditional) — Chromium display-side patch ====+==== Step 2 — Chromium display-side patch (confirmed needed by Step 0 finding 2026-05-01) ==== 
 + 
 +**Status:** Step 0 found that Chromium M138's overlay-delegation system is force-disabled on KWin Wayland by a single predicate. Step 2 is no longer conditional. Trigger met. 
 + 
 +**Patch site:** ''chromium/ui/ozone/platform/wayland/host/wayland_connection.cc'' ''WaylandConnection::ShouldUseOverlayDelegation()'' lines 495-509: 
 + 
 +<code c> 
 +bool WaylandConnection::ShouldUseOverlayDelegation() const { 
 +  bool should_use_overlay_delegation = 
 +      IsWaylandOverlayDelegationEnabled() && !fractional_scale_manager_v1(); 
 +  should_use_overlay_delegation &= !!single_pixel_buffer(); 
 +  return should_use_overlay_delegation; 
 +
 +</code> 
 + 
 +The ''!fractional_scale_manager_v1()'' conjunct is the load-bearing fail. KWin advertises ''wp_fractional_scale_manager_v1''; the predicate is false; overlay delegation is force-disabled regardless of feature flag. 
 + 
 +**Patch shape (recommended — minimal blast radius):** surface-state-gated relaxation. Replace ''!fractional_scale_manager_v1()'' with a check that returns true when the surface's currently-applied scale is integer (1.0, 2.0, etc.). The protocol is allowed to be advertised; we just require the relevant surface isn't *using* fractional scale right now. Preserves correctness when fractional scale IS in fact active for the surface.
  
-**Trigger:** Step 0 finds Chromium does not auto-route VA-API NativePixmaps through ''zwp_linux_dmabuf_v1'' on Wayland under the default feature flags — i.e. it composites via Skia GL and Phase 1 revised's C3 (≤ 100 DRM_IOCTL/seccannot be reached from Step 1 alone.+Two alternative shapes considered and parkeddrop the gate entirely and let Viz `OverlayCandidate` validators reject candidates needing viewport-subpixel destinations (bigger refactor, touches Viz code); add a feature flag bypass (crudest, relies on user to know the trade-off). See [[ohm_gl_fix:phase6_step0_chromium_wayland_routing_2026-05-01|Step 0 doc §"Patch shape"]] for full reasoning.
  
-**Shape (deferred — exact scope set by Step 0):** patch Chromium to route VAAPI NativePixmaps as Wayland subsurfaces for video elements; or enable a feature flag set that does this. Build as ''chromium-ohm-gl-fix'' (or ''brave-ohm-gl-fix''on marfrit-packages.+**Open Step 2 sub-task:** characterise the Viz-side per-buffer filtering (`OverlayCandidate` validation in ''components/viz/service/display/overlay_processor*.cc''that becomes the next-level gate once stage-1 is lifted. Not blocking Step 2 implementation; needed before Phase 7 can predict whether C3 is met by patch alone or also requires a Viz tweak.
  
-If Step 0 finds Step 2 is //not// needed, Phase 4 implementation ends at Step 1 + Step 3.+**Build target:** ''chromium-ohm-gl-fix'' or ''brave-ohm-gl-fix'' on marfrit-packages. ABI-compatible patch (small change to one .cc); no soname change. Substantial build cost (Chromium full rebuild on aarch64 takes hours-to-days; consider building on a beefier ARM host or distcc).
  
 ==== Step 3 — Verification (Phase 7 prep) ==== ==== Step 3 — Verification (Phase 7 prep) ====
Line 125: Line 160:
 ===== 6. Risks and mitigations ===== ===== 6. Risks and mitigations =====
  
-  - **R1 — Multiplanar port takes longer than 8 weeks.** V4L2 stateless API + request-API + hantro-specific control set is intricate. //Mitigation:// scope to H.264 only initially. HEVC is moot (RK3566 hantro has no HEVC HW). VP8 / VP9 / AV1 follow only if H.264 lands cleanly. If a single sub-task slips by >3 weekssurface to Markus for re-scoping. +  - **R1 — Multiplanar port takes longer than 8 weeks.** V4L2 stateless API + request-API + hantro-specific control set is intricate. //Mitigation:// scope to H.264 only initially. HEVC is moot (RK3566 hantro has no HEVC HW). VP8 / VP9 / AV1 follow only if H.264 lands cleanly. **Slip trigger (revised post-Phase 5 review Q3):** any sub-task in Step 1 produces silent black frames or no decoder output for **>3 days** — that is the observable early signal of a control-payload mismatch (the most likely failure mode)and it is materially earlier than calendar-slip detection. Calendar slip alone (>3 weeks) is insufficient as a trigger because silent corruption can disguise itself as a build/integration problem for a long time. Surface either trigger to Markus for re-scoping. 
-  - **R2 — Chromium routes VA-API NativePixmap through Skia GL on Wayland by default** (Step 0 negative finding). //Mitigation:// Step 2 patches Chromium. Engineering cost goes up materially but campaign scope still tractable. If Step 2 itself looks >2 months, reconsider whether to ship Step 1 alone with C1+C2 met and document C3 as still missing.+  - **R2 — Chromium routes VA-API NativePixmap through Skia GL on Wayland by default** — **realised, not just risked.** Step 0 (2026-05-01found the gating predicate at ''WaylandConnection::ShouldUseOverlayDelegation()'' line 495-509 is force-false on KWin because KWin advertises ''wp_fractional_scale_manager_v1''. Step 2 is now in scope unconditionally; see §3 Step 2 above for patch site + shape. //Mitigation status:// activated. If Step 2 itself looks >2 months (Chromium build cost dominates), reconsider whether to ship Step 1 alone with C1+C2 met and document C3 as still missing.
   - **R3 — hantro's H.264 conformance is incomplete.** Some streams (interlaced, certain profile/level combinations, Hi10P) may fail. //Mitigation:// cross-check against fourier's ''gst v4l2slh264dec'' working output on the same clip — that path uses the same kernel driver and is a known-good reference. Use the test corpus from fourier ''README'' L319-340 once enumerated.   - **R3 — hantro's H.264 conformance is incomplete.** Some streams (interlaced, certain profile/level combinations, Hi10P) may fail. //Mitigation:// cross-check against fourier's ''gst v4l2slh264dec'' working output on the same clip — that path uses the same kernel driver and is a known-good reference. Use the test corpus from fourier ''README'' L319-340 once enumerated.
   - **R4 — KWin's ''zwp_linux_dmabuf_v1'' modifier handling on the NV12 ''DRM_FORMAT_MOD_LINEAR'' that hantro produces.** Phase 3 Finding 1 already showed all panvk modifiers carry ''external_only=1''; that's a panvk-side property, but KWin's own modifier acceptance for NV12 is independent. //Mitigation:// cross-check by running ''gst-launch v4l2slh264dec → waylandsink'' on today's stack — that path produces the same modifier and is accepted by KWin (the S1 zero-copy reference). If S1 still works, KWin's acceptance is fine for the Step 1 output.   - **R4 — KWin's ''zwp_linux_dmabuf_v1'' modifier handling on the NV12 ''DRM_FORMAT_MOD_LINEAR'' that hantro produces.** Phase 3 Finding 1 already showed all panvk modifiers carry ''external_only=1''; that's a panvk-side property, but KWin's own modifier acceptance for NV12 is independent. //Mitigation:// cross-check by running ''gst-launch v4l2slh264dec → waylandsink'' on today's stack — that path produces the same modifier and is accepted by KWin (the S1 zero-copy reference). If S1 still works, KWin's acceptance is fine for the Step 1 output.
ohm_gl_fix/phase4_2026-05-01.1777640911.txt.gz · Last modified: by markus_fritsche