This is an old revision of the document!
Table of Contents
fourier — Phase 5 review handover, 2026-04-30
Second-model review artifact. Goal/situation/measurements/plan; uncurated. Reviewer reads top-to-bottom, redacts as needed, then hands to a fresh model instance. Index: fourier.
Origin: prior Phase 1 metric (“max FPS / decode rate”) was invalidated by ad-hoc Phase 3 measurements collected 2026-04-24 on ohm. Loop edge 3→1 fired retroactively. This page is the re-entry at Phase 2 with the existing data treated as Phase 3 output, plus a fresh reproduction on the danctnix/archarm baseline.
Phase 1 — Goal (revised)
Move ohm/RK3566 to “1080p30 YouTube playback at <30% total CPU, zero dropped frames over a 60 s window” by way of the chromium-fourier package. Same gate Markus already wrote into the README; this page is the metric-definition for the next iteration of the loop.
Why the metric changed
Original: “max H.264 decode FPS.” Phase 3 showed that decode-FPS is not the binding constraint. ohm's hantro VPU caps at ~95 fps for 1080p H.264. Source rate (24 fps) is reached at 14 % CPU when the decoder output is discarded (drm_prime + null muxer). The cost shows up downstream of the decoder, in whatever consumer takes the frame.
Replacement — variant (C)
Two bands measured on every run:
- Coarse band (ship-gate). Per-clip realtime run, fixed length,
fixed clip, fixed framerate target. Capture: time-averaged CPU%
over the run, decoder + sink drops out of total presented frames, and a path-discipline note (which surface negotiated; dmabuf-direct or GL-imported; what was actually established at both ends). "Hard facts > somewhat hard facts" — average CPU% is the hard fact, not a peak-watching estimate. * **Latency band** (diagnosis). Per-frame distribution over the chain compressed-bytes-arrival → presented-pixel-on-display, with sub-bands queue-fill / decode / fence-publish / dmabuf-export / GL-import (when applicable) / composite / scanout. Used when (B) fails and we don't know which sub-band ate the latency.
Surface-variant clarification
The metric does not require a GLES-textured terminus. The winning
ohm path (gst → waylandsink) terminates at a VOP2 scanout plane
without ever touching a GL texture. Phase 1 target = “presented pixel
on display.” Each measurement records its surface variant
(dmabuf-scanout, GLES-imported) so latency/CPU comparisons
across paths stay honest.
Phase 2 — Situation
Test target
ohm. PineTab2, RK3566, hantro-vpu on /dev/video1. Mali-G52
panfrost. Kernel 6.19.10-danctnix1-1-pinetab2. Plasma Wayland
session running on DSI-1, KWin compositor. Wayland socket
/run/user/1001/wayland-0.
Stack (after revert; this is the danctnix/archarm baseline)
ffmpeg-v4l2-request 2:8.1-3from danctnix. Listsv4l2request
in -hwaccels. Built from same Kwiboo branch the marfrit
stripped variant tracks. * ''mpv 0.41.0''. ''gst-launch-1.0 1.28.2''. * ''libva 2.23.0'', ''libva-utils 2.22.0'', no v4l2_request VA driver installed (the bootlin meson-stranded ''.so'' was removed). * ''[marfrit]'' commented out in ''/etc/pacman.conf'' for the duration of this iteration.
Pre-revert installed from [marfrit]:
ffmpeg-v4l2-request-git 2:8.1.r123329.b57fbbe-1— replaced by
danctnix's ffmpeg-v4l2-request 2:8.1-3.
brave-bin 1:1.89.143-1(AUR, installed 2026-04-24 for the
Brave-HW-decode probe) — removed.
- Stranded
/usr/lib/dri/v4l2_request_drv_video.so(266 KB,
no package owner) from a meson-direct install of bootlin
''libva-v4l2-request-git'' — removed.
Snapshots before/after at /tmp/ohm-pre-revert-pacman-Q.txt,
/tmp/ohm-post-revert-pacman-Q.txt, /tmp/ohm-pacman.conf.pre-revert.
Constraints already known
- Producer-side
dma_resvis empty on V4L2 stateless decoders.
KWin's watchDmaBuf fence wait would either spin or skip.
Mitigated downstream by the kwin-fourier 0001-skip-wait patch
(shipped); kernel-side fix in flight as the vb2-dma-resv RFC
(v1 sent, two reviews received, v2 sketch backlogged — see
[[fourier:vb2_dma_resv_rfc]]).
* GL/EGL upload from a V4L2 dmabuf is the bottleneck on Mali-class
silicon. Quantified twice on ohm prior to this iteration (mpv
SW gpu-next: 127 % CPU, 973 drops; mpv HW v4l2request gpu-next:
122 % CPU, 930 drops). Same shape under stock danctnix today
(R6 below).
* Browsers cannot use ''waylandsink''. Firefox renders into its
own GLES viewport; Chromium uses Skia/SkiaGanesh on GLES. The
browser path is structurally GL-bound; the dmabuf-direct
win does not transfer.
* ''mpv --vo=dmabuf-wayland'' fails format negotiation on
''yuv420p → drm_prime''. Hwupload filter rejects the output
format ("hardware format not supported"). Re-confirmed in R7.
What will not be touched in this iteration
- fresnel (RK3399), ampere/boltzmann (RK3588): out of scope.
- Kernel side. The vb2-dma-resv v2 sketch is backlogged separately.
- Firefox. The campaign-shipped firefox-fourier 150.0.1-16 result
on fresnel (RDD 78 % → 5 %) stands; reproducing it on ohm is
interesting but is path (3) in the Phase 4 ranking, not (1).
Phase 3 — Baseline measurements
Corpus clip: ~/fourier-test/bbb_1080p30_h264.mp4, SHA-16
dcf8a7170fbd49bb. Full Big Buck Bunny (725 MB, 9:56, 24 fps source
rate). All paced runs cap at 60 s of source via -t 60 /
–end=60 / timeout 62 -INT.
CPU% column is GNU /usr/bin/time -v “Percent of CPU this job
got” — a time-average over the whole run, not a peak.
2026-04-30 reproduction (stock danctnix/archarm)
| Run | Path | CPU% | Wall | Drops/Total | Path discipline |
|---|---|---|---|---|---|
| R1 | ffmpeg -re -hwaccel none -f null - (SW) | 89 % | 1:00.21 | n/a | no decode hwaccel |
| R2 | ffmpeg -re -hwaccel v4l2request -f null - (no drm_prime) | 64 % | 1:00.25 | n/a | hwaccel engaged + CPU readback |
| R3 | ffmpeg -re -hwaccel v4l2request -hwaccel_output_format drm_prime -f null - | 15 % | 1:00.23 | n/a | hantro-vpu engaged on S264; drm_prime export only; no consumer |
| R4 | gst v4l2slh264dec ! fakesink sync=false (unpaced, 1800 frames) | 89 % | 0:45.11 | n/a | decoder emits video/x-raw(memory:DMABuf), format=DMA_DRM, drm-format=NV12; fakesink discards |
| R5 | gst v4l2slh264dec ! waylandsink sync=true (paced, 60 s wall) | 8 % | 1:02.27 | 0 | waylandsink advertises DMA_DRM caps; linux-dmabuf-v1 negotiated; KWin → VOP2 plane; no GL upload |
| R6 | mpv –hwdec=v4l2request –vo=gpu-next | 138 % | 1:02.63 | 1039 / 1440 (72 %) | mpv selects h264-v4l2request; gpu-next loads drmprime hwdec driver; drm_prime → GLES import → libplacebo composite; bottleneck |
| R7 | mpv –hwdec=v4l2request –vo=dmabuf-wayland | 17 % | 1:01.35 | broken | hwupload filter: hardware format not supported; audio plays, no video |
Logs: ~/fourier-test/baseline-runs/20260430-1145/ on ohm
(stdouts, stderrs, mpv terminal captures, GST_DEBUG traces).
Prior Phase 3 dataset (README dossier, marfrit-modified ohm)
For comparison; ad-hoc, kernel same, ffmpeg-v4l2-request-git (marfrit Kwiboo build) instead of danctnix ffmpeg-v4l2-request:
| Run | Path | CPU% (prior) | Drops (prior) |
|---|---|---|---|
| R1 | SW realtime | 90 % | n/a |
| R2 | HW no drm_prime | 67 % | n/a |
| R3 | HW + drm_prime | 14 % | n/a |
| R4 | gst → fakesink | 89 % | n/a (1800 / 48.9 s = 36.8 fps unpaced) |
| R5 | gst → waylandsink | 6–7 % | 0 (1488 / 62 s, progressreport 1:1) |
| R6 | mpv gpu-next | 122 % | 930 / 1440 (65 %) |
| R7 | mpv dmabuf-wayland | n/a | format-negotiation failure |
Reproduction commentary
- R1–R5 reproduce within 1–3 percentage points / single-frame
counts. The danctnix → marfrit swap on ffmpeg is not visible
in the data — same Kwiboo branch, different build flags. * R6 worsened from 122 % / 930 to 138 % / 1039. Same shape; delta plausibly compositor / clip-content variance. The qualitative finding holds: GL composition is the bottleneck, decode is essentially free, ~70 % of frames don't reach the panel. * R7's failure mode is identical.
Path-optimization notes (per Markus, 2026-04-30)
Definition: “zero copy if possible, no double verification of buffers if already established at both ends.” Captured per run:
- R3: drm_prime export-only, no consumer. Zero-copy on the
producer side; no second-end exists.
- R5: caps negotiation runs once at preroll
(video/x-raw(memory:DMABuf), format=DMA_DRM, drm-format=NV12
on both ends); per-frame the dmabuf fd flows decoder → KWin → VOP2 with no caps re-validation. * R6: ''drm_prime → EGLImage → GL texture'' import per frame. Each frame allocates a fresh EGLImage. This is the per-frame cost the metric needed; it's not a buffer-double-verification problem, it's a fresh-import-per-frame problem. * R7: format-negotiation fails, never reaches per-frame phase.
Phase 4 — Plan
Sub-target
ohm + chromium-fourier + H.264 + GL surface (the browser-side analogue of R6, but inside Chromium's V4L2VideoDecoder + Skia/SkiaGanesh GL composition). Resumes task #46. No fleet broadening this round.
What will be touched
- data:CT220 chromium-builder. Unblock the
gnconfigure check
that rejects empty clang_revision. Standard Arch chromium
PKGBUILD patches solve this; CachyOS chromium-cachy fork has the equivalent. Result: a built ''chromium-fourier'' aarch64 package landing on packages.reauktion.de. * ohm: install ''chromium-fourier'' from ''[marfrit]'' (re-enabled only after the Phase 3 baseline has been frozen on this page). Run a single 60 s YouTube-equivalent clip-playback measurement in the same shape as R5/R6. * No kernel changes. No userspace patches beyond the build unblock.
What will not be touched
- fresnel, ampere, boltzmann.
- Firefox (firefox-fourier already shipped, RDD result valid).
- Kernel (vb2-dma-resv v2 backlogged independently).
- mpv
dmabuf-waylanddiagnosis (path 2 in the earlier ranking;
deferred).
- libva-v4l2-request multiplanar port. Browser path will use
Chromium's V4L2VideoDecoder, not VA-API.
Expected outcome (Phase 7 will compare against this)
Two predictions, in the same units as Phase 3 R5 + R6:
chromium-fourieron ohm, 1080p30 H.264 corpus clip, 60 s wall:
avg CPU% in the renderer + GPU process combined, 80–150 %.
Drops 200–800 of 1440 (14–56 %). Path: V4L2VideoDecoder → drm_prime → Chromium ''GLImage'' GLES texture → Skia composite. Bottleneck location: the GL import, same shape as R6. - Stretch: with ''--enable-features=VaapiVideoDecodeLinuxGL'' or equivalent dmabuf-direct compositing flag (the LinuxOzone overlay path), CPU% drops to 20–40 % and drop-count to single digits. This is the chromium-fourier-specific carrier patch bet. If it works, ohm clears the ship-gate.
If both numbers come back inside their ranges, Phase 7 closes the loop. If they're outside (especially if (1) is much worse), the plan re-enters Phase 4 with the diagnosis from the latency band.
Open Phase 4 questions surfaced for review
- Build unblock approach. Patch
gnto skip
clang_revision check (lighter), or do a full toolchain
swap to ''clang_base_path="/usr"'' + ''compiler-rt-adjust-paths''
(heavier, what Arch chromium PKGBUILD does). Either is a one-line
PKGBUILD change for the gn-side; the second has more long-term
upgrade implications.
- **Latency-band instrumentation depth.** ''perf trace -e syscalls''
+ KMS vblank timestamps (''/sys/kernel/debug/dri/0/state'')
+ Chromium's UMA media-pipeline traces (if exposed in this
build) is the proposal. Do we add ''chrome://tracing'' captures
on top, or is that overkill for one sub-target?
- **Test corpus extension.** Stay on the bbb 1080p H.264 clip,
or add a real YouTube DASH H.264 segment (different keyframe
cadence, different bitrate envelope) for the browser-relevant
case? Real YouTube would test the fragmented-MP4 path; bbb
tests the static-MP4 path.
- **Acceptance for "stretch" outcome.** Is a Chromium build flag
sufficient as a campaign artifact, or do we want a carrier
patch in chromium-fourier that flips the default?
Phase 5 routing
This page is the second-model review handover. Reviewer assesses:
- Whether the (C) metric is right, or whether the latency band
needs a different definition.
- Whether path (1) is the right narrowing or whether path (2)
or (3) buys more information first.
- Whether the Phase 4 prediction range is too wide or too narrow.
- What's missing.
After redaction, Markus hands to a fresh model instance.
Source artefacts
- Run logs (ohm):
~/fourier-test/baseline-runs/20260430-1145/ - Pre/post-revert snapshots (ohm):
/tmp/ohm-pre-revert-pacman-Q.txt,
/tmp/ohm-post-revert-pacman-Q.txt,
''/tmp/ohm-pacman.conf.pre-revert'' * Project-state memory: ''project_fourier_campaign_state.md'' * Process spec: ''feedback_dev_process.md'' (the 8-phase loop) * Existing campaign README: ''~/src/fourier/README.md'' * Chromium-fourier workspace: ''marfrit-packages/arch/chromium-fourier/''
