ohm_gl_fix:phase1_revised_2026-05-01
Differences
This shows you the differences between two versions of the page.
| ohm_gl_fix:phase1_revised_2026-05-01 [2026/05/01 11:23] – Phase 1 revised — measurable success criteria across in-scope use cases markus_fritsche | ohm_gl_fix:phase1_revised_2026-05-01 [2026/05/01 13:08] (current) – rewrap paragraphs (DokuWiki single-newline fix) markus_fritsche | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== ohm_gl_fix — Phase 1 (revised), 2026-05-01 ====== | ====== ohm_gl_fix — Phase 1 (revised), 2026-05-01 ====== | ||
| - | This page replaces the original Phase 1 lock at | + | This page replaces the original Phase 1 lock at [[ohm_gl_fix: |
| - | [[ohm_gl_fix: | + | |
| - | quantitative target (" | + | |
| - | floor of gst→waylandsink" | + | |
| - | [[ohm_gl_fix: | + | |
| - | campaign had been reframed twice (Markus 2026-04-30: "drops post warmup, | + | |
| - | not drops total"; | + | |
| - | mpv. I seek to identify the structural gap"). This refinement folds those | + | |
| - | corrections plus the empirical evidence from Phase 3 revised into a single | + | |
| - | load-bearing Phase 1. | + | |
| - | The original Phase 1 page stands as audit trail; Phase 3 revised + this | + | The original Phase 1 page stands as audit trail; Phase 3 revised + this Phase 1 revised are the live driver going forward. |
| - | Phase 1 revised are the live driver going forward. | + | |
| ===== 1. Goal (essence) ===== | ===== 1. Goal (essence) ===== | ||
| - | > Buffer-to-display achieves zero-copy for libavcodec / libva consumers | + | > Buffer-to-display achieves zero-copy for libavcodec / libva consumers on Mali-G52 + KWin Wayland, such that in-scope workloads run with the same memory-subsystem pressure profile as the GStreamer + '' |
| - | > on Mali-G52 + KWin Wayland, such that in-scope workloads run with the | + | |
| - | > same memory-subsystem pressure profile as the GStreamer + | + | |
| - | > '' | + | |
| - | "Same memory-subsystem pressure profile" | + | "Same memory-subsystem pressure profile" |
| - | measurable below. The reference path is named in §3. | + | |
| ===== 2. In-scope use cases ===== | ===== 2. In-scope use cases ===== | ||
| - | * **YouTube / HTML5 ''< | + | * **YouTube / HTML5 ''< |
| - | | + | * **Web browsing in Brave** — compositor-side video + animation surfaces; same Chromium GPU-process pipeline as YouTube. |
| - | * **Web browsing in Brave** — compositor-side video + animation | + | * **VS Code** (Electron + Chromium under the hood) — same pipeline as Brave for any embedded video / animation rendering. |
| - | | + | |
| - | * **VS Code** (Electron + Chromium under the hood) — same | + | |
| - | | + | |
| - | Workloads outside this list are not the campaign' | + | Workloads outside this list are not the campaign' |
| - | deliberate scope-tightening — see §6. | + | |
| ===== 3. Reference baseline (zero-copy benchmark) ===== | ===== 3. Reference baseline (zero-copy benchmark) ===== | ||
| - | '' | + | '' |
| - | \! h264parse \! v4l2slh264dec \! waylandsink sync=true'' | + | |
| - | (scenario S1 in [[ohm_gl_fix: | + | |
| - | revised]]). Empirical numbers, 60 s steady-state, | + | |
| ^ Metric | ^ Metric | ||
| Line 53: | Line 32: | ||
| | post-warmup drops | 0 (fourier 2026-04-24); | | post-warmup drops | 0 (fourier 2026-04-24); | ||
| - | This is the empirical operating point a successful campaign aims to | + | This is the empirical operating point a successful campaign aims to reach for libavcodec / libva consumers on the in-scope workloads. |
| - | reach for libavcodec / libva consumers on the in-scope workloads. | + | |
| ===== 4. Measurable success criteria (all must hold) ===== | ===== 4. Measurable success criteria (all must hold) ===== | ||
| - | Measurements taken on the in-scope workload (e.g. Brave + bbb-class | + | Measurements taken on the in-scope workload (e.g. Brave + bbb-class H.264 video over a 60 s steady-state window, with strace + perf-stat instrumentation per [[ohm_gl_fix: |
| - | H.264 video over a 60 s steady-state window, with strace + perf-stat | + | |
| - | instrumentation per [[ohm_gl_fix: | + | |
| - | - **C1 — Drops.** Post-warmup drops ≤ **10** over 60 s. Warmup = first | + | - **C1 — Drops.** Post-warmup drops ≤ **10** over 60 s. Warmup = first 10 s. Drops in warmup may be up to ~10; drops after warmup must be 0. Sanity cap on total drops across the full 60 s = 10. |
| - | | + | - **C2 — Memory-subsystem pressure.** LLC-load-misses ≤ **3 × baseline** over 10 s steady-state (i.e. ≤ ~9 M). cache-misses ≤ ~6 M as a leading indicator. |
| - | | + | - **C3 — Display-path activity.** DRM_IOCTL_* per second ≤ **100**. Current libavcodec-using contenders sit at 800-1 050 per sec; target is the baseline rate (0) plus tolerance. |
| - | - **C2 — Memory-subsystem pressure.** LLC-load-misses ≤ **3 × | + | |
| - | | + | |
| - | | + | |
| - | - **C3 — Display-path activity.** DRM_IOCTL_* per second ≤ **100**. | + | |
| - | | + | |
| - | | + | |
| - **C4 — Boundary fd-passing.** At least one of: | - **C4 — Boundary fd-passing.** At least one of: | ||
| - | * (a) '' | + | * (a) '' |
| - | | + | * (b) '' |
| - | | + | |
| - | * (b) '' | + | |
| - | | + | |
| - | C1 is the user-visible criterion ("the video plays smoothly" | + | C1 is the user-visible criterion ("the video plays smoothly" |
| - | and C3 are the physical-layer criteria for "no CPU memcpy of frame data, | + | |
| - | no per-frame Mesa GL+DRM round-trips" | + | |
| - | ("the path actually exists, not just is fast" | + | |
| ===== 5. Loopback edges ===== | ===== 5. Loopback edges ===== | ||
| - | * **C1 ✓ + C2 ✗ + C3 ✓** → not possible without something else | + | * **C1 ✓ + C2 ✗ + C3 ✓** → not possible without something else creating cache pressure; flag for re-investigation. |
| - | | + | * **C1 ✓ + C2 ✓ + C3 ✗** → Level-1 zero-copy fixed, Level-2 still missing (decoder produces dmabuf but display path goes through Mesa GL). **Re-enter Phase 4 fix-surface choice.** |
| - | * **C1 ✓ + C2 ✓ + C3 ✗** → Level-1 zero-copy fixed, Level-2 still | + | * **C1 ✗** at Phase 7 verification → re-enter Phase 4 with new perf evidence per [[ohm_gl_fix: |
| - | | + | * **C4 ✗ but C1 ✓ + C2 ✓ + C3 ✓** → the path is not what we expected (e.g. compositor copies despite client passing fd; or a hidden zero-copy mechanism exists we haven' |
| - | | + | |
| - | * **C1 ✗** at Phase 7 verification → re-enter Phase 4 with new perf | + | |
| - | | + | |
| - | | + | |
| - | * **C4 ✗ but C1 ✓ + C2 ✓ + C3 ✓** → the path is not what we | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| ===== 6. What this Phase 1 deliberately does NOT lock ===== | ===== 6. What this Phase 1 deliberately does NOT lock ===== | ||
| - | * **A specific patch site.** That's [[ohm_gl_fix: | + | * **A specific patch site.** That's [[ohm_gl_fix: |
| - | | + | * **An absolute CPU% target.** cycles/ |
| - | | + | * **Out-of-scope workload performance.** 3D games, Proton/ |
| - | * **An absolute CPU% target.** cycles/ | + | * **The S5 regression (gst-launch waylandsink ~0.3 drops/sec on today' |
| - | | + | * **Per-application HW-decode engagement.** Different consumers may take different fix surfaces (e.g. browser via libva-v4l2-request multiplanar — fix surface A; mpv via libavcodec drm_prime → linux-dmabuf-v1 — fix surface B). Phase 1 does not pre-select which. |
| - | | + | |
| - | | + | |
| - | | + | |
| - | * **Out-of-scope workload performance.** 3D games, Proton/ | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | * **The S5 regression (gst-launch waylandsink ~0.3 drops/sec on | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | * **Per-application HW-decode engagement.** Different consumers may | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| ===== 7. Differences from the original Phase 1 (2026-04-30) ===== | ===== 7. Differences from the original Phase 1 (2026-04-30) ===== | ||
| Line 138: | Line 73: | ||
| ===== 8. Live data anchors ===== | ===== 8. Live data anchors ===== | ||
| - | The CSV schema captures these criteria as machine-readable rows. As of | + | The CSV schema captures these criteria as machine-readable rows. As of 2026-05-01: |
| - | 2026-05-01: | + | |
| - | * '' | + | * '' |
| - | | + | * '' |
| - | | + | * '' |
| - | * '' | + | |
| - | | + | |
| - | * '' | + | |
| - | | + | |
| - | C1 / C2 / C3 / C4 are computable from these CSVs at any future | + | C1 / C2 / C3 / C4 are computable from these CSVs at any future measurement point. |
| - | measurement point. | + | |
| ===== 9. References ===== | ===== 9. References ===== | ||
| - | * [[ohm_gl_fix: | + | * [[ohm_gl_fix: |
| - | | + | * [[ohm_gl_fix: |
| - | * [[ohm_gl_fix: | + | * [[ohm_gl_fix: |
| - | | + | * [[ohm_gl_fix: |
| - | | + | |
| - | * [[ohm_gl_fix: | + | |
| - | | + | |
| - | * [[ohm_gl_fix: | + | |
| - | | + | |
| * [[ohm_gl_fix: | * [[ohm_gl_fix: | ||
| ---- | ---- | ||
| - | //Phase 1 (revised) ends here. Phase 4 fix-surface choice + Phase 6 | + | //Phase 1 (revised) ends here. Phase 4 fix-surface choice + Phase 6 implementation will deliver against C1-C4. Phase 7 verification re-runs the strace + perf-stat instrumentation from Phase 3 revised on the in-scope workload and writes the resulting numbers to '' |
| - | implementation will deliver against C1-C4. Phase 7 verification re-runs | + | |
| - | the strace + perf-stat instrumentation from Phase 3 revised on the | + | |
| - | in-scope workload and writes the resulting numbers to '' | + | |
| - | as '' | + | |
ohm_gl_fix/phase1_revised_2026-05-01.1777634592.txt.gz · Last modified: by markus_fritsche
