User Tools

Site Tools


ohm_gl_fix:phase1_revised_2026-05-01

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ohm_gl_fix:phase1_revised_2026-05-01 [2026/05/01 11:23] – Phase 1 revised — measurable success criteria across in-scope use cases markus_fritscheohm_gl_fix:phase1_revised_2026-05-01 [2026/05/01 13:08] (current) – rewrap paragraphs (DokuWiki single-newline fix) markus_fritsche
Line 1: Line 1:
 ====== ohm_gl_fix — Phase 1 (revised), 2026-05-01 ====== ====== ohm_gl_fix — Phase 1 (revised), 2026-05-01 ======
  
-This page replaces the original Phase 1 lock at +This page replaces the original Phase 1 lock at [[ohm_gl_fix:phase1_2026-04-30]]. The original locked an mpv-specific quantitative target ("drops from 1039/1440 → within transient-startup floor of gst→waylandsink") on a single test invocation. By the time [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] landed, the campaign had been reframed twice (Markus 2026-04-30: "drops post warmup, not drops total"; Markus 2026-04-30 evening: "I do not seek to optimize mpv. I seek to identify the structural gap"). This refinement folds those corrections plus the empirical evidence from Phase 3 revised into a single load-bearing Phase 1.
-[[ohm_gl_fix:phase1_2026-04-30]]. The original locked an mpv-specific +
-quantitative target ("drops from 1039/1440 → within transient-startup +
-floor of gst→waylandsink") on a single test invocation. By the time +
-[[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] landed, the +
-campaign had been reframed twice (Markus 2026-04-30: "drops post warmup, +
-not drops total"; Markus 2026-04-30 evening: "I do not seek to optimize +
-mpv. I seek to identify the structural gap"). This refinement folds those +
-corrections plus the empirical evidence from Phase 3 revised into a single +
-load-bearing Phase 1.+
  
-The original Phase 1 page stands as audit trail; Phase 3 revised + this +The original Phase 1 page stands as audit trail; Phase 3 revised + this Phase 1 revised are the live driver going forward.
-Phase 1 revised are the live driver going forward.+
  
 ===== 1. Goal (essence) ===== ===== 1. Goal (essence) =====
  
-> Buffer-to-display achieves zero-copy for libavcodec / libva consumers +> Buffer-to-display achieves zero-copy for libavcodec / libva consumers on Mali-G52 + KWin Wayland, such that in-scope workloads run with the same memory-subsystem pressure profile as the GStreamer + ''linux-dmabuf-v1'' reference path.
-on Mali-G52 + KWin Wayland, such that in-scope workloads run with the +
-same memory-subsystem pressure profile as the GStreamer + +
-''linux-dmabuf-v1'' reference path.+
  
-"Same memory-subsystem pressure profile" is what makes the goal +"Same memory-subsystem pressure profile" is what makes the goal measurable below. The reference path is named in §3.
-measurable below. The reference path is named in §3.+
  
 ===== 2. In-scope use cases ===== ===== 2. In-scope use cases =====
  
-  * **YouTube / HTML5 ''<video>'' in Brave** — the highest-traffic +  * **YouTube / HTML5 ''<video>'' in Brave** — the highest-traffic video-decode workload on this device class. 
-    video-decode workload on this device class. +  * **Web browsing in Brave** — compositor-side video + animation surfaces; same Chromium GPU-process pipeline as YouTube. 
-  * **Web browsing in Brave** — compositor-side video + animation +  * **VS Code** (Electron + Chromium under the hood) — same pipeline as Brave for any embedded video / animation rendering.
-    surfaces; same Chromium GPU-process pipeline as YouTube. +
-  * **VS Code** (Electron + Chromium under the hood) — same +
-    pipeline as Brave for any embedded video / animation rendering.+
  
-Workloads outside this list are not the campaign's subject. This is +Workloads outside this list are not the campaign's subject. This is deliberate scope-tightening — see §6.
-deliberate scope-tightening — see §6.+
  
 ===== 3. Reference baseline (zero-copy benchmark) ===== ===== 3. Reference baseline (zero-copy benchmark) =====
  
-''gst-launch-1.0 -q filesrc location=bbb_1080p30_h264.mp4 \! qtdemux +''gst-launch-1.0 -q filesrc location=bbb_1080p30_h264.mp4 \! qtdemux \! h264parse \! v4l2slh264dec \! waylandsink sync=true'' on //ohm// (scenario S1 in [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]]). Empirical numbers, 60 s steady-state, current stack:
-\! h264parse \! v4l2slh264dec \! waylandsink sync=true'' on //ohm// +
-(scenario S1 in [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 +
-revised]]). Empirical numbers, 60 s steady-state, current stack:+
  
 ^ Metric              ^ S1 (zero-copy reference) ^ ^ Metric              ^ S1 (zero-copy reference) ^
Line 53: Line 32:
 | post-warmup drops   | 0 (fourier 2026-04-24); ~0.3/s today (stack drift, see §6) | | post-warmup drops   | 0 (fourier 2026-04-24); ~0.3/s today (stack drift, see §6) |
  
-This is the empirical operating point a successful campaign aims to +This is the empirical operating point a successful campaign aims to reach for libavcodec / libva consumers on the in-scope workloads.
-reach for libavcodec / libva consumers on the in-scope workloads.+
  
 ===== 4. Measurable success criteria (all must hold) ===== ===== 4. Measurable success criteria (all must hold) =====
  
-Measurements taken on the in-scope workload (e.g. Brave + bbb-class +Measurements taken on the in-scope workload (e.g. Brave + bbb-class H.264 video over a 60 s steady-state window, with strace + perf-stat instrumentation per [[ohm_gl_fix:phase3_revised_2026-05-01]] §3).
-H.264 video over a 60 s steady-state window, with strace + perf-stat +
-instrumentation per [[ohm_gl_fix:phase3_revised_2026-05-01]] §3).+
  
-  - **C1 — Drops.** Post-warmup drops ≤ **10** over 60 s. Warmup = first +  - **C1 — Drops.** Post-warmup drops ≤ **10** over 60 s. Warmup = first 10 s. Drops in warmup may be up to ~10; drops after warmup must be 0. Sanity cap on total drops across the full 60 s = 10. 
-    10 s. Drops in warmup may be up to ~10; drops after warmup must be 0. +  - **C2 — Memory-subsystem pressure.** LLC-load-misses ≤ **3 × baseline** over 10 s steady-state (i.e. ≤ ~9 M). cache-misses ≤ ~6 M as a leading indicator. 
-    Sanity cap on total drops across the full 60 s = 10. +  - **C3 — Display-path activity.** DRM_IOCTL_* per second ≤ **100**. Current libavcodec-using contenders sit at 800-1 050 per sec; target is the baseline rate (0) plus tolerance.
-  - **C2 — Memory-subsystem pressure.** LLC-load-misses ≤ **3 × +
-    baseline** over 10 s steady-state (i.e. ≤ ~9 M). cache-misses +
-    ≤ ~6 M as a leading indicator. +
-  - **C3 — Display-path activity.** DRM_IOCTL_* per second ≤ **100**. +
-    Current libavcodec-using contenders sit at 800-1 050 per sec; +
-    target is the baseline rate (0) plus tolerance.+
   - **C4 — Boundary fd-passing.** At least one of:   - **C4 — Boundary fd-passing.** At least one of:
-    * (a) ''VIDIOC_EXPBUF'' count > 0 from V4L2 hantro AND the resulting +    * (a) ''VIDIOC_EXPBUF'' count > 0 from V4L2 hantro AND the resulting fd flowing to the compositor via ''SCM_RIGHTS'' over the Wayland socket, OR 
-      fd flowing to the compositor via ''SCM_RIGHTS'' over the Wayland +    * (b) ''PRIME_FD_TO_HANDLE'' count > 0 from a V4L2-produced dmabuf flowing into the GPU process / browser compositor.
-      socket, OR +
-    * (b) ''PRIME_FD_TO_HANDLE'' count > 0 from a V4L2-produced dmabuf +
-      flowing into the GPU process / browser compositor.+
  
-C1 is the user-visible criterion ("the video plays smoothly"). C2 +C1 is the user-visible criterion ("the video plays smoothly"). C2 and C3 are the physical-layer criteria for "no CPU memcpy of frame data, no per-frame Mesa GL+DRM round-trips". C4 is the structural criterion ("the path actually exists, not just is fast").
-and C3 are the physical-layer criteria for "no CPU memcpy of frame data, +
-no per-frame Mesa GL+DRM round-trips". C4 is the structural criterion +
-("the path actually exists, not just is fast").+
  
 ===== 5. Loopback edges ===== ===== 5. Loopback edges =====
  
-  * **C1 ✓ + C2 ✗ + C3 ✓** → not possible without something else +  * **C1 ✓ + C2 ✗ + C3 ✓** → not possible without something else creating cache pressure; flag for re-investigation. 
-    creating cache pressure; flag for re-investigation. +  * **C1 ✓ + C2 ✓ + C3 ✗** → Level-1 zero-copy fixed, Level-2 still missing (decoder produces dmabuf but display path goes through Mesa GL). **Re-enter Phase 4 fix-surface choice.** 
-  * **C1 ✓ + C2 ✓ + C3 ✗** → Level-1 zero-copy fixed, Level-2 still +  * **C1 ✗** at Phase 7 verification → re-enter Phase 4 with new perf evidence per [[ohm_gl_fix:phase4_2026-04-30|Phase 4]] §6 loopback condition. 
-    missing (decoder produces dmabuf but display path goes through +  * **C4 ✗ but C1 ✓ + C2 ✓ + C3 ✓** → the path is not what we expected (e.g. compositor copies despite client passing fd; or a hidden zero-copy mechanism exists we haven't characterised) — treat as a measurement-classification problem, surface to Phase 5 review.
-    Mesa GL). **Re-enter Phase 4 fix-surface choice.** +
-  * **C1 ✗** at Phase 7 verification → re-enter Phase 4 with new perf +
-    evidence per [[ohm_gl_fix:phase4_2026-04-30|Phase 4]] §6 +
-    loopback condition. +
-  * **C4 ✗ but C1 ✓ + C2 ✓ + C3 ✓** → the path is not what we +
-    expected (e.g. compositor copies despite client passing fd; or +
-    a hidden zero-copy mechanism exists we haven't characterised) — +
-    treat as a measurement-classification problem, surface to Phase 5 +
-    review.+
  
 ===== 6. What this Phase 1 deliberately does NOT lock ===== ===== 6. What this Phase 1 deliberately does NOT lock =====
  
-  * **A specific patch site.** That's [[ohm_gl_fix:phase4_2026-04-30| +  * **A specific patch site.** That's [[ohm_gl_fix:phase4_2026-04-30| Phase 4]]'s job; this Phase 1 only sets the success criteria the patch must meet. 
-    Phase 4]]'s job; this Phase 1 only sets the success criteria the +  * **An absolute CPU% target.** cycles/cache-misses are the binding physical-layer metrics; CPU% is a leading indicator that varies with kernel scheduling decisions and is not directly comparable across single- vs multi-process pipelines (Brave's CPU% is spread across renderer + GPU process; mpv's is one process). 
-    patch must meet. +  * **Out-of-scope workload performance.** 3D games, Proton/DXVK, general-purpose Vulkan applications may regress arbitrarily — for example, a ''panvk-1.2-fakeshim'' (Phase 4 §6 row C2) would crash Vulkan workloads other than the in-scope video presentation path. That is the explicit trade. 
-  * **An absolute CPU% target.** cycles/cache-misses are the binding +  * **The S5 regression (gst-launch waylandsink ~0.3 drops/sec on today's stack)** — separate iteration. Stack drift between fourier 2026-04-24 (0/62) and ohm_gl_fix 2026-04-30 (~16/60). Likely candidates within marfrit-packages' custom mesa / ffmpeg / alsa / libdrm-pinebookpro builds (per Markus 2026-04-30). A separate ohm-gl-fix-companion campaign would bisect via ''pacman.log''
-    physical-layer metrics; CPU% is a leading indicator that varies +  * **Per-application HW-decode engagement.** Different consumers may take different fix surfaces (e.g. browser via libva-v4l2-request multiplanar — fix surface A; mpv via libavcodec drm_prime → linux-dmabuf-v1 — fix surface B). Phase 1 does not pre-select which.
-    with kernel scheduling decisions and is not directly comparable +
-    across single- vs multi-process pipelines (Brave's CPU% is +
-    spread across renderer + GPU process; mpv's is one process). +
-  * **Out-of-scope workload performance.** 3D games, Proton/DXVK, +
-    general-purpose Vulkan applications may regress arbitrarily — for +
-    example, a ''panvk-1.2-fakeshim'' (Phase 4 §6 row C2) would +
-    crash Vulkan workloads other than the in-scope video presentation +
-    path. That is the explicit trade. +
-  * **The S5 regression (gst-launch waylandsink ~0.3 drops/sec on +
-    today's stack)** — separate iteration. Stack drift between +
-    fourier 2026-04-24 (0/62) and ohm_gl_fix 2026-04-30 (~16/60). +
-    Likely candidates within marfrit-packages' custom mesa / ffmpeg / +
-    alsa / libdrm-pinebookpro builds (per Markus 2026-04-30). A +
-    separate ohm-gl-fix-companion campaign would bisect via +
-    ''pacman.log''+
-  * **Per-application HW-decode engagement.** Different consumers may +
-    take different fix surfaces (e.g. browser via libva-v4l2-request +
-    multiplanar — fix surface A; mpv via libavcodec drm_prime → +
-    linux-dmabuf-v1 — fix surface B). Phase 1 does not pre-select +
-    which.+
  
 ===== 7. Differences from the original Phase 1 (2026-04-30) ===== ===== 7. Differences from the original Phase 1 (2026-04-30) =====
Line 138: Line 73:
 ===== 8. Live data anchors ===== ===== 8. Live data anchors =====
  
-The CSV schema captures these criteria as machine-readable rows. As of +The CSV schema captures these criteria as machine-readable rows. As of 2026-05-01:
-2026-05-01:+
  
-  * ''metrics.csv'' — extended with ''llc_load_misses'', +  * ''metrics.csv'' — extended with ''llc_load_misses'', ''drm_ioctl_per_sec'', ''boundary_fd_passed'' columns and refined ''phase1r_*'' rows. 
-    ''drm_ioctl_per_sec'', ''boundary_fd_passed'' columns and refined +  * ''phase3/io_cache_2026-05-01/boundary_counts.csv'' — per-scenario EXPBUF/DQBUF/PRIME_*/SCM_RIGHTS/anon-mmap counts. 
-    ''phase1r_*'' rows. +  * ''phase3/io_cache_2026-05-01/perfstat.csv'' — per-scenario cache-misses, LLC-load-misses, cycles, instructions, IPC.
-  * ''phase3/io_cache_2026-05-01/boundary_counts.csv'' — per-scenario +
-    EXPBUF/DQBUF/PRIME_*/SCM_RIGHTS/anon-mmap counts. +
-  * ''phase3/io_cache_2026-05-01/perfstat.csv'' — per-scenario +
-    cache-misses, LLC-load-misses, cycles, instructions, IPC.+
  
-C1 / C2 / C3 / C4 are computable from these CSVs at any future +C1 / C2 / C3 / C4 are computable from these CSVs at any future measurement point.
-measurement point.+
  
 ===== 9. References ===== ===== 9. References =====
  
-  * [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] — empirical +  * [[ohm_gl_fix:phase3_revised_2026-05-01|Phase 3 revised]] — empirical bucket-attribution + boundary characterisation across six contenders. 
-    bucket-attribution + boundary characterisation across six contenders. +  * [[ohm_gl_fix:phase4_2026-04-30|Phase 4]] — the gap, fix-surface options, ranked. This Phase 1 refines the success criteria against which Phase 4's chosen fix surface will be measured. 
-  * [[ohm_gl_fix:phase4_2026-04-30|Phase 4]] — the gap, fix-surface +  * [[ohm_gl_fix:phase2_2026-04-30|Phase 2]] — substrate (versions, V4L2 9-fd buffer pool, panfrost capability surface). 
-    options, ranked. This Phase 1 refines the success criteria +  * [[ohm_gl_fix:phase1_2026-04-30|Original Phase 1]] — superseded by this page; preserved for audit trail.
-    against which Phase 4's chosen fix surface will be measured. +
-  * [[ohm_gl_fix:phase2_2026-04-30|Phase 2]] — substrate (versions, +
-    V4L2 9-fd buffer pool, panfrost capability surface). +
-  * [[ohm_gl_fix:phase1_2026-04-30|Original Phase 1]] — superseded by +
-    this page; preserved for audit trail.+
   * [[ohm_gl_fix:start|Namespace landing]].   * [[ohm_gl_fix:start|Namespace landing]].
  
 ---- ----
  
-//Phase 1 (revised) ends here. Phase 4 fix-surface choice + Phase 6 +//Phase 1 (revised) ends here. Phase 4 fix-surface choice + Phase 6 implementation will deliver against C1-C4. Phase 7 verification re-runs the strace + perf-stat instrumentation from Phase 3 revised on the in-scope workload and writes the resulting numbers to ''metrics.csv'' as ''phase7_verify_*'' rows.//
-implementation will deliver against C1-C4. Phase 7 verification re-runs +
-the strace + perf-stat instrumentation from Phase 3 revised on the +
-in-scope workload and writes the resulting numbers to ''metrics.csv'' +
-as ''phase7_verify_*'' rows.//+
  
ohm_gl_fix/phase1_revised_2026-05-01.1777634592.txt.gz · Last modified: by markus_fritsche