User Tools

Site Tools


ohm_gl_fix:phase2_2026-04-30

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ohm_gl_fix:phase2_2026-04-30 [2026/04/30 17:56] – [1. Mesa version + panfrost build flags] markus_fritscheohm_gl_fix:phase2_2026-04-30 [2026/05/01 13:08] (current) – rewrap paragraphs (DokuWiki single-newline fix) markus_fritsche
Line 1: Line 1:
 ====== Phase 2 — Situation Analysis (ohm_gl_fix iteration 1) ====== ====== Phase 2 — Situation Analysis (ohm_gl_fix iteration 1) ======
  
-Substrate enumeration on ''ohm'' (PineTab2, RK3566, Mali-G52 MP2, +Substrate enumeration on ''ohm'' (PineTab2, RK3566, Mali-G52 MP2, hantro-vpu, kernel ''6.19.10-danctnix1-1-pinetab2''), 2026-04-30.
-hantro-vpu, kernel ''6.19.10-danctnix1-1-pinetab2''), 2026-04-30.+
  
-Phase 1 lock: [[ohm_gl_fix:phase1_2026-04-30|Phase 1 goal formulation, 2026-04-30]]. +Phase 1 lock: [[ohm_gl_fix:phase1_2026-04-30|Phase 1 goal formulation, 2026-04-30]]. Goal target: on ''bbb_1080p30_h264.mp4'' with ''mpv --hwdec=v4l2request --vo=gpu-next'' over a 60 s steady-state window, drops drop from baseline 1039/1440 (72 %) into the ''gst v4l2slh264dec → waylandsink'' transient-startup floor — equivalently, EGLImage-import ceases to be the binding constraint.
-Goal target: on ''bbb_1080p30_h264.mp4'' with +
-''mpv --hwdec=v4l2request --vo=gpu-next'' over a 60 s steady-state window, +
-drops drop from baseline 1039/1440 (72 %) into the +
-''gst v4l2slh264dec → waylandsink'' transient-startup floor — equivalently, +
-EGLImage-import ceases to be the binding constraint.+
  
-This page is descriptive, not prescriptive. No plan here; the plan is +This page is descriptive, not prescriptive. No plan here; the plan is Phase 4. If something below turns out to be the wrong thing to measure once Phase 3 baselines run, that's a Phase 3→1 loopback, not a Phase 2 revision.
-Phase 4. If something below turns out to be the wrong thing to measure +
-once Phase 3 baselines run, that's a Phase 3→1 loopback, not a Phase 2 +
-revision.+
  
 ---- ----
Line 33: Line 24:
 ===== 2. EGL/GLES extensions advertised ===== ===== 2. EGL/GLES extensions advertised =====
  
-EGL client extensions: +EGL client extensions: ''EGL_EXT_client_extensions, EGL_EXT_device_*, EGL_EXT_explicit_device, EGL_EXT_platform_base, EGL_EXT_platform_{wayland,x11,xcb,device}, EGL_KHR_debug, EGL_KHR_platform_{gbm,wayland,x11}, EGL_MESA_platform_{gbm,surfaceless}''.
-''EGL_EXT_client_extensions, EGL_EXT_device_*, EGL_EXT_explicit_device, +
-EGL_EXT_platform_base, EGL_EXT_platform_{wayland,x11,xcb,device}, +
-EGL_KHR_debug, EGL_KHR_platform_{gbm,wayland,x11}, +
-EGL_MESA_platform_{gbm,surfaceless}''.+
  
-EGL display extensions (panfrost, both GBM and Wayland) — selecting the +EGL display extensions (panfrost, both GBM and Wayland) — selecting the ones load-bearing for this campaign:
-ones load-bearing for this campaign:+
  
   * **''EGL_EXT_image_dma_buf_import''** ✓ — base import path.   * **''EGL_EXT_image_dma_buf_import''** ✓ — base import path.
-  * **''EGL_EXT_image_dma_buf_import_modifiers''** ✓ — needed for any +  * **''EGL_EXT_image_dma_buf_import_modifiers''** ✓ — needed for any Rockchip-side AFBC / linear-tile NV12 modifier handling.
-    Rockchip-side AFBC / linear-tile NV12 modifier handling.+
   * **''EGL_KHR_image_base''**, ''EGL_KHR_image'' ✓.   * **''EGL_KHR_image_base''**, ''EGL_KHR_image'' ✓.
-  * **''EGL_MESA_image_dma_buf_export''** ✓ — bidirectional, of interest +  * **''EGL_MESA_image_dma_buf_export''** ✓ — bidirectional, of interest later for Mali → KWin handoff scenarios. 
-    later for Mali → KWin handoff scenarios. +  * **''EGL_KHR_fence_sync''**, **''EGL_KHR_wait_sync''** ✓ — explicit fencing, prerequisite for not blocking the GL queue on each new import. 
-  * **''EGL_KHR_fence_sync''**, **''EGL_KHR_wait_sync''** ✓ — explicit fencing, +  * ''EGL_ANDROID_native_fence_sync'' ✓ — sync-fd interop with V4L2 out-fences.
-    prerequisite for not blocking the GL queue on each new import. +
-  * ''EGL_ANDROID_native_fence_sync'' ✓ — sync-fd interop with V4L2 +
-    out-fences.+
   * ''EGL_KHR_partial_update'' ✓.   * ''EGL_KHR_partial_update'' ✓.
-  * Context flags: ''EGL_KHR_no_config_context'', ''EGL_KHR_create_context'', +  * Context flags: ''EGL_KHR_no_config_context'', ''EGL_KHR_create_context'', ''EGL_KHR_create_context_no_error'', ''EGL_IMG_context_priority''.
-    ''EGL_KHR_create_context_no_error'', ''EGL_IMG_context_priority''.+
  
-What ''eglinfo'' does **not** report, and what we therefore still need +What ''eglinfo'' does **not** report, and what we therefore still need before Phase 4 (so this is a Phase 3 baseline action, not a Phase 2 hole): the per-format modifier list returned by ''eglQueryDmaBufModifiersEXT(DRM_FORMAT_NV12)'' and the ''external_only'' flag for each modifier. That's the data point that decides whether a kernel-allocated dmabuf can flow into a regular 2D sampler in libplacebo's GL path or only into ''samplerExternalOES''.
-before Phase 4 (so this is a Phase 3 baseline action, not a Phase 2 +
-hole): the per-format modifier list returned by +
-''eglQueryDmaBufModifiersEXT(DRM_FORMAT_NV12)'' and the +
-''external_only'' flag for each modifier. That's the data point that +
-decides whether a kernel-allocated dmabuf can flow into a regular 2D +
-sampler in libplacebo's GL path or only into ''samplerExternalOES''.+
  
-GL ES extensions are advertised at GLES 3.1 level — full extension +GL ES extensions are advertised at GLES 3.1 level — full extension string lives in ''eglinfo'''s GBM block; nothing surprising for a Mali-G52 panfrost build. ''es2_info'' cannot run from an SSH session without a display; the Wayland ''eglinfo'' GLES profile already covers the same ground.
-string lives in ''eglinfo'''s GBM block; nothing surprising for a Mali-G52 +
-panfrost build. ''es2_info'' cannot run from an SSH session without a +
-display; the Wayland ''eglinfo'' GLES profile already covers the same +
-ground.+
  
 ===== 3. V4L2 buffer-pool size on the hantro path ===== ===== 3. V4L2 buffer-pool size on the hantro path =====
  
-Probe: ''strace -f -e ioctl ffmpeg -hwaccel v4l2request +Probe: ''strace -f -e ioctl ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -i bbb_1080p30_h264.mp4 -frames:v 60 -f null -'' against ''/dev/video1'' (hantro-vpu, mainline ''rockchip,rk3568-vpu-dec'').
--hwaccel_output_format drm_prime -i bbb_1080p30_h264.mp4 -frames:v 60 +
--f null -'' against ''/dev/video1'' (hantro-vpu, mainline +
-''rockchip,rk3568-vpu-dec'').+
  
 Result: Result:
  
-  * **Capture (decoded-frame) ring: 9 distinct dmabuf fds** — indices 0..8, +  * **Capture (decoded-frame) ring: 9 distinct dmabuf fds** — indices 0..8, each created by a separate ''VIDIOC_CREATE_BUFS count=1'' call (the v4l2-request hwaccel grows the pool one buffer at a time, never calls ''VIDIOC_REQBUFS'' with a target count) and exported once via ''VIDIOC_EXPBUF'' with ''O_RDONLY''
-    each created by a separate ''VIDIOC_CREATE_BUFS count=1'' call (the +  * Format: **NV12 single-plane**, 1920×1088 (height aligned up from 1080 to the 16-MB tile boundary), ''sizeimage=3,655,712'' B, ''bytesperline=1920''
-    v4l2-request hwaccel grows the pool one buffer at a time, never calls +  * Output (bitstream) ring: 4 buffers of ''V4L2_PIX_FMT_H264_SLICE'' ('S264'), 1920×1088 with ''sizeimage=3,133,440'' B. Not relevant to GL import cost — included for completeness. 
-    ''VIDIOC_REQBUFS'' with a target count) and exported once via +  * Reuse density in the 60-frame headless trace: **462 DQBUF events** across the 9 capture fds, i.e. each fd dequeued ≈51 times in 60 decoded frames. This is steady-state frame churn, not capture-buffer 
-    ''VIDIOC_EXPBUF'' with ''O_RDONLY''+rotation — the hantro-VPU DPB just holds the frames longer than mpv's consumer would.
-  * Format: **NV12 single-plane**, 1920×1088 (height aligned up from 1080 +
-    to the 16-MB tile boundary), ''sizeimage=3,655,712'' B, ''bytesperline=1920''+
-  * Output (bitstream) ring: 4 buffers of ''V4L2_PIX_FMT_H264_SLICE'' +
-    ('S264'), 1920×1088 with ''sizeimage=3,133,440'' B. Not relevant to GL +
-    import cost — included for completeness. +
-  * Reuse density in the 60-frame headless trace: **462 DQBUF events** +
-    across the 9 capture fds, i.e. each fd dequeued ≈51 times in 60 +
-    decoded frames. This is steady-state frame churn, not capture-buffer +
-    rotation — the hantro-VPU DPB just holds the frames longer than mpv's +
-    consumer would.+
  
-**Implication for the campaign.** A 60 s / 1440-frame run cycles each of +**Implication for the campaign.** A 60 s / 1440-frame run cycles each of the 9 dmabuf fds **≈160 times**. The current gpu-next path treats every DQBUF as a new fd-into-EGLImage import; a cache keyed on fd identity collapses 1440 imports into 9. This is the upper bound on what fd-identity caching can save on this clip — useful as the Phase 3 prediction sanity-check.
-the 9 dmabuf fds **≈160 times**. The current gpu-next path treats every +
-DQBUF as a new fd-into-EGLImage import; a cache keyed on fd identity +
-collapses 1440 imports into 9. This is the upper bound on what +
-fd-identity caching can save on this clip — useful as the Phase 3 +
-prediction sanity-check.+
  
 ===== 4. mpv + libplacebo + which hwdec gpu-next loads ===== ===== 4. mpv + libplacebo + which hwdec gpu-next loads =====
Line 106: Line 59:
   * ''mpv 1:0.41.0-3'', built 2026-02-14.   * ''mpv 1:0.41.0-3'', built 2026-02-14.
     * ''libplacebo 7.360.1-1''.     * ''libplacebo 7.360.1-1''.
-    * ''ffmpeg n8.0.1'' (built) / ''n8.1'' (runtime). libavcodec 62.11.100 / +    * ''ffmpeg n8.0.1'' (built) / ''n8.1'' (runtime). libavcodec 62.11.100 / 62.28.100. The runtime drift is from the marfrit ''ffmpeg-v4l2-request-git'' install rebased onto 8.1. 
-      62.28.100. The runtime drift is from the marfrit +  * ''mpv --hwdec=help'' lists the **''v4l2request''** family for h264, hevc, mpeg2video, vp8, vp9, av1; same set under ''v4l2request-copy'' (CPU readback variant). No standalone ''drmprime'' entry — that's an mpv-internal label for the ''--vo=gpu-next'' path that consumes ''AV_PIX_FMT_DRM_PRIME'' frames from any hwaccel, not a ''--hwdec='' selectable. 
-      ''ffmpeg-v4l2-request-git'' install rebased onto 8.1. +  * Active configuration for the campaign baseline: ''mpv --hwdec=v4l2request --vo=gpu-next bbb_1080p30_h264.mp4''. Path: hantro-vpu produces NV12 dmabuf → ffmpeg wraps as ''AVDRMFrameDescriptor'' → mpv's gpu-next path hands the descriptor to libplacebo → libplacebo calls ''eglCreateImageKHR(EGL_LINUX_DMA_BUF_EXT)'' per frame → samples into a GL texture → composites → KWin presents. The Phase 3 baseline (1039 drops / 1440, 138 % CPU) is on this exact invocation.
-  * ''mpv --hwdec=help'' lists the **''v4l2request''** family for h264, hevc, +
-    mpeg2video, vp8, vp9, av1; same set under ''v4l2request-copy'' (CPU +
-    readback variant). No standalone ''drmprime'' entry — that's an +
-    mpv-internal label for the ''--vo=gpu-next'' path that consumes +
-    ''AV_PIX_FMT_DRM_PRIME'' frames from any hwaccel, not a ''--hwdec='' +
-    selectable. +
-  * Active configuration for the campaign baseline: +
-    ''mpv --hwdec=v4l2request --vo=gpu-next bbb_1080p30_h264.mp4''. Path: +
-    hantro-vpu produces NV12 dmabuf → ffmpeg wraps as +
-    ''AVDRMFrameDescriptor'' → mpv's gpu-next path hands the descriptor to +
-    libplacebo → libplacebo calls ''eglCreateImageKHR(EGL_LINUX_DMA_BUF_EXT)'' +
-    per frame → samples into a GL texture → composites → KWin presents. +
-    The Phase 3 baseline (1039 drops / 1440, 138 % CPU) is on this exact +
-    invocation.+
  
 ===== 5. KWin / kwin_wayland version + startup GL spam ===== ===== 5. KWin / kwin_wayland version + startup GL spam =====
Line 128: Line 67:
   * ''kwin 1:6.6.4-1'', ''plasma-workspace 6.6.4-1'', ''plasma-desktop 6.6.4-1''.   * ''kwin 1:6.6.4-1'', ''plasma-workspace 6.6.4-1'', ''plasma-desktop 6.6.4-1''.
   * ''kwin_wayland --version'' → ''kwin 6.6.4''.   * ''kwin_wayland --version'' → ''kwin 6.6.4''.
-  * Live process tree: ''kwin_wayland --wayland-fd 7 --socket wayland-0 +  * Live process tree: ''kwin_wayland --wayland-fd 7 --socket wayland-0 --xwayland-fd 8 --xwayland-fd 9 --xwayland-display :0 ... --xwayland'', launched by ''startplasma-wayland'' under sddm session. 
-    --xwayland-fd 8 --xwayland-fd 9 --xwayland-display :0 ... --xwayland'', +  * The ''GL_POINT_SPRITE'', ''GL_ALPHA'' ''glTexSubImage2D'' cascade observed in the 2026-04-30 startup logs is **labelled background noise**: it's KWin's compatibility-profile feature probe at compositor init, fires once at session start, and is not on the per-frame video critical path. Phase 3 instrumentation must filter or clearly attribute these events so they don't pollute per-frame import attribution.
-    launched by ''startplasma-wayland'' under sddm session. +
-  * The ''GL_POINT_SPRITE'', ''GL_ALPHA'' ''glTexSubImage2D'' cascade observed +
-    in the 2026-04-30 startup logs is **labelled background noise**: it's +
-    KWin's compatibility-profile feature probe at compositor init, fires +
-    once at session start, and is not on the per-frame video critical +
-    path. Phase 3 instrumentation must filter or clearly attribute these +
-    events so they don't pollute per-frame import attribution.+
  
 ===== 6. Known failure modes — consolidated ===== ===== 6. Known failure modes — consolidated =====
  
-Inherited from the ohm_gl_fix README and the fourier Phase 5 page. +Inherited from the ohm_gl_fix README and the fourier Phase 5 page. Listed here so they don't get re-discovered later.
-Listed here so they don't get re-discovered later.+
  
-  * **R6 — Per-frame fresh EGLImage allocation (the campaign's primary +  * **R6 — Per-frame fresh EGLImage allocation (the campaign's primary lever).** mpv ''--vo=gpu-next'' re-imports each NV12 dmabuf fd into a fresh EGLImage on every decoded frame instead of caching by fd identity. Combined with the §3 result (9-fd capture ring, ~160× reuse over 60 s) this is where the cost is concentrated. 
-    lever).** mpv ''--vo=gpu-next'' re-imports each NV12 dmabuf fd into a +  * **R7 — ''--vo=dmabuf-wayland'' format-negotiation break.** mpv's hwdec → dmabuf-wayland path fails with "hardware format not supported" (''yuv420p → drm_prime'' upload fails). Would otherwise be the zero-copy answer; out of scope for ohm_gl_fix (it's the //workaround//). Its absence is the reason gpu-next is the realistic per-frame path on mpv and therefore the reason this campaign exists. 
-    fresh EGLImage on every decoded frame instead of caching by fd +  * **R-modifiers — ''external_only'' handling on panfrost.** Some Rockchip NV12 modifiers are advertised as external-image-only (sampleable only via ''samplerExternalOES'', not via regular 2D samplers). gpu-next / libplacebo's GLSL paths assume regular 2D samplers. If panfrost reports ''external_only=true'' for the kernel-allocated NV12 modifier on hantro, the import succeeds but composition either miscomposites or silently falls back to a slow path. **Verifies in Phase 3** via ''eglQueryDmaBufModifiersEXT'' against the actual fd's modifier from ''AVDRMFrameDescriptor''
-    identity. Combined with the §3 result (9-fd capture ring, ~160× +  * **R-compositor-bound (gotcha inherited from fourier).** "Compositor-bound ≠ decode-bound": at 138 % CPU and 72 % drops on 1080p24 with v4l2request decode, the binding constraint is not the hantro VPU. Sister-path proof: ''gst v4l2slh264dec → waylandsink'' (zero-copy dmabuf-direct) lands at 6–7 % CPU / 0 drops on identical hardware. Always re-verify which of {decode, import, composite, scanout} is binding with ''mpv --vo=null'' / ''top -H'' / ''perf top'' before attributing CPU.
-    reuse over 60 s) this is where the cost is concentrated. +
-  * **R7 — ''--vo=dmabuf-wayland'' format-negotiation break.** mpv's +
-    hwdec → dmabuf-wayland path fails with "hardware format not +
-    supported" (''yuv420p → drm_prime'' upload fails). Would otherwise be +
-    the zero-copy answer; out of scope for ohm_gl_fix (it's the +
-    //workaround//). Its absence is the reason gpu-next is the realistic +
-    per-frame path on mpv and therefore the reason this campaign exists. +
-  * **R-modifiers — ''external_only'' handling on panfrost.** Some +
-    Rockchip NV12 modifiers are advertised as +
-    external-image-only (sampleable only via ''samplerExternalOES'', not +
-    via regular 2D samplers). gpu-next / libplacebo's GLSL paths assume +
-    regular 2D samplers. If panfrost reports ''external_only=true'' for the +
-    kernel-allocated NV12 modifier on hantro, the import succeeds but +
-    composition either miscomposites or silently falls back to a slow +
-    path. **Verifies in Phase 3** via ''eglQueryDmaBufModifiersEXT'' against +
-    the actual fd's modifier from ''AVDRMFrameDescriptor''+
-  * **R-compositor-bound (gotcha inherited from fourier).** +
-    "Compositor-bound ≠ decode-bound": at 138 % CPU and 72 % drops on +
-    1080p24 with v4l2request decode, the binding constraint is not the +
-    hantro VPU. Sister-path proof: ''gst v4l2slh264dec → waylandsink'' +
-    (zero-copy dmabuf-direct) lands at 6–7 % CPU / 0 drops on identical +
-    hardware. Always re-verify which of {decode, import, composite, +
-    scanout} is binding with ''mpv --vo=null'' / ''top -H'' / ''perf top'' +
-    before attributing CPU.+
  
 ===== Success-transition lock-in ===== ===== Success-transition lock-in =====
  
-Numbers locking the journey from baseline to goal live in +Numbers locking the journey from baseline to goal live in ''metrics.csv'' (sibling file). Four rows: ''phase1_baseline'' (138 % CPU, 1039/1440 drops, 72 %), ''phase1_reference'' and ''phase1_reference_fs'' (the gst→waylandsink floor: 6–7 % CPU, 0/1488 drops), and ''phase1_goal_target'' (post-warmup drops = 0; warmup = first 10 s with ≤ 10 drops tolerated). Binding cell is ''phase1_goal_target.drops_post_warmup''; the ''drops'' cell carries the warmup sanity cap (10) so the path can't trivially satisfy "0 post-warmup" by stretching warmup. Phase 3 must decompose the baseline 1039 drops into warmup vs post-warmup before any Phase 4 prediction is made. Phase 3 and Phase 7 append rows; the ''role'' column distinguishes the metric from references. Loopback Phase 3 → Phase 1 (per dev-process) edits the goal_target row's binding cell rather than rewriting prose.
-''metrics.csv'' (sibling file). Four rows: ''phase1_baseline'' (138 % CPU, +
-1039/1440 drops, 72 %), ''phase1_reference'' and ''phase1_reference_fs'' +
-(the gst→waylandsink floor: 6–7 % CPU, 0/1488 drops), and +
-''phase1_goal_target'' (post-warmup drops = 0; warmup = first 10 s with +
-≤ 10 drops tolerated). Binding cell is +
-''phase1_goal_target.drops_post_warmup''; the ''drops'' cell carries the +
-warmup sanity cap (10) so the path can't trivially satisfy +
-"0 post-warmup" by stretching warmup. Phase 3 must decompose the +
-baseline 1039 drops into warmup vs post-warmup before any Phase 4 +
-prediction is made. Phase 3 and Phase 7 append rows; the ''role'' column +
-distinguishes the metric from references. Loopback Phase 3 → Phase 1 +
-(per dev-process) edits the goal_target row's binding cell rather +
-than rewriting prose.+
  
 ===== Locked metrics (success-transition) ===== ===== Locked metrics (success-transition) =====
Line 237: Line 131:
  
   * Plan / approach. That's Phase 4.   * Plan / approach. That's Phase 4.
-  * Baseline numbers beyond Phase 1's 1039/1440 reference. Phase 3 does +  * Baseline numbers beyond Phase 1's 1039/1440 reference. Phase 3 does the new measurements, including the modifier list and the fd-identity reuse count. 
-    the new measurements, including the modifier list and the +  * Patches, diffs, code reads of panfrost / libplacebo / mpv. Those enter at Phase 4 once we know which import call to replace.
-    fd-identity reuse count. +
-  * Patches, diffs, code reads of panfrost / libplacebo / mpv. Those +
-    enter at Phase 4 once we know which import call to replace.+
  
 ===== References used in this enumeration ===== ===== References used in this enumeration =====
  
   * ''~/src/ohm_gl_fix/README.md''   * ''~/src/ohm_gl_fix/README.md''
-  * ''~/src/fourier/README.md'' — baseline table at L173–234, gotchas at +  * ''~/src/fourier/README.md'' — baseline table at L173–234, gotchas at L357–384.
-    L357–384.+
   * ''~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md''   * ''~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md''
-  * DokuWiki: [[ohm_gl_fix:phase1_2026-04-30]], [[ohm_gl_fix:start]], +  * DokuWiki: [[ohm_gl_fix:phase1_2026-04-30]], [[ohm_gl_fix:start]], [[fourier:phase5_2026-04-30]] (latter currently access-controlled in the wiki — local mirror in ''~/src/fourier/README.md'').
-    [[fourier:phase5_2026-04-30]] (latter currently access-controlled in the +
-    wiki — local mirror in ''~/src/fourier/README.md'').+
  
ohm_gl_fix/phase2_2026-04-30.1777571809.txt.gz · Last modified: by markus_fritsche