User Tools

Site Tools


ohm_gl_fix:phase2_2026-04-30

This is an old revision of the document!


Phase 2 — Situation Analysis (ohm_gl_fix iteration 1)

Substrate enumeration on ohm (PineTab2, RK3566, Mali-G52 MP2, hantro-vpu, kernel 6.19.10-danctnix1-1-pinetab2), 2026-04-30.

Phase 1 lock: Phase 1 goal formulation, 2026-04-30. Goal target: on bbb_1080p30_h264.mp4 with mpv –hwdec=v4l2request –vo=gpu-next over a 60 s steady-state window, drops drop from baseline 1039/1440 (72 %) into the gst v4l2slh264dec → waylandsink transient-startup floor — equivalently, EGLImage-import ceases to be the binding constraint.

This page is descriptive, not prescriptive. No plan here; the plan is Phase 4. If something below turns out to be the wrong thing to measure once Phase 3 baselines run, that's a Phase 3→1 loopback, not a Phase 2 revision.


1. Mesa version + panfrost build flags

  • mesa 1:26.0.5-1, Arch Linux ARM stock, build 2026-04-20, install 2026-04-21. aarch64.
  • Provides: libva-mesa-driver, mesa-libgl, opengl-driver.
  • Live eglinfo (run inside the active KWin Wayland session of user mfritsche, WAYLAND_DISPLAY=wayland-0) returns identical capabilities for both GBM and Wayland EGL platforms:
    • EGL 1.5, vendor “Mesa Project”, driver name panfrost.
    • GL core 3.1, GLSL 1.40.
    • GL compatibility 3.1, GLSL 1.40.
    • GLES 3.1, GLSL ES 3.10.
    • Renderer string: “Mali-G52 r1 MC1 (Panfrost)“.
  • Gallium driver layout: /usr/lib/dri/panfrost_dri.so → libdril_dri.so (the unified-loader stub since Mesa 25.x). Real gallium code is in libgallium-26.0.5-arch1.1.so. Every other gallium driver (iris_dri.so, nouveau_dri.so, …) is a sibling symlink to the same stub. Build flags are not introspectable from the installed binary alone — the Arch package is built from archlinuxarm/PKGBUILDs with gallium-drivers=panfrost,… (full set in the upstream PKGBUILD, pull when needed).
  • This Mesa version carries Collabora's mid-2025 panfrost dmabuf-import rework but predates the 26.1 cycle landings; specific YUV-modifier behaviour is what we'll have to characterise in Phase 3, not assume from version number alone.

2. EGL/GLES extensions advertised

EGL client extensions: EGL_EXT_client_extensions, EGL_EXT_device_*, EGL_EXT_explicit_device, EGL_EXT_platform_base, EGL_EXT_platform_{wayland,x11,xcb,device}, EGL_KHR_debug, EGL_KHR_platform_{gbm,wayland,x11}, EGL_MESA_platform_{gbm,surfaceless}.

EGL display extensions (panfrost, both GBM and Wayland) — selecting the ones load-bearing for this campaign:

  • EGL_EXT_image_dma_buf_import ✓ — base import path.
  • EGL_EXT_image_dma_buf_import_modifiers ✓ — needed for any Rockchip-side AFBC / linear-tile NV12 modifier handling.
  • EGL_KHR_image_base, EGL_KHR_image ✓.
  • EGL_MESA_image_dma_buf_export ✓ — bidirectional, of interest later for Mali → KWin handoff scenarios.
  • EGL_KHR_fence_sync, EGL_KHR_wait_sync ✓ — explicit fencing, prerequisite for not blocking the GL queue on each new import.
  • EGL_ANDROID_native_fence_sync ✓ — sync-fd interop with V4L2 out-fences.
  • EGL_KHR_partial_update ✓.
  • Context flags: EGL_KHR_no_config_context, EGL_KHR_create_context,

EGL_KHR_create_context_no_error, EGL_IMG_context_priority.

What eglinfo does not report, and what we therefore still need before Phase 4 (so this is a Phase 3 baseline action, not a Phase 2 hole): the per-format modifier list returned by eglQueryDmaBufModifiersEXT(DRM_FORMAT_NV12) and the external_only flag for each modifier. That's the data point that decides whether a kernel-allocated dmabuf can flow into a regular 2D sampler in libplacebo's GL path or only into samplerExternalOES.

GL ES extensions are advertised at GLES 3.1 level — full extension string lives in eglinfo's GBM block; nothing surprising for a Mali-G52 panfrost build. es2_info cannot run from an SSH session without a display; the Wayland eglinfo GLES profile already covers the same ground.

3. V4L2 buffer-pool size on the hantro path

Probe: strace -f -e ioctl ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -i bbb_1080p30_h264.mp4 -frames:v 60 -f null - against /dev/video1 (hantro-vpu, mainline rockchip,rk3568-vpu-dec).

Result:

  • Capture (decoded-frame) ring: 9 distinct dmabuf fds — indices 0..8, each created by a separate VIDIOC_CREATE_BUFS count=1 call (the v4l2-request hwaccel grows the pool one buffer at a time, never calls VIDIOC_REQBUFS with a target count) and exported once via VIDIOC_EXPBUF with O_RDONLY.
  • Format: NV12 single-plane, 1920×1088 (height aligned up from 1080 to the 16-MB tile boundary), sizeimage=3,655,712 B, bytesperline=1920.
  • Output (bitstream) ring: 4 buffers of V4L2_PIX_FMT_H264_SLICE ('S264'), 1920×1088 with sizeimage=3,133,440 B. Not relevant to GL import cost — included for completeness.
  • Reuse density in the 60-frame headless trace: 462 DQBUF events across the 9 capture fds, i.e. each fd dequeued ≈51 times in 60 decoded frames. This is steady-state frame churn, not capture-buffer

rotation — the hantro-VPU DPB just holds the frames longer than mpv's consumer would.

Implication for the campaign. A 60 s / 1440-frame run cycles each of the 9 dmabuf fds ≈160 times. The current gpu-next path treats every DQBUF as a new fd-into-EGLImage import; a cache keyed on fd identity collapses 1440 imports into 9. This is the upper bound on what fd-identity caching can save on this clip — useful as the Phase 3 prediction sanity-check.

4. mpv + libplacebo + which hwdec gpu-next loads

  • mpv 1:0.41.0-3, built 2026-02-14.
    • libplacebo 7.360.1-1.
    • ffmpeg n8.0.1 (built) / n8.1 (runtime). libavcodec 62.11.100 / 62.28.100. The runtime drift is from the marfrit ffmpeg-v4l2-request-git install rebased onto 8.1.
  • mpv –hwdec=help lists the v4l2request family for h264, hevc, mpeg2video, vp8, vp9, av1; same set under v4l2request-copy (CPU readback variant). No standalone drmprime entry — that's an mpv-internal label for the –vo=gpu-next path that consumes AV_PIX_FMT_DRM_PRIME frames from any hwaccel, not a –hwdec= selectable.
  • Active configuration for the campaign baseline: mpv –hwdec=v4l2request –vo=gpu-next bbb_1080p30_h264.mp4. Path: hantro-vpu produces NV12 dmabuf → ffmpeg wraps as AVDRMFrameDescriptor → mpv's gpu-next path hands the descriptor to libplacebo → libplacebo calls eglCreateImageKHR(EGL_LINUX_DMA_BUF_EXT) per frame → samples into a GL texture → composites → KWin presents. The Phase 3 baseline (1039 drops / 1440, 138 % CPU) is on this exact invocation.

5. KWin / kwin_wayland version + startup GL spam

  • kwin 1:6.6.4-1, plasma-workspace 6.6.4-1, plasma-desktop 6.6.4-1.
  • kwin_wayland –versionkwin 6.6.4.
  • Live process tree: kwin_wayland –wayland-fd 7 –socket wayland-0 –xwayland-fd 8 –xwayland-fd 9 –xwayland-display :0 … –xwayland,

launched by startplasma-wayland under sddm session.

  • The GL_POINT_SPRITE, GL_ALPHA glTexSubImage2D cascade observed in the 2026-04-30 startup logs is labelled background noise: it's KWin's compatibility-profile feature probe at compositor init, fires once at session start, and is not on the per-frame video critical path. Phase 3 instrumentation must filter or clearly attribute these events so they don't pollute per-frame import attribution.

6. Known failure modes — consolidated

Inherited from the ohm_gl_fix README and the fourier Phase 5 page. Listed here so they don't get re-discovered later.

  • R6 — Per-frame fresh EGLImage allocation (the campaign's primary lever). mpv –vo=gpu-next re-imports each NV12 dmabuf fd into a fresh EGLImage on every decoded frame instead of caching by fd identity. Combined with the §3 result (9-fd capture ring, ~160× reuse over 60 s) this is where the cost is concentrated.
  • R7 — –vo=dmabuf-wayland format-negotiation break. mpv's hwdec → dmabuf-wayland path fails with “hardware format not supported” (yuv420p → drm_prime upload fails). Would otherwise be the zero-copy answer; out of scope for ohm_gl_fix (it's the workaround). Its absence is the reason gpu-next is the realistic per-frame path on mpv and therefore the reason this campaign exists.
  • R-modifiers — external_only handling on panfrost. Some Rockchip NV12 modifiers are advertised as external-image-only (sampleable only via samplerExternalOES, not via regular 2D samplers). gpu-next / libplacebo's GLSL paths assume regular 2D samplers. If panfrost reports external_only=true for the kernel-allocated NV12 modifier on hantro, the import succeeds but composition either miscomposites or silently falls back to a slow path. Verifies in Phase 3 via eglQueryDmaBufModifiersEXT against the actual fd's modifier from AVDRMFrameDescriptor.
  • R-compositor-bound (gotcha inherited from fourier). “Compositor-bound ≠ decode-bound”: at 138 % CPU and 72 % drops on 1080p24 with v4l2request decode, the binding constraint is not the hantro VPU. Sister-path proof: gst v4l2slh264dec → waylandsink (zero-copy dmabuf-direct) lands at 6–7 % CPU / 0 drops on identical hardware. Always re-verify which of {decode, import, composite, scanout} is binding with mpv –vo=null / top -H / perf top before attributing CPU.

Success-transition lock-in

Numbers locking the journey from baseline to goal live in metrics.csv (sibling file). Four rows: phase1_baseline (138 % CPU, 1039/1440 drops, 72 %), phase1_reference and phase1_reference_fs (the gst→waylandsink floor: 6–7 % CPU, 0/1488 drops), and phase1_goal_target (post-warmup drops = 0; warmup = first 10 s with ≤ 10 drops tolerated). Binding cell is phase1_goal_target.drops_post_warmup; the drops cell carries the warmup sanity cap (10) so the path can't trivially satisfy “0 post-warmup” by stretching warmup. Phase 3 must decompose the baseline 1039 drops into warmup vs post-warmup before any Phase 4 prediction is made. Phase 3 and Phase 7 append rows; the role column distinguishes the metric from references. Loopback Phase 3 → Phase 1 (per dev-process) edits the goal_target row's binding cell rather than rewriting prose.

Locked metrics (success-transition)

Machine-readable lock — see metrics.csv in the repo for the editable source.

# ohm_gl_fix metrics — success-transition lock-in
# Phase 1 anchor + reference + goal target. Phase 3/7 add rows as they run.
#
# Schema:
#   phase             phase1_baseline | phase1_reference[_*] | phase1_goal_target
#                     | phase3_* | phase7_*
#   path_label        descriptive playback configuration
#   clip              source media file (sha16: dcf8a7170fbd49bb for bbb_1080p30_h264.mp4)
#   decoder           hantro-vpu | sw
#   vo_sink           mpv VO or GStreamer sink
#   surface_protocol  how decoded frames reach the compositor
#   cpu_pct           total %CPU (top -p style); empty = not the locked metric
#   drops             total dropped frames over window_s (warmup + steady-state)
#   frames_total      total frames considered (delivered + dropped) over window_s
#   drop_pct          100 * drops / frames_total
#   window_s          full measurement window in seconds
#   warmup_s          duration of pipeline warmup at window start; drops inside
#                     this sub-window are tolerated and are NOT a goal failure
#   drops_post_warmup dropped frames in the (warmup_s, window_s] sub-window;
#                     THIS is the binding metric for phase1_goal_target
#   effective_fps     delivered frames per second
#   role              metric=THE success criterion; reference=floor/control;
#                     context=informational
#   source            where this row's number came from
#   date              ISO date the number was taken
#
# Phase 1 prose goal (refined 2026-04-30): "0 drops after pipeline warmup —
# warmup = first 10 s; ~10 dropped frames during warmup is acceptable".
# Binding cell: phase1_goal_target.drops_post_warmup == 0.
# Sanity guard: phase1_goal_target.drops <= 10 (caps `drops` so the path
# can't satisfy `drops_post_warmup == 0` by extending the warmup forever).
# Phase 3 baseline must split phase1_baseline.drops into warmup vs post-warmup
# (the 1039 figure is total over 60 s — not yet decomposed).
phase,path_label,clip,decoder,vo_sink,surface_protocol,cpu_pct,drops,frames_total,drop_pct,window_s,warmup_s,drops_post_warmup,effective_fps,role,source,date
phase1_baseline,mpv_gpu_next_v4l2request,bbb_1080p30_h264.mp4,hantro-vpu,gpu-next,EGLImage_per_frame,138,1039,1440,72.0,60,10,,8.5,metric,ohm_gl_fix:README_L19,2026-04-30
phase1_reference,gst_v4l2slh264dec_waylandsink,bbb_1080p30_h264.mp4,hantro-vpu,waylandsink,linux-dmabuf-v1_direct,7,0,1488,0.0,62,10,0,24.0,reference,fourier:README_L189,2026-04-24
phase1_reference_fs,gst_v4l2slh264dec_waylandsink_fullscreen,bbb_1080p30_h264.mp4,hantro-vpu,waylandsink_fs,linux-dmabuf-v1_direct+VOP2_scale,6,0,1488,0.0,62,10,0,24.0,reference,fourier:README_L190,2026-04-24
phase1_goal_target,mpv_gpu_next_v4l2request_cached,bbb_1080p30_h264.mp4,hantro-vpu,gpu-next,EGLImage_cached_by_fd,,10,1440,0.69,60,10,0,24.0,metric,ohm_gl_fix:phase1_2026-04-30,2026-04-30

What's deliberately not in this page

  • Plan / approach. That's Phase 4.
  • Baseline numbers beyond Phase 1's 1039/1440 reference. Phase 3 does the new measurements, including the modifier list and the fd-identity reuse count.
  • Patches, diffs, code reads of panfrost / libplacebo / mpv. Those enter at Phase 4 once we know which import call to replace.

References used in this enumeration

  • ~/src/ohm_gl_fix/README.md
  • ~/src/fourier/README.md — baseline table at L173–234, gotchas at L357–384.
  • ~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md
  • DokuWiki: phase1_2026-04-30, ohm_gl_fix, phase5_2026-04-30 (latter currently access-controlled in the wiki — local mirror in ~/src/fourier/README.md).
ohm_gl_fix/phase2_2026-04-30.1777572958.txt.gz · Last modified: by markus_fritsche