User Tools

Site Tools


ohm_gl_fix:phase3_revised_2026-05-01

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ohm_gl_fix:phase3_revised_2026-05-01 [2026/05/01 09:27] – Phase 3 revised — empirical bucket-attribution + boundary characterisation markus_fritscheohm_gl_fix:phase3_revised_2026-05-01 [2026/05/01 13:08] (current) – rewrap paragraphs (DokuWiki single-newline fix) markus_fritsche
Line 1: Line 1:
 ====== ohm_gl_fix — Phase 3 (revised), 2026-05-01 ====== ====== ohm_gl_fix — Phase 3 (revised), 2026-05-01 ======
  
-This page replaces the original Phase 3 narrative after a methodological +This page replaces the original Phase 3 narrative after a methodological correction by Markus on 2026-04-30 ("did you actually trace, or take cheap stdout statistics?"). The original Phase 3 sat on mpv ''--term-status-msg'' counters and ''/usr/bin/time -v'' totals. The revised Phase 3 — captured 2026-05-01 — is grounded in ''perf record --call-graph=dwarf'', ''perf stat -e cache-misses,LLC-load-misses'', and ''strace -e trace=ioctl,mmap,munmap,sendmsg,recvmsg'' across six contender playback paths.
-correction by Markus on 2026-04-30 ("did you actually trace, or take cheap +
-stdout statistics?"). The original Phase 3 sat on mpv ''--term-status-msg'' +
-counters and ''/usr/bin/time -v'' totals. The revised Phase 3 — captured +
-2026-05-01 — is grounded in ''perf record --call-graph=dwarf'', +
-''perf stat -e cache-misses,LLC-load-misses'', and +
-''strace -e trace=ioctl,mmap,munmap,sendmsg,recvmsg'' across six contender +
-playback paths.+
  
-Validity criterion (Markus 2026-05-01): a measurement is valid only if it +Validity criterion (Markus 2026-05-01): a measurement is valid only if it identifies, **at each handoff boundary**, whether a //dmabuf fd was passed// or a //new anonymous mapping appeared//. Stdout playback logs are not measurements. Five of six scenarios pass the criterion; the sixth (Brave) has perf record + DSO/symbol attribution but no v2 strace — see §6.
-identifies, **at each handoff boundary**, whether a //dmabuf fd was passed// +
-or a //new anonymous mapping appeared//. Stdout playback logs are not +
-measurements. Five of six scenarios pass the criterion; the sixth (Brave) +
-has perf record + DSO/symbol attribution but no v2 strace — see §6.+
  
-Source clip across all scenarios: ''bbb_1080p30_h264.mp4'' (1920×1080 +Source clip across all scenarios: ''bbb_1080p30_h264.mp4'' (1920×1080 H.264 Main, 24 fps, sha-16 ''dcf8a7170fbd49bb''). Hardware: //ohm// (PineTab2, RK3566, Mali-G52 MP2, hantro VPU, kernel ''6.19.10-danctnix1-1-pinetab2'', mesa 26.0.5, mpv 0.41.0, libplacebo 7.360.1, ffmpeg n8.1 runtime, KWin 6.6.4, Plasma 6.6.4). Compositor: KWin Wayland, ozone-platform=wayland for Brave.
-H.264 Main, 24 fps, sha-16 ''dcf8a7170fbd49bb''). Hardware: //ohm// +
-(PineTab2, RK3566, Mali-G52 MP2, hantro VPU, kernel +
-''6.19.10-danctnix1-1-pinetab2'', mesa 26.0.5, mpv 0.41.0, +
-libplacebo 7.360.1, ffmpeg n8.1 runtime, KWin 6.6.4, Plasma 6.6.4). +
-Compositor: KWin Wayland, ozone-platform=wayland for Brave.+
  
 ===== 1. Contenders ===== ===== 1. Contenders =====
  
-  - **S1 gst-launch v4l2slh264dec → waylandsink** — the fourier reference +  - **S1 gst-launch v4l2slh264dec → waylandsink** — the fourier reference path. HW decode via GStreamer's ''v4l2codecs'' plugin, present via ''linux-dmabuf-v1'' Wayland protocol. 
-    path. HW decode via GStreamer's ''v4l2codecs'' plugin, present via +  - **S2 mpv 0.41 ''--hwdec=v4l2request --vo=gpu-next''** — libavcodec + libplacebo. Falls to SW decode because mpv's drmprime-overlay loader fails on Wayland (see §6 finding 5 of original phase3/findings.md). 
-    ''linux-dmabuf-v1'' Wayland protocol. +  - **S3 ffplay default SW** — libavcodec + SDL3 vout. ''-hwaccel v4l2request'' refuses to play because ffplay's required Vulkan renderer fails to initialise on Mali-G52 (panvk default-off gate). 
-  - **S2 mpv 0.41 ''--hwdec=v4l2request --vo=gpu-next''** — libavcodec + +  - **S4 VLC 3.0.22 ''--intf=qt''** — bundled libavcodec 58.x (ffmpeg 4.4 packaged at ''/usr/lib/ffmpeg4.4/'') + Qt vout. The bundled libavcodec predates the v4l2request hwaccel landing. 
-    libplacebo. Falls to SW decode because mpv's drmprime-overlay loader +  - **S5 gst-play-1.0 (GstPlayBin3)** — GStreamer auto-pipeline: ''v4l2slh264dec0'' → ''glimagesink''. HW decode + GL composite. 
-    fails on Wayland (see §6 finding 5 of original phase3/findings.md). +  - **S6 Brave (Chromium)** with autoplay file-URL — VAAPI initialisation fails (''vaInitialize failed: unknown libva error''); falls to Chromium's static-linked SW decode in renderer process; renderer→GPU IPC; GPU process composes via Mesa.
-  - **S3 ffplay default SW** — libavcodec + SDL3 vout. ''-hwaccel +
-    v4l2request'' refuses to play because ffplay's required Vulkan +
-    renderer fails to initialise on Mali-G52 (panvk default-off gate). +
-  - **S4 VLC 3.0.22 ''--intf=qt''** — bundled libavcodec 58.x (ffmpeg +
-    4.4 packaged at ''/usr/lib/ffmpeg4.4/'') + Qt vout. The bundled +
-    libavcodec predates the v4l2request hwaccel landing. +
-  - **S5 gst-play-1.0 (GstPlayBin3)** — GStreamer auto-pipeline: +
-    ''v4l2slh264dec0'' → ''glimagesink''. HW decode + GL composite. +
-  - **S6 Brave (Chromium)** with autoplay file-URL — VAAPI initialisation +
-    fails (''vaInitialize failed: unknown libva error''); falls to +
-    Chromium's static-linked SW decode in renderer process; renderer→GPU +
-    IPC; GPU process composes via Mesa.+
  
 ===== 2. Bucket attribution (load stream / decode / display handoff) ===== ===== 2. Bucket attribution (load stream / decode / display handoff) =====
  
-DSO-level CPU attribution from ''perf record --call-graph=dwarf'' over +DSO-level CPU attribution from ''perf record --call-graph=dwarf'' over 15 s steady-state per scenario.
-15 s steady-state per scenario.+
  
 ^ Scenario          ^ Load stream ^ Decode (libavcodec or static)         ^ Display handoff (memcpy + GL/Mesa)                ^ Other ^ ^ Scenario          ^ Load stream ^ Decode (libavcodec or static)         ^ Display handoff (memcpy + GL/Mesa)                ^ Other ^
Line 58: Line 29:
 | S6 Brave GPU      | — | — | **17.26%** libc (12.92% memcpy on renderer→GPU IPC + GL upload), 9.25% libgallium, 3.45% libGLESv2, 0.45% [panfrost] | 44.21% brave (Chromium GPU code), kernel 23.41% | | S6 Brave GPU      | — | — | **17.26%** libc (12.92% memcpy on renderer→GPU IPC + GL upload), 9.25% libgallium, 3.45% libGLESv2, 0.45% [panfrost] | 44.21% brave (Chromium GPU code), kernel 23.41% |
  
-Headline: load-stream is ≤0.1% on every scenario; decode is the cost +Headline: load-stream is ≤0.1% on every scenario; decode is the cost sink for SW paths (70-87% libavcodec); display handoff is 12-17% memcpy on every libavcodec-using path AND on Brave's GPU process. The two HW-decode paths (S1, S5) split: S1 has near-zero memcpy, S5 has minimal memcpy (1.23%) but significant libgallium/GL work.
-sink for SW paths (70-87% libavcodec); display handoff is 12-17% memcpy +
-on every libavcodec-using path AND on Brave's GPU process. The two +
-HW-decode paths (S1, S5) split: S1 has near-zero memcpy, S5 has minimal +
-memcpy (1.23%) but significant libgallium/GL work.+
  
 ===== 3. Boundary characterisation (validity-passing) ===== ===== 3. Boundary characterisation (validity-passing) =====
  
-''strace -f -e trace=ioctl,mmap,munmap,sendmsg,recvmsg'' over the +''strace -f -e trace=ioctl,mmap,munmap,sendmsg,recvmsg'' over the ~25 s playback lifetime, post-processed for boundary signals.
-~25 s playback lifetime, post-processed for boundary signals.+
  
 ==== 3.1 Decode → presentation buffer boundary ==== ==== 3.1 Decode → presentation buffer boundary ====
Line 79: Line 45:
 | S6 Brave (renderer + gpu) | 0 (no V4L2 path) | 0 | not captured (v2 strace not run) | — | inferred: shmem-IPC from renderer to GPU (no V4L2 dmabuf at the decode boundary; libva failed earlier) | | S6 Brave (renderer + gpu) | 0 (no V4L2 path) | 0 | not captured (v2 strace not run) | — | inferred: shmem-IPC from renderer to GPU (no V4L2 dmabuf at the decode boundary; libva failed earlier) |
  
-The 13 EXPBUFs in S1/S5 correspond to 9 V4L2 capture buffers (NV12 +The 13 EXPBUFs in S1/S5 correspond to 9 V4L2 capture buffers (NV12 1920×1088 single-plane, ''sizeimage = 3 655 712'') plus 4 bitstream-input buffers — matches the Phase 2 §3 substrate finding.
-1920×1088 single-plane, ''sizeimage = 3 655 712'') plus 4 bitstream-input +
-buffers — matches the Phase 2 §3 substrate finding.+
  
 ==== 3.2 Presentation buffer → compositor boundary ==== ==== 3.2 Presentation buffer → compositor boundary ====
Line 103: Line 67:
 ===== 4. Memory subsystem pressure (perf stat) ===== ===== 4. Memory subsystem pressure (perf stat) =====
  
-''perf stat -e cache-misses,LLC-load-misses,cycles,instructions -p $PID +''perf stat -e cache-misses,LLC-load-misses,cycles,instructions -p $PID sleep 10'' (no strace overhead).
-sleep 10'' (no strace overhead).+
  
 ^ Scenario          ^ cache-misses ^ LLC-load-misses ^ cycles    ^ instructions ^ IPC  ^ ^ Scenario          ^ cache-misses ^ LLC-load-misses ^ cycles    ^ instructions ^ IPC  ^
Line 122: Line 85:
   * S5 gst-play: 5×   * S5 gst-play: 5×
  
-VLC's 4-core saturation also shows in cycles (62.6 G in 10 s ≈ 6.26 GHz +VLC's 4-core saturation also shows in cycles (62.6 G in 10 s ≈ 6.26 GHz aggregate, near 100% across all 4 cores at 1.4 GHz per core) and IPC of 0.09 (severely cache-bound). Memcpy of 1080p NV12 frames at 24 fps = ~72 MB/sec memory traffic, exactly the workload generating LLC-class miss pressure.
-aggregate, near 100% across all 4 cores at 1.4 GHz per core) and IPC of +
-0.09 (severely cache-bound). Memcpy of 1080p NV12 frames at 24 fps = +
-~72 MB/sec memory traffic, exactly the workload generating LLC-class +
-miss pressure.+
  
 ===== 5. Kernel-side path attribution ===== ===== 5. Kernel-side path attribution =====
  
-''perf report --dsos=[kernel.kallsyms]'' on the existing perf data +''perf report --dsos=[kernel.kallsyms]'' on the existing perf data (kernel symbols resolve via ''/proc/kallsyms'').
-(kernel symbols resolve via ''/proc/kallsyms'').+
  
-  * **No unfavorable paths detected** across any scenario. Specifically: +  * **No unfavorable paths detected** across any scenario. Specifically: no ''rk_iommu_irq'' / ''rockchip_iommu_*'' (no iommu fault traffic), no ''panfrost_*'' software-fallback path symbols, no excessive page-table churn beyond what dmabuf allocation-and-free implies. 
-    no ''rk_iommu_irq'' / ''rockchip_iommu_*'' (no iommu fault traffic), +  * S1, S5 (HW decode + dmabuf): kernel time is V4L2 ioctl serving + DRM ioctl serving + ''__arch_copy_to_user'' for ioctl returns + ''_raw_spin_unlock_irqrestore'' for HW decoder completion interrupts + ''dma-buf'' allocation/destruction. **All "doing the work" symbols.** 
-    no ''panfrost_*'' software-fallback path symbols, no excessive +  * S2, S3, S4 (SW decode): kernel time is just scheduler + page-table fixups + softirqs. Light kernel work as expected. 
-    page-table churn beyond what dmabuf allocation-and-free implies. +  * **S6 Brave GPU process (23.41% kernel)**: notable share in ''kmem_cache_alloc_noprof 0.22%'' + ''vma_interval_tree_insert 0.26%'' + ''objects_lookup 0.29%''. Together suggests **per-frame DRM object allocation** rather than buffer reuse on Chromium's GPU side. That's a Chromium-side decision and adds visible kernel cost.
-  * S1, S5 (HW decode + dmabuf): kernel time is V4L2 ioctl serving + +
-    DRM ioctl serving + ''__arch_copy_to_user'' for ioctl returns + +
-    ''_raw_spin_unlock_irqrestore'' for HW decoder completion interrupts + +
-    ''dma-buf'' allocation/destruction. **All "doing the work" symbols.** +
-  * S2, S3, S4 (SW decode): kernel time is just scheduler + +
-    page-table fixups + softirqs. Light kernel work as expected. +
-  * **S6 Brave GPU process (23.41% kernel)**: notable share in +
-    ''kmem_cache_alloc_noprof 0.22%'' + ''vma_interval_tree_insert +
-    0.26%'' + ''objects_lookup 0.29%''. Together suggests **per-frame +
-    DRM object allocation** rather than buffer reuse on Chromium's GPU +
-    side. That's a Chromium-side decision and adds visible kernel cost.+
  
 ===== 6. Two-level zero-copy structure ===== ===== 6. Two-level zero-copy structure =====
  
-The validity-passing data exposes a structural distinction the +The validity-passing data exposes a structural distinction the Phase 3 (original) narrative missed:
-Phase 3 (original) narrative missed:+
  
 ==== Level 1 — decode → presentation buffer ==== ==== Level 1 — decode → presentation buffer ====
  
-The decoder either produces a //dmabuf fd// (visible as +The decoder either produces a //dmabuf fd// (visible as ''VIDIOC_EXPBUF'' followed by the fd flowing into the consumer) or it produces frames into //CPU-side anonymous memory// (visible as ''mmap(MAP_ANONYMOUS, ≥3 MB, ...)''). S1 and S5 do the former (**13 EXPBUFs** each). S2, S3, S4 do the latter (**1.3-2.5 GB total anon allocation**, of which the steady-state per-frame share is amortised across libavcodec's reused frame-buffer pool).
-''VIDIOC_EXPBUF'' followed by the fd flowing into the consumer) +
-or it produces frames into //CPU-side anonymous memory// (visible +
-as ''mmap(MAP_ANONYMOUS, ≥3 MB, ...)''). S1 and S5 do the former +
-(**13 EXPBUFs** each). S2, S3, S4 do the latter (**1.3-2.5 GB total +
-anon allocation**, of which the steady-state per-frame share is +
-amortised across libavcodec's reused frame-buffer pool).+
  
 ==== Level 2 — presentation buffer → compositor ==== ==== Level 2 — presentation buffer → compositor ====
  
-Once the presentation buffer exists, the consumer either passes a +Once the presentation buffer exists, the consumer either passes a //dmabuf fd to the compositor via the Wayland ''linux-dmabuf-v1'' protocol// (visible as ''sendmsg + SCM_RIGHTS'' on the Wayland socket, **0 DRM_IOCTL_***) or it walks the buffer through //Mesa GL+DRM// to build a GL texture and present that (visible as **thousands of DRM_IOCTL_***/sec).
-//dmabuf fd to the compositor via the Wayland ''linux-dmabuf-v1'' +
-protocol// (visible as ''sendmsg + SCM_RIGHTS'' on the Wayland socket, +
-**0 DRM_IOCTL_***) or it walks the buffer through //Mesa GL+DRM// to +
-build a GL texture and present that (visible as **thousands of +
-DRM_IOCTL_***/sec).+
  
-S1 is the only scenario reaching Level 2. Every other libavcodec or +S1 is the only scenario reaching Level 2. Every other libavcodec or libva-using path stops at Level 1 (best case, S5) or fails Level 1 entirely (S2, S3, S4, S6).
-libva-using path stops at Level 1 (best case, S5) or fails Level 1 +
-entirely (S2, S3, S4, S6).+
  
 ==== Why this matters ==== ==== Why this matters ====
  
-Same decode (HW), same source clip, same compositor — but +Same decode (HW), same source clip, same compositor — but **S1 vs. S5 differ by**:
-**S1 vs. S5 differ by**:+
  
   * **3 800× fewer cache-misses** (2.1 M vs. 14.3 M)   * **3 800× fewer cache-misses** (2.1 M vs. 14.3 M)
Line 187: Line 119:
   * **5× lower CPU footprint** (7% vs. 38% paced)   * **5× lower CPU footprint** (7% vs. 38% paced)
  
-Going through Mesa's GL+DRM path for compositing **alone** costs ~30% +Going through Mesa's GL+DRM path for compositing **alone** costs ~30% of CPU on this hardware class compared with going through Wayland's ''linux-dmabuf-v1'' protocol directly. That gap exists even when Level 1 is solved. The "buffer-to-display without CPU copy" predicament Markus has been naming is //specifically about Level 2//.
-of CPU on this hardware class compared with going through Wayland's +
-''linux-dmabuf-v1'' protocol directly. That gap exists even when +
-Level 1 is solved. The "buffer-to-display without CPU copy" predicament +
-Markus has been naming is //specifically about Level 2//.+
  
 ===== 7. What this implies for the fix surface ===== ===== 7. What this implies for the fix surface =====
Line 197: Line 125:
 Re-stating the Phase 4 fix surfaces, ranked against this evidence: Re-stating the Phase 4 fix surfaces, ranked against this evidence:
  
-  - **A. Complete libva-v4l2-request multiplanar port** — lifts S6 +  - **A. Complete libva-v4l2-request multiplanar port** — lifts S6 (Brave) Level 1 only. Browser still composes via Chromium's GPU-process Mesa GL path; Level 2 stays. 
-    (Brave) Level 1 only. Browser still composes via Chromium's +  - **B. ''libavcodec drm_prime'' export to ''linux-dmabuf-v1''** — would lift Level 1 //and// Level 2 for libavcodec consumers (mpv, ffplay, VLC if linked against current libavcodec). Highest leverage on the libavcodec ecosystem. 
-    GPU-process Mesa GL path; Level 2 stays. +  - **C2. ''panvk-1.2-fakeshim'' Vulkan layer** — unblocks Vulkan-side consumers for Level 2 //if they use Vulkan-direct dmabuf-import 
-  - **B. ''libavcodec drm_prime'' export to ''linux-dmabuf-v1''** — +    + swapchain present// instead of GL. Doesn't help GL-anchored consumers (libplacebo's GL backend, SDL3 GL, Qt GL).
-    would lift Level 1 //and// Level 2 for libavcodec consumers +
-    (mpv, ffplay, VLC if linked against current libavcodec). Highest +
-    leverage on the libavcodec ecosystem. +
-  - **C2. ''panvk-1.2-fakeshim'' Vulkan layer** — unblocks Vulkan-side +
-    consumers for Level 2 //if they use Vulkan-direct dmabuf-import +
-    + swapchain present// instead of GL. Doesn't help GL-anchored +
-    consumers (libplacebo's GL backend, SDL3 GL, Qt GL).+
  
-The empirical rank: //B > A > C2// for Markus's stated use cases (Brave, +The empirical rank: //B > A > C2// for Markus's stated use cases (Brave, VS Code, web browsing). A lifts the highest-traffic individual workload (browser video decode); B lifts the most consumers across the libavcodec ecosystem with a single change at the right layer. C2 has a narrower lift but is the smallest engineering footprint and would serve as a feasibility vehicle.
-VS Code, web browsing). A lifts the highest-traffic individual workload +
-(browser video decode); B lifts the most consumers across the +
-libavcodec ecosystem with a single change at the right layer. C2 has a +
-narrower lift but is the smallest engineering footprint and would +
-serve as a feasibility vehicle.+
  
 ===== 8. Brave-specific gap acknowledgement ===== ===== 8. Brave-specific gap acknowledgement =====
  
-Five of six scenarios have validity-passing v2 strace + perf-stat data. +Five of six scenarios have validity-passing v2 strace + perf-stat data. Brave (S6) has only the earlier perf record (DSO/symbol attribution, 35 992 renderer + 9 289 GPU samples) plus the Brave subprocess CPU distribution captured 2026-05-01.
-Brave (S6) has only the earlier perf record (DSO/symbol attribution, +
-35 992 renderer + 9 289 GPU samples) plus the Brave subprocess CPU +
-distribution captured 2026-05-01.+
  
 What we //do// know about Brave: What we //do// know about Brave:
  
-  * Renderer: 71.5% of one core, 91.98% in ''brave'' (Chromium +  * Renderer: 71.5% of one core, 91.98% in ''brave'' (Chromium statically links libavcodec — invisible at DSO level but inside that 92%), 0.86% memcpy in renderer. 
-    statically links libavcodec — invisible at DSO level but inside +  * GPU process: 21.5% of one core, **17.26% libc (of which 12.92% memcpy)**, 9.25% libgallium, 23.41% kernel (DRM/dma-buf object-table churn). 
-    that 92%), 0.86% memcpy in renderer. +  * Per-frame DRM object allocation pattern in the GPU process (kernel-side ''kmem_cache_alloc_noprof'', ''objects_lookup'', ''vma_interval_tree_insert''). 
-  * GPU process: 21.5% of one core, **17.26% libc (of which 12.92% +  * VAAPI initialisation //fails// (''vaInitialize failed: unknown libva error''), confirming fourier's S4 finding — libva-v4l2-request is the chokepoint for browser HW decode.
-    memcpy)**, 9.25% libgallium, 23.41% kernel (DRM/dma-buf +
-    object-table churn). +
-  * Per-frame DRM object allocation pattern in the GPU process +
-    (kernel-side ''kmem_cache_alloc_noprof'', ''objects_lookup'', +
-    ''vma_interval_tree_insert''). +
-  * VAAPI initialisation //fails// (''vaInitialize failed: unknown +
-    libva error''), confirming fourier's S4 finding — libva-v4l2-request +
-    is the chokepoint for browser HW decode.+
  
 What we //don't// know (gap): What we //don't// know (gap):
  
-  * No v2 strace capture: the Brave-specific automation (fresh +  * No v2 strace capture: the Brave-specific automation (fresh isolated profile, autoplay file-URL) didn't reach video-decode steady state within the 12 s settle window in three retry attempts. Manual measurement (Markus opens the video) yielded perf record but not strace-from-start.
-    isolated profile, autoplay file-URL) didn't reach video-decode +
-    steady state within the 12 s settle window in three retry +
-    attempts. Manual measurement (Markus opens the video) yielded +
-    perf record but not strace-from-start.+
  
-The architectural picture for Brave is consistent with what perf shows +The architectural picture for Brave is consistent with what perf shows and with what fourier documented earlier (see fourier README L236-281): no HW decode (libva-v4l2-request multiplanar gap), SW decode in renderer's static ffmpeg, IPC via shared memory to GPU process, GPU process uploads to GL texture and composites. Both Level 1 and Level 2 are CPU-copy.
-and with what fourier documented earlier (see fourier README L236-281): +
-no HW decode (libva-v4l2-request multiplanar gap), SW decode in +
-renderer's static ffmpeg, IPC via shared memory to GPU process, GPU +
-process uploads to GL texture and composites. Both Level 1 and Level 2 +
-are CPU-copy.+
  
 ===== 9. Artefact references ===== ===== 9. Artefact references =====
  
-  * ''phase3/cross_player_perf_2026-04-30/'' — original perf record +  * ''phase3/cross_player_perf_2026-04-30/'' — original perf record DSO/symbol/callgraph for S1-S5 + Brave renderer/gpu (samples-based). 
-    DSO/symbol/callgraph for S1-S5 + Brave renderer/gpu (samples-based). +  * ''phase3/io_cache_2026-05-01/'' — v2 strace traces (full lifetime, widened filter) + perf-stat ''.perfstat'' files for S1-S5. 
-  * ''phase3/io_cache_2026-05-01/'' — v2 strace traces (full lifetime, +  * ''phase3/findings.md'' — the original Phase 3 narrative (Findings 1-6) plus the methodology corrections that led here. 
-    widened filter) + perf-stat ''.perfstat'' files for S1-S5. +  * ''phase3/research_2026-04-30_panvk_brokenness.md'' — the panvk/v7 Vulkan-API-version analysis (''PAN_I_WANT_A_BROKEN_VULKAN_DRIVER'' gate, ''apiVersion = 1.0.335'' wall against libplacebo's ≥1.2 minimum).
-  * ''phase3/findings.md'' — the original Phase 3 narrative (Findings +
-    1-6) plus the methodology corrections that led here. +
-  * ''phase3/research_2026-04-30_panvk_brokenness.md'' — the panvk/v7 +
-    Vulkan-API-version analysis (''PAN_I_WANT_A_BROKEN_VULKAN_DRIVER'' +
-    gate, ''apiVersion = 1.0.335'' wall against libplacebo's ≥1.2 +
-    minimum).+
   * ''phase3/INDEX.md'' — full evidence-file map per finding.   * ''phase3/INDEX.md'' — full evidence-file map per finding.
  
Line 271: Line 161:
 Saved to project memory (''~/.claude/projects/-home-mfritsche-src-ohm-gl-fix/memory/''): Saved to project memory (''~/.claude/projects/-home-mfritsche-src-ohm-gl-fix/memory/''):
  
-  * ''feedback_profile_dont_proxy.md'' — when locating cycles, run +  * ''feedback_profile_dont_proxy.md'' — when locating cycles, run ''perf''/''strace'', don't infer from program-self counters. 
-    ''perf''/''strace'', don't infer from program-self counters. +  * ''feedback_kpi_vs_detail_knowledge.md'' — before producing an artefact, check whether the facts in reach mandate the content. 
-  * ''feedback_kpi_vs_detail_knowledge.md'' — before producing an +  * ''feedback_measurement_archival.md'' — every probe writes to a named file in the campaign repo at run time. 
-    artefact, check whether the facts in reach mandate the content. +  * ''feedback_outscoping.md'' — for "find the gap" goals, the deliverable is the gap, never a workaround. 
-  * ''feedback_measurement_archival.md'' — every probe writes to a +  * ''feedback_pre_think_problem_space.md'' — slow-down requests are for territory mapping, not solution selection. 
-    named file in the campaign repo at run time. +  * ''feedback_ask_before_user_visible.md'' — when automation fails on shared user state, asking the user is cheaper than retrying.
-  * ''feedback_outscoping.md'' — for "find the gap" goals, the +
-    deliverable is the gap, never a workaround. +
-  * ''feedback_pre_think_problem_space.md'' — slow-down requests are +
-    for territory mapping, not solution selection. +
-  * ''feedback_ask_before_user_visible.md'' — when automation fails on +
-    shared user state, asking the user is cheaper than retrying.+
  
 ---- ----
  
-//Phase 3 (revised) ends here. Phase 4 ("the gap" structural +//Phase 3 (revised) ends here. Phase 4 ("the gap" structural documentation, with use-case scoping) is in [[ohm_gl_fix:phase4_2026-04-30]]; it predates this revised data but its fix-surface ranking is reinforced by §7 above.//
-documentation, with use-case scoping) is in +
-[[ohm_gl_fix:phase4_2026-04-30]]; it predates this revised data but its +
-fix-surface ranking is reinforced by §7 above.//+
  
ohm_gl_fix/phase3_revised_2026-05-01.1777627671.txt.gz · Last modified: by markus_fritsche