bin
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| bin [2026/04/18 21:02] – 2026-04-18 evening: tripwire session, register trajectories, 3 candidate bugs markus_fritsche | bin [2026/04/20 12:55] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Bin — Boot it Nicely (GenBook u-boot eDP upstream) ====== | ||
| + | |||
| + | Mainline u-boot on the CoolPi CM5 GenBook (RK3588) with working eDP boot | ||
| + | display and internal keyboard. First-ever RK3588 eDP bring-up in upstream | ||
| + | u-boot (not the Rockchip downstream fork). | ||
| + | |||
| + | **Target hardware:** ampere (CoolPi CM5 GenBook, RK3588 + 32 GB LPDDR5) | ||
| + | **Panel:** CSOT T9 SNE001BS2-2, | ||
| + | **Status 2026-04-17 evening:** v10 mainline u-boot trains link at HBR×2, panel reports IN_SYNC, BIST bars display on the panel — DP TX + eDP panel proven healthy. Pixels from our own framebuffer still absent; fault narrowed to VOP2 pixel-output chain or content-format upstream of DP TX. Vendor '' | ||
| + | |||
| + | **For next session:** [[bin_debug_strategy|Dual-agent debug strategy]] — two independent AI agents drafted strategies with different top-3 bets. Overlap = high-confidence signal, disagreement = where the most learning happens. **Start with the '' | ||
| + | |||
| + | ===== Why ===== | ||
| + | |||
| + | The vendor u-boot binary (" | ||
| + | its boot logo. Upstream u-boot has no RK3588 VOP2/eDP driver path at all. | ||
| + | The goal of project Bin is to produce a clean, upstream-submittable u-boot | ||
| + | that paints something on the eDP panel during boot — without pulling in | ||
| + | the vendor' | ||
| + | |||
| + | Secondary goals: | ||
| + | * Prove the upstream-clean boot chain (mainline u-boot + Collabora TF-A + upstream OP-TEE) can drive eDP. | ||
| + | * Produce a patch series acceptable to u-boot-custodians (no " | ||
| + | |||
| + | ===== What actually works as of 2026-04-16 ===== | ||
| + | |||
| + | * Full **DDR / TPL / SPL / BL31 / OP-TEE** chain, using stock Rockchip blob '' | ||
| + | * **VOP2 probe** (clocks, PMU power domain, GRF) and full register-level init of VP2 + Cluster1 window + overlay mixer. | ||
| + | * **eDP controller probe**: EDID read via AUX, DPCD capabilities read (rev 1.1, 2.7 Gbps HBR, 2 lanes). | ||
| + | * **Link training** succeeds (CR + CE pass), training pattern properly disabled at end (was the critical "no pixels" | ||
| + | * **Backlight on** (PWM6 + GPIO4_A3, via panel-simple). | ||
| + | * **VOP2 state matches kernel live-dump perfectly**: | ||
| + | * **eDP controller state matches kernel** after applying the config_video delta (VIDEO_CTL_2=0x10, | ||
| + | * **vidconsole path live** — proven by instrumenting '' | ||
| + | |||
| + | ===== What's still broken ===== | ||
| + | |||
| + | **No visible pixels on the panel.** Everything upstream of the physical | ||
| + | signal chain is correct. The bug lives in territory our direct-MMIO tools | ||
| + | don't reach: | ||
| + | |||
| + | - **HDPTX1 PHY state** — 0xFED70000 reads back mostly zero via ''/ | ||
| + | - **CRU dclk_vp2 source mux** — for eDP on VP2, dclk_vp2 must be parented on HDPTX PHY's recovered clock, not a CRU PLL. u-boot' | ||
| + | - **VO1_GRF** (0xFD5AC000) has DP mux / HDPTX routing bits; the kernel writes them, we write none. | ||
| + | - '' | ||
| + | |||
| + | ===== The cfg_done dance — biggest recurring gotcha ===== | ||
| + | |||
| + | **If a VOP2 register write reads back zero or doesn' | ||
| + | cfg_done.** This has burned us repeatedly across multiple sessions. Save | ||
| + | yourself the reboot cycles: | ||
| + | |||
| + | VOP2 has shadow (staging) and active (hardware-driving) register banks. | ||
| + | Writes land in shadow; they commit to active only when cfg_done fires. | ||
| + | |||
| + | * '' | ||
| + | * Required write: '' | ||
| + | * Kernel has '' | ||
| + | * Writes to **non-shadow** regs (CTRL1, DSP_ST, clock, mux) commit immediately and read back cleanly. Writes to **shadow** regs need cfg_done. | ||
| + | * Debugging: add readback right after the write; if zero, add cfg_done; readback again. | ||
| + | |||
| + | ===== Critical bugs caught this session (2026-04-16) ===== | ||
| + | |||
| + | Seven fixes, all of which are **prerequisites** for any future display work | ||
| + | — any upstream u-boot VOP2/eDP driver will need these: | ||
| + | |||
| + | - **Explicit per-VP cfg_done write.** CFG_DONE_IMD does NOT latch window shadow registers. Must write '' | ||
| + | - **DP_TRAINING_PATTERN_DISABLE at end of channel equalization.** '' | ||
| + | - **DPCD setup before training.** Missing spec-required writes: ML_CH_CODING_SET=1 (ANSI 8B/10B), DOWNSPREAD_CTRL to match SSC capability, enhanced-frame bit in LANE_COUNT_SET when sink supports it. | ||
| + | - **eDP config_video delta matching kernel.** Our config_video wasn't writing VIDEO_CTL_2=0x10, | ||
| + | - **VP2_DSP_CTRL=0x1000000f** — kernel uses OUT_MODE=AAAA (0xf) + bit 28 set. Briefly tried S888 (0x8), wrong tree; stay with AAAA + bit 28. | ||
| + | - **OVL_CTRL bit 31 for immediate latch.** See cfg_done dance callout. | ||
| + | - **VOP_GRF_CON2 bit layout corrected.** Our original '' | ||
| + | |||
| + | ===== Method: kernel live-dump as oracle ===== | ||
| + | |||
| + | The productive move of this session was switching from reading downstream | ||
| + | vendor u-boot source (which was " | ||
| + | explanations) to **reading live register state from the running kernel** | ||
| + | while SDDM was displaying. That's ground truth — whatever bits are set | ||
| + | when pixels reach the panel, those are the bits you need. | ||
| + | |||
| + | <code bash> | ||
| + | ssh ampere 'sudo python3 <<EOF | ||
| + | import mmap, struct, os | ||
| + | fd = os.open("/ | ||
| + | mm = mmap.mmap(fd, | ||
| + | print(hex(struct.unpack("< | ||
| + | EOF' | ||
| + | </ | ||
| + | |||
| + | Works for VOP2 (0xFDD90000) and the eDP controller (0xFDED0000). Does **not** | ||
| + | work for the HDPTX PHY (0xFED70000, | ||
| + | those show mostly zeros even when active. Next session: kernel module to | ||
| + | dump those properly. | ||
| + | |||
| + | ===== Artifacts ===== | ||
| + | |||
| + | ^ Where ^ What ^ | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | |||
| + | ===== Flash pipeline ===== | ||
| + | |||
| + | Board is alive, so flash from running Linux (fastest): | ||
| + | |||
| + | <code bash> | ||
| + | scp boltzmann: | ||
| + | ssh ampere 'sudo dd if=/ | ||
| + | sudo flashcp --partition / | ||
| + | # Reboot when YOU are ready. | ||
| + | </ | ||
| + | |||
| + | If bricked: maskrom-mode recovery via meitner + rkdeveloptool ('' | ||
| + | → '' | ||
| + | |||
| + | ===== Build recipe (stock fast DDR blob — mandatory) ===== | ||
| + | |||
| + | <code bash> | ||
| + | cd ~/ | ||
| + | BL31=/ | ||
| + | TEE=/ | ||
| + | ROCKCHIP_TPL=/ | ||
| + | CC=" | ||
| + | </ | ||
| + | |||
| + | **Do not** use the decompiled-and-patched '' | ||
| + | — it bricks ampere (DDR training fails all channels/ | ||
| + | stock rkbin blob is known-good. | ||
| + | |||
| + | ===== Next session ===== | ||
| + | |||
| + | - Write a kernel module that dumps HDPTX1 PHY + CRU dclk_vp2 mux + VO1_GRF full state while display works. ''/ | ||
| + | - Hook '' | ||
| + | - Diff kernel vs u-boot for VO1_GRF + CRU dclk_vp2; write our init to match. | ||
| + | - Fallback: HDMI output path (simpler protocol) to prove VOP2 itself works, then come back to eDP. | ||
| + | |||
| + | |||
| + | ===== 2026-04-17 evening — empirical disambiguation ===== | ||
| + | |||
| + | Sessions 1–4 were heuristic register matching against the kernel live-dump. Tonight | ||
| + | was empirical disambiguation: | ||
| + | what it actually sees. | ||
| + | |||
| + | ==== u-boot eDP campaign — v10 ==== | ||
| + | |||
| + | Branch '' | ||
| + | fixes on top of the earlier VOP2/eDP work: | ||
| + | |||
| + | * '' | ||
| + | * Double-beat '' | ||
| + | * '' | ||
| + | * DPCD enhanced-frame + downspread + training-pattern-disable writes completed. | ||
| + | * DPCD '' | ||
| + | * Alpha=0 bug in the u-boot stripe paint — vendor clusters blend incoming pixels | ||
| + | against '' | ||
| + | field regardless of colour. Fixed in v10. | ||
| + | * '' | ||
| + | fixed by pre-selecting '' | ||
| + | clock driver takes the retune path instead of picking the nearest matching divider | ||
| + | on the default ancestor. | ||
| + | * VOP MMU disable hypothesis explored; MMU was already bypassed by reset — commit | ||
| + | reverted, only the diagnostic code remains. | ||
| + | |||
| + | ==== What this unlocked ==== | ||
| + | |||
| + | * Link trains at HBR×2 without errors. | ||
| + | * Panel DPCD '' | ||
| + | * Backlight on. | ||
| + | * **BIST colour bars display** on the panel (PHY-internal pattern generator driving | ||
| + | the main link). DP TX, PHY, cable, panel, backlight are all healthy. | ||
| + | * **Own framebuffer still produces no visible pixels.** Whatever is wrong is | ||
| + | upstream of DP TX — either VOP2 pixel-output chain or the content format | ||
| + | we hand to the eDP controller. | ||
| + | |||
| + | ==== Vendor u-boot detour (closes vendor-knows-how) ==== | ||
| + | |||
| + | To rule out we are missing a vendor-secret step, built vendor '' | ||
| + | source: | ||
| + | |||
| + | * '' | ||
| + | * Source: vendor '' | ||
| + | * Extlinux.conf auto-rewrite hack removed in both '' | ||
| + | '' | ||
| + | the runtime distro ''/ | ||
| + | * Built + flashed '' | ||
| + | HDMI.** Same symptom as our own u-boot. | ||
| + | * Web-research agent surfaced the likely reason: vendor u-boot uses | ||
| + | '' | ||
| + | '' | ||
| + | include that partition. The Rockchip wiki explicitly states logo support is | ||
| + | Android-only: | ||
| + | vendor image does not light eDP on this reference board either. | ||
| + | * Rebuilt on '' | ||
| + | fixes over Jan-2026 '' | ||
| + | FIT version mismatch between the vendor-baseline idblock and the rkr5 u-boot.img. | ||
| + | * Panel SKU fork noted: BOE NV140FHM-N42 / N61 / N66 across hardware revisions. | ||
| + | Vendor u-boot panel timings may not match our specific SKU even with a working | ||
| + | logo path. | ||
| + | |||
| + | **Conclusion: | ||
| + | vendor-knows-how assumption is dead. | ||
| + | |||
| + | ==== Flash protocol confirmed ==== | ||
| + | |||
| + | * '' | ||
| + | userland is reliable across iterations. Default path. | ||
| + | * '' | ||
| + | Multiple consecutive '' | ||
| + | failures. Recovery path validated: '' | ||
| + | → wl 0 < | ||
| + | * SPI layout on RK3588 GenBook (correction from earlier notes): | ||
| + | * idblock at 0x8000 (~208 KB) | ||
| + | * u-boot FIT at 0x60000 | ||
| + | * The earlier 0x200000 value was wrong — that is | ||
| + | '' | ||
| + | |||
| + | ==== Reference images stashed ==== | ||
| + | |||
| + | ^ Where ^ What ^ sha256 (prefix) ^ | ||
| + | | '' | ||
| + | | '' | ||
| + | | '' | ||
| + | |||
| + | ==== Updates to the Next-session list ==== | ||
| + | |||
| + | Supersedes the earlier clk_summary / HDPTX trace checklist on this page — those | ||
| + | checks have been run; see [[bin_debug_strategy|the debug-strategy page]] for the | ||
| + | Closed / Reopened update. | ||
| + | |||
| + | * Maybe extract the separate '' | ||
| + | builds alongside the 492 KB download-boot loader, then rebuild the rkr5 stack | ||
| + | end-to-end so idblock + u-boot FIT versions line up. | ||
| + | * Try '' | ||
| + | '' | ||
| + | that by surprise. | ||
| + | * Alternatively accept that pixels-in-u-boot is not a today problem; the kernel | ||
| + | stack works cleanly on v10 for development use. | ||
| + | |||
| + | ===== 2026-04-18 evening — tripwire session: register trajectories captured ===== | ||
| + | |||
| + | Bench session. Stopped heuristic matching, built an actual instrument: every | ||
| + | < | ||
| + | shared DDR region. Diff the two traces, the bugs fall out. | ||
| + | |||
| + | ==== Infrastructure built ==== | ||
| + | |||
| + | * < | ||
| + | kernel (< | ||
| + | DDR region at phys < | ||
| + | maps it as general memory. | ||
| + | * Every < | ||
| + | * Offline C dumper at < | ||
| + | * Bench runbook at < | ||
| + | |||
| + | ==== Three phases of the evening ==== | ||
| + | |||
| + | - **Phase 1:** < | ||
| + | - **Phase 2:** same u-boot, armed tripwire at runtime, reloaded < | ||
| + | - **Phase 3:** u-boot with full VP2+eDP init AND tripwire armed from first < | ||
| + | |||
| + | ==== Concrete register divergences found ==== | ||
| + | |||
| + | Three bit-level diffs between the u-boot writes and the kernel writes: | ||
| + | |||
| + | ^ Register ^ Our u-boot ^ Kernel ^ Analysis ^ | ||
| + | | < | ||
| + | | < | ||
| + | | < | ||
| + | |||
| + | ==== Confirmations (values that match kernel — NOT bugs) ==== | ||
| + | |||
| + | * < | ||
| + | * VP2 post-config block < | ||
| + | |||
| + | ==== Secondary observation ==== | ||
| + | |||
| + | With Phase 2 u-boot (no VP2 init), the **" | ||
| + | |||
| + | ==== Next steps ==== | ||
| + | |||
| + | Fix the three divergences in u-boot, reflash, observe. If they close the "no pixels" | ||
| + | |||
| + | ==== Artifacts ==== | ||
| + | |||
| + | ^ Where ^ What ^ | ||
| + | | < | ||
| + | | < | ||
| + | | < | ||
| + | |||
| + | ===== 2026-04-19 — Phases 4-20: false trails and Phase 20 verdict ===== | ||
| + | |||
| + | Bench session, four hours of chained hypotheses against a 20-phase reboot | ||
| + | cycle. The memory key takeaway: **observation beats theory in register-level | ||
| + | reverse engineering**. | ||
| + | memory]]. | ||
| + | |||
| + | ==== The bit-31 detour ==== | ||
| + | |||
| + | Phase 4 theorised that < | ||
| + | flag and cleared it. No readback verified the theory; no tripwire recorded | ||
| + | the bit effect. Theory was wrong — per TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL, | ||
| + | < | ||
| + | field (pick which VP cfg_done commits the LAYER_SEL register). Value | ||
| + | < | ||
| + | |||
| + | Result: Cluster + LAYER_SEL writes were silently dropped for phases 5 | ||
| + | through 12, and every subsequent register tweak appeared to fail for | ||
| + | " | ||
| + | bench time. Phase 13 caught the regression only because tripwire captured | ||
| + | the actual readback (< | ||
| + | defaults) alongside the intended writes — the discrepancy was the smoking | ||
| + | gun. | ||
| + | |||
| + | **Rule now in memory**: before committing a register fix, write down the | ||
| + | expected post-fix readback value. If you cannot name one, the hypothesis | ||
| + | is not falsifiable. After the fix, read back. If readback does not match | ||
| + | expected, the fix did not land — do not move on. | ||
| + | |||
| + | ==== Phase 19 to Phase 20: the cluster swap ==== | ||
| + | |||
| + | Phases 13 through 19 refixed OVL_CTRL bit 31 and iterated on Cluster0 | ||
| + | plane writes (CLUSTER0_CTRL at 0x1100 = < | ||
| + | < | ||
| + | registers never latched (readback stayed at reset defaults) no matter | ||
| + | what load-enable or frame-delay we tried. | ||
| + | |||
| + | The wall: **decoding PORT_SEL**. Readback value < | ||
| + | decodes via TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL as: | ||
| + | |||
| + | ^ Bit range ^ Field ^ Value ^ Meaning ^ | ||
| + | | 17:16 | cluster0_sel_port | 00 | VP0 (no active output) | | ||
| + | | 19:18 | cluster1_sel_port | 10 | VP2 (eDP, active) | | ||
| + | |||
| + | Cluster0 shadow commits on VP0 vsync which never fires — hence writes | ||
| + | never land. Cluster1 shadow commits on VP2 vsync, which IS firing. The | ||
| + | kernel uses Cluster1 for the VP2 primary plane; we had been writing | ||
| + | Cluster0 for phases 12 through 19. | ||
| + | |||
| + | Phase 20 reverts: base < | ||
| + | < | ||
| + | |||
| + | ==== Phase 20 verdict — writes finally land ==== | ||
| + | |||
| + | UART state dump after Phase 20 trace-replay init: | ||
| + | |||
| + | < | ||
| + | CLUSTER1 CTRL0 = 0x00000001 | ||
| + | CLUSTER1 YRGB = 0xef700000 | ||
| + | CLUSTER1 VIR = 0x00000780 | ||
| + | CLUSTER1 ACT = 0x0437077f | ||
| + | CLUSTER1_CTRL(0x1300) = 0x80004001 | ||
| + | </ | ||
| + | |||
| + | Every plane register holds our value on readback — plane is committed, | ||
| + | VP2 is running. Prior phases had all these same offsets reading reset | ||
| + | defaults. | ||
| + | |||
| + | But panel stays dark (see webcam frames below). Kernel dmesg shows | ||
| + | < | ||
| + | (486k callbacks suppressed per 5 seconds). The kernel uses Cluster1 too, | ||
| + | so this is NOT a u-boot-only bug: the VP2 post-scaler is starved on a | ||
| + | path shared with kernel-side DRM. | ||
| + | |||
| + | ===== The real bug: runtime PM underflow at handover ===== | ||
| + | |||
| + | < | ||
| + | |||
| + | < | ||
| + | 23:26:43 platform fdd90000.vop: | ||
| + | 23:26:48 rockchip-vop2 fdd90000.vop: | ||
| + | (x17 in immediate succession) | ||
| + | 23:26:48 rockchip-drm display-subsystem: | ||
| + | irq err at vp2 (then vop2_isr: 486859 callbacks suppressed) | ||
| + | </ | ||
| + | |||
| + | Mechanism: u-boot leaves VP2 powered + clocked, but the kernel PM | ||
| + | framework starts every device at < | ||
| + | When supplier links (IOMMU group 5, power-domain) run their init, they | ||
| + | call < | ||
| + | (was 0, went negative 17 times). Warning is non-fatal. | ||
| + | |||
| + | Later, < | ||
| + | < | ||
| + | is already resumed (counter not zero after the underflow math settles), | ||
| + | < | ||
| + | is where the driver does its full clock re-gate, reset-toggle, | ||
| + | re-program sequence. Modeset proceeds with VP2 in the state u-boot left | ||
| + | it — one or more sub-steps of the resume path skipped — post-scaler | ||
| + | starves and POST_BUF_EMPTY fires at every vblank. | ||
| + | |||
| + | The physical fb probe confirms pixels did reach memory: | ||
| + | |||
| + | * fb_peek.ko kernel module, native-built on boltzmann, memremap of | ||
| + | < | ||
| + | * Rows 0 / 300 / 540 / 800 read **all zeros** — u-boot stripe paint | ||
| + | for the top two thirds was overwritten by kernel takeover (fbcon or | ||
| + | DRM alloc, since we reserve no memory for the fb). | ||
| + | * Row 1070 reads < | ||
| + | (alpha=0xFF, | ||
| + | bottom third. | ||
| + | |||
| + | So the panel IS displaying what u-boot wrote (mostly black with a blue | ||
| + | band at the bottom), latched by the panel internal scanout memory at | ||
| + | the moment VP2 got PM-locked. eDP panels cache their last received | ||
| + | frame; that is what the webcam sees. | ||
| + | |||
| + | DRM state at quiescence: < | ||
| + | shows < | ||
| + | with an Xorg-allocated XR24 framebuffer (fb=90, 1920x1080), yet | ||
| + | < | ||
| + | < | ||
| + | POST_BUF_EMPTY cascade. | ||
| + | |||
| + | ==== The three paths forward ==== | ||
| + | |||
| + | * **(A) Tear-down path** — before < | ||
| + | returns, walk VP2 all the way back down: STANDBY bit, CRU re-gate, | ||
| + | PMU bus-idle, PMU power-off, release u-boot PD. Kernel probes a | ||
| + | clean SUSPENDED device, resume callback runs normally. | ||
| + | likely will NOT persist visually (eDP blanks within a frame of | ||
| + | signal loss, unless the panel supports PSR) but handover is clean. | ||
| + | * **(B) State-match path** — leave VP2 running in exactly the state | ||
| + | kernel early-probe expects. Map every register the kernel reads at | ||
| + | probe, diff vs what u-boot leaves, fix discrepancies. Probably a | ||
| + | dead end. | ||
| + | * **(C) NOINIT + text splash** — stay on | ||
| + | < | ||
| + | splash, kernel does all VP2 setup cleanly. Lowest risk, least | ||
| + | impressive outcome. | ||
| + | |||
| + | Phase 21 pursues (A). Kconfig gate: < | ||
| + | lives at < | ||
| + | < | ||
| + | sequence in reverse. Verdict metric: < | ||
| + | | grep underflow</ | ||
| + | |||
| + | ===== Webcam setup — visual verification rig ===== | ||
| + | |||
| + | Eyedot USB camera on meitner, pointed at the ampere panel. Framing | ||
| + | convention: | ||
| + | |||
| + | * **One third desk, two thirds screen** — camera tilted so the bottom | ||
| + | third of the frame shows keyboard / desk surface (reference for | ||
| + | " | ||
| + | * Default capture script: < | ||
| + | / | ||
| + | 1 fps into H.264, auto-stops 5 seconds after UART shows | ||
| + | < | ||
| + | * ffmpeg signalstats is unreliable on JPEGs from this camera; use a | ||
| + | Python one-liner on raw pixel data for luma histograms. | ||
| + | * Dim content on a dark panel is hard to see raw. Standard enhance | ||
| + | pipeline: < | ||
| + | eq=brightness=0.1: | ||
| + | < | ||
| + | focus on panel content. | ||
| + | * Trap: camera auto-exposure skews dark and bright regions. Luma | ||
| + | average over the whole frame is dominated by the bright environment; | ||
| + | always crop to panel before averaging. | ||
| + | * Another trap: a stationary "dark blob" in the top-middle of frames | ||
| + | is the camera head shadow on the panel, not displayed content. If it | ||
| + | shows across multiple frames in the same position, it is not pixels. | ||
| + | |||
| + | Artifacts: frames at < | ||
| + | videos at < | ||
| + | yielded 303 frames over around 5 minutes (reboot to sddm). Phase 20 | ||
| + | panel-view crops show dim navy blue bottom two thirds, lighter top | ||
| + | third — consistent with "top 2/3 black (zeros) + bottom 1/3 blue | ||
| + | (0xFF0000FF)" | ||
| + | |||
| + | ===== Related memory files ===== | ||
| + | |||
| + | * < | ||
| + | findings for 2026-04-18 tripwire plus 2026-04-19 Phase 20 plus | ||
| + | Phase 21 direction | ||
| + | * < | ||
| + | detour memorialised | ||
| + | * < | ||
| + | during visual test memorialised | ||
| + | * < | ||
| + | 42C3 talk proposal | ||
| + | |||
| + | ===== 2026-04-20 — Phase 21 failed, trace-diff, VP0 theory reinstated ===== | ||
| + | |||
| + | **Phase 21 (A) failed on both axes**: the BIN_VP2_TEARDOWN code ran | ||
| + | correctly (UART: < | ||
| + | PD_VOP and PD_VO1 gated off), but the kernel still got exactly 17 | ||
| + | < | ||
| + | with init SIGSEGV because < | ||
| + | VP2 registers on a powered-off domain and AXI hangs. | ||
| + | a boot loop; recovered via maskrom + < | ||
| + | → cs 9 → wl 0 phase20.bin → rd</ | ||
| + | |||
| + | Two implications: | ||
| + | |||
| + | * The 17 underflows are **not** caused by u-boot leaving VP2 active. | ||
| + | They fire regardless of PMU state. | ||
| + | target. | ||
| + | * You cannot power-gate VP2 before kernel boot. Kernel expects | ||
| + | register readability at probe. | ||
| + | or never turn it on (Phase 1 NOINIT). | ||
| + | |||
| + | ==== Trace diff on < | ||
| + | |||
| + | 4.3 M-record capture from 2026-04-18. | ||
| + | (< | ||
| + | aggressive dedup. | ||
| + | |||
| + | ^ Metric ^ u-boot ^ kernel ^ | ||
| + | | unique offsets touched | 54 | 1118 | | ||
| + | | total accesses | 87 | 24 614 | | ||
| + | |||
| + | Kernel-only offsets (we never write): 1072. Most are per-vblank | ||
| + | IRQ status/ | ||
| + | < | ||
| + | expected maintenance traffic, not missing init. | ||
| + | |||
| + | The **one-shot divergences that actually matter**, ranked by likely | ||
| + | impact on post-scaler starvation: | ||
| + | |||
| + | - < | ||
| + | < | ||
| + | and bit 22 < | ||
| + | 1024-entry LUT at < | ||
| + | enable bits nor the LUT. | ||
| + | - < | ||
| + | < | ||
| + | - < | ||
| + | kernel < | ||
| + | below. | ||
| + | - < | ||
| + | < | ||
| + | - < | ||
| + | 0x0028 and all of 0x002c=< | ||
| + | |||
| + | Kernel also brings up VP0 fully (VP0_DSP_CTRL at < | ||
| + | VP0 timing at < | ||
| + | < | ||
| + | zero VP0 regs. | ||
| + | |||
| + | ==== VP0-drives-VP2 theory (reinstated) ==== | ||
| + | |||
| + | The cfg_done question has been flip-flopped across Bin sessions. | ||
| + | Pinning it down in memory now at | ||
| + | < | ||
| + | |||
| + | **Rule**: on the GenBook, cfg_done at < | ||
| + | **both VP0 and VP2 together** (value < | ||
| + | drop the VP0 bit on the reasoning that VP0 has no connector. | ||
| + | |||
| + | **Why**: RK3588 VOP2 has a single shared overlay mix crossbar, not | ||
| + | per-VP silos. | ||
| + | PORT0_MUX=3 (VP0 gets layers 0..3), PORT1_MUX=8 (VP1 disabled), | ||
| + | PORT2_MUX=7 (VP2 gets layers 4..7), PORT3_MUX=7. | ||
| + | sit **downstream of VP0 layer-slots 0..3** in the same mix pipeline. | ||
| + | When VP0 cfg_done stays pending (because we only latch VP2), VP0 mix | ||
| + | state is in shadow, never commits, and the mix stalls — VP2 post-scaler | ||
| + | reads empty → POST_BUF_EMPTY at vblank rate → panel dark. | ||
| + | |||
| + | VP0 vsync fires whenever VP0 has dclk running + valid timing, independent | ||
| + | of whether a panel is physically attached. | ||
| + | for exactly this reason: keeps the mix crossbar advancing. | ||
| + | |||
| + | **History of the bug**: | ||
| + | |||
| + | - < | ||
| + | < | ||
| + | needs VP0 committed. | ||
| + | - < | ||
| + | "VP0 has no connector = no vsync." | ||
| + | < | ||
| + | - < | ||
| + | the VP2-only cfg_done. | ||
| + | firing. | ||
| + | - < | ||
| + | < | ||
| + | |||
| + | ==== Proposed Phase 22 ==== | ||
| + | |||
| + | Restore < | ||
| + | |||
| + | * < | ||
| + | * < | ||
| + | * < | ||
| + | * < | ||
| + | * < | ||
| + | * < | ||
| + | * < | ||
| + | |||
| + | Plus top candidates from the value divergences: | ||
| + | < | ||
| + | LUT if needed), fix < | ||
| + | complete < | ||
| + | |||
| + | One change per commit, verify via state-dump readback before moving on. | ||
| + | |||
| + | ===== 2026-04-20 late — campaign closeout ===== | ||
| + | |||
| + | **Bin is closed, partial-win.** | ||
| + | of session. | ||
| + | display on panel (PHY-internal pattern generator), eDP AUX reads panel | ||
| + | EDID correctly — but the VOP2-to-HDPTX-internal-pipeline never delivers | ||
| + | a valid pixel stream. | ||
| + | observed across u-boot variants: | ||
| + | |||
| + | * **POST_BUF_EMPTY storm** (100k-500k IRQ/s) — caused by Bin | ||
| + | u-boot' | ||
| + | Fixed by vanilla u-boot (storm count drops to zero). | ||
| + | why the panel is dark — just excessive IRQ noise. | ||
| + | * **rockchip-vop2 port_mux_done timeout** — kernel' | ||
| + | does not latch with vanilla u-boot. | ||
| + | pre-committing port_mux. | ||
| + | * **runtime PM refcount underflow (9447 in 1 min with vanilla | ||
| + | kernel + Phase 22 u-boot)** — kernel PM vs u-boot PM state drift. | ||
| + | |||
| + | **Final confirmation** 2026-04-20 13:20: vendor `coolpi-loader` | ||
| + | u-boot + vendor kernel image also produces dark panel. | ||
| + | u-boot is supposed to be the working reference for this exact SKU, | ||
| + | vendor-image-dark = display silicon fault, not software bug. | ||
| + | |||
| + | **Unanswerable question**: did the campaign contribute to the silicon | ||
| + | failure? | ||
| + | many register writes into analog PHY/PLL blocks without TRM backing — | ||
| + | collectively plausible contributors, | ||
| + | < | ||
| + | write needs TRM backing, especially in analog blocks. | ||
| + | |||
| + | ==== What the campaign produced ==== | ||
| + | |||
| + | * **VP0-drives-VP2 theory** decoded and memorialised. | ||
| + | RK3588 VOP2 boards where VP0 has no connector but the mix crossbar | ||
| + | still requires VP0 cfg_done to commit. | ||
| + | < | ||
| + | * **Phase 22 u-boot binary** at < | ||
| + | output/ | ||
| + | Cluster1 routing, VP0 fake-run, cfg_done 0x00048005, and enough | ||
| + | PMU/CRU init to produce valid link training + panel stream. | ||
| + | replacement GenBook with working silicon materialises, | ||
| + | start point, not Phase 1 or Phase 24. | ||
| + | * **Tripwire infrastructure** — shared 2 GB DDR ring, u-boot + kernel | ||
| + | writel/ | ||
| + | for arm64 boot-path debugging; worth extracting + upstreaming. | ||
| + | * **One confirmed SDDM-on-eDP-via-upstream** photo at | ||
| + | < | ||
| + | on Nextcloud. | ||
| + | default wallpaper. | ||
| + | in the whole campaign. | ||
| + | * **Register-divergence catalogue** from two tripwire traces | ||
| + | (< | ||
| + | — reference for anyone re-implementing RK3588 VOP2 bring-up. | ||
| + | * **A list of register fields whose meaning was in the TRM vs whose | ||
| + | meaning we inferred** — useful for the next RE engineer, sobering | ||
| + | for this one. | ||
| + | |||
| + | ==== Webcam luma calibration (useful for next campaigns) ==== | ||
| + | |||
| + | * **10–25**: | ||
| + | * **30–40**: | ||
| + | bleed + ambient reflection. | ||
| + | bound — panel still shows nothing. | ||
| + | here. | ||
| + | * **100–130**: | ||
| + | "LCD neutral" | ||
| + | identical to real "white screen" | ||
| + | * **170–200** **with visible structure** (text, icons, wallpaper): | ||
| + | real rendered pixels. | ||
| + | campaign. | ||
| + | |||
| + | ==== Status check for future pickup ==== | ||
| + | |||
| + | If someone comes back to this project with replacement hardware, read | ||
| + | in this order: | ||
| + | |||
| + | - < | ||
| + | - < | ||
| + | - < | ||
| + | - < | ||
| + | - < | ||
| + | - < | ||
| + | |||
| + | The full arc is in this DokuWiki page (read top-to-bottom). | ||
| + | memory files are the executive summary + rules-going-forward. | ||
| + | |||
| + | ==== Bin is closed. ==== | ||
| + | |||
| + | Not " | ||
| + | u-boot contribution is upstreamable; | ||
| + | specific silicon; and a real talk at 42C3 would honestly lead with | ||
| + | "here is how we broke a RK3588 laptop trying to tell it to display | ||
| + | pixels, and what we learned on the way." | ||
| + | |||
| + | ===== 2026-04-20 — postscript: Anthropic feedback (not yet filed) ===== | ||
| + | |||
| + | During the ampere-silicon-maybe-fried phase, Claude Code running on | ||
| + | noether was observed to be at low effort despite the user setting max | ||
| + | effort in a sibling client/ | ||
| + | likely the proximate cause of the < | ||
| + | that wiped the < | ||
| + | SSH until physical disk recovery. | ||
| + | certainly have caught the merged-usr archive-layout trap in advance. | ||
| + | |||
| + | Kept as local memory (< | ||
| + | MEMORY.md index. | ||
| + | Code GitHub tracker is archived here for reference: | ||
| + | |||
| + | **Title candidate**: | ||
| + | recovery scenarios is a customer-retention risk | ||
| + | |||
| + | **Body**: | ||
| + | |||
| + | During a hardware RE session (RK3588 SBC, upstream u-boot + kernel | ||
| + | work on a ~600 EUR device) the Claude Code client dropped effort | ||
| + | mid-recovery after the user had explicitly set max effort in a sibling | ||
| + | client/ | ||
| + | sessions, and the assistant did not surface its current effort level | ||
| + | nor flag that it had shifted down. | ||
| + | |||
| + | In the specific incident, the low-effort pass dispatched a | ||
| + | < | ||
| + | clobbered the < | ||
| + | dynamic linker and bricking SSH access until physical disk recovery | ||
| + | was possible. | ||
| + | the archive layout first, considered the symlink, and staged safely. | ||
| + | The fault-mode is exactly what max-effort exists to prevent. | ||
| + | |||
| + | The user-visible stakes: the hardware could have been permanently | ||
| + | damaged (thermally we got close — sustained ~100k IRQ/s on a display | ||
| + | pipeline error loop for hours). | ||
| + | necessarily have "just buy another one" as an option. | ||
| + | |||
| + | Two concrete asks: | ||
| + | |||
| + | - **Effort level should propagate across sessions/ | ||
| + | same user, or at minimum be surfaced visibly at the top of every | ||
| + | conversation. | ||
| + | - **Auto-downshift during a session that includes disaster-recovery | ||
| + | signals** (rm -rf, flashing, rootfs operations, user-expressed | ||
| + | distress, expensive/ | ||
| + | at least flagged to the user before committing the next expensive | ||
| + | action. | ||
| + | |||
| + | Low effort in this context is not just "less helpful" | ||
| + | directly responsible for the fault that nearly cost a 600 EUR+ | ||
| + | device. | ||
| + | |||
| + | **Status**: drafted 2026-04-20, NOT filed. | ||
| + | rather than submit to the public tracker at this time. | ||
