Table of Contents
Bin — Boot it Nicely (GenBook u-boot eDP upstream)
Mainline u-boot on the CoolPi CM5 GenBook (RK3588) with working eDP boot display and internal keyboard. First-ever RK3588 eDP bring-up in upstream u-boot (not the Rockchip downstream fork).
Target hardware: ampere (CoolPi CM5 GenBook, RK3588 + 32 GB LPDDR5)
Panel: CSOT T9 SNE001BS2-2, 1920×1080@60 Hz, DPCD 1.1, 2.7 Gbps HBR, 2 lanes
Status 2026-04-17 evening: v10 mainline u-boot trains link at HBR×2, panel reports IN_SYNC, BIST bars display on the panel — DP TX + eDP panel proven healthy. Pixels from our own framebuffer still absent; fault narrowed to VOP2 pixel-output chain or content-format upstream of DP TX. Vendor coolpi-loader (factory genbook_spi.img) also shows no logo on eDP — closes the “vendor knows how” assumption.
For next session: Dual-agent debug strategy — two independent AI agents drafted strategies with different top-3 bets. Overlap = high-confidence signal, disagreement = where the most learning happens. Start with the clk_summary check at the top of that page.
Why
The vendor u-boot binary (“coolpi-loader”) runs the vendor DRM stack for its boot logo. Upstream u-boot has no RK3588 VOP2/eDP driver path at all. The goal of project Bin is to produce a clean, upstream-submittable u-boot that paints something on the eDP panel during boot — without pulling in the vendor's “display-cmd dance” from the downstream tree.
Secondary goals:
- Prove the upstream-clean boot chain (mainline u-boot + Collabora TF-A + upstream OP-TEE) can drive eDP.
- Produce a patch series acceptable to u-boot-custodians (no “vendor secret” compensation hacks).
What actually works as of 2026-04-16
- Full DDR / TPL / SPL / BL31 / OP-TEE chain, using stock Rockchip blob
rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin— faster bin unlocked after the compute-module reseat (see DDR RE project MVP1 section). - VOP2 probe (clocks, PMU power domain, GRF) and full register-level init of VP2 + Cluster1 window + overlay mixer.
- eDP controller probe: EDID read via AUX, DPCD capabilities read (rev 1.1, 2.7 Gbps HBR, 2 lanes).
- Link training succeeds (CR + CE pass), training pattern properly disabled at end (was the critical “no pixels” bug #2).
- Backlight on (PWM6 + GPIO4_A3, via panel-simple).
- VOP2 state matches kernel live-dump perfectly: every register (SYS_PD, CFG_DONE, DSP_IF_EN, OVL_CTRL, VP2_DSP_CTRL, CLUSTER1 CTRL0/YRGB/VIR/ACT/DSP_INFO/DSP_ST, OVL PORT_SEL/LAYER_SEL, timing regs) reads back the same bits as the running kernel while SDDM is displaying.
- eDP controller state matches kernel after applying the config_video delta (VIDEO_CTL_2=0x10, VIDEO_CTL_3=0x80, VIDEO_CTL_10=0x02, FUNC_EN_2=0x80, SYS_CTL_4=0x08).
- vidconsole path live — proven by instrumenting
vidconsole_putc_xy(); u-boot is drawing its “U-Boo…” banner into framebuffer.
What's still broken
No visible pixels on the panel. Everything upstream of the physical signal chain is correct. The bug lives in territory our direct-MMIO tools don't reach:
- HDPTX1 PHY state — 0xFED70000 reads back mostly zero via
/dev/memeven while the kernel is driving the panel. That suggests the PHY's real state is behindregmap/syscon indirection (not MMIO-readable without a kernel module). - CRU dclk_vp2 source mux — for eDP on VP2, dclk_vp2 must be parented on HDPTX PHY's recovered clock, not a CRU PLL. u-boot's
clk_set_rate(dclk_vp2, 147.84 MHz)may be hitting the wrong mux. - VO1_GRF (0xFD5AC000) has DP mux / HDPTX routing bits; the kernel writes them, we write none.
regmap_update_bitstraffic that ourvop2trace.kokprobe never caught (it only hookedregmap_write).
The cfg_done dance — biggest recurring gotcha
If a VOP2 register write reads back zero or doesn't take effect, it's cfg_done. This has burned us repeatedly across multiple sessions. Save yourself the reboot cycles:
VOP2 has shadow (staging) and active (hardware-driving) register banks. Writes land in shadow; they commit to active only when cfg_done fires.
CFG_DONE_IMD(bit 28 at offset 0x030) latches VP-level config immediately but does not cover window shadow registers (CLUSTERx CTRL0, YRGB_MST, VIR, ACT, DSP_INFO). Those need an explicit per-VP cfg_done write.- Required write:
CFG_DONE_EN | BIT(vp_id) | (BIT(vp_id) « 16)→ offset 0x000. - Kernel has
OVL_CTRL=0because it does periodicregmap_update_bitscfg_done every atomic commit. Our one-shot u-boot init needsOVL_CTRLbit 31 set (0x80000000) or PORT_SEL/LAYER_SEL/CLUSTER writes never latch. - Writes to non-shadow regs (CTRL1, DSP_ST, clock, mux) commit immediately and read back cleanly. Writes to shadow regs need cfg_done.
- Debugging: add readback right after the write; if zero, add cfg_done; readback again.
Critical bugs caught this session (2026-04-16)
Seven fixes, all of which are prerequisites for any future display work — any upstream u-boot VOP2/eDP driver will need these:
- Explicit per-VP cfg_done write. CFG_DONE_IMD does NOT latch window shadow registers. Must write
CFG_DONE_EN | BIT(vp_id) | (BIT(vp_id)«16)to the CFG_DONE reg after every batch of window writes. See cfg_done dance callout above. - DP_TRAINING_PATTERN_DISABLE at end of channel equalization.
rk3588_edp_link_train_ce()was returning after CR/CE success without writing 0 to ADP_TRAINING_PTN_SET or DPCD 0x102. The PHY was streaming training symbols indefinitely — link “trained” but carrying no real video. Panel showed black despite all register state looking correct. - DPCD setup before training. Missing spec-required writes: ML_CH_CODING_SET=1 (ANSI 8B/10B), DOWNSPREAD_CTRL to match SSC capability, enhanced-frame bit in LANE_COUNT_SET when sink supports it.
- eDP config_video delta matching kernel. Our config_video wasn't writing VIDEO_CTL_2=0x10, VIDEO_CTL_3=0x80, VIDEO_CTL_10 bit 1, FUNC_EN_2=0x80, SYS_CTL_4=0x08. All of these are needed; the kernel's analogix_dp driver sets them but the u-boot one doesn't.
- VP2_DSP_CTRL=0x1000000f — kernel uses OUT_MODE=AAAA (0xf) + bit 28 set. Briefly tried S888 (0x8), wrong tree; stay with AAAA + bit 28.
- OVL_CTRL bit 31 for immediate latch. See cfg_done dance callout.
- VOP_GRF_CON2 bit layout corrected. Our original
EDP1_ENABLE_SHIFT=1was wrong — kernel's live register value has bit 3 set, not bit 1. The naive EDP0/EDP1/HDMI0/HDMI1 = bits 0/1/2/3 mapping is not what RK3588 uses.
Method: kernel live-dump as oracle
The productive move of this session was switching from reading downstream vendor u-boot source (which was “vendor secret” register soup without explanations) to reading live register state from the running kernel while SDDM was displaying. That's ground truth — whatever bits are set when pixels reach the panel, those are the bits you need.
ssh ampere 'sudo python3 <<EOF import mmap, struct, os fd = os.open("/dev/mem", os.O_RDONLY|os.O_SYNC) mm = mmap.mmap(fd, 4096, mmap.MAP_SHARED, mmap.PROT_READ, offset=<BASE>) print(hex(struct.unpack("<I", mm[<OFFSET>:<OFFSET>+4])[0])) EOF'
Works for VOP2 (0xFDD90000) and the eDP controller (0xFDED0000). Does not work for the HDPTX PHY (0xFED70000, syscon-wrapped) or the CRU (0xFD7C0000) — those show mostly zeros even when active. Next session: kernel module to dump those properly.
Artifacts
| Where | What |
|---|---|
boltzmann:~/src/u-boot/ | u-boot source with rk3588_vop2, rk_edp, DTSI patches |
boltzmann:~/src/u-boot/drivers/video/rockchip/rk3588_vop2.c | VOP2 driver with kernel-trace-replay init and live register STATE dump |
boltzmann:~/src/u-boot/drivers/video/rockchip/rk_edp.c | eDP driver with spec-complete DPCD setup, pattern-disable, kernel-matched config_video |
ampere:/root/uboot-backups/ | Timestamped SPI backups across session |
meitner:/tmp/uart.log | All UART traffic during boot iterations (systemd-run uart-cap.service + uart-follow.service mirror on /dev/tty8) |
noether:~/claude/vop2_harness/vop2trace/ | LKM that traces regmap_write + writel_relaxed during kernel DRM module load. Dumps to /proc/vop2trace. |
Flash pipeline
Board is alive, so flash from running Linux (fastest):
scp boltzmann:~/src/u-boot/u-boot-rockchip-spi.bin ampere:/tmp/ ssh ampere 'sudo dd if=/dev/mtd0 of=/root/uboot-backups/spi-pre-$(date +%H%M%S).bin bs=1M; \ sudo flashcp --partition /tmp/u-boot-rockchip-spi.bin /dev/mtd0' # Reboot when YOU are ready.
If bricked: maskrom-mode recovery via meitner + rkdeveloptool (db loader
→ cs 9 SPI NOR → wl 0 → rd). Validated at ~60 s per cycle.
Build recipe (stock fast DDR blob — mandatory)
cd ~/src/u-boot && make ARCH=arm \ BL31=/home/mfritsche/src/tf-a/build/rk3588/release/bl31/bl31.elf \ TEE=/home/mfritsche/src/optee_os/out/arm-plat-rockchip/core/tee.bin \ ROCKCHIP_TPL=/home/mfritsche/src/rkbin/bin/rk35/rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin \ CC="distcc gcc" -j80
Do not use the decompiled-and-patched rk3588_ddr_v1.19_patched_v2.bin
— it bricks ampere (DDR training fails all channels/all lanes). Only the
stock rkbin blob is known-good.
Next session
- Write a kernel module that dumps HDPTX1 PHY + CRU dclk_vp2 mux + VO1_GRF full state while display works.
/dev/mem-direct reads hit zero zones for some of these. - Hook
regmap_update_bitsinvop2trace.ko— current kprobe only catchesregmap_write. - Diff kernel vs u-boot for VO1_GRF + CRU dclk_vp2; write our init to match.
- Fallback: HDMI output path (simpler protocol) to prove VOP2 itself works, then come back to eDP.
2026-04-17 evening — empirical disambiguation
Sessions 1–4 were heuristic register matching against the kernel live-dump. Tonight was empirical disambiguation: stopped assuming and started asking the hardware what it actually sees.
u-boot eDP campaign — v10
Branch bin/wip-2026-04-17 on boltzmann:~/src/u-boot carries all vendor-sourced
fixes on top of the earlier VOP2/eDP work:
hclkDT + driver wire-up.- Double-beat
adp_writefor the eDP controller (was a single-beat quirk). init_video+set_video_formatpaths added.- DPCD enhanced-frame + downspread + training-pattern-disable writes completed.
- DPCD
SINK_STATUS(0x205) read after commit. - Alpha=0 bug in the u-boot stripe paint — vendor clusters blend incoming pixels
against VP2_DSP_BG=black with the source alpha. Alpha-0 = source×0 = black
field regardless of colour. Fixed in v10. * ''dclk_vop2'' rate was wrong in u-boot (~136 MHz vs the kernel 147.69 MHz); fixed by pre-selecting ''V0PLL'' as parent before ''clk_set_rate()'' so the u-boot clock driver takes the retune path instead of picking the nearest matching divider on the default ancestor. * VOP MMU disable hypothesis explored; MMU was already bypassed by reset — commit reverted, only the diagnostic code remains.
What this unlocked
- Link trains at HBR×2 without errors.
- Panel DPCD
SINK_STATUSreportsIN_SYNC— panel sees a valid stream. - Backlight on.
- BIST colour bars display on the panel (PHY-internal pattern generator driving
the main link). DP TX, PHY, cable, panel, backlight are all healthy.
- Own framebuffer still produces no visible pixels. Whatever is wrong is
upstream of DP TX — either VOP2 pixel-output chain or the content format
we hand to the eDP controller.
Vendor u-boot detour (closes vendor-knows-how)
To rule out we are missing a vendor-secret step, built vendor coolpi-loader from
source:
ranke= CT171 on data, Debian 12 x86_64, ephemeral historian-named build host.- Source: vendor
coolpi-loader, branchlinux-6.1-stan. - Extlinux.conf auto-rewrite hack removed in both
linux-6.1-stan(b718d7b1f9) and
linux-6.1-stan-rkr5 (90083ce217) variants — vendor u-boot otherwise edits
the runtime distro ''/boot/extlinux/extlinux.conf'' on every boot. * Built + flashed ''genbook_spi.img''. Kernel boots cleanly. **No logo on eDP or HDMI.** Same symptom as our own u-boot. * Web-research agent surfaced the likely reason: vendor u-boot uses ''rockchip_show_logo()'', which loads a ''logo'' partition defined in ''parameter.txt''. ''genbook_spi.img'' is SPL-region only (8 MB) and does not include that partition. The Rockchip wiki explicitly states logo support is Android-only: not implemented for Linux. That probably explains why the vendor image does not light eDP on this reference board either. * Rebuilt on ''linux-6.1-stan-rkr5'' (Apr-2026, significant display/analogix_dp fixes over Jan-2026 ''stan''). Resulting image did not boot — idblock / u-boot FIT version mismatch between the vendor-baseline idblock and the rkr5 u-boot.img. * Panel SKU fork noted: BOE NV140FHM-N42 / N61 / N66 across hardware revisions. Vendor u-boot panel timings may not match our specific SKU even with a working logo path.
Conclusion: vendor u-boot is not a working reference for eDP-logo. The vendor-knows-how assumption is dead.
Flash protocol confirmed
flashcp –partition <8mb-img> /dev/mtd0via SSH into the running Arch
userland is reliable across iterations. Default path.
rkdeveloptoolmaskrom: onewl 0 <8mb-img>per fresh maskrom session.
Multiple consecutive wl writes in the same session cause comm-object
failures. Recovery path validated: ''db rk3588_spl_loader_v1.19.113.bin → cs 9
→ wl 0 <image> → rd''.
* SPI layout on RK3588 GenBook (correction from earlier notes):
* idblock at 0x8000 (~208 KB)
* u-boot FIT at 0x60000
* The earlier 0x200000 value was wrong — that is
''CONFIG_MTD_BLK_U_BOOT_OFFS'' for eMMC, not SPI.
Reference images stashed
| Where | What | sha256 (prefix) |
|---|---|---|
boltzmann:~/projects/AMPere/output/u-boot-rockchip-spi-bin-wip-20260417-v10-8mb.bin | v10 mainline, known-good | 925137b923af… |
meitner:/tmp/genbook_spi_vendor_prebuilt.img | Vendor factory (no logo) | 7202caf7ca54… |
data:/rpool/nas/home/mfritsche/gbook/coolpi_rk3588_gbook_nor_upgrade.img | Vendor 7 MB upgrade image with logo partition — not tested | — |
Updates to the Next-session list
Supersedes the earlier clk_summary / HDPTX trace checklist on this page — those checks have been run; see the debug-strategy page for the Closed / Reopened update.
- Maybe extract the separate
idblock.bin(208 KB) that vendormake.sh
builds alongside the 492 KB download-boot loader, then rebuild the rkr5 stack
end-to-end so idblock + u-boot FIT versions line up. * Try ''coolpi_rk3588_gbook_nor_upgrade.img'' — hesitant, it might rewrite ''extlinux.conf'' through some other mechanism and we would rather not discover that by surprise. * Alternatively accept that pixels-in-u-boot is not a today problem; the kernel stack works cleanly on v10 for development use.
2026-04-18 evening — tripwire session: register trajectories captured
Bench session. Stopped heuristic matching, built an actual instrument: every
writel
/
readl
in u-boot and kernel now records a timestamped trace into a shared DDR region. Diff the two traces, the bugs fall out.
Infrastructure built
CONFIG_RK_TRIPWIRE
feature in both u-boot (branch
bin/wip-2026-04-17
) and
kernel (
linux-rk3588-marfrit
→
bin/tripwire
branch). Shared 2 GB no-map
DDR region at phys <code>0x780000000</code>, reserved via DT on both sides so neither side maps it as general memory. * Every <code>writel</code>/<code>readl</code> records a 32-byte record: <code>(cntvct_el0 tick, caller PC, phys addr, value, flags)</code>. Phys resolved via page-table walk in the kernel record fn; native on the u-boot side. * Offline C dumper at <code>boltzmann:~/src/u-boot/tools/rk_tw_dump/rk_tw_dump.c</code> emits CSV; <code>resolve.py</code> sidecar does symbol lookup via kallsyms bisect. * Bench runbook at <code>noether:~/claude/bin_bench_plan.md</code>.
Three phases of the evening
- Phase 1:
CONFIG_BIN_PHASE1_NOINIT
u-boot (zero VP2/eDP register writes) + Phase 2 kernel. Result: kernel DRM can cold-init the display from scratch after
modprobe panel_edp
— SDDM displays. Conclusion: point-3 hypothesis (kernel depends on u-boot half-init) is disarmed.
- Phase 2: same u-boot, armed tripwire at runtime, reloaded
panel_edp
to re-trigger modeset. Captured 77 K kernel writes, decoded full
atomic_commit
sequence.
- Phase 3: u-boot with full VP2+eDP init AND tripwire armed from first
vop2_probe
entry. Captured 4.3 M records (2.08 M u-boot + 2.24 M kernel). Zero lost.
Concrete register divergences found
Three bit-level diffs between the u-boot writes and the kernel writes:
| Register | Our u-boot | Kernel | Analysis |
|---|---|---|---|
VOP2 +0x0000 cfg_done (VP2 latch) | 0x00048004 | 0x00048005 | Kernel latches VP0+VP2 together. We latch VP2 alone. Source at drivers/video/rockchip/rk3588_vop2.c:287-292 — vop2_cfg_done(priv, 2) writes CFG_DONE_EN | BIT(2) | (BIT(2) << 16) . Candidate fix: also OR in BIT(0) | (BIT(0) << 16) so the DSP_IF crossbar can synchronize VP0+VP2 in one latch. |
VOP2 +0x0600 | 0x80000000 (bit 31 set) | 0x00000000 | Likely STANDBY/bypass bit on VP1. Need to trace where we set it (likely in the vendor-dump-mirror block from early RE) and either clear or never set. |
VOP2 +0x06f0 | 0x04040000 | 0x04040404 | Per-byte lane-phase register. We set only the upper 16 bits; kernel sets all 4 bytes to 0x04 . Trivial value fix. |
Confirmations (values that match kernel — NOT bugs)
VP2_POST_SCL_CTRL 0x0e3c = 0x10001000
— rkr5 analysis flagged this as “cargo-cult worth decoding”; tripwire now proves kernel writes the same value, so the value is correct.
- VP2 post-config block
0x0e30..0x0e40
, output mux
0x06e8 = 0x34000000
, cluster1 window,
VP2_DSP_CTRL = 0x0000000f
.
Secondary observation
With Phase 2 u-boot (no VP2 init), the “brown text flash” during early kernel boot DISAPPEARED. With Phase 3 u-boot (full VP2 init), it came back. This proves
simplefb
is scanning the raster the u-boot sets up — the flash is the u-boot VP2 output,
simplefb
just overlays kernel console text on it. Panel stays physically lit whenever u-boot does VP2 init, regardless of whether our own stripe content reaches it.
Next steps
Fix the three divergences in u-boot, reflash, observe. If they close the “no pixels” wall, campaign is done. If not, mine the 4.3 M-record Phase 3 capture for the next divergence — we now have a reproducible capture.
Artifacts
| Where | What |
|---|---|
boltzmann:~/bin-phase3-full.csv | Full trace, 225 MB, 4.3 M records |
boltzmann:~/bin-phase2-modeset-v2.csv | Kernel modeset only, 77 K records |
boltzmann:~/projects/AMPere/output/u-boot-rockchip-spi-phase3-genbook-8mb.bin | Phase 3 u-boot SPI (sha ac461a2195… ) |
2026-04-19 — Phases 4-20: false trails and Phase 20 verdict
Bench session, four hours of chained hypotheses against a 20-phase reboot cycle. The memory key takeaway: observation beats theory in register-level reverse engineering. See feedback memory.
The bit-31 detour
Phase 4 theorised that
OVL_CTRL
bit 31 was a STANDBY/bypass flag and cleared it. No readback verified the theory; no tripwire recorded the bit effect. Theory was wrong — per TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL,
OVL_CTRL
bits 31:30 are the
LAYERSEL_REGDONE_SEL
field (pick which VP cfg_done commits the LAYER_SEL register). Value
10
selects VP2 — exactly what we wanted, cleared by Phase 4.
Result: Cluster + LAYER_SEL writes were silently dropped for phases 5 through 12, and every subsequent register tweak appeared to fail for “unknown reasons.” Cost: ten phases of misdirection, around six hours of bench time. Phase 13 caught the regression only because tripwire captured the actual readback (
OVL_CTRL, PORT_SEL, LAYER_SEL
at reset defaults) alongside the intended writes — the discrepancy was the smoking gun.
Rule now in memory: before committing a register fix, write down the expected post-fix readback value. If you cannot name one, the hypothesis is not falsifiable. After the fix, read back. If readback does not match expected, the fix did not land — do not move on.
Phase 19 to Phase 20: the cluster swap
Phases 13 through 19 refixed OVL_CTRL bit 31 and iterated on Cluster0 plane writes (CLUSTER0_CTRL at 0x1100 =
0x80004001
=
FRM_RESETN_EN | MMU_BYPASS | CLUSTER_ENABLE
). Cluster0 registers never latched (readback stayed at reset defaults) no matter what load-enable or frame-delay we tried.
The wall: decoding PORT_SEL. Readback value
0xa0587783
decodes via TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL as:
| Bit range | Field | Value | Meaning |
|---|---|---|---|
| 17:16 | cluster0_sel_port | 00 | VP0 (no active output) |
| 19:18 | cluster1_sel_port | 10 | VP2 (eDP, active) |
Cluster0 shadow commits on VP0 vsync which never fires — hence writes never land. Cluster1 shadow commits on VP2 vsync, which IS firing. The kernel uses Cluster1 for the VP2 primary plane; we had been writing Cluster0 for phases 12 through 19.
Phase 20 reverts: base
0x1000 -> 0x1200
, CLUSTER_CTRL
0x1100 -> 0x1300
, WIN_REG_CFG_DONE load bit 0 to bit 1.
Phase 20 verdict — writes finally land
UART state dump after Phase 20 trace-replay init:
CLUSTER1 CTRL0 = 0x00000001 (WIN_ENABLE=1) CLUSTER1 YRGB = 0xef700000 (our fbbase) CLUSTER1 VIR = 0x00000780 (stride 1920) CLUSTER1 ACT = 0x0437077f (1920x1080) CLUSTER1_CTRL(0x1300) = 0x80004001 (FRM_RESETN | MMU_BYPASS | ENABLE)
Every plane register holds our value on readback — plane is committed, VP2 is running. Prior phases had all these same offsets reading reset defaults.
But panel stays dark (see webcam frames below). Kernel dmesg shows
*ERROR* POST_BUF_EMPTY irq err at vp2
firing at vblank rate (486k callbacks suppressed per 5 seconds). The kernel uses Cluster1 too, so this is NOT a u-boot-only bug: the VP2 post-scaler is starved on a path shared with kernel-side DRM.
The real bug: runtime PM underflow at handover
sudo journalctl -k -b 0
on the Phase 20 boot, chronological:
23:26:43 platform fdd90000.vop: Adding to iommu group 5
23:26:48 rockchip-vop2 fdd90000.vop: Runtime PM usage count underflow!
(x17 in immediate succession)
23:26:48 rockchip-drm display-subsystem: [drm] *ERROR* POST_BUF_EMPTY
irq err at vp2 (then vop2_isr: 486859 callbacks suppressed)
Mechanism: u-boot leaves VP2 powered + clocked, but the kernel PM framework starts every device at
runtime_status = SUSPENDED
. When supplier links (IOMMU group 5, power-domain) run their init, they call
pm_runtime_put()
on fdd90000.vop. Counter underflows (was 0, went negative 17 times). Warning is non-fatal.
Later,
rockchip-vop2
modeset path calls
pm_runtime_get()
, but since the framework thinks the device is already resumed (counter not zero after the underflow math settles),
rockchip_vop2_runtime_resume()
never runs. That callback is where the driver does its full clock re-gate, reset-toggle, and IOMMU re-program sequence. Modeset proceeds with VP2 in the state u-boot left it — one or more sub-steps of the resume path skipped — post-scaler starves and POST_BUF_EMPTY fires at every vblank.
The physical fb probe confirms pixels did reach memory:
- fb_peek.ko kernel module, native-built on boltzmann, memremap of
0xef700000
with
MEMREMAP_WB
.
- Rows 0 / 300 / 540 / 800 read all zeros — u-boot stripe paint
for the top two thirds was overwritten by kernel takeover (fbcon or
DRM alloc, since we reserve no memory for the fb). * Row 1070 reads <code>ff0000ff ff0000ff ...</code> — the blue band (alpha=0xFF, B=0xFF) from u-boot stripe paint **survives** in the bottom third.
So the panel IS displaying what u-boot wrote (mostly black with a blue band at the bottom), latched by the panel internal scanout memory at the moment VP2 got PM-locked. eDP panels cache their last received frame; that is what the webcam sees.
DRM state at quiescence:
/sys/kernel/debug/dri/1/state
shows
Cluster1-win0
bound to
video_port2
with an Xorg-allocated XR24 framebuffer (fb=90, 1920×1080), yet
/sys/kernel/debug/dri/1/vop2/summary
reports
Video Port2: DISABLED
— VP2 is force-idle after the POST_BUF_EMPTY cascade.
The three paths forward
- (A) Tear-down path — before
rk3588_vop2_display_init
returns, walk VP2 all the way back down: STANDBY bit, CRU re-gate,
PMU bus-idle, PMU power-off, release u-boot PD. Kernel probes a clean SUSPENDED device, resume callback runs normally. Splash likely will NOT persist visually (eDP blanks within a frame of signal loss, unless the panel supports PSR) but handover is clean. * **(B) State-match path** — leave VP2 running in exactly the state kernel early-probe expects. Map every register the kernel reads at probe, diff vs what u-boot leaves, fix discrepancies. Probably a dead end. * **(C) NOINIT + text splash** — stay on <code>BIN_PHASE1_NOINIT=y</code> (already works), no graphical splash, kernel does all VP2 setup cleanly. Lowest risk, least impressive outcome.
Phase 21 pursues (A). Kconfig gate:
BIN_VP2_TEARDOWN
. Code lives at
drivers/video/rockchip/rk3588_vop2.c
end of
rk3588_vop2_display_init
, mirroring the PMU+CRU setup sequence in reverse. Verdict metric:
sudo journalctl -k -b 0 | grep underflow
should return zero matches after a clean boot.
Webcam setup — visual verification rig
Eyedot USB camera on meitner, pointed at the ampere panel. Framing convention:
- One third desk, two thirds screen — camera tilted so the bottom
third of the frame shows keyboard / desk surface (reference for
"backlight off" baseline) and the top two thirds covers the panel. * Default capture script: <code>meitner:/tmp/eyedot-cap.sh /tmp/bin-phaseN.mkv</code>. Records at 10 fps raw, downsamples to 1 fps into H.264, auto-stops 5 seconds after UART shows <code>login:</code>. * ffmpeg signalstats is unreliable on JPEGs from this camera; use a Python one-liner on raw pixel data for luma histograms. * Dim content on a dark panel is hard to see raw. Standard enhance pipeline: <code>ffmpeg -i frame.jpg -vf eq=brightness=0.1:contrast=2.5 -y out.jpg</code>, then <code>crop=iw*2/3:ih:iw/3:0</code> to strip the desk portion and focus on panel content. * Trap: camera auto-exposure skews dark and bright regions. Luma average over the whole frame is dominated by the bright environment; always crop to panel before averaging. * Another trap: a stationary "dark blob" in the top-middle of frames is the camera head shadow on the panel, not displayed content. If it shows across multiple frames in the same position, it is not pixels.
Artifacts: frames at
meitner:/tmp/bin-phaseN-frames/
, videos at
meitner:/tmp/bin-phaseN.mkv
. Phase 20 capture yielded 303 frames over around 5 minutes (reboot to sddm). Phase 20 panel-view crops show dim navy blue bottom two thirds, lighter top third — consistent with “top 2/3 black (zeros) + bottom 1/3 blue (0xFF0000FF)” latched on the panel.
Related memory files
project_bin_tripwire_findings.md
— case study, full
findings for 2026-04-18 tripwire plus 2026-04-19 Phase 20 plus
Phase 21 direction * <code>feedback_observation_over_theory.md</code> — the bit-31 detour memorialised * <code>feedback_observer_first.md</code> — Phase 7 backlight-off during visual test memorialised * <code>project_bin_42c3_timeline.md</code> — narrative arc for the 42C3 talk proposal
2026-04-20 — Phase 21 failed, trace-diff, VP0 theory reinstated
Phase 21 (A) failed on both axes: the BIN_VP2_TEARDOWN code ran correctly (UART:
VOP2: teardown done, PMU pwr=0x1b idle=0x37fff
, PD_VOP and PD_VO1 gated off), but the kernel still got exactly 17
Runtime PM usage count underflow
warnings AND now crashed with init SIGSEGV because
rockchip-drm
probe tries to read VP2 registers on a powered-off domain and AXI hangs. Ampere went into a boot loop; recovered via maskrom +
db rk3588_spl_loader_v1.19.113.bin → cs 9 → wl 0 phase20.bin → rd
on meitner.
Two implications:
- The 17 underflows are not caused by u-boot leaving VP2 active.
They fire regardless of PMU state. Path-A was attacking the wrong
target. * You cannot power-gate VP2 before kernel boot. Kernel expects register readability at probe. Either leave it on (Phase 20 state) or never turn it on (Phase 1 NOINIT).
Trace diff on <code>bin-phase3-full.csv</code>
4.3 M-record capture from 2026-04-18. Filtered to VOP2 region (
0xfdd90000..0xfdd95fff
), split by stage, compared without aggressive dedup.
| Metric | u-boot | kernel |
|---|---|---|
| unique offsets touched | 54 | 1118 |
| total accesses | 87 | 24 614 |
Kernel-only offsets (we never write): 1072. Most are per-vblank IRQ status/clear (
0x0084 written 900x
,
0x0094 written 900x
,
0x00c4 written 900x
) — expected maintenance traffic, not missing init.
The one-shot divergences that actually matter, ranked by likely impact on post-scaler starvation:
0x0e00 VP2_DSP_CTRL
: we
0x0000000f
, kernel
0x1040000f
. Kernel sets bit 28
DSP_LUT_EN
and bit 22 <code>GAMMA_UPDATE_EN</code>, and programs the full 1024-entry LUT at <code>0x5000+</code>. We write neither the enable bits nor the LUT. - <code>0x0e0c VP2_CLK_CTRL</code>: we <code>0xe</code>, kernel <code>0x2</code>. Different internal clock-divider config. - <code>0x0000 cfg_done</code>: we <code>0x00048004</code> (VP2 only), kernel <code>0x00048005</code> (VP0+VP2 together). See VP0 theory below. - <code>0x06f0</code>: we <code>0x04040000</code>, kernel <code>0x04040404</code>. Trivial missing two bytes. - <code>0x0028, 0x002c</code> DSP_IF_EN block: we miss bits 3/4 in 0x0028 and all of 0x002c=<code>0x00060000</code>.
Kernel also brings up VP0 fully (VP0_DSP_CTRL at
0x0c00
, VP0 timing at
0x0c48..0x0c54
, VP0 LINE_FLAG at
0x0070
, VP0 INT_EN at
0x00a0
) — we write zero VP0 regs.
VP0-drives-VP2 theory (reinstated)
The cfg_done question has been flip-flopped across Bin sessions. Pinning it down in memory now at
project_bin_vp0_theory.md
.
Rule: on the GenBook, cfg_done at
0x0000
must latch both VP0 and VP2 together (value
0x00048005
). Do not drop the VP0 bit on the reasoning that VP0 has no connector.
Why: RK3588 VOP2 has a single shared overlay mix crossbar, not per-VP silos. PORT_SEL readback
0xa0587783
decodes to: PORT0_MUX=3 (VP0 gets layers 0..3), PORT1_MUX=8 (VP1 disabled), PORT2_MUX=7 (VP2 gets layers 4..7), PORT3_MUX=7. VP2 layer-slots 4..7 sit downstream of VP0 layer-slots 0..3 in the same mix pipeline. When VP0 cfg_done stays pending (because we only latch VP2), VP0 mix state is in shadow, never commits, and the mix stalls — VP2 post-scaler reads empty → POST_BUF_EMPTY at vblank rate → panel dark.
VP0 vsync fires whenever VP0 has dclk running + valid timing, independent of whether a panel is physically attached. Kernel brings up VP0 fully for exactly this reason: keeps the mix crossbar advancing.
History of the bug:
ddefc154
(pre-Phase 12): tripwire diff correctly found
0x00048005
two-phase latch, hypothesised DSP_IF crossbar
needs VP0 committed. Right theory. - <code>e05c0915</code> (Phase 15-16): reversed on the false reasoning "VP0 has no connector = no vsync." **Wrong.** Value stayed at <code>0x00048004</code> through Phase 20. - <code>7bd68b59</code> (Phase 20): fixed Cluster routing but kept the VP2-only cfg_done. Panel still dark, POST_BUF_EMPTY still firing. - <code>2026-04-20</code> trace-diff: confirmed kernel terminal <code>0x00048005</code>, plus VP0 init writes we never replicate.
Proposed Phase 22
Restore
0x00048005
in cfg_done AND add VP0 fake-run init:
VP0_DSP_CTRL = 0x1040000f
at
0x0c00
VP0 HTOTAL_HS_END = 0x0898002c
at
0x0c48
VP0 HACT_ST_END = 0x00c00840
at
0x0c4c
VP0 VTOTAL_VS_END = 0x04650005
at
0x0c50
VP0 VACT_ST_END = 0x00290461
at
0x0c54
VP0 LINE_FLAG = 0x04610461
at
0x0070
VP0 INT_EN = 0x00200020
at
0x00a0
Plus top candidates from the value divergences: flip
VP2_DSP_CTRL += DSP_LUT_EN + GAMMA_UPDATE_EN
(and populate LUT if needed), fix
VP2_CLK_CTRL
to
0x2
, complete
0x06f0 = 0x04040404
.
One change per commit, verify via state-dump readback before moving on.
2026-04-20 late — campaign closeout
Bin is closed, partial-win. Ampere's display silicon failed by end of session. Symptoms: link trains, backlight comes on, BIST bars display on panel (PHY-internal pattern generator), eDP AUX reads panel EDID correctly — but the VOP2-to-HDPTX-internal-pipeline never delivers a valid pixel stream. Three different kernel-side failure modes observed across u-boot variants:
- POST_BUF_EMPTY storm (100k-500k IRQ/s) — caused by Bin
u-boot's VP2 half-init leaving mix pipeline in error-reassert loop.
Fixed by vanilla u-boot (storm count drops to zero). Storm is NOT why the panel is dark — just excessive IRQ noise. * **rockchip-vop2 port_mux_done timeout** — kernel's PORT_SEL commit does not latch with vanilla u-boot. Fixed by Phase 22 u-boot pre-committing port_mux. Trade-off, not solution. * **runtime PM refcount underflow (9447 in 1 min with vanilla kernel + Phase 22 u-boot)** — kernel PM vs u-boot PM state drift.
Final confirmation 2026-04-20 13:20: vendor `coolpi-loader` u-boot + vendor kernel image also produces dark panel. Since vendor u-boot is supposed to be the working reference for this exact SKU, vendor-image-dark = display silicon fault, not software bug.
Unanswerable question: did the campaign contribute to the silicon failure? 25+ reboot cycles, 100k IRQ/s storms across many hours, and many register writes into analog PHY/PLL blocks without TRM backing — collectively plausible contributors, individually unprovable. See
feedback_trm_or_nothing.md
— forward rule: every register write needs TRM backing, especially in analog blocks.
What the campaign produced
- VP0-drives-VP2 theory decoded and memorialised. Applies to all
RK3588 VOP2 boards where VP0 has no connector but the mix crossbar
still requires VP0 cfg_done to commit. Memory file <code>project_bin_vp0_theory.md</code> documents it with receipts. * **Phase 22 u-boot binary** at <code>boltzmann:~/projects/AMPere/ output/u-boot-rockchip-spi-phase22-genbook-8mb.bin</code>. Correct Cluster1 routing, VP0 fake-run, cfg_done 0x00048005, and enough PMU/CRU init to produce valid link training + panel stream. If a replacement GenBook with working silicon materialises, this is the start point, not Phase 1 or Phase 24. * **Tripwire infrastructure** — shared 2 GB DDR ring, u-boot + kernel writel/readl recording, offline CSV dumper. Useful generic tool for arm64 boot-path debugging; worth extracting + upstreaming. * **One confirmed SDDM-on-eDP-via-upstream** photo at <code>Documents/Markus_And_Claude/bin-phase22-sddm-20260420-0806.jpg</code> on Nextcloud. Clock showing 08:06:37, Monday 20 April, KDE/Arch default wallpaper. Never reproduced. Only confirmed-pixels frame in the whole campaign. * **Register-divergence catalogue** from two tripwire traces (<code>bin-phase3-full.csv</code> and <code>phase24-trace.csv</code>) — reference for anyone re-implementing RK3588 VOP2 bring-up. * **A list of register fields whose meaning was in the TRM vs whose meaning we inferred** — useful for the next RE engineer, sobering for this one.
Webcam luma calibration (useful for next campaigns)
- 10–25: backlight off (cold boot, DPMS, hardware gate).
- 30–40: backlight ON, panel blank — LCD rest state + backlight
bleed + ambient reflection. Xorg can be running, CRTC active, fb
bound — panel still shows nothing. Most of what we captured fell here. * **100–130**: panel backlit + receiving a signal that hits the "LCD neutral" state (uniform bright with no structure). Looks identical to real "white screen" content in the camera. Trap. * **170–200** **with visible structure** (text, icons, wallpaper): real rendered pixels. The 08:06 shot is the only one in the campaign.
Status check for future pickup
If someone comes back to this project with replacement hardware, read in this order:
project_bin_closeout.md
(this file)
project_bin_phase22_notes.md
project_bin_vp0_theory.md
project_bin_tripwire_findings.md
feedback_trm_or_nothing.md
feedback_observation_over_theory.md
The full arc is in this DokuWiki page (read top-to-bottom). The memory files are the executive summary + rules-going-forward.
Bin is closed.
Not “done” — more like “won the insights, lost the hardware.” The u-boot contribution is upstreamable; the display never lit up on this specific silicon; and a real talk at 42C3 would honestly lead with “here is how we broke a RK3588 laptop trying to tell it to display pixels, and what we learned on the way.”
2026-04-20 — postscript: Anthropic feedback (not yet filed)
During the ampere-silicon-maybe-fried phase, Claude Code running on noether was observed to be at low effort despite the user setting max effort in a sibling client/session. The low-effort setting was very likely the proximate cause of the
tar -xzf -C /
footgun that wiped the
/lib -> usr/lib
symlink on ampere, bricking SSH until physical disk recovery. A max-effort pass would almost certainly have caught the merged-usr archive-layout trap in advance.
Kept as local memory (
feedback_effort_stakes.md
) and MEMORY.md index. Draft of a possible future issue against the Claude Code GitHub tracker is archived here for reference:
Title candidate: Effort level auto-downshifting during disaster- recovery scenarios is a customer-retention risk
Body:
During a hardware RE session (RK3588 SBC, upstream u-boot + kernel work on a ~600 EUR device) the Claude Code client dropped effort mid-recovery after the user had explicitly set max effort in a sibling client/session. The effort setting did not propagate between sessions, and the assistant did not surface its current effort level nor flag that it had shifted down.
In the specific incident, the low-effort pass dispatched a
tar -xzf -C /
against a merged-/usr Arch rootfs and clobbered the
/lib -> usr/lib
symlink — breaking the dynamic linker and bricking SSH access until physical disk recovery was possible. A max-effort pass would almost certainly have read the archive layout first, considered the symlink, and staged safely. The fault-mode is exactly what max-effort exists to prevent.
The user-visible stakes: the hardware could have been permanently damaged (thermally we got close — sustained ~100k IRQ/s on a display pipeline error loop for hours). In that scenario, the user does not necessarily have “just buy another one” as an option.
Two concrete asks:
- Effort level should propagate across sessions/clients for the
same user, or at minimum be surfaced visibly at the top of every
conversation. - **Auto-downshift during a session that includes disaster-recovery signals** (rm -rf, flashing, rootfs operations, user-expressed distress, expensive/irreversible steps) should be suppressed, or at least flagged to the user before committing the next expensive action.
Low effort in this context is not just “less helpful” — it is directly responsible for the fault that nearly cost a 600 EUR+ device. That is a churn-level outcome if it hits the wrong customer.
Status: drafted 2026-04-20, NOT filed. User chose to keep local rather than submit to the public tracker at this time.
