Mainline u-boot on the CoolPi CM5 GenBook (RK3588) with working eDP boot display and internal keyboard. First-ever RK3588 eDP bring-up in upstream u-boot (not the Rockchip downstream fork).
Target hardware: ampere (CoolPi CM5 GenBook, RK3588 + 32 GB LPDDR5)
Panel: CSOT T9 SNE001BS2-2, 1920×1080@60 Hz, DPCD 1.1, 2.7 Gbps HBR, 2 lanes
Status 2026-04-17 evening: v10 mainline u-boot trains link at HBR×2, panel reports IN_SYNC, BIST bars display on the panel — DP TX + eDP panel proven healthy. Pixels from our own framebuffer still absent; fault narrowed to VOP2 pixel-output chain or content-format upstream of DP TX. Vendor coolpi-loader (factory genbook_spi.img) also shows no logo on eDP — closes the “vendor knows how” assumption.
For next session: Dual-agent debug strategy — two independent AI agents drafted strategies with different top-3 bets. Overlap = high-confidence signal, disagreement = where the most learning happens. Start with the clk_summary check at the top of that page.
The vendor u-boot binary (“coolpi-loader”) runs the vendor DRM stack for its boot logo. Upstream u-boot has no RK3588 VOP2/eDP driver path at all. The goal of project Bin is to produce a clean, upstream-submittable u-boot that paints something on the eDP panel during boot — without pulling in the vendor's “display-cmd dance” from the downstream tree.
Secondary goals:
rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin — faster bin unlocked after the compute-module reseat (see DDR RE project MVP1 section).vidconsole_putc_xy(); u-boot is drawing its “U-Boo…” banner into framebuffer.No visible pixels on the panel. Everything upstream of the physical signal chain is correct. The bug lives in territory our direct-MMIO tools don't reach:
/dev/mem even while the kernel is driving the panel. That suggests the PHY's real state is behind regmap/syscon indirection (not MMIO-readable without a kernel module).clk_set_rate(dclk_vp2, 147.84 MHz) may be hitting the wrong mux.regmap_update_bits traffic that our vop2trace.ko kprobe never caught (it only hooked regmap_write).If a VOP2 register write reads back zero or doesn't take effect, it's cfg_done. This has burned us repeatedly across multiple sessions. Save yourself the reboot cycles:
VOP2 has shadow (staging) and active (hardware-driving) register banks. Writes land in shadow; they commit to active only when cfg_done fires.
CFG_DONE_IMD (bit 28 at offset 0x030) latches VP-level config immediately but does not cover window shadow registers (CLUSTERx CTRL0, YRGB_MST, VIR, ACT, DSP_INFO). Those need an explicit per-VP cfg_done write.CFG_DONE_EN | BIT(vp_id) | (BIT(vp_id) « 16) → offset 0x000.OVL_CTRL=0 because it does periodic regmap_update_bits cfg_done every atomic commit. Our one-shot u-boot init needs OVL_CTRL bit 31 set (0x80000000) or PORT_SEL/LAYER_SEL/CLUSTER writes never latch.Seven fixes, all of which are prerequisites for any future display work — any upstream u-boot VOP2/eDP driver will need these:
CFG_DONE_EN | BIT(vp_id) | (BIT(vp_id)«16) to the CFG_DONE reg after every batch of window writes. See cfg_done dance callout above.rk3588_edp_link_train_ce() was returning after CR/CE success without writing 0 to ADP_TRAINING_PTN_SET or DPCD 0x102. The PHY was streaming training symbols indefinitely — link “trained” but carrying no real video. Panel showed black despite all register state looking correct.EDP1_ENABLE_SHIFT=1 was wrong — kernel's live register value has bit 3 set, not bit 1. The naive EDP0/EDP1/HDMI0/HDMI1 = bits 0/1/2/3 mapping is not what RK3588 uses.The productive move of this session was switching from reading downstream vendor u-boot source (which was “vendor secret” register soup without explanations) to reading live register state from the running kernel while SDDM was displaying. That's ground truth — whatever bits are set when pixels reach the panel, those are the bits you need.
ssh ampere 'sudo python3 <<EOF import mmap, struct, os fd = os.open("/dev/mem", os.O_RDONLY|os.O_SYNC) mm = mmap.mmap(fd, 4096, mmap.MAP_SHARED, mmap.PROT_READ, offset=<BASE>) print(hex(struct.unpack("<I", mm[<OFFSET>:<OFFSET>+4])[0])) EOF'
Works for VOP2 (0xFDD90000) and the eDP controller (0xFDED0000). Does not work for the HDPTX PHY (0xFED70000, syscon-wrapped) or the CRU (0xFD7C0000) — those show mostly zeros even when active. Next session: kernel module to dump those properly.
| Where | What |
|---|---|
boltzmann:~/src/u-boot/ | u-boot source with rk3588_vop2, rk_edp, DTSI patches |
boltzmann:~/src/u-boot/drivers/video/rockchip/rk3588_vop2.c | VOP2 driver with kernel-trace-replay init and live register STATE dump |
boltzmann:~/src/u-boot/drivers/video/rockchip/rk_edp.c | eDP driver with spec-complete DPCD setup, pattern-disable, kernel-matched config_video |
ampere:/root/uboot-backups/ | Timestamped SPI backups across session |
meitner:/tmp/uart.log | All UART traffic during boot iterations (systemd-run uart-cap.service + uart-follow.service mirror on /dev/tty8) |
noether:~/claude/vop2_harness/vop2trace/ | LKM that traces regmap_write + writel_relaxed during kernel DRM module load. Dumps to /proc/vop2trace. |
Board is alive, so flash from running Linux (fastest):
scp boltzmann:~/src/u-boot/u-boot-rockchip-spi.bin ampere:/tmp/ ssh ampere 'sudo dd if=/dev/mtd0 of=/root/uboot-backups/spi-pre-$(date +%H%M%S).bin bs=1M; \ sudo flashcp --partition /tmp/u-boot-rockchip-spi.bin /dev/mtd0' # Reboot when YOU are ready.
If bricked: maskrom-mode recovery via meitner + rkdeveloptool (db loader
→ cs 9 SPI NOR → wl 0 → rd). Validated at ~60 s per cycle.
cd ~/src/u-boot && make ARCH=arm \ BL31=/home/mfritsche/src/tf-a/build/rk3588/release/bl31/bl31.elf \ TEE=/home/mfritsche/src/optee_os/out/arm-plat-rockchip/core/tee.bin \ ROCKCHIP_TPL=/home/mfritsche/src/rkbin/bin/rk35/rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin \ CC="distcc gcc" -j80
Do not use the decompiled-and-patched rk3588_ddr_v1.19_patched_v2.bin
— it bricks ampere (DDR training fails all channels/all lanes). Only the
stock rkbin blob is known-good.
/dev/mem-direct reads hit zero zones for some of these.regmap_update_bits in vop2trace.ko — current kprobe only catches regmap_write.Sessions 1–4 were heuristic register matching against the kernel live-dump. Tonight was empirical disambiguation: stopped assuming and started asking the hardware what it actually sees.
Branch bin/wip-2026-04-17 on boltzmann:~/src/u-boot carries all vendor-sourced
fixes on top of the earlier VOP2/eDP work:
hclk DT + driver wire-up.adp_write for the eDP controller (was a single-beat quirk).init_video + set_video_format paths added.SINK_STATUS (0x205) read after commit.
against VP2_DSP_BG=black with the source alpha. Alpha-0 = source×0 = black
field regardless of colour. Fixed in v10. * ''dclk_vop2'' rate was wrong in u-boot (~136 MHz vs the kernel 147.69 MHz); fixed by pre-selecting ''V0PLL'' as parent before ''clk_set_rate()'' so the u-boot clock driver takes the retune path instead of picking the nearest matching divider on the default ancestor. * VOP MMU disable hypothesis explored; MMU was already bypassed by reset — commit reverted, only the diagnostic code remains.
SINK_STATUS reports IN_SYNC — panel sees a valid stream.the main link). DP TX, PHY, cable, panel, backlight are all healthy.
upstream of DP TX — either VOP2 pixel-output chain or the content format
we hand to the eDP controller.
To rule out we are missing a vendor-secret step, built vendor coolpi-loader from
source:
ranke = CT171 on data, Debian 12 x86_64, ephemeral historian-named build host.coolpi-loader, branch linux-6.1-stan.linux-6.1-stan (b718d7b1f9) and
linux-6.1-stan-rkr5 (90083ce217) variants — vendor u-boot otherwise edits
the runtime distro ''/boot/extlinux/extlinux.conf'' on every boot. * Built + flashed ''genbook_spi.img''. Kernel boots cleanly. **No logo on eDP or HDMI.** Same symptom as our own u-boot. * Web-research agent surfaced the likely reason: vendor u-boot uses ''rockchip_show_logo()'', which loads a ''logo'' partition defined in ''parameter.txt''. ''genbook_spi.img'' is SPL-region only (8 MB) and does not include that partition. The Rockchip wiki explicitly states logo support is Android-only: not implemented for Linux. That probably explains why the vendor image does not light eDP on this reference board either. * Rebuilt on ''linux-6.1-stan-rkr5'' (Apr-2026, significant display/analogix_dp fixes over Jan-2026 ''stan''). Resulting image did not boot — idblock / u-boot FIT version mismatch between the vendor-baseline idblock and the rkr5 u-boot.img. * Panel SKU fork noted: BOE NV140FHM-N42 / N61 / N66 across hardware revisions. Vendor u-boot panel timings may not match our specific SKU even with a working logo path.
Conclusion: vendor u-boot is not a working reference for eDP-logo. The vendor-knows-how assumption is dead.
flashcp –partition <8mb-img> /dev/mtd0 via SSH into the running Archuserland is reliable across iterations. Default path.
rkdeveloptool maskrom: one wl 0 <8mb-img> per fresh maskrom session.
Multiple consecutive wl writes in the same session cause comm-object
failures. Recovery path validated: ''db rk3588_spl_loader_v1.19.113.bin → cs 9
→ wl 0 <image> → rd''.
* SPI layout on RK3588 GenBook (correction from earlier notes):
* idblock at 0x8000 (~208 KB)
* u-boot FIT at 0x60000
* The earlier 0x200000 value was wrong — that is
''CONFIG_MTD_BLK_U_BOOT_OFFS'' for eMMC, not SPI.
| Where | What | sha256 (prefix) |
|---|---|---|
boltzmann:~/projects/AMPere/output/u-boot-rockchip-spi-bin-wip-20260417-v10-8mb.bin | v10 mainline, known-good | 925137b923af… |
meitner:/tmp/genbook_spi_vendor_prebuilt.img | Vendor factory (no logo) | 7202caf7ca54… |
data:/rpool/nas/home/mfritsche/gbook/coolpi_rk3588_gbook_nor_upgrade.img | Vendor 7 MB upgrade image with logo partition — not tested | — |
Supersedes the earlier clk_summary / HDPTX trace checklist on this page — those checks have been run; see the debug-strategy page for the Closed / Reopened update.
idblock.bin (208 KB) that vendor make.shbuilds alongside the 492 KB download-boot loader, then rebuild the rkr5 stack
end-to-end so idblock + u-boot FIT versions line up. * Try ''coolpi_rk3588_gbook_nor_upgrade.img'' — hesitant, it might rewrite ''extlinux.conf'' through some other mechanism and we would rather not discover that by surprise. * Alternatively accept that pixels-in-u-boot is not a today problem; the kernel stack works cleanly on v10 for development use.
Bench session. Stopped heuristic matching, built an actual instrument: every
writel
/
readl
in u-boot and kernel now records a timestamped trace into a shared DDR region. Diff the two traces, the bugs fall out.
CONFIG_RK_TRIPWIRE
feature in both u-boot (branch
bin/wip-2026-04-17
) and
kernel (
linux-rk3588-marfrit
→
bin/tripwire
branch). Shared 2 GB no-map
DDR region at phys <code>0x780000000</code>, reserved via DT on both sides so neither side maps it as general memory. * Every <code>writel</code>/<code>readl</code> records a 32-byte record: <code>(cntvct_el0 tick, caller PC, phys addr, value, flags)</code>. Phys resolved via page-table walk in the kernel record fn; native on the u-boot side. * Offline C dumper at <code>boltzmann:~/src/u-boot/tools/rk_tw_dump/rk_tw_dump.c</code> emits CSV; <code>resolve.py</code> sidecar does symbol lookup via kallsyms bisect. * Bench runbook at <code>noether:~/claude/bin_bench_plan.md</code>.
CONFIG_BIN_PHASE1_NOINIT
u-boot (zero VP2/eDP register writes) + Phase 2 kernel. Result: kernel DRM can cold-init the display from scratch after
modprobe panel_edp
— SDDM displays. Conclusion: point-3 hypothesis (kernel depends on u-boot half-init) is disarmed.
panel_edp
to re-trigger modeset. Captured 77 K kernel writes, decoded full
atomic_commit
sequence.
vop2_probe
entry. Captured 4.3 M records (2.08 M u-boot + 2.24 M kernel). Zero lost.
Three bit-level diffs between the u-boot writes and the kernel writes:
| Register | Our u-boot | Kernel | Analysis |
|---|---|---|---|
VOP2 +0x0000 cfg_done (VP2 latch) | 0x00048004 | 0x00048005 | Kernel latches VP0+VP2 together. We latch VP2 alone. Source at drivers/video/rockchip/rk3588_vop2.c:287-292 — vop2_cfg_done(priv, 2) writes CFG_DONE_EN | BIT(2) | (BIT(2) << 16) . Candidate fix: also OR in BIT(0) | (BIT(0) << 16) so the DSP_IF crossbar can synchronize VP0+VP2 in one latch. |
VOP2 +0x0600 | 0x80000000 (bit 31 set) | 0x00000000 | Likely STANDBY/bypass bit on VP1. Need to trace where we set it (likely in the vendor-dump-mirror block from early RE) and either clear or never set. |
VOP2 +0x06f0 | 0x04040000 | 0x04040404 | Per-byte lane-phase register. We set only the upper 16 bits; kernel sets all 4 bytes to 0x04 . Trivial value fix. |
VP2_POST_SCL_CTRL 0x0e3c = 0x10001000
— rkr5 analysis flagged this as “cargo-cult worth decoding”; tripwire now proves kernel writes the same value, so the value is correct.
0x0e30..0x0e40
, output mux
0x06e8 = 0x34000000
, cluster1 window,
VP2_DSP_CTRL = 0x0000000f
.
With Phase 2 u-boot (no VP2 init), the “brown text flash” during early kernel boot DISAPPEARED. With Phase 3 u-boot (full VP2 init), it came back. This proves
simplefb
is scanning the raster the u-boot sets up — the flash is the u-boot VP2 output,
simplefb
just overlays kernel console text on it. Panel stays physically lit whenever u-boot does VP2 init, regardless of whether our own stripe content reaches it.
Fix the three divergences in u-boot, reflash, observe. If they close the “no pixels” wall, campaign is done. If not, mine the 4.3 M-record Phase 3 capture for the next divergence — we now have a reproducible capture.
| Where | What |
|---|---|
boltzmann:~/bin-phase3-full.csv | Full trace, 225 MB, 4.3 M records |
boltzmann:~/bin-phase2-modeset-v2.csv | Kernel modeset only, 77 K records |
boltzmann:~/projects/AMPere/output/u-boot-rockchip-spi-phase3-genbook-8mb.bin | Phase 3 u-boot SPI (sha ac461a2195… ) |
Bench session, four hours of chained hypotheses against a 20-phase reboot cycle. The memory key takeaway: observation beats theory in register-level reverse engineering. See feedback memory.
Phase 4 theorised that
OVL_CTRL
bit 31 was a STANDBY/bypass flag and cleared it. No readback verified the theory; no tripwire recorded the bit effect. Theory was wrong — per TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL,
OVL_CTRL
bits 31:30 are the
LAYERSEL_REGDONE_SEL
field (pick which VP cfg_done commits the LAYER_SEL register). Value
10
selects VP2 — exactly what we wanted, cleared by Phase 4.
Result: Cluster + LAYER_SEL writes were silently dropped for phases 5 through 12, and every subsequent register tweak appeared to fail for “unknown reasons.” Cost: ten phases of misdirection, around six hours of bench time. Phase 13 caught the regression only because tripwire captured the actual readback (
OVL_CTRL, PORT_SEL, LAYER_SEL
at reset defaults) alongside the intended writes — the discrepancy was the smoking gun.
Rule now in memory: before committing a register fix, write down the expected post-fix readback value. If you cannot name one, the hypothesis is not falsifiable. After the fix, read back. If readback does not match expected, the fix did not land — do not move on.
Phases 13 through 19 refixed OVL_CTRL bit 31 and iterated on Cluster0 plane writes (CLUSTER0_CTRL at 0x1100 =
0x80004001
=
FRM_RESETN_EN | MMU_BYPASS | CLUSTER_ENABLE
). Cluster0 registers never latched (readback stayed at reset defaults) no matter what load-enable or frame-delay we tried.
The wall: decoding PORT_SEL. Readback value
0xa0587783
decodes via TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL as:
| Bit range | Field | Value | Meaning |
|---|---|---|---|
| 17:16 | cluster0_sel_port | 00 | VP0 (no active output) |
| 19:18 | cluster1_sel_port | 10 | VP2 (eDP, active) |
Cluster0 shadow commits on VP0 vsync which never fires — hence writes never land. Cluster1 shadow commits on VP2 vsync, which IS firing. The kernel uses Cluster1 for the VP2 primary plane; we had been writing Cluster0 for phases 12 through 19.
Phase 20 reverts: base
0x1000 -> 0x1200
, CLUSTER_CTRL
0x1100 -> 0x1300
, WIN_REG_CFG_DONE load bit 0 to bit 1.
UART state dump after Phase 20 trace-replay init:
CLUSTER1 CTRL0 = 0x00000001 (WIN_ENABLE=1) CLUSTER1 YRGB = 0xef700000 (our fbbase) CLUSTER1 VIR = 0x00000780 (stride 1920) CLUSTER1 ACT = 0x0437077f (1920x1080) CLUSTER1_CTRL(0x1300) = 0x80004001 (FRM_RESETN | MMU_BYPASS | ENABLE)
Every plane register holds our value on readback — plane is committed, VP2 is running. Prior phases had all these same offsets reading reset defaults.
But panel stays dark (see webcam frames below). Kernel dmesg shows
*ERROR* POST_BUF_EMPTY irq err at vp2
firing at vblank rate (486k callbacks suppressed per 5 seconds). The kernel uses Cluster1 too, so this is NOT a u-boot-only bug: the VP2 post-scaler is starved on a path shared with kernel-side DRM.
sudo journalctl -k -b 0
on the Phase 20 boot, chronological:
23:26:43 platform fdd90000.vop: Adding to iommu group 5
23:26:48 rockchip-vop2 fdd90000.vop: Runtime PM usage count underflow!
(x17 in immediate succession)
23:26:48 rockchip-drm display-subsystem: [drm] *ERROR* POST_BUF_EMPTY
irq err at vp2 (then vop2_isr: 486859 callbacks suppressed)
Mechanism: u-boot leaves VP2 powered + clocked, but the kernel PM framework starts every device at
runtime_status = SUSPENDED
. When supplier links (IOMMU group 5, power-domain) run their init, they call
pm_runtime_put()
on fdd90000.vop. Counter underflows (was 0, went negative 17 times). Warning is non-fatal.
Later,
rockchip-vop2
modeset path calls
pm_runtime_get()
, but since the framework thinks the device is already resumed (counter not zero after the underflow math settles),
rockchip_vop2_runtime_resume()
never runs. That callback is where the driver does its full clock re-gate, reset-toggle, and IOMMU re-program sequence. Modeset proceeds with VP2 in the state u-boot left it — one or more sub-steps of the resume path skipped — post-scaler starves and POST_BUF_EMPTY fires at every vblank.
The physical fb probe confirms pixels did reach memory:
0xef700000
with
MEMREMAP_WB
.
for the top two thirds was overwritten by kernel takeover (fbcon or
DRM alloc, since we reserve no memory for the fb). * Row 1070 reads <code>ff0000ff ff0000ff ...</code> — the blue band (alpha=0xFF, B=0xFF) from u-boot stripe paint **survives** in the bottom third.
So the panel IS displaying what u-boot wrote (mostly black with a blue band at the bottom), latched by the panel internal scanout memory at the moment VP2 got PM-locked. eDP panels cache their last received frame; that is what the webcam sees.
DRM state at quiescence:
/sys/kernel/debug/dri/1/state
shows
Cluster1-win0
bound to
video_port2
with an Xorg-allocated XR24 framebuffer (fb=90, 1920×1080), yet
/sys/kernel/debug/dri/1/vop2/summary
reports
Video Port2: DISABLED
— VP2 is force-idle after the POST_BUF_EMPTY cascade.
rk3588_vop2_display_init
returns, walk VP2 all the way back down: STANDBY bit, CRU re-gate,
PMU bus-idle, PMU power-off, release u-boot PD. Kernel probes a clean SUSPENDED device, resume callback runs normally. Splash likely will NOT persist visually (eDP blanks within a frame of signal loss, unless the panel supports PSR) but handover is clean. * **(B) State-match path** — leave VP2 running in exactly the state kernel early-probe expects. Map every register the kernel reads at probe, diff vs what u-boot leaves, fix discrepancies. Probably a dead end. * **(C) NOINIT + text splash** — stay on <code>BIN_PHASE1_NOINIT=y</code> (already works), no graphical splash, kernel does all VP2 setup cleanly. Lowest risk, least impressive outcome.
Phase 21 pursues (A). Kconfig gate:
BIN_VP2_TEARDOWN
. Code lives at
drivers/video/rockchip/rk3588_vop2.c
end of
rk3588_vop2_display_init
, mirroring the PMU+CRU setup sequence in reverse. Verdict metric:
sudo journalctl -k -b 0 | grep underflow
should return zero matches after a clean boot.
Eyedot USB camera on meitner, pointed at the ampere panel. Framing convention:
third of the frame shows keyboard / desk surface (reference for
"backlight off" baseline) and the top two thirds covers the panel. * Default capture script: <code>meitner:/tmp/eyedot-cap.sh /tmp/bin-phaseN.mkv</code>. Records at 10 fps raw, downsamples to 1 fps into H.264, auto-stops 5 seconds after UART shows <code>login:</code>. * ffmpeg signalstats is unreliable on JPEGs from this camera; use a Python one-liner on raw pixel data for luma histograms. * Dim content on a dark panel is hard to see raw. Standard enhance pipeline: <code>ffmpeg -i frame.jpg -vf eq=brightness=0.1:contrast=2.5 -y out.jpg</code>, then <code>crop=iw*2/3:ih:iw/3:0</code> to strip the desk portion and focus on panel content. * Trap: camera auto-exposure skews dark and bright regions. Luma average over the whole frame is dominated by the bright environment; always crop to panel before averaging. * Another trap: a stationary "dark blob" in the top-middle of frames is the camera head shadow on the panel, not displayed content. If it shows across multiple frames in the same position, it is not pixels.
Artifacts: frames at
meitner:/tmp/bin-phaseN-frames/
, videos at
meitner:/tmp/bin-phaseN.mkv
. Phase 20 capture yielded 303 frames over around 5 minutes (reboot to sddm). Phase 20 panel-view crops show dim navy blue bottom two thirds, lighter top third — consistent with “top 2/3 black (zeros) + bottom 1/3 blue (0xFF0000FF)” latched on the panel.
project_bin_tripwire_findings.md
— case study, full
findings for 2026-04-18 tripwire plus 2026-04-19 Phase 20 plus
Phase 21 direction * <code>feedback_observation_over_theory.md</code> — the bit-31 detour memorialised * <code>feedback_observer_first.md</code> — Phase 7 backlight-off during visual test memorialised * <code>project_bin_42c3_timeline.md</code> — narrative arc for the 42C3 talk proposal
Phase 21 (A) failed on both axes: the BIN_VP2_TEARDOWN code ran correctly (UART:
VOP2: teardown done, PMU pwr=0x1b idle=0x37fff
, PD_VOP and PD_VO1 gated off), but the kernel still got exactly 17
Runtime PM usage count underflow
warnings AND now crashed with init SIGSEGV because
rockchip-drm
probe tries to read VP2 registers on a powered-off domain and AXI hangs. Ampere went into a boot loop; recovered via maskrom +
db rk3588_spl_loader_v1.19.113.bin → cs 9 → wl 0 phase20.bin → rd
on meitner.
Two implications:
They fire regardless of PMU state. Path-A was attacking the wrong
target. * You cannot power-gate VP2 before kernel boot. Kernel expects register readability at probe. Either leave it on (Phase 20 state) or never turn it on (Phase 1 NOINIT).
4.3 M-record capture from 2026-04-18. Filtered to VOP2 region (
0xfdd90000..0xfdd95fff
), split by stage, compared without aggressive dedup.
| Metric | u-boot | kernel |
|---|---|---|
| unique offsets touched | 54 | 1118 |
| total accesses | 87 | 24 614 |
Kernel-only offsets (we never write): 1072. Most are per-vblank IRQ status/clear (
0x0084 written 900x
,
0x0094 written 900x
,
0x00c4 written 900x
) — expected maintenance traffic, not missing init.
The one-shot divergences that actually matter, ranked by likely impact on post-scaler starvation:
0x0e00 VP2_DSP_CTRL
: we
0x0000000f
, kernel
0x1040000f
. Kernel sets bit 28
DSP_LUT_EN
and bit 22 <code>GAMMA_UPDATE_EN</code>, and programs the full 1024-entry LUT at <code>0x5000+</code>. We write neither the enable bits nor the LUT. - <code>0x0e0c VP2_CLK_CTRL</code>: we <code>0xe</code>, kernel <code>0x2</code>. Different internal clock-divider config. - <code>0x0000 cfg_done</code>: we <code>0x00048004</code> (VP2 only), kernel <code>0x00048005</code> (VP0+VP2 together). See VP0 theory below. - <code>0x06f0</code>: we <code>0x04040000</code>, kernel <code>0x04040404</code>. Trivial missing two bytes. - <code>0x0028, 0x002c</code> DSP_IF_EN block: we miss bits 3/4 in 0x0028 and all of 0x002c=<code>0x00060000</code>.
Kernel also brings up VP0 fully (VP0_DSP_CTRL at
0x0c00
, VP0 timing at
0x0c48..0x0c54
, VP0 LINE_FLAG at
0x0070
, VP0 INT_EN at
0x00a0
) — we write zero VP0 regs.
The cfg_done question has been flip-flopped across Bin sessions. Pinning it down in memory now at
project_bin_vp0_theory.md
.
Rule: on the GenBook, cfg_done at
0x0000
must latch both VP0 and VP2 together (value
0x00048005
). Do not drop the VP0 bit on the reasoning that VP0 has no connector.
Why: RK3588 VOP2 has a single shared overlay mix crossbar, not per-VP silos. PORT_SEL readback
0xa0587783
decodes to: PORT0_MUX=3 (VP0 gets layers 0..3), PORT1_MUX=8 (VP1 disabled), PORT2_MUX=7 (VP2 gets layers 4..7), PORT3_MUX=7. VP2 layer-slots 4..7 sit downstream of VP0 layer-slots 0..3 in the same mix pipeline. When VP0 cfg_done stays pending (because we only latch VP2), VP0 mix state is in shadow, never commits, and the mix stalls — VP2 post-scaler reads empty → POST_BUF_EMPTY at vblank rate → panel dark.
VP0 vsync fires whenever VP0 has dclk running + valid timing, independent of whether a panel is physically attached. Kernel brings up VP0 fully for exactly this reason: keeps the mix crossbar advancing.
History of the bug:
ddefc154
(pre-Phase 12): tripwire diff correctly found
0x00048005
two-phase latch, hypothesised DSP_IF crossbar
needs VP0 committed. Right theory. - <code>e05c0915</code> (Phase 15-16): reversed on the false reasoning "VP0 has no connector = no vsync." **Wrong.** Value stayed at <code>0x00048004</code> through Phase 20. - <code>7bd68b59</code> (Phase 20): fixed Cluster routing but kept the VP2-only cfg_done. Panel still dark, POST_BUF_EMPTY still firing. - <code>2026-04-20</code> trace-diff: confirmed kernel terminal <code>0x00048005</code>, plus VP0 init writes we never replicate.
Restore
0x00048005
in cfg_done AND add VP0 fake-run init:
VP0_DSP_CTRL = 0x1040000f
at
0x0c00
VP0 HTOTAL_HS_END = 0x0898002c
at
0x0c48
VP0 HACT_ST_END = 0x00c00840
at
0x0c4c
VP0 VTOTAL_VS_END = 0x04650005
at
0x0c50
VP0 VACT_ST_END = 0x00290461
at
0x0c54
VP0 LINE_FLAG = 0x04610461
at
0x0070
VP0 INT_EN = 0x00200020
at
0x00a0
Plus top candidates from the value divergences: flip
VP2_DSP_CTRL += DSP_LUT_EN + GAMMA_UPDATE_EN
(and populate LUT if needed), fix
VP2_CLK_CTRL
to
0x2
, complete
0x06f0 = 0x04040404
.
One change per commit, verify via state-dump readback before moving on.
Bin is closed, partial-win. Ampere's display silicon failed by end of session. Symptoms: link trains, backlight comes on, BIST bars display on panel (PHY-internal pattern generator), eDP AUX reads panel EDID correctly — but the VOP2-to-HDPTX-internal-pipeline never delivers a valid pixel stream. Three different kernel-side failure modes observed across u-boot variants:
u-boot's VP2 half-init leaving mix pipeline in error-reassert loop.
Fixed by vanilla u-boot (storm count drops to zero). Storm is NOT why the panel is dark — just excessive IRQ noise. * **rockchip-vop2 port_mux_done timeout** — kernel's PORT_SEL commit does not latch with vanilla u-boot. Fixed by Phase 22 u-boot pre-committing port_mux. Trade-off, not solution. * **runtime PM refcount underflow (9447 in 1 min with vanilla kernel + Phase 22 u-boot)** — kernel PM vs u-boot PM state drift.
Final confirmation 2026-04-20 13:20: vendor `coolpi-loader` u-boot + vendor kernel image also produces dark panel. Since vendor u-boot is supposed to be the working reference for this exact SKU, vendor-image-dark = display silicon fault, not software bug.
Unanswerable question: did the campaign contribute to the silicon failure? 25+ reboot cycles, 100k IRQ/s storms across many hours, and many register writes into analog PHY/PLL blocks without TRM backing — collectively plausible contributors, individually unprovable. See
feedback_trm_or_nothing.md
— forward rule: every register write needs TRM backing, especially in analog blocks.
RK3588 VOP2 boards where VP0 has no connector but the mix crossbar
still requires VP0 cfg_done to commit. Memory file <code>project_bin_vp0_theory.md</code> documents it with receipts. * **Phase 22 u-boot binary** at <code>boltzmann:~/projects/AMPere/ output/u-boot-rockchip-spi-phase22-genbook-8mb.bin</code>. Correct Cluster1 routing, VP0 fake-run, cfg_done 0x00048005, and enough PMU/CRU init to produce valid link training + panel stream. If a replacement GenBook with working silicon materialises, this is the start point, not Phase 1 or Phase 24. * **Tripwire infrastructure** — shared 2 GB DDR ring, u-boot + kernel writel/readl recording, offline CSV dumper. Useful generic tool for arm64 boot-path debugging; worth extracting + upstreaming. * **One confirmed SDDM-on-eDP-via-upstream** photo at <code>Documents/Markus_And_Claude/bin-phase22-sddm-20260420-0806.jpg</code> on Nextcloud. Clock showing 08:06:37, Monday 20 April, KDE/Arch default wallpaper. Never reproduced. Only confirmed-pixels frame in the whole campaign. * **Register-divergence catalogue** from two tripwire traces (<code>bin-phase3-full.csv</code> and <code>phase24-trace.csv</code>) — reference for anyone re-implementing RK3588 VOP2 bring-up. * **A list of register fields whose meaning was in the TRM vs whose meaning we inferred** — useful for the next RE engineer, sobering for this one.
bleed + ambient reflection. Xorg can be running, CRTC active, fb
bound — panel still shows nothing. Most of what we captured fell here. * **100–130**: panel backlit + receiving a signal that hits the "LCD neutral" state (uniform bright with no structure). Looks identical to real "white screen" content in the camera. Trap. * **170–200** **with visible structure** (text, icons, wallpaper): real rendered pixels. The 08:06 shot is the only one in the campaign.
If someone comes back to this project with replacement hardware, read in this order:
project_bin_closeout.md
(this file)
project_bin_phase22_notes.md
project_bin_vp0_theory.md
project_bin_tripwire_findings.md
feedback_trm_or_nothing.md
feedback_observation_over_theory.md
The full arc is in this DokuWiki page (read top-to-bottom). The memory files are the executive summary + rules-going-forward.
Not “done” — more like “won the insights, lost the hardware.” The u-boot contribution is upstreamable; the display never lit up on this specific silicon; and a real talk at 42C3 would honestly lead with “here is how we broke a RK3588 laptop trying to tell it to display pixels, and what we learned on the way.”
During the ampere-silicon-maybe-fried phase, Claude Code running on noether was observed to be at low effort despite the user setting max effort in a sibling client/session. The low-effort setting was very likely the proximate cause of the
tar -xzf -C /
footgun that wiped the
/lib -> usr/lib
symlink on ampere, bricking SSH until physical disk recovery. A max-effort pass would almost certainly have caught the merged-usr archive-layout trap in advance.
Kept as local memory (
feedback_effort_stakes.md
) and MEMORY.md index. Draft of a possible future issue against the Claude Code GitHub tracker is archived here for reference:
Title candidate: Effort level auto-downshifting during disaster- recovery scenarios is a customer-retention risk
Body:
During a hardware RE session (RK3588 SBC, upstream u-boot + kernel work on a ~600 EUR device) the Claude Code client dropped effort mid-recovery after the user had explicitly set max effort in a sibling client/session. The effort setting did not propagate between sessions, and the assistant did not surface its current effort level nor flag that it had shifted down.
In the specific incident, the low-effort pass dispatched a
tar -xzf -C /
against a merged-/usr Arch rootfs and clobbered the
/lib -> usr/lib
symlink — breaking the dynamic linker and bricking SSH access until physical disk recovery was possible. A max-effort pass would almost certainly have read the archive layout first, considered the symlink, and staged safely. The fault-mode is exactly what max-effort exists to prevent.
The user-visible stakes: the hardware could have been permanently damaged (thermally we got close — sustained ~100k IRQ/s on a display pipeline error loop for hours). In that scenario, the user does not necessarily have “just buy another one” as an option.
Two concrete asks:
same user, or at minimum be surfaced visibly at the top of every
conversation. - **Auto-downshift during a session that includes disaster-recovery signals** (rm -rf, flashing, rootfs operations, user-expressed distress, expensive/irreversible steps) should be suppressed, or at least flagged to the user before committing the next expensive action.
Low effort in this context is not just “less helpful” — it is directly responsible for the fault that nearly cost a 600 EUR+ device. That is a churn-level outcome if it hits the wrong customer.
Status: drafted 2026-04-20, NOT filed. User chose to keep local rather than submit to the public tracker at this time.