User Tools

Site Tools


bin

Table of Contents

Bin — Boot it Nicely (GenBook u-boot eDP upstream)

Mainline u-boot on the CoolPi CM5 GenBook (RK3588) with working eDP boot display and internal keyboard. First-ever RK3588 eDP bring-up in upstream u-boot (not the Rockchip downstream fork).

Target hardware: ampere (CoolPi CM5 GenBook, RK3588 + 32 GB LPDDR5) Panel: CSOT T9 SNE001BS2-2, 1920×1080@60 Hz, DPCD 1.1, 2.7 Gbps HBR, 2 lanes Status 2026-04-17 evening: v10 mainline u-boot trains link at HBR×2, panel reports IN_SYNC, BIST bars display on the panel — DP TX + eDP panel proven healthy. Pixels from our own framebuffer still absent; fault narrowed to VOP2 pixel-output chain or content-format upstream of DP TX. Vendor coolpi-loader (factory genbook_spi.img) also shows no logo on eDP — closes the “vendor knows how” assumption.

For next session: Dual-agent debug strategy — two independent AI agents drafted strategies with different top-3 bets. Overlap = high-confidence signal, disagreement = where the most learning happens. Start with the clk_summary check at the top of that page.

Why

The vendor u-boot binary (“coolpi-loader”) runs the vendor DRM stack for its boot logo. Upstream u-boot has no RK3588 VOP2/eDP driver path at all. The goal of project Bin is to produce a clean, upstream-submittable u-boot that paints something on the eDP panel during boot — without pulling in the vendor's “display-cmd dance” from the downstream tree.

Secondary goals:

  • Prove the upstream-clean boot chain (mainline u-boot + Collabora TF-A + upstream OP-TEE) can drive eDP.
  • Produce a patch series acceptable to u-boot-custodians (no “vendor secret” compensation hacks).

What actually works as of 2026-04-16

  • Full DDR / TPL / SPL / BL31 / OP-TEE chain, using stock Rockchip blob rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin — faster bin unlocked after the compute-module reseat (see DDR RE project MVP1 section).
  • VOP2 probe (clocks, PMU power domain, GRF) and full register-level init of VP2 + Cluster1 window + overlay mixer.
  • eDP controller probe: EDID read via AUX, DPCD capabilities read (rev 1.1, 2.7 Gbps HBR, 2 lanes).
  • Link training succeeds (CR + CE pass), training pattern properly disabled at end (was the critical “no pixels” bug #2).
  • Backlight on (PWM6 + GPIO4_A3, via panel-simple).
  • VOP2 state matches kernel live-dump perfectly: every register (SYS_PD, CFG_DONE, DSP_IF_EN, OVL_CTRL, VP2_DSP_CTRL, CLUSTER1 CTRL0/YRGB/VIR/ACT/DSP_INFO/DSP_ST, OVL PORT_SEL/LAYER_SEL, timing regs) reads back the same bits as the running kernel while SDDM is displaying.
  • eDP controller state matches kernel after applying the config_video delta (VIDEO_CTL_2=0x10, VIDEO_CTL_3=0x80, VIDEO_CTL_10=0x02, FUNC_EN_2=0x80, SYS_CTL_4=0x08).
  • vidconsole path live — proven by instrumenting vidconsole_putc_xy(); u-boot is drawing its “U-Boo…” banner into framebuffer.

What's still broken

No visible pixels on the panel. Everything upstream of the physical signal chain is correct. The bug lives in territory our direct-MMIO tools don't reach:

  1. HDPTX1 PHY state — 0xFED70000 reads back mostly zero via /dev/mem even while the kernel is driving the panel. That suggests the PHY's real state is behind regmap/syscon indirection (not MMIO-readable without a kernel module).
  2. CRU dclk_vp2 source mux — for eDP on VP2, dclk_vp2 must be parented on HDPTX PHY's recovered clock, not a CRU PLL. u-boot's clk_set_rate(dclk_vp2, 147.84 MHz) may be hitting the wrong mux.
  3. VO1_GRF (0xFD5AC000) has DP mux / HDPTX routing bits; the kernel writes them, we write none.
  4. regmap_update_bits traffic that our vop2trace.ko kprobe never caught (it only hooked regmap_write).

The cfg_done dance — biggest recurring gotcha

If a VOP2 register write reads back zero or doesn't take effect, it's cfg_done. This has burned us repeatedly across multiple sessions. Save yourself the reboot cycles:

VOP2 has shadow (staging) and active (hardware-driving) register banks. Writes land in shadow; they commit to active only when cfg_done fires.

  • CFG_DONE_IMD (bit 28 at offset 0x030) latches VP-level config immediately but does not cover window shadow registers (CLUSTERx CTRL0, YRGB_MST, VIR, ACT, DSP_INFO). Those need an explicit per-VP cfg_done write.
  • Required write: CFG_DONE_EN | BIT(vp_id) | (BIT(vp_id) « 16) → offset 0x000.
  • Kernel has OVL_CTRL=0 because it does periodic regmap_update_bits cfg_done every atomic commit. Our one-shot u-boot init needs OVL_CTRL bit 31 set (0x80000000) or PORT_SEL/LAYER_SEL/CLUSTER writes never latch.
  • Writes to non-shadow regs (CTRL1, DSP_ST, clock, mux) commit immediately and read back cleanly. Writes to shadow regs need cfg_done.
  • Debugging: add readback right after the write; if zero, add cfg_done; readback again.

Critical bugs caught this session (2026-04-16)

Seven fixes, all of which are prerequisites for any future display work — any upstream u-boot VOP2/eDP driver will need these:

  1. Explicit per-VP cfg_done write. CFG_DONE_IMD does NOT latch window shadow registers. Must write CFG_DONE_EN | BIT(vp_id) | (BIT(vp_id)«16) to the CFG_DONE reg after every batch of window writes. See cfg_done dance callout above.
  2. DP_TRAINING_PATTERN_DISABLE at end of channel equalization. rk3588_edp_link_train_ce() was returning after CR/CE success without writing 0 to ADP_TRAINING_PTN_SET or DPCD 0x102. The PHY was streaming training symbols indefinitely — link “trained” but carrying no real video. Panel showed black despite all register state looking correct.
  3. DPCD setup before training. Missing spec-required writes: ML_CH_CODING_SET=1 (ANSI 8B/10B), DOWNSPREAD_CTRL to match SSC capability, enhanced-frame bit in LANE_COUNT_SET when sink supports it.
  4. eDP config_video delta matching kernel. Our config_video wasn't writing VIDEO_CTL_2=0x10, VIDEO_CTL_3=0x80, VIDEO_CTL_10 bit 1, FUNC_EN_2=0x80, SYS_CTL_4=0x08. All of these are needed; the kernel's analogix_dp driver sets them but the u-boot one doesn't.
  5. VP2_DSP_CTRL=0x1000000f — kernel uses OUT_MODE=AAAA (0xf) + bit 28 set. Briefly tried S888 (0x8), wrong tree; stay with AAAA + bit 28.
  6. OVL_CTRL bit 31 for immediate latch. See cfg_done dance callout.
  7. VOP_GRF_CON2 bit layout corrected. Our original EDP1_ENABLE_SHIFT=1 was wrong — kernel's live register value has bit 3 set, not bit 1. The naive EDP0/EDP1/HDMI0/HDMI1 = bits 0/1/2/3 mapping is not what RK3588 uses.

Method: kernel live-dump as oracle

The productive move of this session was switching from reading downstream vendor u-boot source (which was “vendor secret” register soup without explanations) to reading live register state from the running kernel while SDDM was displaying. That's ground truth — whatever bits are set when pixels reach the panel, those are the bits you need.

ssh ampere 'sudo python3 <<EOF
import mmap, struct, os
fd = os.open("/dev/mem", os.O_RDONLY|os.O_SYNC)
mm = mmap.mmap(fd, 4096, mmap.MAP_SHARED, mmap.PROT_READ, offset=<BASE>)
print(hex(struct.unpack("<I", mm[<OFFSET>:<OFFSET>+4])[0]))
EOF'

Works for VOP2 (0xFDD90000) and the eDP controller (0xFDED0000). Does not work for the HDPTX PHY (0xFED70000, syscon-wrapped) or the CRU (0xFD7C0000) — those show mostly zeros even when active. Next session: kernel module to dump those properly.

Artifacts

Where What
boltzmann:~/src/u-boot/ u-boot source with rk3588_vop2, rk_edp, DTSI patches
boltzmann:~/src/u-boot/drivers/video/rockchip/rk3588_vop2.c VOP2 driver with kernel-trace-replay init and live register STATE dump
boltzmann:~/src/u-boot/drivers/video/rockchip/rk_edp.c eDP driver with spec-complete DPCD setup, pattern-disable, kernel-matched config_video
ampere:/root/uboot-backups/ Timestamped SPI backups across session
meitner:/tmp/uart.log All UART traffic during boot iterations (systemd-run uart-cap.service + uart-follow.service mirror on /dev/tty8)
noether:~/claude/vop2_harness/vop2trace/ LKM that traces regmap_write + writel_relaxed during kernel DRM module load. Dumps to /proc/vop2trace.

Flash pipeline

Board is alive, so flash from running Linux (fastest):

scp boltzmann:~/src/u-boot/u-boot-rockchip-spi.bin ampere:/tmp/
ssh ampere 'sudo dd if=/dev/mtd0 of=/root/uboot-backups/spi-pre-$(date +%H%M%S).bin bs=1M; \
  sudo flashcp --partition /tmp/u-boot-rockchip-spi.bin /dev/mtd0'
# Reboot when YOU are ready.

If bricked: maskrom-mode recovery via meitner + rkdeveloptool (db loader → cs 9 SPI NOR → wl 0rd). Validated at ~60 s per cycle.

Build recipe (stock fast DDR blob — mandatory)

cd ~/src/u-boot && make ARCH=arm \
  BL31=/home/mfritsche/src/tf-a/build/rk3588/release/bl31/bl31.elf \
  TEE=/home/mfritsche/src/optee_os/out/arm-plat-rockchip/core/tee.bin \
  ROCKCHIP_TPL=/home/mfritsche/src/rkbin/bin/rk35/rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin \
  CC="distcc gcc" -j80

Do not use the decompiled-and-patched rk3588_ddr_v1.19_patched_v2.bin — it bricks ampere (DDR training fails all channels/all lanes). Only the stock rkbin blob is known-good.

Next session

  1. Write a kernel module that dumps HDPTX1 PHY + CRU dclk_vp2 mux + VO1_GRF full state while display works. /dev/mem-direct reads hit zero zones for some of these.
  2. Hook regmap_update_bits in vop2trace.ko — current kprobe only catches regmap_write.
  3. Diff kernel vs u-boot for VO1_GRF + CRU dclk_vp2; write our init to match.
  4. Fallback: HDMI output path (simpler protocol) to prove VOP2 itself works, then come back to eDP.

2026-04-17 evening — empirical disambiguation

Sessions 1–4 were heuristic register matching against the kernel live-dump. Tonight was empirical disambiguation: stopped assuming and started asking the hardware what it actually sees.

u-boot eDP campaign — v10

Branch bin/wip-2026-04-17 on boltzmann:~/src/u-boot carries all vendor-sourced fixes on top of the earlier VOP2/eDP work:

  • hclk DT + driver wire-up.
  • Double-beat adp_write for the eDP controller (was a single-beat quirk).
  • init_video + set_video_format paths added.
  • DPCD enhanced-frame + downspread + training-pattern-disable writes completed.
  • DPCD SINK_STATUS (0x205) read after commit.
  • Alpha=0 bug in the u-boot stripe paint — vendor clusters blend incoming pixels

against VP2_DSP_BG=black with the source alpha. Alpha-0 = source×0 = black

  field regardless of colour. Fixed in v10.
* ''dclk_vop2'' rate was wrong in u-boot (~136 MHz vs the kernel 147.69 MHz);
  fixed by pre-selecting ''V0PLL'' as parent before ''clk_set_rate()'' so the u-boot
  clock driver takes the retune path instead of picking the nearest matching divider
  on the default ancestor.
* VOP MMU disable hypothesis explored; MMU was already bypassed by reset — commit
  reverted, only the diagnostic code remains.

What this unlocked

  • Link trains at HBR×2 without errors.
  • Panel DPCD SINK_STATUS reports IN_SYNC — panel sees a valid stream.
  • Backlight on.
  • BIST colour bars display on the panel (PHY-internal pattern generator driving

the main link). DP TX, PHY, cable, panel, backlight are all healthy.

  • Own framebuffer still produces no visible pixels. Whatever is wrong is

upstream of DP TX — either VOP2 pixel-output chain or the content format

  we hand to the eDP controller.

Vendor u-boot detour (closes vendor-knows-how)

To rule out we are missing a vendor-secret step, built vendor coolpi-loader from source:

  • ranke = CT171 on data, Debian 12 x86_64, ephemeral historian-named build host.
  • Source: vendor coolpi-loader, branch linux-6.1-stan.
  • Extlinux.conf auto-rewrite hack removed in both linux-6.1-stan (b718d7b1f9) and

linux-6.1-stan-rkr5 (90083ce217) variants — vendor u-boot otherwise edits

  the runtime distro ''/boot/extlinux/extlinux.conf'' on every boot.
* Built + flashed ''genbook_spi.img''. Kernel boots cleanly. **No logo on eDP or
  HDMI.** Same symptom as our own u-boot.
* Web-research agent surfaced the likely reason: vendor u-boot uses
  ''rockchip_show_logo()'', which loads a ''logo'' partition defined in
  ''parameter.txt''. ''genbook_spi.img'' is SPL-region only (8 MB) and does not
  include that partition. The Rockchip wiki explicitly states logo support is
  Android-only: not implemented for Linux. That probably explains why the
  vendor image does not light eDP on this reference board either.
* Rebuilt on ''linux-6.1-stan-rkr5'' (Apr-2026, significant display/analogix_dp
  fixes over Jan-2026 ''stan''). Resulting image did not boot — idblock / u-boot
  FIT version mismatch between the vendor-baseline idblock and the rkr5 u-boot.img.
* Panel SKU fork noted: BOE NV140FHM-N42 / N61 / N66 across hardware revisions.
  Vendor u-boot panel timings may not match our specific SKU even with a working
  logo path.

Conclusion: vendor u-boot is not a working reference for eDP-logo. The vendor-knows-how assumption is dead.

Flash protocol confirmed

  • flashcp –partition <8mb-img> /dev/mtd0 via SSH into the running Arch

userland is reliable across iterations. Default path.

  • rkdeveloptool maskrom: one wl 0 <8mb-img> per fresh maskrom session.

Multiple consecutive wl writes in the same session cause comm-object

  failures. Recovery path validated: ''db rk3588_spl_loader_v1.19.113.bin → cs 9
  → wl 0 <image> → rd''.
* SPI layout on RK3588 GenBook (correction from earlier notes):
  * idblock at 0x8000 (~208 KB)
  * u-boot FIT at 0x60000
  * The earlier 0x200000 value was wrong — that is
    ''CONFIG_MTD_BLK_U_BOOT_OFFS'' for eMMC, not SPI.

Reference images stashed

Where What sha256 (prefix)
boltzmann:~/projects/AMPere/output/u-boot-rockchip-spi-bin-wip-20260417-v10-8mb.bin v10 mainline, known-good 925137b923af…
meitner:/tmp/genbook_spi_vendor_prebuilt.img Vendor factory (no logo) 7202caf7ca54…
data:/rpool/nas/home/mfritsche/gbook/coolpi_rk3588_gbook_nor_upgrade.img Vendor 7 MB upgrade image with logo partition — not tested

Updates to the Next-session list

Supersedes the earlier clk_summary / HDPTX trace checklist on this page — those checks have been run; see the debug-strategy page for the Closed / Reopened update.

  • Maybe extract the separate idblock.bin (208 KB) that vendor make.sh

builds alongside the 492 KB download-boot loader, then rebuild the rkr5 stack

  end-to-end so idblock + u-boot FIT versions line up.
* Try ''coolpi_rk3588_gbook_nor_upgrade.img'' — hesitant, it might rewrite
  ''extlinux.conf'' through some other mechanism and we would rather not discover
  that by surprise.
* Alternatively accept that pixels-in-u-boot is not a today problem; the kernel
  stack works cleanly on v10 for development use.

2026-04-18 evening — tripwire session: register trajectories captured

Bench session. Stopped heuristic matching, built an actual instrument: every

writel

/

readl

in u-boot and kernel now records a timestamped trace into a shared DDR region. Diff the two traces, the bugs fall out.

Infrastructure built

  • CONFIG_RK_TRIPWIRE

    feature in both u-boot (branch

    bin/wip-2026-04-17

    ) and

kernel (

linux-rk3588-marfrit

bin/tripwire

branch). Shared 2 GB no-map

  DDR region at phys <code>0x780000000</code>, reserved via DT on both sides so neither side
  maps it as general memory.
* Every <code>writel</code>/<code>readl</code> records a 32-byte record: <code>(cntvct_el0 tick, caller PC, phys addr, value, flags)</code>. Phys resolved via page-table walk in the kernel record fn; native on the u-boot side.
* Offline C dumper at <code>boltzmann:~/src/u-boot/tools/rk_tw_dump/rk_tw_dump.c</code> emits CSV; <code>resolve.py</code> sidecar does symbol lookup via kallsyms bisect.
* Bench runbook at <code>noether:~/claude/bin_bench_plan.md</code>.

Three phases of the evening

  1. Phase 1:
    CONFIG_BIN_PHASE1_NOINIT

    u-boot (zero VP2/eDP register writes) + Phase 2 kernel. Result: kernel DRM can cold-init the display from scratch after

    modprobe panel_edp

    — SDDM displays. Conclusion: point-3 hypothesis (kernel depends on u-boot half-init) is disarmed.

  2. Phase 2: same u-boot, armed tripwire at runtime, reloaded
    panel_edp

    to re-trigger modeset. Captured 77 K kernel writes, decoded full

    atomic_commit

    sequence.

  3. Phase 3: u-boot with full VP2+eDP init AND tripwire armed from first
    vop2_probe

    entry. Captured 4.3 M records (2.08 M u-boot + 2.24 M kernel). Zero lost.

Concrete register divergences found

Three bit-level diffs between the u-boot writes and the kernel writes:

Register Our u-boot Kernel Analysis
VOP2 +0x0000

cfg_done (VP2 latch)

0x00048004
0x00048005
Kernel latches VP0+VP2 together. We latch VP2 alone. Source at
drivers/video/rockchip/rk3588_vop2.c:287-292

vop2_cfg_done(priv, 2)

writes

CFG_DONE_EN | BIT(2) | (BIT(2) << 16)

. Candidate fix: also OR in

BIT(0) | (BIT(0) << 16)

so the DSP_IF crossbar can synchronize VP0+VP2 in one latch.

VOP2 +0x0600
0x80000000

(bit 31 set)

0x00000000
Likely STANDBY/bypass bit on VP1. Need to trace where we set it (likely in the vendor-dump-mirror block from early RE) and either clear or never set.
VOP2 +0x06f0
0x04040000
0x04040404
Per-byte lane-phase register. We set only the upper 16 bits; kernel sets all 4 bytes to
0x04

. Trivial value fix.

Confirmations (values that match kernel — NOT bugs)

  • VP2_POST_SCL_CTRL 0x0e3c = 0x10001000

    rkr5 analysis flagged this as “cargo-cult worth decoding”; tripwire now proves kernel writes the same value, so the value is correct.

  • VP2 post-config block
    0x0e30..0x0e40

    , output mux

    0x06e8 = 0x34000000

    , cluster1 window,

    VP2_DSP_CTRL = 0x0000000f

    .

Secondary observation

With Phase 2 u-boot (no VP2 init), the “brown text flash” during early kernel boot DISAPPEARED. With Phase 3 u-boot (full VP2 init), it came back. This proves

simplefb

is scanning the raster the u-boot sets up — the flash is the u-boot VP2 output,

simplefb

just overlays kernel console text on it. Panel stays physically lit whenever u-boot does VP2 init, regardless of whether our own stripe content reaches it.

Next steps

Fix the three divergences in u-boot, reflash, observe. If they close the “no pixels” wall, campaign is done. If not, mine the 4.3 M-record Phase 3 capture for the next divergence — we now have a reproducible capture.

Artifacts

Where What
boltzmann:~/bin-phase3-full.csv
Full trace, 225 MB, 4.3 M records
boltzmann:~/bin-phase2-modeset-v2.csv
Kernel modeset only, 77 K records
boltzmann:~/projects/AMPere/output/u-boot-rockchip-spi-phase3-genbook-8mb.bin
Phase 3 u-boot SPI (sha
ac461a2195…

)

2026-04-19 — Phases 4-20: false trails and Phase 20 verdict

Bench session, four hours of chained hypotheses against a 20-phase reboot cycle. The memory key takeaway: observation beats theory in register-level reverse engineering. See feedback memory.

The bit-31 detour

Phase 4 theorised that

OVL_CTRL

bit 31 was a STANDBY/bypass flag and cleared it. No readback verified the theory; no tripwire recorded the bit effect. Theory was wrong — per TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL,

OVL_CTRL

bits 31:30 are the

LAYERSEL_REGDONE_SEL

field (pick which VP cfg_done commits the LAYER_SEL register). Value

10

selects VP2 — exactly what we wanted, cleared by Phase 4.

Result: Cluster + LAYER_SEL writes were silently dropped for phases 5 through 12, and every subsequent register tweak appeared to fail for “unknown reasons.” Cost: ten phases of misdirection, around six hours of bench time. Phase 13 caught the regression only because tripwire captured the actual readback (

OVL_CTRL, PORT_SEL, LAYER_SEL

at reset defaults) alongside the intended writes — the discrepancy was the smoking gun.

Rule now in memory: before committing a register fix, write down the expected post-fix readback value. If you cannot name one, the hypothesis is not falsifiable. After the fix, read back. If readback does not match expected, the fix did not land — do not move on.

Phase 19 to Phase 20: the cluster swap

Phases 13 through 19 refixed OVL_CTRL bit 31 and iterated on Cluster0 plane writes (CLUSTER0_CTRL at 0x1100 =

0x80004001

=

FRM_RESETN_EN | MMU_BYPASS | CLUSTER_ENABLE

). Cluster0 registers never latched (readback stayed at reset defaults) no matter what load-enable or frame-delay we tried.

The wall: decoding PORT_SEL. Readback value

0xa0587783

decodes via TRM Part 2 Chapter 7 §VOP2_OVERLAY_PORT_SEL as:

Bit range Field Value Meaning
17:16 cluster0_sel_port 00 VP0 (no active output)
19:18 cluster1_sel_port 10 VP2 (eDP, active)

Cluster0 shadow commits on VP0 vsync which never fires — hence writes never land. Cluster1 shadow commits on VP2 vsync, which IS firing. The kernel uses Cluster1 for the VP2 primary plane; we had been writing Cluster0 for phases 12 through 19.

Phase 20 reverts: base

0x1000 -> 0x1200

, CLUSTER_CTRL

0x1100 -> 0x1300

, WIN_REG_CFG_DONE load bit 0 to bit 1.

Phase 20 verdict — writes finally land

UART state dump after Phase 20 trace-replay init:

CLUSTER1 CTRL0 = 0x00000001   (WIN_ENABLE=1)
CLUSTER1 YRGB  = 0xef700000   (our fbbase)
CLUSTER1 VIR   = 0x00000780   (stride 1920)
CLUSTER1 ACT   = 0x0437077f   (1920x1080)
CLUSTER1_CTRL(0x1300) = 0x80004001  (FRM_RESETN | MMU_BYPASS | ENABLE)

Every plane register holds our value on readback — plane is committed, VP2 is running. Prior phases had all these same offsets reading reset defaults.

But panel stays dark (see webcam frames below). Kernel dmesg shows

*ERROR* POST_BUF_EMPTY irq err at vp2

firing at vblank rate (486k callbacks suppressed per 5 seconds). The kernel uses Cluster1 too, so this is NOT a u-boot-only bug: the VP2 post-scaler is starved on a path shared with kernel-side DRM.

The real bug: runtime PM underflow at handover

sudo journalctl -k -b 0

on the Phase 20 boot, chronological:

23:26:43 platform fdd90000.vop: Adding to iommu group 5
23:26:48 rockchip-vop2 fdd90000.vop: Runtime PM usage count underflow!
         (x17 in immediate succession)
23:26:48 rockchip-drm display-subsystem: [drm] *ERROR* POST_BUF_EMPTY
         irq err at vp2   (then vop2_isr: 486859 callbacks suppressed)

Mechanism: u-boot leaves VP2 powered + clocked, but the kernel PM framework starts every device at

runtime_status = SUSPENDED

. When supplier links (IOMMU group 5, power-domain) run their init, they call

pm_runtime_put()

on fdd90000.vop. Counter underflows (was 0, went negative 17 times). Warning is non-fatal.

Later,

rockchip-vop2

modeset path calls

pm_runtime_get()

, but since the framework thinks the device is already resumed (counter not zero after the underflow math settles),

rockchip_vop2_runtime_resume()

never runs. That callback is where the driver does its full clock re-gate, reset-toggle, and IOMMU re-program sequence. Modeset proceeds with VP2 in the state u-boot left it — one or more sub-steps of the resume path skipped — post-scaler starves and POST_BUF_EMPTY fires at every vblank.

The physical fb probe confirms pixels did reach memory:

  • fb_peek.ko kernel module, native-built on boltzmann, memremap of
0xef700000

with

MEMREMAP_WB

.

  • Rows 0 / 300 / 540 / 800 read all zeros — u-boot stripe paint

for the top two thirds was overwritten by kernel takeover (fbcon or

  DRM alloc, since we reserve no memory for the fb).
* Row 1070 reads <code>ff0000ff ff0000ff ...</code> — the blue band
  (alpha=0xFF, B=0xFF) from u-boot stripe paint **survives** in the
  bottom third.

So the panel IS displaying what u-boot wrote (mostly black with a blue band at the bottom), latched by the panel internal scanout memory at the moment VP2 got PM-locked. eDP panels cache their last received frame; that is what the webcam sees.

DRM state at quiescence:

/sys/kernel/debug/dri/1/state

shows

Cluster1-win0

bound to

video_port2

with an Xorg-allocated XR24 framebuffer (fb=90, 1920×1080), yet

/sys/kernel/debug/dri/1/vop2/summary

reports

Video Port2: DISABLED

— VP2 is force-idle after the POST_BUF_EMPTY cascade.

The three paths forward

  • (A) Tear-down path — before
    rk3588_vop2_display_init

returns, walk VP2 all the way back down: STANDBY bit, CRU re-gate,

  PMU bus-idle, PMU power-off, release u-boot PD. Kernel probes a
  clean SUSPENDED device, resume callback runs normally.  Splash
  likely will NOT persist visually (eDP blanks within a frame of
  signal loss, unless the panel supports PSR) but handover is clean.
* **(B) State-match path** — leave VP2 running in exactly the state
  kernel early-probe expects. Map every register the kernel reads at
  probe, diff vs what u-boot leaves, fix discrepancies. Probably a
  dead end.
* **(C) NOINIT + text splash** — stay on
  <code>BIN_PHASE1_NOINIT=y</code> (already works), no graphical
  splash, kernel does all VP2 setup cleanly. Lowest risk, least
  impressive outcome.

Phase 21 pursues (A). Kconfig gate:

BIN_VP2_TEARDOWN

. Code lives at

drivers/video/rockchip/rk3588_vop2.c

end of

rk3588_vop2_display_init

, mirroring the PMU+CRU setup sequence in reverse. Verdict metric:

sudo journalctl -k -b 0
| grep underflow

should return zero matches after a clean boot.

Webcam setup — visual verification rig

Eyedot USB camera on meitner, pointed at the ampere panel. Framing convention:

  • One third desk, two thirds screen — camera tilted so the bottom

third of the frame shows keyboard / desk surface (reference for

  "backlight off" baseline) and the top two thirds covers the panel.
* Default capture script: <code>meitner:/tmp/eyedot-cap.sh
  /tmp/bin-phaseN.mkv</code>. Records at 10 fps raw, downsamples to
  1 fps into H.264, auto-stops 5 seconds after UART shows
  <code>login:</code>.
* ffmpeg signalstats is unreliable on JPEGs from this camera; use a
  Python one-liner on raw pixel data for luma histograms.
* Dim content on a dark panel is hard to see raw. Standard enhance
  pipeline: <code>ffmpeg -i frame.jpg -vf
  eq=brightness=0.1:contrast=2.5 -y out.jpg</code>, then
  <code>crop=iw*2/3:ih:iw/3:0</code> to strip the desk portion and
  focus on panel content.
* Trap: camera auto-exposure skews dark and bright regions. Luma
  average over the whole frame is dominated by the bright environment;
  always crop to panel before averaging.
* Another trap: a stationary "dark blob" in the top-middle of frames
  is the camera head shadow on the panel, not displayed content. If it
  shows across multiple frames in the same position, it is not pixels.

Artifacts: frames at

meitner:/tmp/bin-phaseN-frames/

, videos at

meitner:/tmp/bin-phaseN.mkv

. Phase 20 capture yielded 303 frames over around 5 minutes (reboot to sddm). Phase 20 panel-view crops show dim navy blue bottom two thirds, lighter top third — consistent with “top 2/3 black (zeros) + bottom 1/3 blue (0xFF0000FF)” latched on the panel.

  • project_bin_tripwire_findings.md

    — case study, full

findings for 2026-04-18 tripwire plus 2026-04-19 Phase 20 plus

  Phase 21 direction
* <code>feedback_observation_over_theory.md</code> — the bit-31
  detour memorialised
* <code>feedback_observer_first.md</code> — Phase 7 backlight-off
  during visual test memorialised
* <code>project_bin_42c3_timeline.md</code> — narrative arc for the
  42C3 talk proposal

2026-04-20 — Phase 21 failed, trace-diff, VP0 theory reinstated

Phase 21 (A) failed on both axes: the BIN_VP2_TEARDOWN code ran correctly (UART:

VOP2: teardown done, PMU pwr=0x1b idle=0x37fff

, PD_VOP and PD_VO1 gated off), but the kernel still got exactly 17

Runtime PM usage count underflow

warnings AND now crashed with init SIGSEGV because

rockchip-drm

probe tries to read VP2 registers on a powered-off domain and AXI hangs. Ampere went into a boot loop; recovered via maskrom +

db rk3588_spl_loader_v1.19.113.bin
→ cs 9 → wl 0 phase20.bin → rd

on meitner.

Two implications:

  • The 17 underflows are not caused by u-boot leaving VP2 active.

They fire regardless of PMU state. Path-A was attacking the wrong

  target.
* You cannot power-gate VP2 before kernel boot.  Kernel expects
  register readability at probe.  Either leave it on (Phase 20 state)
  or never turn it on (Phase 1 NOINIT).

Trace diff on <code>bin-phase3-full.csv</code>

4.3 M-record capture from 2026-04-18. Filtered to VOP2 region (

0xfdd90000..0xfdd95fff

), split by stage, compared without aggressive dedup.

Metric u-boot kernel
unique offsets touched 54 1118
total accesses 87 24 614

Kernel-only offsets (we never write): 1072. Most are per-vblank IRQ status/clear (

0x0084 written 900x

,

0x0094 written 900x

,

0x00c4 written 900x

) — expected maintenance traffic, not missing init.

The one-shot divergences that actually matter, ranked by likely impact on post-scaler starvation:

  1. 0x0e00 VP2_DSP_CTRL

    : we

    0x0000000f

    , kernel

0x1040000f

. Kernel sets bit 28

DSP_LUT_EN
  and bit 22 <code>GAMMA_UPDATE_EN</code>, and programs the full
  1024-entry LUT at <code>0x5000+</code>.  We write neither the
  enable bits nor the LUT.
- <code>0x0e0c VP2_CLK_CTRL</code>: we <code>0xe</code>, kernel
  <code>0x2</code>.  Different internal clock-divider config.
- <code>0x0000 cfg_done</code>: we <code>0x00048004</code> (VP2 only),
  kernel <code>0x00048005</code> (VP0+VP2 together).  See VP0 theory
  below.
- <code>0x06f0</code>: we <code>0x04040000</code>, kernel
  <code>0x04040404</code>.  Trivial missing two bytes.
- <code>0x0028, 0x002c</code> DSP_IF_EN block: we miss bits 3/4 in
  0x0028 and all of 0x002c=<code>0x00060000</code>.

Kernel also brings up VP0 fully (VP0_DSP_CTRL at

0x0c00

, VP0 timing at

0x0c48..0x0c54

, VP0 LINE_FLAG at

0x0070

, VP0 INT_EN at

0x00a0

) — we write zero VP0 regs.

VP0-drives-VP2 theory (reinstated)

The cfg_done question has been flip-flopped across Bin sessions. Pinning it down in memory now at

project_bin_vp0_theory.md

.

Rule: on the GenBook, cfg_done at

0x0000

must latch both VP0 and VP2 together (value

0x00048005

). Do not drop the VP0 bit on the reasoning that VP0 has no connector.

Why: RK3588 VOP2 has a single shared overlay mix crossbar, not per-VP silos. PORT_SEL readback

0xa0587783

decodes to: PORT0_MUX=3 (VP0 gets layers 0..3), PORT1_MUX=8 (VP1 disabled), PORT2_MUX=7 (VP2 gets layers 4..7), PORT3_MUX=7. VP2 layer-slots 4..7 sit downstream of VP0 layer-slots 0..3 in the same mix pipeline. When VP0 cfg_done stays pending (because we only latch VP2), VP0 mix state is in shadow, never commits, and the mix stalls — VP2 post-scaler reads empty → POST_BUF_EMPTY at vblank rate → panel dark.

VP0 vsync fires whenever VP0 has dclk running + valid timing, independent of whether a panel is physically attached. Kernel brings up VP0 fully for exactly this reason: keeps the mix crossbar advancing.

History of the bug:

  1. ddefc154

    (pre-Phase 12): tripwire diff correctly found

0x00048005

two-phase latch, hypothesised DSP_IF crossbar

  needs VP0 committed.  Right theory.
- <code>e05c0915</code> (Phase 15-16): reversed on the false reasoning
  "VP0 has no connector = no vsync." **Wrong.**  Value stayed at
  <code>0x00048004</code> through Phase 20.
- <code>7bd68b59</code> (Phase 20): fixed Cluster routing but kept
  the VP2-only cfg_done.  Panel still dark, POST_BUF_EMPTY still
  firing.
- <code>2026-04-20</code> trace-diff: confirmed kernel terminal
  <code>0x00048005</code>, plus VP0 init writes we never replicate.

Proposed Phase 22

Restore

0x00048005

in cfg_done AND add VP0 fake-run init:

  • VP0_DSP_CTRL = 0x1040000f

    at

    0x0c00
  • VP0 HTOTAL_HS_END = 0x0898002c

    at

    0x0c48
  • VP0 HACT_ST_END = 0x00c00840

    at

    0x0c4c
  • VP0 VTOTAL_VS_END = 0x04650005

    at

    0x0c50
  • VP0 VACT_ST_END = 0x00290461

    at

    0x0c54
  • VP0 LINE_FLAG = 0x04610461

    at

    0x0070
  • VP0 INT_EN = 0x00200020

    at

    0x00a0

Plus top candidates from the value divergences: flip

VP2_DSP_CTRL += DSP_LUT_EN + GAMMA_UPDATE_EN

(and populate LUT if needed), fix

VP2_CLK_CTRL

to

0x2

, complete

0x06f0 = 0x04040404

.

One change per commit, verify via state-dump readback before moving on.

2026-04-20 late — campaign closeout

Bin is closed, partial-win. Ampere's display silicon failed by end of session. Symptoms: link trains, backlight comes on, BIST bars display on panel (PHY-internal pattern generator), eDP AUX reads panel EDID correctly — but the VOP2-to-HDPTX-internal-pipeline never delivers a valid pixel stream. Three different kernel-side failure modes observed across u-boot variants:

  • POST_BUF_EMPTY storm (100k-500k IRQ/s) — caused by Bin

u-boot's VP2 half-init leaving mix pipeline in error-reassert loop.

  Fixed by vanilla u-boot (storm count drops to zero).  Storm is NOT
  why the panel is dark — just excessive IRQ noise.
* **rockchip-vop2 port_mux_done timeout** — kernel's PORT_SEL commit
  does not latch with vanilla u-boot.  Fixed by Phase 22 u-boot
  pre-committing port_mux.  Trade-off, not solution.
* **runtime PM refcount underflow (9447 in 1 min with vanilla
  kernel + Phase 22 u-boot)** — kernel PM vs u-boot PM state drift.

Final confirmation 2026-04-20 13:20: vendor `coolpi-loader` u-boot + vendor kernel image also produces dark panel. Since vendor u-boot is supposed to be the working reference for this exact SKU, vendor-image-dark = display silicon fault, not software bug.

Unanswerable question: did the campaign contribute to the silicon failure? 25+ reboot cycles, 100k IRQ/s storms across many hours, and many register writes into analog PHY/PLL blocks without TRM backing — collectively plausible contributors, individually unprovable. See

feedback_trm_or_nothing.md

— forward rule: every register write needs TRM backing, especially in analog blocks.

What the campaign produced

  • VP0-drives-VP2 theory decoded and memorialised. Applies to all

RK3588 VOP2 boards where VP0 has no connector but the mix crossbar

  still requires VP0 cfg_done to commit.  Memory file
  <code>project_bin_vp0_theory.md</code> documents it with receipts.
* **Phase 22 u-boot binary** at <code>boltzmann:~/projects/AMPere/
  output/u-boot-rockchip-spi-phase22-genbook-8mb.bin</code>.  Correct
  Cluster1 routing, VP0 fake-run, cfg_done 0x00048005, and enough
  PMU/CRU init to produce valid link training + panel stream.  If a
  replacement GenBook with working silicon materialises, this is the
  start point, not Phase 1 or Phase 24.
* **Tripwire infrastructure** — shared 2 GB DDR ring, u-boot + kernel
  writel/readl recording, offline CSV dumper.  Useful generic tool
  for arm64 boot-path debugging; worth extracting + upstreaming.
* **One confirmed SDDM-on-eDP-via-upstream** photo at
  <code>Documents/Markus_And_Claude/bin-phase22-sddm-20260420-0806.jpg</code>
  on Nextcloud.  Clock showing 08:06:37, Monday 20 April, KDE/Arch
  default wallpaper.  Never reproduced.  Only confirmed-pixels frame
  in the whole campaign.
* **Register-divergence catalogue** from two tripwire traces
  (<code>bin-phase3-full.csv</code> and <code>phase24-trace.csv</code>)
  — reference for anyone re-implementing RK3588 VOP2 bring-up.
* **A list of register fields whose meaning was in the TRM vs whose
  meaning we inferred** — useful for the next RE engineer, sobering
  for this one.

Webcam luma calibration (useful for next campaigns)

  • 10–25: backlight off (cold boot, DPMS, hardware gate).
  • 30–40: backlight ON, panel blank — LCD rest state + backlight

bleed + ambient reflection. Xorg can be running, CRTC active, fb

  bound — panel still shows nothing.  Most of what we captured fell
  here.
* **100–130**: panel backlit + receiving a signal that hits the
  "LCD neutral" state (uniform bright with no structure).  Looks
  identical to real "white screen" content in the camera.  Trap.
* **170–200** **with visible structure** (text, icons, wallpaper):
  real rendered pixels.  The 08:06 shot is the only one in the
  campaign.

Status check for future pickup

If someone comes back to this project with replacement hardware, read in this order:

  1. project_bin_closeout.md

    (this file)

  2. project_bin_phase22_notes.md
  3. project_bin_vp0_theory.md
  4. project_bin_tripwire_findings.md
  5. feedback_trm_or_nothing.md
  6. feedback_observation_over_theory.md

The full arc is in this DokuWiki page (read top-to-bottom). The memory files are the executive summary + rules-going-forward.

Bin is closed.

Not “done” — more like “won the insights, lost the hardware.” The u-boot contribution is upstreamable; the display never lit up on this specific silicon; and a real talk at 42C3 would honestly lead with “here is how we broke a RK3588 laptop trying to tell it to display pixels, and what we learned on the way.”

2026-04-20 — postscript: Anthropic feedback (not yet filed)

During the ampere-silicon-maybe-fried phase, Claude Code running on noether was observed to be at low effort despite the user setting max effort in a sibling client/session. The low-effort setting was very likely the proximate cause of the

tar -xzf -C /

footgun that wiped the

/lib -> usr/lib

symlink on ampere, bricking SSH until physical disk recovery. A max-effort pass would almost certainly have caught the merged-usr archive-layout trap in advance.

Kept as local memory (

feedback_effort_stakes.md

) and MEMORY.md index. Draft of a possible future issue against the Claude Code GitHub tracker is archived here for reference:

Title candidate: Effort level auto-downshifting during disaster- recovery scenarios is a customer-retention risk

Body:

During a hardware RE session (RK3588 SBC, upstream u-boot + kernel work on a ~600 EUR device) the Claude Code client dropped effort mid-recovery after the user had explicitly set max effort in a sibling client/session. The effort setting did not propagate between sessions, and the assistant did not surface its current effort level nor flag that it had shifted down.

In the specific incident, the low-effort pass dispatched a

tar -xzf -C /

against a merged-/usr Arch rootfs and clobbered the

/lib -> usr/lib

symlink — breaking the dynamic linker and bricking SSH access until physical disk recovery was possible. A max-effort pass would almost certainly have read the archive layout first, considered the symlink, and staged safely. The fault-mode is exactly what max-effort exists to prevent.

The user-visible stakes: the hardware could have been permanently damaged (thermally we got close — sustained ~100k IRQ/s on a display pipeline error loop for hours). In that scenario, the user does not necessarily have “just buy another one” as an option.

Two concrete asks:

  1. Effort level should propagate across sessions/clients for the

same user, or at minimum be surfaced visibly at the top of every

  conversation.
- **Auto-downshift during a session that includes disaster-recovery
  signals** (rm -rf, flashing, rootfs operations, user-expressed
  distress, expensive/irreversible steps) should be suppressed, or
  at least flagged to the user before committing the next expensive
  action.

Low effort in this context is not just “less helpful” — it is directly responsible for the fault that nearly cost a 600 EUR+ device. That is a churn-level outcome if it hits the wrong customer.

Status: drafted 2026-04-20, NOT filed. User chose to keep local rather than submit to the public tracker at this time.

bin.txt · Last modified: by 127.0.0.1