This is an old revision of the document!
Table of Contents
RK3588 DDR Init Blob — Reverse Engineering & Patching
Running log of the RK3588 DDR init blob project: what's been tried, what worked, what bricked the board, and what the current state is.
Source: https://git.reauktion.de/marfrit/rk3588-ddr-analysis Target hardware: ampere (CoolPi CM5 GenBook, RK3588 + LPDDR5) Status 2026-04-15: v3fb patcher staged, waiting on UART cable and ampere SPI recovery to bisection-test.
Why we're doing this
The RK3588 ships with a closed-source binary blob that initialises LPDDR4/5 memory during early boot. Rockchip provides no source. The blob contains at least 20 “timeout-less” hardware poll loops — `do while` with no iteration cap — which is the community-accepted explanation for sporadic cold-boot failures on otherwise-stable hardware.
Long-term goal: produce a compileable, well-structured C version of the blob that we can fix bugs in. Short-term goal: add timeouts to the poll loops so the board fails fast instead of hanging silently.
Timeline
2026-04-02 .. 04-11: decompilation + first patcher
- Decompiled v1.19 blob with Ghidra 11.3 on oppenheimer (CT131, x86 PVE container on
data). 118 functions, ~12 kLOC. - Verified Synopsys DWC LPDDR5 multiPHY heritage. Most registers map
to the DWC PUB databook (CalBusy, DfiStatus, MicroReset, etc.).
- Identified 20 timeout-less polls, documented in
BUG_ANALYSIS.md. - v1 patcher (
patch_prod.py): NOP'd the backward branches of
each poll. Tested in ddr_emu2 (Unicorn emulator) — looked good.
2026-04-11: v1 bricked the board
Flashed NOP-patched blob to the GenBook's SPI flash. Cold-boot failed
to bring DRAM up, entered maskrom. Required battery disconnect +
rkdeveloptool-based SPI reflash with stock blob to recover.
Lesson: NOPping hardware polls on real silicon removes necessary wait time. The PHY genuinely needs those iterations to settle. A second opinion from a DDR-focused expert agent (“Mr. Claude Subagent”) confirmed the diagnosis independently.
2026-04-11 .. 04-14: v2 counted-loop trampolines
- Rewrote the patcher (
patch_timeouts.py, commit 05d0d8e): each
poll site now jumps to a per-site trampoline appended at the end of
the blob. The trampoline counts 16384 iterations (~91 µs at 1.8 GHz), returns to the original error path on timeout. * Output: ''rk3588_ddr_v1.19_counted_v2.bin''. * Design reviewed by Mr. Claude Subagent — no objections. * U-Boot image built, flashed to ampere's SPI.
2026-04-14: v2 ALSO bricked the board
This time worse: power LED did not even come on, implying the CPU crashed before the bootrom's LED-setup code ran. No UART banner, no diagnostics, nothing. Full battery disconnect + maskrom recovery needed.
At this point design review had twice approved a broken implementation. The design was correct; the implementation was not. Something about the actual encoded trampoline bytes had to be wrong.
2026-04-15: the thorough check that unearthed the bug
Rather than guess what was wrong, we went back to the bytes:
- For each of the 16 patch sites, pulled the original loop body from
ddr_conservative_asm.s with surrounding context.
- Hand-disassembled each trampoline from
rk3588_ddr_v1.19_counted_v2.bin
(raw little-endian uint32 decode, not Ghidra).
- Cross-compared: does the trampoline execute the same instructions as the original loop, in the same order, producing the same CPU flags before the branch?
The answer for 9 of 16 sites: no.
The original poll pattern on those sites was:
LDR Wx, [Xbase, #off] AND Wx, Wx, #mask ; no flag update CMP Wx, #expected ; sets NZCV B.cond .retry
The v2 patcher had logic like:
test_inst = None for off, w in site['body']: if off != site['load_offset']: test_inst = w break
It copied exactly one non-load instruction into the trampoline.
For body=2 sites (LDR + TST; B.cond), fine — the TST was copied and
the condition was valid. For body=3 sites (LDR + AND + CMP; B.cond),
the AND was copied but the CMP was silently dropped.
An AND without S-suffix doesn't update flags. The trampoline's B.cond
therefore tested whatever NZCV happened to be set by whatever instruction
last executed before the trampoline was entered → random branch decision
→ CPU jumped to arbitrary offsets → crash before the bootrom LED stage.
This is a class of bug that design review cannot catch. Design review validates “is the algorithm correct?”. The algorithm WAS correct (run the poll body in a bounded loop). The bug was in the encoder: a wrong bound on how many instructions constitute “the poll body”. Only byte-level hand-verification against the source disassembly surfaces that kind of off-by-something.
2026-04-15: v3fb (full-body) + bisection harness
patch_timeouts_v3.py(commit 694be88) copies the entire loop
body into each trampoline, not just one instruction. Per-site size
becomes ''4 * (N + 6)'' bytes where ''N'' is body length (28 bytes
for body=2, 36 for body=3).
* New ''--sites'' flag: ''all'', ''early'', ''mid'', ''late'', ''none'',
or index list like ''0,3,5-7''. Site indices stable:
* ''early'' = sites 0-7, blob offsets 0x07b78..0x07f08 —
SGRF + PHY firmware state machine. Brick-suspect cluster.
* ''mid'' = sites 8-10, 0x09124..0x0aaf8 — DfiStatus / training start.
* ''late'' = sites 11-15, 0x0d154..0x0d378 — UctWriteProt / CalBusy.
* Three U-Boot SPI images built on boltzmann
(''~/projects/AMPere/output/''):
* ''u-boot-rockchip-spi-midlate-fb-8mb.bin'' — patches sites 8-15.
**First flash candidate** once ampere recovers. If it boots, the
v2 bug was concentrated in the early cluster (expected).
* ''u-boot-rockchip-spi-all-fb-8mb.bin'' — patches all 16. The
production candidate once midlate-fb is validated.
* ''u-boot-rockchip-spi-early-fb-8mb.bin'' — patches sites 0-7 only.
Used if mid+late boots but all bricks.
2026-04-15: pre-flash verification
Sanity checks before the next flash attempt:
- Emulator trace diff (
ddr_emu2): stock, midlate-fb, and all-fb
produce byte-identical execution traces for the first 106
instructions (the reach of the emulator before it bails on unmodeled MMIO). Confirms the trampoline append does not perturb pre-site code. * **Hand-decoded trampolines for sites 0, 1, 2:** all three preserve the full original body, correctly invert the condition for ''B.cond .done'', decrement ''W16'' correctly, and encode the right relative branch offsets back to the original return point. No encoder bugs.
Pending: UART bisection flash plan
Once ampere is recovered from its current brick (battery disconnect +
stock SPI reflash via Ohm running rkdeveloptool) and the UART
cable is plugged in on ampere's debug header:
- Flash
stock→ capture UART trace (baseline). - Flash
midlate-fb→ capture. If boots, v2 bug was in early cluster. - Flash
all-fb→ capture. This is the production candidate. - Per-cluster bisection only if needed.
UART wiring: ampere debug header → USB-UART cable → Ohm (PineTab2)
USB-A port → picocom -b 1500000 /dev/ttyUSB0.
SPI recovery ladder on Ohm (requires rkdeveloptool Rockchip original,
not Pine64 fork):
rkdeveloptool ld # confirm maskrom device rkdeveloptool db ~/projects/AMPere/rk3588_spl_loader_v1.19.113.bin rkdeveloptool cs 9 # select SPI NOR — do NOT skip rkdeveloptool ef # erase flash rkdeveloptool wl 0 ~/projects/AMPere/u-boot-rockchip-spi-stock-8mb.bin rkdeveloptool rd # reboot
Lessons learned
- NOPping real-hardware polls = brick. Bounded retries only.
- Expert design review is necessary but not sufficient. A second
opinion validates the algorithm, not the implementation.
- Byte-level verification against source disassembly is the
cheapest intervention that catches encoder bugs. It takes an hour,
costs nothing, and would have caught v2 before flashing. - **UART is the only signal source** that's worth iterating against. Without it, each flash attempt is a 1-bit oracle that costs a screwdriver to read. The moment we have UART the iteration cycle goes from hours (brick → disconnect battery → reflash → retry) to minutes (flash → read UART → tweak → flash).
Files of interest
boltzmann:~/projects/AMPere/— full build tree (TF-A, OP-TEE, u-boot, rkbin)boltzmann:~/src/rk3588-ddr-decompiled/— analysis artifacts, patchers, emuohm:~/projects/AMPere/— recovery kit (rkdeveloptool + stock SPI image + loader)- https://git.reauktion.de/marfrit/rk3588-ddr-analysis — public source of truth
2026-04-15 evening: UART connected, three bricks, one silent build bug
Long session. Meitner was commissioned as a dedicated x86 flasher workbench
(ThinkPad T430, Debian 13 trixie, XFCE, aarch64 cross-toolchain, rkbin, lmcp
service on :8080) and brought online as the first real consumer of the
marfrit-packages Debian repo.
With a flasher in place the brick-recover cycle drops to ~60 s:
sudo rkdeveloptool ld sudo rkdeveloptool db rk3588_spl_loader_v1.19.113.bin sudo rkdeveloptool cs 9 # SELECT SPI NOR — forgetting = writes eMMC sudo rkdeveloptool ef sudo rkdeveloptool wl 0 <image> sudo rkdeveloptool rd
Bonus observation: when SPI holds a non-empty but non-bootable image,
the RK3588 bootrom falls back to maskrom on the next power cycle — no
pinhole button needed. Cleanly erased SPI (rkdeveloptool ef with nothing
written) instead falls through to eMMC, which still has a working u-boot
+ Debian — effectively a “two strikes before you're really bricked” safety net.
The UART rig
The GenBook debug header turned out to be a 4-pin 1.0 mm Chinese-brand connector, NOT JST SH. Amazon's “JST SH” cables are too tall (2.1 mm housing vs. the header's ~1.3 mm depth). Happily, the x86 GenBook variant's internal fan cable uses the same connector shell — one sacrificed fan cable = one working UART pigtail. Cable design gripe: V+ and GND were crimped next to each other, so one loose dupont sleeve could short 3.3 V into GND.
Pin voltages (measured on a running stock GenBook):
| Silkscreen | Idle voltage | Function | Wire colour (this donor cable) |
|---|---|---|---|
| GND | 0 V | GND | Black |
| V+ | 3.3 V | VCC-out rail (SKIP, not a signal) | Purple |
| TX | 1.8 V | GenBook TX → Tigard RX | Grey |
| RX | ~0 V floating | GenBook RX ← Tigard TX | White |
That's asymmetric-voltage UART: TX is raw 1.8 V PMUIO, RX has a board-side level shifter to 3.3 V. Tigard at 1.8 V reads the 1.8 V TX cleanly; driving RX may need 3.3 V — we didn't need to drive in this session so 1.8 V stayed.
Tigard UART lives on Channel A → /dev/ttyUSB0, not B. Also, set
echo 1 > /sys/bus/usb-serial/devices/ttyUSB0/latency_timer and use
dd if=… bs=1 — cat > file silently block-buffers at 4 KB and
will lose a short boot banner.
Known-good boot captured from stock:
DDR ff1a08bde6 typ 25/04/21-14:31.26,fwver: v1.19 ch0 ttot6 ch1 ttot6 ch2 ttot6 ch3 ttot6 LPDDR5, 2112MHz channel[0] BW=16 Col=10 Bk=16 CS0 Row=17 CS1 Row=17 CS=2 Die BW=8 Size=8192MB (×4 channels = 32 GB)
That banner is the oracle: if patched variants produce it, DDR trained; if silent, TPL hung.
The three-brick bisection
With UART and fast reflash in place we tested the v3fb variants back-to-back:
| Image | Sites patched | Boot LED | UART |
|---|---|---|---|
| stock-8mb | none | on | full banner, SDDM |
| all-fb-8mb | 0..15 | OFF | 5 B noise |
| midlate-fb-8mb | 8..15 | OFF | 6 B noise |
| early-fb-8mb | 0..7 | OFF | 6 B noise |
Every patched variant failed with the same symptom, regardless of which cluster of poll sites was patched. That rules out site-specific encoder bugs — it's systemic.
The real root cause: u-boot built a blank idbloader
Byte-diff of stock vs. patched SPI images revealed the smoking gun:
- stock SPI at offset
0x8000contains the RKNS wrapper magic (52 4b 4e 53), then ~57 % non-0xFFcontent through 0x60000 — real SPL, TPL, DTB. - patched SPI at 0x8000 is
0xFF FF FF FF. The entire idbloader region (0x8000..0x60000, 352 KB) is pure erase pattern. Zero content.
So when the v3 patcher appended 548 bytes of trampolines (DDR blob grew
76,704 → 77,252 bytes), u-boot's mkimage -T rkspi silently failed
to produce an idbloader, and binman padded the empty slot with 0xFF
without flagging an error. Build “succeeded” but produced a brick-ready
image. The final SPI had u-boot proper at 0x60000 but no loader
in front of it — bootrom reads garbage at 0x8000, can't find a valid
boot path, never gets far enough to light the power LED. It's not an
eMMC-fallback scenario either because the SPI isn't cleanly erased
(there's valid-looking content further in).
Bottom line: the v3 trampoline bytes were probably fine. We just never got to execute them.
Pre-flash gate: spi_check.py
Committed to the gitea repo:
rk3588-ddr-analysis
commit 3a90236.
spi_check.py statically parses the RKNS wrapper at 0x8000 and the
payload region's non-0xFF content. No emulation, purely byte-level.
$ python3 spi_check.py u-boot-rockchip-spi-stock-8mb.bin
OK RKNS wrapper present at 0x8000
payload region 0x8000..0x60000: 205151/360448 non-0xFF bytes (56.9%)
PASS: image looks structurally sound. Safe to flash.
$ python3 spi_check.py u-boot-rockchip-spi-all-fb-8mb.bin
FAIL: no RKNS wrapper at 0x8000: got 0xffffffff. idbloader was not
produced — silently-failed mkimage during u-boot build.
Wired into build_uboot_stock.sh and build_uboot_rock5itx.sh as the
final post-build action. Any build that silently fails mkimage now exits
non-zero instead of producing a brick-ready file. Phase 1 of the broader
“test harness” task.
Phase 2 queued: bootrom-level QEMU emulation
The user's observation during the post-mortem: a QEMU run of the full SPI
image from bootrom entry, with stubbed MMIO (return 0 / return 0xFFFF /
per-address lookup) would have caught both today's empty-idbloader bug
and the earlier v2 counted_v2 CMP-drop brick without touching hardware.
Extending ddr_emu2.c to accept an SPI image, parse the idbloader header,
and execute the TPL with stubbed MMIO is queued as the next harness layer.
Every real-hardware flash should be gated behind “bootrom emu says it loads”
before it ever reaches rkdeveloptool.
Next steps
- Rebuild a patched variant with verbose build logging; identify the
exact mkimage -T rkspi rejection reason (size limit? validation check?
alignment?). Two fix paths: (a) grow whatever size limit rejects the patched TPL, (b) compress trampolines into blob dead-space so the blob stays ≤ stock size and sidesteps the build pipeline entirely. - Extend ''ddr_emu2.c'' per above. - Pretty-print GenBook UART trace so the DDR-phase output becomes comparable across variants (offset-aligned, timestamp-normalised).
Updated files of interest
boltzmann:~/projects/AMPere/— build tree (TF-A, OP-TEE, u-boot, rkbin);build_uboot_*.shnow gated by spi_check.boltzmann:~/src/rk3588-ddr-decompiled/— analysis artifacts, patchers, emu,spi_check.py(new).boltzmann:~/boltzmann-spi-backup-16M.bin— known-good UEFI dump of boltzmann's own SPI before we touch it. Mirrors athertz:~/saving_private_boltzmann/andmeitner:~/boltzmann-spi/. SHA-256d7a58743….meitner:~/ampere/— all four GenBook SPI images (stock + 3 v3fb variants).meitner:~/rkbin/— full rkbin tree + builtrk3588_spl_loader_v1.19.113.binfor maskromdb. rkdeveloptool v1.32 built fromgithub.com/rockchip-linux/rkdeveloptoolinstalled at/usr/local/bin/rkdeveloptool(the Rockchip stock one doesn't recognise 350b PID and lackscs).ohm:— mothballed; meitner is the new flasher workbench.- https://git.reauktion.de/marfrit/rk3588-ddr-analysis — source of truth (pushed over HTTPS+token; boltzmann's SSH key is
mfritsche@hawkingfingerprintSHA256:LaXfAhn9IH4Hm/MF4BSCW/bxRESeijNybfdL9lNiyKc, needs to be added in Gitea Settings to enable SSH push).
Last updated: 2026-04-15 evening
