User Tools

Site Tools


rk3588_ddr

This is an old revision of the document!


RK3588 DDR Init Blob — Reverse Engineering & Patching

Running log of the RK3588 DDR init blob project: what's been tried, what worked, what bricked the board, and what the current state is.

Source: https://git.reauktion.de/marfrit/rk3588-ddr-analysis Target hardware: ampere (CoolPi CM5 GenBook, RK3588 + LPDDR5) Status 2026-04-15: v3fb patcher staged, waiting on UART cable and ampere SPI recovery to bisection-test.

Why we're doing this

The RK3588 ships with a closed-source binary blob that initialises LPDDR4/5 memory during early boot. Rockchip provides no source. The blob contains at least 20 “timeout-less” hardware poll loops — `do while` with no iteration cap — which is the community-accepted explanation for sporadic cold-boot failures on otherwise-stable hardware.

Long-term goal: produce a compileable, well-structured C version of the blob that we can fix bugs in. Short-term goal: add timeouts to the poll loops so the board fails fast instead of hanging silently.

Timeline

2026-04-02 .. 04-11: decompilation + first patcher

  • Decompiled v1.19 blob with Ghidra 11.3 on oppenheimer (CT131, x86 PVE container on data). 118 functions, ~12 kLOC.
  • Verified Synopsys DWC LPDDR5 multiPHY heritage. Most registers map

to the DWC PUB databook (CalBusy, DfiStatus, MicroReset, etc.).

  • Identified 20 timeout-less polls, documented in BUG_ANALYSIS.md.
  • v1 patcher (patch_prod.py): NOP'd the backward branches of

each poll. Tested in ddr_emu2 (Unicorn emulator) — looked good.

2026-04-11: v1 bricked the board

Flashed NOP-patched blob to the GenBook's SPI flash. Cold-boot failed to bring DRAM up, entered maskrom. Required battery disconnect + rkdeveloptool-based SPI reflash with stock blob to recover.

Lesson: NOPping hardware polls on real silicon removes necessary wait time. The PHY genuinely needs those iterations to settle. A second opinion from a DDR-focused expert agent (“Mr. Claude Subagent”) confirmed the diagnosis independently.

2026-04-11 .. 04-14: v2 counted-loop trampolines

  • Rewrote the patcher (patch_timeouts.py, commit 05d0d8e): each

poll site now jumps to a per-site trampoline appended at the end of

  the blob. The trampoline counts 16384 iterations (~91 µs at
  1.8 GHz), returns to the original error path on timeout.
* Output: ''rk3588_ddr_v1.19_counted_v2.bin''.
* Design reviewed by Mr. Claude Subagent — no objections.
* U-Boot image built, flashed to ampere's SPI.

2026-04-14: v2 ALSO bricked the board

This time worse: power LED did not even come on, implying the CPU crashed before the bootrom's LED-setup code ran. No UART banner, no diagnostics, nothing. Full battery disconnect + maskrom recovery needed.

At this point design review had twice approved a broken implementation. The design was correct; the implementation was not. Something about the actual encoded trampoline bytes had to be wrong.

2026-04-15: the thorough check that unearthed the bug

Rather than guess what was wrong, we went back to the bytes:

  1. For each of the 16 patch sites, pulled the original loop body from

ddr_conservative_asm.s with surrounding context.

  1. Hand-disassembled each trampoline from rk3588_ddr_v1.19_counted_v2.bin

(raw little-endian uint32 decode, not Ghidra).

  1. Cross-compared: does the trampoline execute the same instructions as the original loop, in the same order, producing the same CPU flags before the branch?

The answer for 9 of 16 sites: no.

The original poll pattern on those sites was:

LDR   Wx, [Xbase, #off]
AND   Wx, Wx, #mask        ; no flag update
CMP   Wx, #expected        ; sets NZCV
B.cond .retry

The v2 patcher had logic like:

test_inst = None
for off, w in site['body']:
    if off != site['load_offset']:
        test_inst = w
        break

It copied exactly one non-load instruction into the trampoline. For body=2 sites (LDR + TST; B.cond), fine — the TST was copied and the condition was valid. For body=3 sites (LDR + AND + CMP; B.cond), the AND was copied but the CMP was silently dropped.

An AND without S-suffix doesn't update flags. The trampoline's B.cond therefore tested whatever NZCV happened to be set by whatever instruction last executed before the trampoline was entered → random branch decision → CPU jumped to arbitrary offsets → crash before the bootrom LED stage.

This is a class of bug that design review cannot catch. Design review validates “is the algorithm correct?”. The algorithm WAS correct (run the poll body in a bounded loop). The bug was in the encoder: a wrong bound on how many instructions constitute “the poll body”. Only byte-level hand-verification against the source disassembly surfaces that kind of off-by-something.

2026-04-15: v3fb (full-body) + bisection harness

  • patch_timeouts_v3.py (commit 694be88) copies the entire loop

body into each trampoline, not just one instruction. Per-site size

  becomes ''4 * (N + 6)'' bytes where ''N'' is body length (28 bytes
  for body=2, 36 for body=3).
* New ''--sites'' flag: ''all'', ''early'', ''mid'', ''late'', ''none'',
  or index list like ''0,3,5-7''. Site indices stable:
  * ''early'' = sites 0-7, blob offsets 0x07b78..0x07f08 —
    SGRF + PHY firmware state machine. Brick-suspect cluster.
  * ''mid''   = sites 8-10, 0x09124..0x0aaf8 — DfiStatus / training start.
  * ''late''  = sites 11-15, 0x0d154..0x0d378 — UctWriteProt / CalBusy.
* Three U-Boot SPI images built on boltzmann
  (''~/projects/AMPere/output/''):
  * ''u-boot-rockchip-spi-midlate-fb-8mb.bin'' — patches sites 8-15.
    **First flash candidate** once ampere recovers. If it boots, the
    v2 bug was concentrated in the early cluster (expected).
  * ''u-boot-rockchip-spi-all-fb-8mb.bin'' — patches all 16. The
    production candidate once midlate-fb is validated.
  * ''u-boot-rockchip-spi-early-fb-8mb.bin'' — patches sites 0-7 only.
    Used if mid+late boots but all bricks.

2026-04-15: pre-flash verification

Sanity checks before the next flash attempt:

  • Emulator trace diff (ddr_emu2): stock, midlate-fb, and all-fb

produce byte-identical execution traces for the first 106

  instructions (the reach of the emulator before it bails on unmodeled
  MMIO). Confirms the trampoline append does not perturb pre-site code.
* **Hand-decoded trampolines for sites 0, 1, 2:** all three preserve
  the full original body, correctly invert the condition for ''B.cond
  .done'', decrement ''W16'' correctly, and encode the right relative
  branch offsets back to the original return point. No encoder bugs.

Pending: UART bisection flash plan

Once ampere is recovered from its current brick (battery disconnect + stock SPI reflash via Ohm running rkdeveloptool) and the UART cable is plugged in on ampere's debug header:

  1. Flash stock → capture UART trace (baseline).
  2. Flash midlate-fb → capture. If boots, v2 bug was in early cluster.
  3. Flash all-fb → capture. This is the production candidate.
  4. Per-cluster bisection only if needed.

UART wiring: ampere debug header → USB-UART cable → Ohm (PineTab2) USB-A port → picocom -b 1500000 /dev/ttyUSB0.

SPI recovery ladder on Ohm (requires rkdeveloptool Rockchip original, not Pine64 fork):

rkdeveloptool ld                                                # confirm maskrom device
rkdeveloptool db ~/projects/AMPere/rk3588_spl_loader_v1.19.113.bin
rkdeveloptool cs 9                                              # select SPI NOR — do NOT skip
rkdeveloptool ef                                                # erase flash
rkdeveloptool wl 0 ~/projects/AMPere/u-boot-rockchip-spi-stock-8mb.bin
rkdeveloptool rd                                                # reboot

Lessons learned

  1. NOPping real-hardware polls = brick. Bounded retries only.
  2. Expert design review is necessary but not sufficient. A second

opinion validates the algorithm, not the implementation.

  1. Byte-level verification against source disassembly is the

cheapest intervention that catches encoder bugs. It takes an hour,

  costs nothing, and would have caught v2 before flashing.
- **UART is the only signal source** that's worth iterating against.
  Without it, each flash attempt is a 1-bit oracle that costs a
  screwdriver to read. The moment we have UART the iteration cycle
  goes from hours (brick → disconnect battery → reflash → retry) to
  minutes (flash → read UART → tweak → flash).

Files of interest

  • boltzmann:~/projects/AMPere/ — full build tree (TF-A, OP-TEE, u-boot, rkbin)
  • boltzmann:~/src/rk3588-ddr-decompiled/ — analysis artifacts, patchers, emu
  • ohm:~/projects/AMPere/ — recovery kit (rkdeveloptool + stock SPI image + loader)

Last updated: 2026-04-15

rk3588_ddr.1776226269.txt.gz · Last modified: by 127.0.0.1