Table of Contents
RK3588 DDR Init Blob Reverse Engineering
Analysis of the closed-source Rockchip RK3588 DDR initialization binary blobs, decompiled with Ghidra on oppenheimer (Proxmox CT131 on data).
Overview
- Blob:
rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin(76,704 bytes) - Architecture: AArch64 (64-bit ARM), runs on A76/A55 cores during early boot
- Functions: 118 decompiled, 17,308 assembly instructions
- Tools: Ghidra 11.3.2 headless, hbiyik/rkddr
- Source files: boltzmann:~/src/rk3588-ddr-decompiled/
Key Finding: 6 Bytes Control Frequency
The “fast” (2112/2400 MHz) and “conservative” (1848/2112 MHz) blobs have identical code. Only 6 bytes of timing data and 8 bytes of version string differ:
| Offset | Fast (2112/2400) | Conservative (1848/2112) | Purpose |
|---|---|---|---|
| 0x11B8C | 0x0840 | 0x0738 | LP4 frequency parameter |
| 0x11BC0 | 0x0840 | 0x0738 | LP4 frequency (channel 2) |
| 0x11BF4 | 0x6960 | 0x6840 | LP5 frequency parameter |
This means custom DDR frequencies can be set by patching just these bytes.
LPDDR5 Frequency Table
Official Rockchip Blobs
| Version | LP4 Freq | LP5 Freq | LP5 Data Rate | Status |
|---|---|---|---|---|
| v1.09 - v1.15 | 2112 MHz | 2736 MHz | 5472 MT/s | Dropped after v1.15 |
| v1.16 - v1.19 | 2112 MHz | 2400 MHz | 4800 MT/s | Current default |
| v1.19 (conservative) | 1848 MHz | 2112 MHz | 4224 MT/s | Safe/stable |
Community-Achieved Frequencies
| LP5 Clock | Data Rate | BW/channel | Source | Stability |
|---|---|---|---|---|
| 2112 MHz | 4224 MT/s | 8.4 GB/s | Official conservative | Rock solid |
| 2400 MHz | 4800 MT/s | 9.6 GB/s | Official default | Stable |
| 2736 MHz | 5472 MT/s | 10.9 GB/s | Old official (v1.15) | Dropped by Rockchip, works on good modules |
| 3200 MHz | 6400 MT/s | 12.8 GB/s | Community (rkddr tool) | Requires SK Hynix rated modules |
JEDEC LPDDR5 Speed Grades
| Speed Grade | Data Rate | Clock | Notes |
|---|---|---|---|
| LPDDR5-3200 | 3200 MT/s | 1600 MHz | Minimum spec |
| LPDDR5-4267 | 4267 MT/s | 2133 MHz | ≈ conservative blob |
| LPDDR5-4800 | 4800 MT/s | 2400 MHz | = default blob |
| LPDDR5-5500 | 5500 MT/s | 2750 MHz | ≈ 2736 blob, TRM “optimized” |
| LPDDR5-6400 | 6400 MT/s | 3200 MHz | Max JEDEC, community OC |
MMIO Register Map
The blob accesses 79 unique hardware registers across 9 blocks:
| Address Range | Block | Registers | Purpose |
|---|---|---|---|
| 0xFD588xxx | PMU1_GRF | 1 | DDR training status |
| 0xFD598xxx | DDR_GRF_CH2 | 1 | Channel 2 config |
| 0xFD5F4/8xxx | BUS_GRF | 27 | DDR bus interconnect, AXI routing, QoS |
| 0xFD8C8xxx | SCRU | 4 | DDR PLL (DPLL) clock gate/reset/config |
| 0xFE010xxx | DDRC_CH0 | 4 | Synopsys UMCTL2 controller |
| 0xFE030xxx | FIREWALL_DDR | 1 | Memory access control |
| 0xFE050xxx | SGRF | 9 | Security - DDR region permissions |
| 0xFECC0xxx | Unknown | 4 | Possibly DDR scramble/ECC |
| 0xFF000xxx | SRAM | 1 | Boot mailbox |
Base addresses verified against RK3588 TRM Part 2 and Linux kernel DT sources.
Potential Bugs
- No timeout on hardware polls:
FUN_000000e4polls SGRF status (0xFE0500E0) in a tight loop. If SGRF doesn't respond, the system hangs permanently during boot. - Firewall opened wide:
_DAT_fe030040 |= 0xffffopens all DDR firewall masters during init and never re-restricts them. - Single-channel direct access: Only DDRC CH0 (0xFE01xxxx) is accessed directly. Channels 1-3 are configured via broadcast through BUS_GRF.
DDR Training Flow
- DDR blob loaded by BL2 (TPL) during early boot
- Configures DPLL via SCRU registers (0xFD7D0000)
- Opens DDR firewall and SGRF for access
- Configures BUS_GRF (27 registers — DDR bus interconnect)
- Runs PHY training at configured frequency
- Trains 6 frequency steps (main + 5 alternatives) for DVFS
- Writes results to PMU GRF OS registers
- Linux devfreq (rockchip-dfi driver) reads these for runtime frequency scaling
Tools
- rkddr — TUI tool to edit DDR blob parameters directly on the board. Supports any frequency + ODT/drive strength. Saves to eMMC/SPI flash IDB.
- ddrbin_tool (in rkbin/tools/) — Rockchip's official blob configuration tool.
- Manual patching — Change 6 bytes in data section as documented above.
- Device tree overlay —
rockchip-rk3588-dmc-oc-3500mhzenables frequency steps up to 3200 MHz for devfreq.
Practical: Overclocking DDR on Rock 5 ITX+
Check your DRAM module:
cat /sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/available_frequencies
- SK Hynix LPDDR5 modules are rated for 6400 MT/s — safe to try 2736 or 3200
- Samsung varies — some 5500, some 6400
- Micron — typically 5500 MT/s max
Recommendation: try the old v1.15 blob (2736 MHz) first. If stable, use rkddr for 3200 MHz with stress testing (stressapptest).
Impact for LLM inference: 2400→3200 MHz = ~33% more memory bandwidth = proportional tok/s improvement on memory-bound workloads.
Files
All analysis files on boltzmann:~/src/rk3588-ddr-decompiled/:
ddr_decompiled.c— Decompiled C (fast blob, 118 functions, 11,923 lines)ddr_conservative_decompiled.c— Decompiled C (conservative blob)ddr_diff.txt— Diff between fast and conservativeddr_fast_asm.s/ddr_conservative_asm.s— Full disassembly (17,308 lines each)rk3588_ddr.h— Register definitions header (TRM-verified)rk3588_regs_annotated.h— All 79 MMIO registers with block annotationsDDR_FREQUENCY_TABLE.md— Complete frequency tableANALYSIS.md— Full analysis report- Ghidra project on
oppenheimer(CT131 on data):/opt/work/ghidra_project/
Generated by Claude Code, 2026-04-03
What is DDR Training?
DDR training is the calibration process where the memory controller and PHY find the optimal timing window to reliably communicate with DRAM chips. At 2400-3200 MHz (4800-6400 MT/s), signal integrity is the primary challenge.
Why Training is Needed
Electrical signals on PCB traces experience:
- Propagation delay — different trace lengths = different arrival times
- Crosstalk — adjacent signals interfere
- ISI — previous bit values affect current bit shape
- PVT variation — process, voltage, temperature shift timing
- Impedance mismatch — causes reflections that distort signals
Training compensates by finding the “eye” — the timing/voltage window where data is reliably captured — for each signal individually.
Training Stages (from decompiled code)
The RK3588 uses a Synopsys DWC (DesignWare Core) LPDDR5/4X multiPHY. The training sequence in the blob:
- ZQ Calibration — Calibrates output driver impedance. Polls
CalBusy(PHY offset 0x684, 11 uses in code). - Write Leveling — Aligns DQS strobe with clock at the DRAM. Loops over 16 DQ bits.
- Read Gate Training — Finds correct time to capture read data. Polls
DfiStatus(offset 0xA24, 65 uses — most-used register). - Read/Write DQ Training — Per-bit timing adjustment using patterns
0xAA55AA55/0x55AA55AA(written to PHY offsets 0x93C-0x970). - Eye Training — Scans delay+voltage range for maximum margin. The “eyescan” blob variant does extended analysis.
- VREF Training — Finds optimal voltage threshold. Uses PHY offsets 0x600/0x608/0x60C (67 combined uses).
- CA Training — Calibrates command/address bus timing.
Why Training Runs Every Boot
Results depend on temperature (shifts ~1-2 ps/°C), voltage, DRAM internal state, and component aging. Results are stored in SRAM (0x001FE000) and passed to the kernel via PMU GRF for DVFS.
Bug Analysis
CRITICAL: 20 Timeout-less Hardware Polls
The most serious bug class: do {} while loops polling hardware registers indefinitely. If hardware doesn't respond, the system hangs permanently during boot.
| Register | PHY Offset | Polls | Waits For |
|---|---|---|---|
| SGRF_DDR_STATUS | 0xFE0500E0 | 1 | Security GRF ready |
| SGRF_DDR_CON21 | 0xFE050054 | 2 | SGRF config done |
| DfiStatus | +0xA24 | 4 | DFI interface ready |
| MicroContMuxSel | +0x10090 | 4 | PHY firmware mailbox |
| MicroReset | +0x10080 | 2 | PHY firmware reset |
| UctWriteProtShadow | +0x10514 | 5 | Training status |
| CalBusy | +0x684 | 1 | ZQ calibration |
Impact: Cold boot failures, hangs at extreme temperatures, power supply issues during training.
Fix: Add timeout counters. The code already has error return paths (23 instances of return 0xFFFFFFFF) — the polls just don't use them.
WARNING: Firewall Left Open on Error
ddr_open_firewall() grants all bus masters DDR access (FW_DDR |= 0xFFFF). The matching close may not be called on all error paths.
WARNING: No Selective Retry
Training failure restarts the entire sequence from scratch. No selective retry (e.g., “only redo read gate training”). Each failure costs a full retrain (~100ms).
Code Metrics
| Metric | Value |
|---|---|
| Total lines | 11,977 |
| Functions | 118 |
| Loops | ~341 |
| Branches | ~1,725 |
| MMIO registers | 79 |
| Error returns | 23 / 1,405 checks (1.6%) |
| PHY register uses | DfiStatus (0xA24): 65 uses (most frequent) |
Synopsys DWC PHY Training Sequence
The RK3588 uses a Synopsys DWC LPDDR5/4X multiPHY (DWC_LPDDR54_PHY). The training stages map to specific register offsets found in the decompiled code:
| PHY Offset | Synopsys Name | Stage | Uses in Code |
|---|---|---|---|
| +0x684 | CalBusy | ZQ Calibration | 11 |
| +0xA24 | DfiStatus | DFI ready / gate training | 65 |
| +0x600/608/60C | VrefDAC | VREF training | 67 |
| +0x10080 | MicroReset | PHY firmware control | 13 |
| +0x10090 | MicroContMuxSel | Firmware ↔ APB mux | many |
| +0x10180 | AcsmPlayback | CA training | 26 |
| +0x10514 | UctWriteProtShadow | Training complete status | 28 |
The 0xAA55AA55 Training Pattern
Written to PHY offsets 0x93C-0x970, this alternating bit pattern maximizes switching noise and crosstalk — the worst-case scenario for signal integrity testing. Variations (0xAAAA5555, 0x55AA55AA) stress different inter-bit coupling scenarios on the PCB.
Community Research
- Why 2736 MHz was dropped: Narrow PHY eye margins across varying DRAM batches (SK Hynix vs Samsung vs Micron)
- v1.18 single-rank LPDDR5 crash: Incorrect derate timing for MR4 on single-rank configs caused DVFS hangs
- Cold boot failures: Consistent with 20 timeout-less polls found in this analysis
- LPDDR5 bandwidth paradox: LPDDR5 showed worse latency than LPDDR4X at same data rates due to WCK synchronization overhead
- No open-source DDR init planned: Collabora confirmed Rockchip has “no plan” for open-sourcing DDR training
Full research with 40+ sources in boltzmann:~/src/rk3588-ddr-decompiled/COMMUNITY_RESEARCH.md
Generated by Claude Code, 2026-04-03. Analysis performed on oppenheimer (Proxmox CT131).
