User Tools

Site Tools


rk3588_ddr_analysis

RK3588 DDR Init Blob Reverse Engineering

Analysis of the closed-source Rockchip RK3588 DDR initialization binary blobs, decompiled with Ghidra on oppenheimer (Proxmox CT131 on data).

Overview

  • Blob: rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin (76,704 bytes)
  • Architecture: AArch64 (64-bit ARM), runs on A76/A55 cores during early boot
  • Functions: 118 decompiled, 17,308 assembly instructions
  • Tools: Ghidra 11.3.2 headless, hbiyik/rkddr
  • Source files: boltzmann:~/src/rk3588-ddr-decompiled/

Key Finding: 6 Bytes Control Frequency

The “fast” (2112/2400 MHz) and “conservative” (1848/2112 MHz) blobs have identical code. Only 6 bytes of timing data and 8 bytes of version string differ:

Offset Fast (2112/2400) Conservative (1848/2112) Purpose
0x11B8C 0x0840 0x0738 LP4 frequency parameter
0x11BC0 0x0840 0x0738 LP4 frequency (channel 2)
0x11BF4 0x6960 0x6840 LP5 frequency parameter

This means custom DDR frequencies can be set by patching just these bytes.

LPDDR5 Frequency Table

Official Rockchip Blobs

Version LP4 Freq LP5 Freq LP5 Data Rate Status
v1.09 - v1.15 2112 MHz 2736 MHz 5472 MT/s Dropped after v1.15
v1.16 - v1.19 2112 MHz 2400 MHz 4800 MT/s Current default
v1.19 (conservative) 1848 MHz 2112 MHz 4224 MT/s Safe/stable

Community-Achieved Frequencies

LP5 Clock Data Rate BW/channel Source Stability
2112 MHz 4224 MT/s 8.4 GB/s Official conservative Rock solid
2400 MHz 4800 MT/s 9.6 GB/s Official default Stable
2736 MHz 5472 MT/s 10.9 GB/s Old official (v1.15) Dropped by Rockchip, works on good modules
3200 MHz 6400 MT/s 12.8 GB/s Community (rkddr tool) Requires SK Hynix rated modules

JEDEC LPDDR5 Speed Grades

Speed Grade Data Rate Clock Notes
LPDDR5-3200 3200 MT/s 1600 MHz Minimum spec
LPDDR5-4267 4267 MT/s 2133 MHz ≈ conservative blob
LPDDR5-4800 4800 MT/s 2400 MHz = default blob
LPDDR5-5500 5500 MT/s 2750 MHz ≈ 2736 blob, TRM “optimized”
LPDDR5-6400 6400 MT/s 3200 MHz Max JEDEC, community OC

MMIO Register Map

The blob accesses 79 unique hardware registers across 9 blocks:

Address Range Block Registers Purpose
0xFD588xxx PMU1_GRF 1 DDR training status
0xFD598xxx DDR_GRF_CH2 1 Channel 2 config
0xFD5F4/8xxx BUS_GRF 27 DDR bus interconnect, AXI routing, QoS
0xFD8C8xxx SCRU 4 DDR PLL (DPLL) clock gate/reset/config
0xFE010xxx DDRC_CH0 4 Synopsys UMCTL2 controller
0xFE030xxx FIREWALL_DDR 1 Memory access control
0xFE050xxx SGRF 9 Security - DDR region permissions
0xFECC0xxx Unknown 4 Possibly DDR scramble/ECC
0xFF000xxx SRAM 1 Boot mailbox

Base addresses verified against RK3588 TRM Part 2 and Linux kernel DT sources.

Potential Bugs

  1. No timeout on hardware polls: FUN_000000e4 polls SGRF status (0xFE0500E0) in a tight loop. If SGRF doesn't respond, the system hangs permanently during boot.
  2. Firewall opened wide: _DAT_fe030040 |= 0xffff opens all DDR firewall masters during init and never re-restricts them.
  3. Single-channel direct access: Only DDRC CH0 (0xFE01xxxx) is accessed directly. Channels 1-3 are configured via broadcast through BUS_GRF.

DDR Training Flow

  1. DDR blob loaded by BL2 (TPL) during early boot
  2. Configures DPLL via SCRU registers (0xFD7D0000)
  3. Opens DDR firewall and SGRF for access
  4. Configures BUS_GRF (27 registers — DDR bus interconnect)
  5. Runs PHY training at configured frequency
  6. Trains 6 frequency steps (main + 5 alternatives) for DVFS
  7. Writes results to PMU GRF OS registers
  8. Linux devfreq (rockchip-dfi driver) reads these for runtime frequency scaling

Tools

  • rkddr — TUI tool to edit DDR blob parameters directly on the board. Supports any frequency + ODT/drive strength. Saves to eMMC/SPI flash IDB.
  • ddrbin_tool (in rkbin/tools/) — Rockchip's official blob configuration tool.
  • Manual patching — Change 6 bytes in data section as documented above.
  • Device tree overlayrockchip-rk3588-dmc-oc-3500mhz enables frequency steps up to 3200 MHz for devfreq.

Practical: Overclocking DDR on Rock 5 ITX+

Check your DRAM module:

cat /sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/available_frequencies
  • SK Hynix LPDDR5 modules are rated for 6400 MT/s — safe to try 2736 or 3200
  • Samsung varies — some 5500, some 6400
  • Micron — typically 5500 MT/s max

Recommendation: try the old v1.15 blob (2736 MHz) first. If stable, use rkddr for 3200 MHz with stress testing (stressapptest).

Impact for LLM inference: 2400→3200 MHz = ~33% more memory bandwidth = proportional tok/s improvement on memory-bound workloads.

Files

All analysis files on boltzmann:~/src/rk3588-ddr-decompiled/:

  • ddr_decompiled.c — Decompiled C (fast blob, 118 functions, 11,923 lines)
  • ddr_conservative_decompiled.c — Decompiled C (conservative blob)
  • ddr_diff.txt — Diff between fast and conservative
  • ddr_fast_asm.s / ddr_conservative_asm.s — Full disassembly (17,308 lines each)
  • rk3588_ddr.h — Register definitions header (TRM-verified)
  • rk3588_regs_annotated.h — All 79 MMIO registers with block annotations
  • DDR_FREQUENCY_TABLE.md — Complete frequency table
  • ANALYSIS.md — Full analysis report
  • Ghidra project on oppenheimer (CT131 on data): /opt/work/ghidra_project/

Generated by Claude Code, 2026-04-03

What is DDR Training?

DDR training is the calibration process where the memory controller and PHY find the optimal timing window to reliably communicate with DRAM chips. At 2400-3200 MHz (4800-6400 MT/s), signal integrity is the primary challenge.

Why Training is Needed

Electrical signals on PCB traces experience:

  • Propagation delay — different trace lengths = different arrival times
  • Crosstalk — adjacent signals interfere
  • ISI — previous bit values affect current bit shape
  • PVT variation — process, voltage, temperature shift timing
  • Impedance mismatch — causes reflections that distort signals

Training compensates by finding the “eye” — the timing/voltage window where data is reliably captured — for each signal individually.

Training Stages (from decompiled code)

The RK3588 uses a Synopsys DWC (DesignWare Core) LPDDR5/4X multiPHY. The training sequence in the blob:

  1. ZQ Calibration — Calibrates output driver impedance. Polls CalBusy (PHY offset 0x684, 11 uses in code).
  2. Write Leveling — Aligns DQS strobe with clock at the DRAM. Loops over 16 DQ bits.
  3. Read Gate Training — Finds correct time to capture read data. Polls DfiStatus (offset 0xA24, 65 uses — most-used register).
  4. Read/Write DQ Training — Per-bit timing adjustment using patterns 0xAA55AA55 / 0x55AA55AA (written to PHY offsets 0x93C-0x970).
  5. Eye Training — Scans delay+voltage range for maximum margin. The “eyescan” blob variant does extended analysis.
  6. VREF Training — Finds optimal voltage threshold. Uses PHY offsets 0x600/0x608/0x60C (67 combined uses).
  7. CA Training — Calibrates command/address bus timing.

Why Training Runs Every Boot

Results depend on temperature (shifts ~1-2 ps/°C), voltage, DRAM internal state, and component aging. Results are stored in SRAM (0x001FE000) and passed to the kernel via PMU GRF for DVFS.

Bug Analysis

CRITICAL: 20 Timeout-less Hardware Polls

The most serious bug class: do {} while loops polling hardware registers indefinitely. If hardware doesn't respond, the system hangs permanently during boot.

Register PHY Offset Polls Waits For
SGRF_DDR_STATUS 0xFE0500E0 1 Security GRF ready
SGRF_DDR_CON21 0xFE050054 2 SGRF config done
DfiStatus +0xA24 4 DFI interface ready
MicroContMuxSel +0x10090 4 PHY firmware mailbox
MicroReset +0x10080 2 PHY firmware reset
UctWriteProtShadow +0x10514 5 Training status
CalBusy +0x684 1 ZQ calibration

Impact: Cold boot failures, hangs at extreme temperatures, power supply issues during training.

Fix: Add timeout counters. The code already has error return paths (23 instances of return 0xFFFFFFFF) — the polls just don't use them.

WARNING: Firewall Left Open on Error

ddr_open_firewall() grants all bus masters DDR access (FW_DDR |= 0xFFFF). The matching close may not be called on all error paths.

WARNING: No Selective Retry

Training failure restarts the entire sequence from scratch. No selective retry (e.g., “only redo read gate training”). Each failure costs a full retrain (~100ms).

Code Metrics

Metric Value
Total lines 11,977
Functions 118
Loops ~341
Branches ~1,725
MMIO registers 79
Error returns 23 / 1,405 checks (1.6%)
PHY register uses DfiStatus (0xA24): 65 uses (most frequent)

Synopsys DWC PHY Training Sequence

The RK3588 uses a Synopsys DWC LPDDR5/4X multiPHY (DWC_LPDDR54_PHY). The training stages map to specific register offsets found in the decompiled code:

PHY Offset Synopsys Name Stage Uses in Code
+0x684 CalBusy ZQ Calibration 11
+0xA24 DfiStatus DFI ready / gate training 65
+0x600/608/60C VrefDAC VREF training 67
+0x10080 MicroReset PHY firmware control 13
+0x10090 MicroContMuxSel Firmware ↔ APB mux many
+0x10180 AcsmPlayback CA training 26
+0x10514 UctWriteProtShadow Training complete status 28

The 0xAA55AA55 Training Pattern

Written to PHY offsets 0x93C-0x970, this alternating bit pattern maximizes switching noise and crosstalk — the worst-case scenario for signal integrity testing. Variations (0xAAAA5555, 0x55AA55AA) stress different inter-bit coupling scenarios on the PCB.

Community Research

  • Why 2736 MHz was dropped: Narrow PHY eye margins across varying DRAM batches (SK Hynix vs Samsung vs Micron)
  • v1.18 single-rank LPDDR5 crash: Incorrect derate timing for MR4 on single-rank configs caused DVFS hangs
  • Cold boot failures: Consistent with 20 timeout-less polls found in this analysis
  • LPDDR5 bandwidth paradox: LPDDR5 showed worse latency than LPDDR4X at same data rates due to WCK synchronization overhead
  • No open-source DDR init planned: Collabora confirmed Rockchip has “no plan” for open-sourcing DDR training

Full research with 40+ sources in boltzmann:~/src/rk3588-ddr-decompiled/COMMUNITY_RESEARCH.md


Generated by Claude Code, 2026-04-03. Analysis performed on oppenheimer (Proxmox CT131).

rk3588_ddr_analysis.txt · Last modified: by 127.0.0.1