====== RK3588 DDR Init Blob Reverse Engineering ====== Analysis of the closed-source Rockchip RK3588 DDR initialization binary blobs, decompiled with Ghidra on oppenheimer (Proxmox CT131 on data). ===== Overview ===== * **Blob:** ''rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.19.bin'' (76,704 bytes) * **Architecture:** AArch64 (64-bit ARM), runs on A76/A55 cores during early boot * **Functions:** 118 decompiled, 17,308 assembly instructions * **Tools:** Ghidra 11.3.2 headless, hbiyik/rkddr * **Source files:** boltzmann:~/src/rk3588-ddr-decompiled/ ===== Key Finding: 6 Bytes Control Frequency ===== The "fast" (2112/2400 MHz) and "conservative" (1848/2112 MHz) blobs have **identical code**. Only 6 bytes of timing data and 8 bytes of version string differ: ^ Offset ^ Fast (2112/2400) ^ Conservative (1848/2112) ^ Purpose ^ | 0x11B8C | 0x0840 | 0x0738 | LP4 frequency parameter | | 0x11BC0 | 0x0840 | 0x0738 | LP4 frequency (channel 2) | | 0x11BF4 | 0x6960 | 0x6840 | LP5 frequency parameter | This means custom DDR frequencies can be set by patching just these bytes. ===== LPDDR5 Frequency Table ===== ==== Official Rockchip Blobs ==== ^ Version ^ LP4 Freq ^ LP5 Freq ^ LP5 Data Rate ^ Status ^ | v1.09 - v1.15 | 2112 MHz | **2736 MHz** | 5472 MT/s | Dropped after v1.15 | | v1.16 - v1.19 | 2112 MHz | **2400 MHz** | 4800 MT/s | Current default | | v1.19 (conservative) | 1848 MHz | **2112 MHz** | 4224 MT/s | Safe/stable | ==== Community-Achieved Frequencies ==== ^ LP5 Clock ^ Data Rate ^ BW/channel ^ Source ^ Stability ^ | 2112 MHz | 4224 MT/s | 8.4 GB/s | Official conservative | Rock solid | | 2400 MHz | 4800 MT/s | 9.6 GB/s | Official default | Stable | | 2736 MHz | 5472 MT/s | 10.9 GB/s | Old official (v1.15) | Dropped by Rockchip, works on good modules | | 3200 MHz | 6400 MT/s | 12.8 GB/s | Community (rkddr tool) | Requires SK Hynix rated modules | ==== JEDEC LPDDR5 Speed Grades ==== ^ Speed Grade ^ Data Rate ^ Clock ^ Notes ^ | LPDDR5-3200 | 3200 MT/s | 1600 MHz | Minimum spec | | LPDDR5-4267 | 4267 MT/s | 2133 MHz | ≈ conservative blob | | LPDDR5-4800 | 4800 MT/s | 2400 MHz | = default blob | | LPDDR5-5500 | 5500 MT/s | 2750 MHz | ≈ 2736 blob, TRM "optimized" | | LPDDR5-6400 | 6400 MT/s | 3200 MHz | Max JEDEC, community OC | ===== MMIO Register Map ===== The blob accesses 79 unique hardware registers across 9 blocks: ^ Address Range ^ Block ^ Registers ^ Purpose ^ | 0xFD588xxx | PMU1_GRF | 1 | DDR training status | | 0xFD598xxx | DDR_GRF_CH2 | 1 | Channel 2 config | | 0xFD5F4/8xxx | BUS_GRF | 27 | DDR bus interconnect, AXI routing, QoS | | 0xFD8C8xxx | SCRU | 4 | DDR PLL (DPLL) clock gate/reset/config | | 0xFE010xxx | DDRC_CH0 | 4 | Synopsys UMCTL2 controller | | 0xFE030xxx | FIREWALL_DDR | 1 | Memory access control | | 0xFE050xxx | SGRF | 9 | Security - DDR region permissions | | 0xFECC0xxx | Unknown | 4 | Possibly DDR scramble/ECC | | 0xFF000xxx | SRAM | 1 | Boot mailbox | Base addresses verified against RK3588 TRM Part 2 and Linux kernel DT sources. ===== Potential Bugs ===== - **No timeout on hardware polls:** ''FUN_000000e4'' polls SGRF status (0xFE0500E0) in a tight loop. If SGRF doesn't respond, the system hangs permanently during boot. - **Firewall opened wide:** ''_DAT_fe030040 |= 0xffff'' opens all DDR firewall masters during init and never re-restricts them. - **Single-channel direct access:** Only DDRC CH0 (0xFE01xxxx) is accessed directly. Channels 1-3 are configured via broadcast through BUS_GRF. ===== DDR Training Flow ===== - DDR blob loaded by BL2 (TPL) during early boot - Configures DPLL via SCRU registers (0xFD7D0000) - Opens DDR firewall and SGRF for access - Configures BUS_GRF (27 registers — DDR bus interconnect) - Runs PHY training at configured frequency - Trains 6 frequency steps (main + 5 alternatives) for DVFS - Writes results to PMU GRF OS registers - Linux devfreq (rockchip-dfi driver) reads these for runtime frequency scaling ===== Tools ===== * **[[https://github.com/hbiyik/rkddr|rkddr]]** — TUI tool to edit DDR blob parameters directly on the board. Supports any frequency + ODT/drive strength. Saves to eMMC/SPI flash IDB. * **ddrbin_tool** (in rkbin/tools/) — Rockchip's official blob configuration tool. * **Manual patching** — Change 6 bytes in data section as documented above. * **Device tree overlay** — ''rockchip-rk3588-dmc-oc-3500mhz'' enables frequency steps up to 3200 MHz for devfreq. ===== Practical: Overclocking DDR on Rock 5 ITX+ ===== Check your DRAM module: cat /sys/bus/platform/drivers/rockchip-dmc/dmc/devfreq/dmc/available_frequencies * **SK Hynix** LPDDR5 modules are rated for 6400 MT/s — safe to try 2736 or 3200 * **Samsung** varies — some 5500, some 6400 * **Micron** — typically 5500 MT/s max Recommendation: try the old v1.15 blob (2736 MHz) first. If stable, use rkddr for 3200 MHz with stress testing (''stressapptest''). Impact for LLM inference: 2400→3200 MHz = ~33% more memory bandwidth = proportional tok/s improvement on memory-bound workloads. ===== Files ===== All analysis files on ''boltzmann:~/src/rk3588-ddr-decompiled/'': * ''ddr_decompiled.c'' — Decompiled C (fast blob, 118 functions, 11,923 lines) * ''ddr_conservative_decompiled.c'' — Decompiled C (conservative blob) * ''ddr_diff.txt'' — Diff between fast and conservative * ''ddr_fast_asm.s'' / ''ddr_conservative_asm.s'' — Full disassembly (17,308 lines each) * ''rk3588_ddr.h'' — Register definitions header (TRM-verified) * ''rk3588_regs_annotated.h'' — All 79 MMIO registers with block annotations * ''DDR_FREQUENCY_TABLE.md'' — Complete frequency table * ''ANALYSIS.md'' — Full analysis report * Ghidra project on ''oppenheimer'' (CT131 on data): ''/opt/work/ghidra_project/'' ---- //Generated by Claude Code, 2026-04-03// ===== What is DDR Training? ===== DDR training is the calibration process where the memory controller and PHY find the **optimal timing window** to reliably communicate with DRAM chips. At 2400-3200 MHz (4800-6400 MT/s), signal integrity is the primary challenge. ==== Why Training is Needed ==== Electrical signals on PCB traces experience: * **Propagation delay** — different trace lengths = different arrival times * **Crosstalk** — adjacent signals interfere * **ISI** — previous bit values affect current bit shape * **PVT variation** — process, voltage, temperature shift timing * **Impedance mismatch** — causes reflections that distort signals Training compensates by finding the "**eye**" — the timing/voltage window where data is reliably captured — for each signal individually. ==== Training Stages (from decompiled code) ==== The RK3588 uses a **Synopsys DWC (DesignWare Core) LPDDR5/4X multiPHY**. The training sequence in the blob: - **ZQ Calibration** — Calibrates output driver impedance. Polls ''CalBusy'' (PHY offset 0x684, 11 uses in code). - **Write Leveling** — Aligns DQS strobe with clock at the DRAM. Loops over 16 DQ bits. - **Read Gate Training** — Finds correct time to capture read data. Polls ''DfiStatus'' (offset 0xA24, 65 uses — most-used register). - **Read/Write DQ Training** — Per-bit timing adjustment using patterns ''0xAA55AA55'' / ''0x55AA55AA'' (written to PHY offsets 0x93C-0x970). - **Eye Training** — Scans delay+voltage range for maximum margin. The "eyescan" blob variant does extended analysis. - **VREF Training** — Finds optimal voltage threshold. Uses PHY offsets 0x600/0x608/0x60C (67 combined uses). - **CA Training** — Calibrates command/address bus timing. ==== Why Training Runs Every Boot ==== Results depend on temperature (shifts ~1-2 ps/°C), voltage, DRAM internal state, and component aging. Results are stored in SRAM (0x001FE000) and passed to the kernel via PMU GRF for DVFS. ===== Bug Analysis ===== ==== CRITICAL: 20 Timeout-less Hardware Polls ==== The most serious bug class: ''do {} while'' loops polling hardware registers **indefinitely**. If hardware doesn't respond, the system **hangs permanently** during boot. ^ Register ^ PHY Offset ^ Polls ^ Waits For ^ | SGRF_DDR_STATUS | 0xFE0500E0 | 1 | Security GRF ready | | SGRF_DDR_CON21 | 0xFE050054 | 2 | SGRF config done | | DfiStatus | +0xA24 | 4 | DFI interface ready | | MicroContMuxSel | +0x10090 | 4 | PHY firmware mailbox | | MicroReset | +0x10080 | 2 | PHY firmware reset | | UctWriteProtShadow | +0x10514 | 5 | Training status | | CalBusy | +0x684 | 1 | ZQ calibration | **Impact:** Cold boot failures, hangs at extreme temperatures, power supply issues during training. **Fix:** Add timeout counters. The code already has error return paths (23 instances of ''return 0xFFFFFFFF'') — the polls just don't use them. ==== WARNING: Firewall Left Open on Error ==== ''ddr_open_firewall()'' grants all bus masters DDR access (''FW_DDR |= 0xFFFF''). The matching close may not be called on all error paths. ==== WARNING: No Selective Retry ==== Training failure restarts the **entire sequence** from scratch. No selective retry (e.g., "only redo read gate training"). Each failure costs a full retrain (~100ms). ==== Code Metrics ==== ^ Metric ^ Value ^ | Total lines | 11,977 | | Functions | 118 | | Loops | ~341 | | Branches | ~1,725 | | MMIO registers | 79 | | Error returns | 23 / 1,405 checks (1.6%) | | PHY register uses | DfiStatus (0xA24): 65 uses (most frequent) | ===== Synopsys DWC PHY Training Sequence ===== The RK3588 uses a **Synopsys DWC LPDDR5/4X multiPHY** (DWC_LPDDR54_PHY). The training stages map to specific register offsets found in the decompiled code: ^ PHY Offset ^ Synopsys Name ^ Stage ^ Uses in Code ^ | +0x684 | CalBusy | ZQ Calibration | 11 | | +0xA24 | DfiStatus | DFI ready / gate training | 65 | | +0x600/608/60C | VrefDAC | VREF training | 67 | | +0x10080 | MicroReset | PHY firmware control | 13 | | +0x10090 | MicroContMuxSel | Firmware ↔ APB mux | many | | +0x10180 | AcsmPlayback | CA training | 26 | | +0x10514 | UctWriteProtShadow | Training complete status | 28 | ==== The 0xAA55AA55 Training Pattern ==== Written to PHY offsets 0x93C-0x970, this alternating bit pattern maximizes switching noise and crosstalk — the worst-case scenario for signal integrity testing. Variations (0xAAAA5555, 0x55AA55AA) stress different inter-bit coupling scenarios on the PCB. ===== Community Research ===== * **Why 2736 MHz was dropped:** Narrow PHY eye margins across varying DRAM batches (SK Hynix vs Samsung vs Micron) * **v1.18 single-rank LPDDR5 crash:** Incorrect derate timing for MR4 on single-rank configs caused DVFS hangs * **Cold boot failures:** Consistent with 20 timeout-less polls found in this analysis * **LPDDR5 bandwidth paradox:** LPDDR5 showed worse latency than LPDDR4X at same data rates due to WCK synchronization overhead * **No open-source DDR init planned:** Collabora confirmed Rockchip has "no plan" for open-sourcing DDR training Full research with 40+ sources in ''boltzmann:~/src/rk3588-ddr-decompiled/COMMUNITY_RESEARCH.md'' ---- //Generated by Claude Code, 2026-04-03. Analysis performed on oppenheimer (Proxmox CT131).//