rk3588_ddr
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| rk3588_ddr [2026/04/15 15:32] – Append 2026-04-15 late evening: bootrom emu + gitea SSH + PineBuds PR #122 markus_fritsche | rk3588_ddr [2026/04/20 21:58] (current) – MVP2 session 2026-04-20 recap (matching-decomp blitz, 33/118, 15/16 poll sites) markus_fritsche | ||
|---|---|---|---|
| Line 438: | Line 438: | ||
| //Last updated: 2026-04-15 late evening// | //Last updated: 2026-04-15 late evening// | ||
| + | |||
| + | |||
| + | ===== 2026-04-15 late night: counted-loop v3 is cold-boot-broken ===== | ||
| + | |||
| + | **Project-defining finding.** The counted-loop trampoline approach (any counter | ||
| + | value we tested — 16 Ki, 1 Mi, 16 Mi iterations) **cannot** replace the stock | ||
| + | blob's infinite polls for the PHY firmware handshake that fires during F1 | ||
| + | frequency retrain on the GenBook RK3588. All-evening bisection turned out to be | ||
| + | warm-PHY illusion; cold-boot control experiments at the end revealed that only | ||
| + | stock cold-boots reliably. | ||
| + | |||
| + | ==== The warm-PHY trap ==== | ||
| + | |||
| + | Every " | ||
| + | '' | ||
| + | which only fires after '' | ||
| + | SPL into SRAM and **run a full DDR init at 2400 MHz** (visible in UART captures | ||
| + | as the '' | ||
| + | blob's '' | ||
| + | a trained PHY state where the F1-retrain code path that kills cold boots either | ||
| + | never fires or side-steps site 1. | ||
| + | |||
| + | Cold-tested '' | ||
| + | **same** '' | ||
| + | full boot. Bisection was theatre. | ||
| + | |||
| + | ==== Diagnostic chain ==== | ||
| + | |||
| + | The UART trace rewriter ended up being the tool that cracked it. Each trampoline | ||
| + | emits a unique byte to UART2 ('' | ||
| + | a colon on success exit, an exclamation on timeout exit. Typical cold-boot hang tail: | ||
| + | |||
| + | change to F1: 534MHz | ||
| + | 0:1!2:3:4: | ||
| + | (hang) | ||
| + | |||
| + | Reads: site 0 succeeded, site 1 **timed out**, sites 2-4 succeeded, then hang | ||
| + | somewhere after site 4 (no trampoline → no marker). | ||
| + | |||
| + | **Site 1 context** (blob offset '' | ||
| + | |||
| + | 7b90: orr w0, w0, #0x2 | ||
| + | 7b94: str w0, [x26, # | ||
| + | 7b98: mov w0, # | ||
| + | 7b9c: ldr w1, [x26, # | ||
| + | 7ba0: bics wzr, w0, w1 ; body[1]: flags | ||
| + | 7ba4: B.NE 0x7b9c | ||
| + | |||
| + | Register '' | ||
| + | uMCTL2 territory. Stock infinite-poll always succeeds cold; our 1 Mi and 16 Mi | ||
| + | counted loops both time out every time. | ||
| + | |||
| + | ==== Likely root cause ==== | ||
| + | |||
| + | The PHY firmware state machine is sensitive to either the polling cadence or | ||
| + | the CPU-cycle count before the first LDR. Our trampoline adds a 3-instruction | ||
| + | UART-marker prolog + 1-instruction counter init ≈ 10 cycles of extra latency | ||
| + | before the first read. Stock has zero extra cycles between the '' | ||
| + | caller and the '' | ||
| + | reads arrive inside a specific window, our prolog pushes the first read outside | ||
| + | that window and the handshake silently aborts — no subsequent polling recovers. | ||
| + | |||
| + | Not proven (tonight didn't have time to build a non-trace counter-bump variant | ||
| + | and cold-test it to isolate UART-marker latency from counter-logic latency), | ||
| + | but the evidence pattern fits: stock works, trace-enabled variants fail, counter | ||
| + | size doesn' | ||
| + | count before first read is. | ||
| + | |||
| + | ==== Shipping deliverables ==== | ||
| + | |||
| + | Tonight we built working tooling. A working **fix** is future work. | ||
| + | |||
| + | * '' | ||
| + | * '' | ||
| + | skip and DW_apb_uart shim; prints byte-identical DDR banner to real hardware. | ||
| + | * '' | ||
| + | '' | ||
| + | * '' | ||
| + | * Meitner '' | ||
| + | |||
| + | ==== Methodology lessons (captured in memory) ==== | ||
| + | |||
| + | * **Warm-PHY illusion** — '' | ||
| + | baseline BEFORE bisecting any hardware init bug. '' | ||
| + | warm boot, not a cold boot — results are not portable to cold deployment. | ||
| + | * Linear bisection that looks "too clean for a hard problem" | ||
| + | methodology leak. Tonight' | ||
| + | boots, 0-12 hangs'' | ||
| + | |||
| + | ==== Next session direction ==== | ||
| + | |||
| + | Re-scope from "patch all 16 timeout-less polls" to "patch only the safe subset": | ||
| + | |||
| + | - Read each site's body + base register, cross-reference with TRM §2.4 + | ||
| + | Synopsys DWC uMCTL2 docs. | ||
| + | - Classify: PHY-firmware handshake polls (DO NOT patch) vs SGRF/ | ||
| + | BUS_GRF polls (safe to patch). | ||
| + | - Rebuild subset patcher, cold-test. If a non-empty safe subset exists, ship that. | ||
| + | |||
| + | Stock stays on the GenBook SPI as the reliable cold-boot variant. Board is | ||
| + | currently running Arch from stock. | ||
| + | |||
| + | //Last updated: 2026-04-15 23:51// | ||
| + | |||
| + | |||
| + | ===== 2026-04-16: MVP1 delivered — root cause was reseating ===== | ||
| + | |||
| + | The original "board craps out at 2400 MHz" problem that started the entire | ||
| + | MegabitChip project was **hardware, not firmware**. Two physical interventions | ||
| + | resolved it: | ||
| + | |||
| + | - **Reseating the CM5 module** in its PCIe-style socket → restored LPDDR5 | ||
| + | signal integrity at 2400 MT/s. User confirmed: " | ||
| + | - **Copperfield copper-shim cooling mod** → improved thermal margin at | ||
| + | elevated temps. | ||
| + | |||
| + | After reseating + swapping to the stock 2400 MHz DDR blob | ||
| + | ('' | ||
| + | reliably at 2400 MHz, survives full kernel compiles at 84 °C avg core temp, | ||
| + | and passes '' | ||
| + | |||
| + | ==== MVP1 shipped deliverables ==== | ||
| + | |||
| + | ^ Deliverable ^ Location ^ Status ^ | ||
| + | | Unicorn blob emulator | '' | ||
| + | | SPI pre-flash validator | '' | ||
| + | | UART trace rewriter | '' | ||
| + | | Configurable counted-loop patcher | '' | ||
| + | | GenBook flash pipeline | '' | ||
| + | | Ghidra LLM auto-renamer | '' | ||
| + | | Cold-boot methodology | '' | ||
| + | | UART capture archive | '' | ||
| + | | 2400 MHz stock GenBook SPI | '' | ||
| + | |||
| + | ==== MVP2 goal ==== | ||
| + | |||
| + | Boot from **source-regenerated blob**: matching-decomp all 118 functions → | ||
| + | clang recompile → byte-identical binary → then **modify**. Currently at 1/118 | ||
| + | functions matched ('' | ||
| + | community can rewrite training algorithms, expose OC knobs, and do things | ||
| + | Rockchip never intended. Question of principle. | ||
| + | |||
| + | //Last updated: 2026-04-16 00:xx// | ||
| + | |||
| + | ====== MVP2 session 2026-04-20 — matching-decomp blitz ====== | ||
| + | |||
| + | Single session, **1/118 → 33/118 functions matching-decomped**. | ||
| + | Canonical compile line settled + poll-site coverage jumped to 15/16. | ||
| + | |||
| + | ===== Canonical compile line ===== | ||
| + | |||
| + | <code bash> | ||
| + | clang -O2 -ffreestanding -mgeneral-regs-only \ | ||
| + | [-fno-pic] | ||
| + | [-fno-builtin] | ||
| + | [-fno-unroll-loops] # for small fixed-count loops | ||
| + | </ | ||
| + | |||
| + | * **Hard required:** '' | ||
| + | FPU/NEON enabled; any '' | ||
| + | Without the flag, clang' | ||
| + | with 128-bit NEON ldp/stp (observed on FUN_00000ac8: | ||
| + | Neon vs 112 B scalar vendor). | ||
| + | * '' | ||
| + | helpers (FUN_000027e0) gcc byte-matches vendor where clang | ||
| + | picks different register allocation. | ||
| + | |||
| + | ===== Workspace ===== | ||
| + | |||
| + | All lifts live in '' | ||
| + | with 5 files each: | ||
| + | |||
| + | * '' | ||
| + | '' | ||
| + | * '' | ||
| + | * '' | ||
| + | * '' | ||
| + | * '' | ||
| + | |||
| + | ===== Poll-site coverage: 4/16 → 15/16 ===== | ||
| + | |||
| + | ^ site ^ containing fn ^ benchmark dir ^ semantic role ^ | ||
| + | | 0 | FUN_00007730 | 15_site0_block | PHY train interlock disable | | ||
| + | | 1 | FUN_00007730 | 14_site1_block | DFI shadow handshake (bit 1 / 4-lane ack) | | ||
| + | | 2 | FUN_00007730 | 07_site2_block | Enter Normal operating-mode | | ||
| + | | 3 | FUN_00007730 | 11_site3_block | DDRCTL_DFISTAT bits[2:1] clear | | ||
| + | | 4 | FUN_00007730 | 18_site4_block | Enter Self-refresh | | ||
| + | | 5 | FUN_00007730 | 19_site5_block | Wait selfref_type == auto | | ||
| + | | 6 | FUN_00007730 | 20_site6_block | DFI shadow handshake (bit 0 / 2-lane ack) | | ||
| + | | 7 | FUN_00007730 | 21_site7_block | Exit Self-refresh | | ||
| + | | 8 | FUN_00008b40 | 35_site8_block | Enable auto-ctrlupd + wait Normal | | ||
| + | | 9 | FUN_00009a90 | 40_site9_block | Exit SREF, 2-bit variant | | ||
| + | | 10 | FUN_00009a90 | **pending** | absolute 0xff000024 access — SRAM mirror? | | ||
| + | | 11 | FUN_0000d10c | 05_prep_freq_change | wait PHY state 1 | | ||
| + | | 12-15 | FUN_0000d328 | 04_train_phy_block | PHY training step | | ||
| + | |||
| + | Only **site 10** remains — sits in the 9044-byte FUN_00009a90 monster, | ||
| + | uses an absolute address (not a ch_base + offset) so needs wider | ||
| + | context before extraction. | ||
| + | |||
| + | ===== Highlights — what landed this session ===== | ||
| + | |||
| + | * **FUN_00002340** — MR-submit (TRM-verified DDRCTL_MRCTRL0/ | ||
| + | registers). Highest-leverage dispatcher callee; every MR write | ||
| + | in FUN_6c8c (LP4/x) and FUN_6d90 (LP5) goes through this. | ||
| + | * **FUN_0000337c** — freq→timing LUT. LP5 thresholds 533/ | ||
| + | 2133 MHz, LP4 thresholds 400/ | ||
| + | into the blob's 0x11C78/ | ||
| + | * **FUN_00006c8c** (LP4/x) + **FUN_00006d90** (LP5) — MR dispatch. | ||
| + | 6d90 compiled to **exactly 364 B** matching vendor (size-exact). | ||
| + | Together: 16 MR writes per per-channel-per-rank iteration. | ||
| + | * **FUN_00000ac8** — memcpy_aligned with same-ptr shortcut and | ||
| + | 8-byte fast path. | ||
| + | * **FUN_00000b38** — xorshift-seeded buffer hash, seed 0x47C6A7E6 | ||
| + | (DJB-variant with XOR fold). | ||
| + | * **FUN_00000b88** — ATAGS magic validator, accepts {0, | ||
| + | 0x54410001} ∪ [0x54410050, | ||
| + | * **FUN_00000bd8** — SRAM_BOOT range + overflow validator for | ||
| + | ATAGS reads (SRAM window 0x1FE000..0x200000, | ||
| + | * **Print chain closed:** | ||
| + | - '' | ||
| + | - '' | ||
| + | - '' | ||
| + | * **Timer chain closed:** | ||
| + | - '' | ||
| + | - '' | ||
| + | * **Prep/ | ||
| + | restore, with matching save-area offsets 0x238/ | ||
| + | 0x248/ | ||
| + | * **FUN_0000cb44** (1088 B training-timing pack) — **full port** | ||
| + | from Ghidra decompile. Compiles clean with -Wall -Wextra at | ||
| + | 944 B. The −13 memory-op delta vs vendor is clang' | ||
| + | RAM-access coalescing. **Cross-validation under blob_emu.py | ||
| + | still pending — backlog item #36.** | ||
| + | |||
| + | ===== Context-map decoded ===== | ||
| + | |||
| + | '' | ||
| + | 208-byte ctx struct — decoded as the blob's RK3588 physical-address | ||
| + | dictionary: | ||
| + | |||
| + | ^ ctx offset ^ value ^ role ^ | ||
| + | | 0x00..0x60 (stride 0x20) | 0xF7..0xFA000000 | 4-ch DDR channel bases | | ||
| + | | 0x08..0x68 | 0xFE0C..0x0F0000 | 4-ch CRU-DDR | | ||
| + | | 0x10..0x70 | 0xFD80..0x0C000 | 4-ch DDRPHY (16K stride) | | ||
| + | | 0x18..0x78 | 0xFE00..0x06000 | 4-ch DDRCTL (8K stride) | | ||
| + | | 0x80 | 0xFD58A000 | GRF sideband | | ||
| + | | 0x88 | 0xFD7C0000 | CRU | | ||
| + | | 0x90 | 0xFD59E000 | GRF alt | | ||
| + | | 0x98 | 0xFD586000 | GRF (3rd) | | ||
| + | | 0xA0 | 0xFD587000 | GRF (4th) | | ||
| + | | 0xB8 | 0xFD8D0000 | GRF DDR | | ||
| + | | 0xC0 | 0xFD588000 | GRF (5th) | | ||
| + | | **0xC8** | **0xFD59C000** | **DMC sec_a** (prep/ | ||
| + | | **0xD0** | **0xFD59D000** | **DMC sec_b** | | ||
| + | |||
| + | Confirms: the secondary-table pointers used in prep_freq_change, | ||
| + | restore_freq_change, | ||
| + | Memory Controller) timing-register regions at 0xFD59C000/ | ||
| + | — Rockchip-vendor register islands separate from the uMCTL2 DDRCTL | ||
| + | block. | ||
| + | |||
| + | ===== Strings decoded ===== | ||
| + | |||
| + | | offset | content | | ||
| + | | 0x10C36 | ''" | ||
| + | | 0x10C4C | ''" | ||
| + | | 0x10DA4 | ''" | ||
| + | | 0x113D1 | ''", | ||
| + | | 0x11491 | ''" | ||
| + | | 0x114E9 | ''" | ||
| + | | 0x114F2 | ''" | ||
| + | |||
| + | ===== Caveat — to validate before relying on ===== | ||
| + | |||
| + | '' | ||
| + | full port of the Ghidra decompile. Compiles clean at 944 B. The | ||
| + | −13 memory-op delta vs vendor is clang' | ||
| + | coalescing for a non-volatile struct — post-function RAM state | ||
| + | should match, but **hasn' | ||
| + | |||
| + | **Backlog item #36** = "Run both vendor and candidate under | ||
| + | blob_emu.py with identical input state (ctx, ch_idx, ch_array_base) | ||
| + | and compare post-function RAM state at ctx+ch_idx*0x6C and | ||
| + | target+0x10..0x24." | ||
| + | |||
| + | ===== Backlog staged ===== | ||
| + | |||
| + | Next 10 units (tasks #37–46 in session state, of which tasks 37–43 | ||
| + | are **complete as of EOD 2026-04-20**): | ||
| + | |||
| + | * 37 FUN_000104b8 puts ✔ | ||
| + | * 38 FUN_000104f8 print_decimal ✔ | ||
| + | * 39 FUN_00010a38 udelay ✔ | ||
| + | * 40 site-9 poll block ✔ | ||
| + | * 41 FUN_00000e5c freq_log ✔ | ||
| + | * 42 FUN_00010a70 system_timer_init ✔ | ||
| + | * 43 FUN_00002110 dram_type → timing base ✔ | ||
| + | * 44 FUN_0000bf7c (tiny thunk) | ||
| + | * 45 FUN_000016bc | ||
| + | * 46 FUN_00002e88 | ||
| + | |||
| + | After those, the larger targets still on the shelf: | ||
| + | |||
| + | * site 10 extraction (FUN_00009a90 body) | ||
| + | * FUN_000027f8 (508 B, 7730-callee) | ||
| + | * FUN_00005540 (2636 B monster) | ||
| + | * FUN_00009a90 non-site-9/ | ||
| + | * FUN_00008b40 non-site-8 body (~2100 B) | ||
| + | |||
| + | ===== Numbers ===== | ||
| + | |||
| + | | metric | start of session | end | | ||
| + | | matching-decomp units | 1 | 33 (7 more in-flight tonight) | | ||
| + | | poll-sites covered | 4/16 | 15/16 | | ||
| + | | benchmark directories | 5 | 36+ | | ||
| + | | cumulative bytes of vendor asm lifted | ~104 B | ~6.0 KB | | ||
rk3588_ddr.1776267176.txt.gz · Last modified: by markus_fritsche
