Table of Contents

MVP2 session 2026-04-20 — matching-decomp blitz

Single session, 1/118 → 33/118 functions matching-decomped. Canonical compile line settled + poll-site coverage jumped to 15/16.

Canonical compile line

clang -O2 -ffreestanding -mgeneral-regs-only \
      [-fno-pic]          # when referencing extern data symbols
      [-fno-builtin]      # when lifting memcpy/memset
      [-fno-unroll-loops] # for small fixed-count loops

FPU/NEON enabled; any q0/q1 vector insn would fault.

  Without the flag, clang's vectorizer replaces byte/word loops
  with 128-bit NEON ldp/stp (observed on FUN_00000ac8: 428 B of
  Neon vs 112 B scalar vendor).
* ''gcc -O2 -ffreestanding'' stays acceptable; on some small
  helpers (FUN_000027e0) gcc byte-matches vendor where clang
  picks different register allocation.

Workspace

All lifts live in boltzmann:~/projects/AMPere/benchmark/NN_<name>/ with 5 files each:

rkbin/bin/rk35/rk3588_ddr_lp4_1848MHz_lp5_2112MHz_v1.19.bin

Poll-site coverage: 4/16 → 15/16

site containing fn benchmark dir semantic role
0 FUN_00007730 15_site0_block PHY train interlock disable
1 FUN_00007730 14_site1_block DFI shadow handshake (bit 1 / 4-lane ack)
2 FUN_00007730 07_site2_block Enter Normal operating-mode
3 FUN_00007730 11_site3_block DDRCTL_DFISTAT bits[2:1] clear
4 FUN_00007730 18_site4_block Enter Self-refresh
5 FUN_00007730 19_site5_block Wait selfref_type == auto
6 FUN_00007730 20_site6_block DFI shadow handshake (bit 0 / 2-lane ack)
7 FUN_00007730 21_site7_block Exit Self-refresh
8 FUN_00008b40 35_site8_block Enable auto-ctrlupd + wait Normal
9 FUN_00009a90 40_site9_block Exit SREF, 2-bit variant
10 FUN_00009a90 pending absolute 0xff000024 access — SRAM mirror?
11 FUN_0000d10c 05_prep_freq_change wait PHY state 1
12-15 FUN_0000d328 04_train_phy_block PHY training step

Only site 10 remains — sits in the 9044-byte FUN_00009a90 monster, uses an absolute address (not a ch_base + offset) so needs wider context before extraction.

Highlights — what landed this session

registers). Highest-leverage dispatcher callee; every MR write

  in FUN_6c8c (LP4/x) and FUN_6d90 (LP5) goes through this.
* **FUN_0000337c** — freq→timing LUT. LP5 thresholds 533/800/1600/
  2133 MHz, LP4 thresholds 400/613/1066 MHz. Returns a pointer
  into the blob's 0x11C78/0x11CE0 data-region timing tables.
* **FUN_00006c8c** (LP4/x) + **FUN_00006d90** (LP5) — MR dispatch.
  6d90 compiled to **exactly 364 B** matching vendor (size-exact).
  Together: 16 MR writes per per-channel-per-rank iteration.
* **FUN_00000ac8** — memcpy_aligned with same-ptr shortcut and
  8-byte fast path.
* **FUN_00000b38** — xorshift-seeded buffer hash, seed 0x47C6A7E6
  (DJB-variant with XOR fold).
* **FUN_00000b88** — ATAGS magic validator, accepts {0,
  0x54410001} ∪ [0x54410050, 0x544100FF].
* **FUN_00000bd8** — SRAM_BOOT range + overflow validator for
  ATAGS reads (SRAM window 0x1FE000..0x200000, 8 KB).
* **Print chain closed:**
  - ''FUN_000104b8'' puts (CRLF-expanding)
  - ''FUN_000104f8'' recursive decimal print
  - ''FUN_00001194'' "channel[N] " dispatcher (tail-calls FUN_f60)
* **Timer chain closed:**
  - ''FUN_00010a38'' udelay via CNTPCT_EL0 + CNTFRQ_EL0
  - ''FUN_00010a70'' system_timer_init (STIMER @ 0xFD8C8000)
* **Prep/restore freq-change pair** — FUN_d10c save + FUN_d1d0
  restore, with matching save-area offsets 0x238/0x240/0x244/
  0x248/0x24C.
* **FUN_0000cb44** (1088 B training-timing pack) — **full port**
  from Ghidra decompile. Compiles clean with -Wall -Wextra at
  944 B. The −13 memory-op delta vs vendor is clang's legitimate
  RAM-access coalescing. **Cross-validation under blob_emu.py
  still pending — backlog item #36.**

Context-map decoded

FUN_0000d390 (init_ctx_pointers) writes 25 constants to the 208-byte ctx struct — decoded as the blob's RK3588 physical-address dictionary:

ctx offset value role
0x00..0x60 (stride 0x20) 0xF7..0xFA000000 4-ch DDR channel bases
0x08..0x68 0xFE0C..0x0F0000 4-ch CRU-DDR
0x10..0x70 0xFD80..0x0C000 4-ch DDRPHY (16K stride)
0x18..0x78 0xFE00..0x06000 4-ch DDRCTL (8K stride)
0x80 0xFD58A000 GRF sideband
0x88 0xFD7C0000 CRU
0x90 0xFD59E000 GRF alt
0x98 0xFD586000 GRF (3rd)
0xA0 0xFD587000 GRF (4th)
0xB8 0xFD8D0000 GRF DDR
0xC0 0xFD588000 GRF (5th)
0xC8 0xFD59C000 DMC sec_a (prep/restore + setup sec_table)
0xD0 0xFD59D000 DMC sec_b

Confirms: the secondary-table pointers used in prep_freq_change, restore_freq_change, and setup_channels point into DMC (Dynamic Memory Controller) timing-register regions at 0xFD59C000/0xFD59D000 — Rockchip-vendor register islands separate from the uMCTL2 DDRCTL block.

Strings decoded

offset content
0x10C36 “Magic is not support\n”
0x10C4C “Tag is overflow\n”
0x10DA4 “unsupported dram type\n”
0x113D1 “, ”
0x11491 MHz\n”
0x114E9 “channel[”
0x114F2 “] ”

Caveat — to validate before relying on

FUN_0000cb44 (1088 B, per-channel training-timing pack) is a full port of the Ghidra decompile. Compiles clean at 944 B. The −13 memory-op delta vs vendor is clang's legitimate RAM-access coalescing for a non-volatile struct — post-function RAM state should match, but hasn't been cross-validated under blob_emu.py.

Backlog item #36 = “Run both vendor and candidate under blob_emu.py with identical input state (ctx, ch_idx, ch_array_base) and compare post-function RAM state at ctx+ch_idx*0x6C and target+0x10..0x24.”

Backlog staged

Next 10 units (tasks #37–46 in session state, of which tasks 37–43 are complete as of EOD 2026-04-20):

After those, the larger targets still on the shelf:

Numbers

metric start of session end
matching-decomp units 1 33 (7 more in-flight tonight)
poll-sites covered 4/16 15/16
benchmark directories 5 36+
cumulative bytes of vendor asm lifted ~104 B ~6.0 KB