Continues the 2026-04-21 reloc-splice session. Goal: close the 47/54 reach gap with simulation tooling.
lockstep.py, mmio_diff.py,
check_asm.sh.
divergence — concrete proof the early boot path is source-rebuilt.
reg-alloc noise obscured real signal).
Structural asm-diff gate. Classifies each benchmark dir as
ASM_EXACT / ORDER_DIFF / REG_DIFF / BRANCH_DIFF / STRUCTURAL_DIFF.
Current: 100 tested, 3 EXACT, 18 REG_DIFF (same mnemonics + offsets,
different regs — the useful audit signal), 57 BRANCH_DIFF (inflated by
partial-port stubs), 22 STRUCTURAL_DIFF.
Two Unicorn instances (vendor + rebuilt), step one insn at a time,
diff all x-regs + sp + pc + nzcv. Supports –fn-entry-only to
suppress inside-function noise.
Verdict: works mechanically but fires false alarms on clang register allocation (“vendor puts intermediate in x0, our port puts it in x8”). Even filtered to function entries, caller-saved-but-stale regs trigger. Use for targeted investigation, not as a gate.
Log MMIO writes (addr, size, val, caller_pc) from each run; diff sequences in order. MMIO is the silicon-observable behavior — if the write sequence matches, rebuild is behaviorally equivalent regardless of how clang chose registers.
First divergent write pinpoints the exact bug in one line of output. This replaces reach-bisection + lockstep as the primary validation gate going forward.
*arg* → *(DAT);our port had it reversed. Surfaced by lockstep (first divergence
at step 35, mid-copy).
* **46_FUN_2e88** (MR read helper) — args ''mr_addr''/''byte_index''
were swapped in the port's C signature. Vendor's asm uses w2 for
shift amount (byte_index) and w3<<8 for MRCTRL1 (mr_addr). Our
port had them reversed.
* **17_FUN_2340** (MR submit) — vendor ends ''mov w0, #0; ret''
explicitly returning 0. Our ''void'' port preserved whatever
clang left in x0 (often the ch_base ptr = 0xfe000000). Same class
as ''fn_27e0''. Change to ''int mr_submit(...) { ...; return 0; }''
* **113_FUN_4f8 case-2 sub-0** — Ghidra mis-decompiled the BUS_GRF
register addresses. Vendor writes ''0xFD5F4000'' and ''0xFD5F800C''
for (grp=2, sub=0); our port wrote ''BUS_GRF_BASE_CFG'' (0xFD5F0000)
and ''BUS_GRF_DDR_ROUTE'' (0xFD5F0004). Fixed by hardcoding the
actual vendor addresses.
MMIO-level divergence; reach-count is no longer a discriminating
signal. * **MMIO parity gate** (new primary): writes 1..82 byte-identical, write 83 reveals ''fn_62d8'' variant=1-vs-0 discrepancy (likely caller passing wrong arg). Keeps peeling bug-by-bug. * Vendor total MMIO writes: **1007** in 500k insn budget. Rebuilt total: **253** (stops short because of divergence, not because 253 writes are all).
write surfaces one Ghidra-decompile error or one ABI mismatch.
sequence byte-for-byte.
meitner rkdeveloptool). That's the final ground-truth check.
~/src/rk3588-ddr-decompiled/lockstep.py~/src/rk3588-ddr-decompiled/mmio_diff.py~/projects/AMPere/benchmark/check_asm.sh~/projects/AMPere/benchmark/reloc_bisect.shfeedback_megabitchip_reloc_splicer.md — added section onMMIO-diff as the superior gate, plus the two new bug-class
examples (fn_9a68 direction, fn_62d8 address confusion).
The iteration pace felt like peeling an onion — each bug fixed revealed the next. But that IS the correct shape for matching-decomp with semantic tests: the MMIO sequence is the contract, each mismatch is a localized bug, and the tools converge us toward the vendor spec. Much more principled than register-level lockstep, which is too noisy for compiler-portable C ports.
A fully verified MMIO trace becomes a permanent regression oracle
— useful both for this project and for any future Rockchip DDR
reverse-engineering work. The .mmio-trace file is the real deliverable.