Table of Contents
MegabitChip — Session 2026-04-21 Part 2 (Simulation & MMIO-Diff)
Continues the 2026-04-21 reloc-splice session. Goal: close the 47/54 reach gap with simulation tooling.
Result
- Three new validation tools built:
lockstep.py,mmio_diff.py,
check_asm.sh.
- Four more bugs found and fixed.
- MMIO-write parity: 82 of ~1000 writes match identically before first
divergence — concrete proof the early boot path is source-rebuilt.
- Superseded register-level lockstep with MMIO-trace diff (clang
reg-alloc noise obscured real signal).
Tools added
check_asm.sh
Structural asm-diff gate. Classifies each benchmark dir as
ASM_EXACT / ORDER_DIFF / REG_DIFF / BRANCH_DIFF / STRUCTURAL_DIFF.
Current: 100 tested, 3 EXACT, 18 REG_DIFF (same mnemonics + offsets,
different regs — the useful audit signal), 57 BRANCH_DIFF (inflated by
partial-port stubs), 22 STRUCTURAL_DIFF.
lockstep.py
Two Unicorn instances (vendor + rebuilt), step one insn at a time,
diff all x-regs + sp + pc + nzcv. Supports –fn-entry-only to
suppress inside-function noise.
Verdict: works mechanically but fires false alarms on clang register allocation (“vendor puts intermediate in x0, our port puts it in x8”). Even filtered to function entries, caller-saved-but-stale regs trigger. Use for targeted investigation, not as a gate.
mmio_diff.py
Log MMIO writes (addr, size, val, caller_pc) from each run; diff sequences in order. MMIO is the silicon-observable behavior — if the write sequence matches, rebuild is behaviorally equivalent regardless of how clang chose registers.
First divergent write pinpoints the exact bug in one line of output. This replaces reach-bisection + lockstep as the primary validation gate going forward.
Bugs fixed
- 54_FUN_9a68 — dst/src swap. Vendor copies
*arg*→*(DAT);
our port had it reversed. Surfaced by lockstep (first divergence
at step 35, mid-copy).
* **46_FUN_2e88** (MR read helper) — args ''mr_addr''/''byte_index''
were swapped in the port's C signature. Vendor's asm uses w2 for
shift amount (byte_index) and w3<<8 for MRCTRL1 (mr_addr). Our
port had them reversed.
* **17_FUN_2340** (MR submit) — vendor ends ''mov w0, #0; ret''
explicitly returning 0. Our ''void'' port preserved whatever
clang left in x0 (often the ch_base ptr = 0xfe000000). Same class
as ''fn_27e0''. Change to ''int mr_submit(...) { ...; return 0; }''
* **113_FUN_4f8 case-2 sub-0** — Ghidra mis-decompiled the BUS_GRF
register addresses. Vendor writes ''0xFD5F4000'' and ''0xFD5F800C''
for (grp=2, sub=0); our port wrote ''BUS_GRF_BASE_CFG'' (0xFD5F0000)
and ''BUS_GRF_DDR_ROUTE'' (0xFD5F0004). Fixed by hardcoding the
actual vendor addresses.
Status
- Reach gate: still 47/54 — the 4 fixes each reveal the next
MMIO-level divergence; reach-count is no longer a discriminating
signal. * **MMIO parity gate** (new primary): writes 1..82 byte-identical, write 83 reveals ''fn_62d8'' variant=1-vs-0 discrepancy (likely caller passing wrong arg). Keeps peeling bug-by-bug. * Vendor total MMIO writes: **1007** in 500k insn budget. Rebuilt total: **253** (stops short because of divergence, not because 253 writes are all).
Next steps
- Continue mmio_diff-driven debug from write 83 onward. Each divergent
write surfaces one Ghidra-decompile error or one ABI mismatch.
- Consider dumping the FULL 1007-write vendor trace and using it as a
- *spec file**: every future rebuild must reproduce this exact
sequence byte-for-byte.
- When 1007/1007 match: move to Phase 4 (bare-metal on ampere via
meitner rkdeveloptool). That's the final ground-truth check.
Files added
~/src/rk3588-ddr-decompiled/lockstep.py~/src/rk3588-ddr-decompiled/mmio_diff.py~/projects/AMPere/benchmark/check_asm.sh~/projects/AMPere/benchmark/reloc_bisect.sh
Memories updated
feedback_megabitchip_reloc_splicer.md— added section on
MMIO-diff as the superior gate, plus the two new bug-class
examples (fn_9a68 direction, fn_62d8 address confusion).
Observation
The iteration pace felt like peeling an onion — each bug fixed revealed the next. But that IS the correct shape for matching-decomp with semantic tests: the MMIO sequence is the contract, each mismatch is a localized bug, and the tools converge us toward the vendor spec. Much more principled than register-level lockstep, which is too noisy for compiler-portable C ports.
A fully verified MMIO trace becomes a permanent regression oracle
— useful both for this project and for any future Rockchip DDR
reverse-engineering work. The .mmio-trace file is the real deliverable.
