====== MegabitChip — Session 2026-04-21 Part 2 (Simulation & MMIO-Diff) ====== Continues the 2026-04-21 reloc-splice session. Goal: close the 47/54 reach gap with simulation tooling. ===== Result ===== * Three new validation tools built: ''lockstep.py'', ''mmio_diff.py'', ''check_asm.sh''. * Four more bugs found and fixed. * MMIO-write parity: **82 of ~1000 writes match identically** before first divergence — concrete proof the early boot path is source-rebuilt. * Superseded register-level lockstep with MMIO-trace diff (clang reg-alloc noise obscured real signal). ===== Tools added ===== ==== check_asm.sh ==== Structural asm-diff gate. Classifies each benchmark dir as ''ASM_EXACT / ORDER_DIFF / REG_DIFF / BRANCH_DIFF / STRUCTURAL_DIFF''. Current: 100 tested, 3 EXACT, 18 REG_DIFF (same mnemonics + offsets, different regs — the useful audit signal), 57 BRANCH_DIFF (inflated by partial-port stubs), 22 STRUCTURAL_DIFF. ==== lockstep.py ==== Two Unicorn instances (vendor + rebuilt), step one insn at a time, diff all x-regs + sp + pc + nzcv. Supports ''--fn-entry-only'' to suppress inside-function noise. **Verdict:** works mechanically but fires false alarms on clang register allocation ("vendor puts intermediate in x0, our port puts it in x8"). Even filtered to function entries, caller-saved-but-stale regs trigger. Use for targeted investigation, not as a gate. ==== mmio_diff.py ==== Log MMIO writes (addr, size, val, caller_pc) from each run; diff sequences in order. MMIO is the silicon-observable behavior — if the write sequence matches, rebuild is behaviorally equivalent regardless of how clang chose registers. First divergent write pinpoints the exact bug in one line of output. This replaces reach-bisection + lockstep as the primary validation gate going forward. ===== Bugs fixed ===== * **54_FUN_9a68** — dst/src swap. Vendor copies ''*arg*'' → ''*(DAT)''; our port had it reversed. Surfaced by lockstep (first divergence at step 35, mid-copy). * **46_FUN_2e88** (MR read helper) — args ''mr_addr''/''byte_index'' were swapped in the port's C signature. Vendor's asm uses w2 for shift amount (byte_index) and w3<<8 for MRCTRL1 (mr_addr). Our port had them reversed. * **17_FUN_2340** (MR submit) — vendor ends ''mov w0, #0; ret'' explicitly returning 0. Our ''void'' port preserved whatever clang left in x0 (often the ch_base ptr = 0xfe000000). Same class as ''fn_27e0''. Change to ''int mr_submit(...) { ...; return 0; }'' * **113_FUN_4f8 case-2 sub-0** — Ghidra mis-decompiled the BUS_GRF register addresses. Vendor writes ''0xFD5F4000'' and ''0xFD5F800C'' for (grp=2, sub=0); our port wrote ''BUS_GRF_BASE_CFG'' (0xFD5F0000) and ''BUS_GRF_DDR_ROUTE'' (0xFD5F0004). Fixed by hardcoding the actual vendor addresses. ===== Status ===== * **Reach gate**: still 47/54 — the 4 fixes each reveal the next MMIO-level divergence; reach-count is no longer a discriminating signal. * **MMIO parity gate** (new primary): writes 1..82 byte-identical, write 83 reveals ''fn_62d8'' variant=1-vs-0 discrepancy (likely caller passing wrong arg). Keeps peeling bug-by-bug. * Vendor total MMIO writes: **1007** in 500k insn budget. Rebuilt total: **253** (stops short because of divergence, not because 253 writes are all). ===== Next steps ===== * Continue mmio_diff-driven debug from write 83 onward. Each divergent write surfaces one Ghidra-decompile error or one ABI mismatch. * Consider dumping the FULL 1007-write vendor trace and using it as a **spec file**: every future rebuild must reproduce this exact sequence byte-for-byte. * When 1007/1007 match: move to Phase 4 (bare-metal on ampere via meitner rkdeveloptool). That's the final ground-truth check. ===== Files added ===== * ''~/src/rk3588-ddr-decompiled/lockstep.py'' * ''~/src/rk3588-ddr-decompiled/mmio_diff.py'' * ''~/projects/AMPere/benchmark/check_asm.sh'' * ''~/projects/AMPere/benchmark/reloc_bisect.sh'' ===== Memories updated ===== * ''feedback_megabitchip_reloc_splicer.md'' — added section on MMIO-diff as the superior gate, plus the two new bug-class examples (fn_9a68 direction, fn_62d8 address confusion). ===== Observation ===== The iteration pace felt like peeling an onion — each bug fixed revealed the next. But that IS the correct shape for matching-decomp with semantic tests: the MMIO sequence is the contract, each mismatch is a localized bug, and the tools converge us toward the vendor spec. Much more principled than register-level lockstep, which is too noisy for compiler-portable C ports. A fully verified MMIO trace becomes a permanent regression oracle — useful both for this project and for any future Rockchip DDR reverse-engineering work. The ''.mmio-trace'' file is the real deliverable.