====== MegabitChip — Session 2026-04-21 (Reloc-Splicer Pass) ====== **Goal:** prove source-rebuilt parity with vendor DDR blob by splicing each matching-decomp candidate.o into the vendor binary (with full relocation resolution) and running the reachability trace. Baseline vendor reaches **54** functions at 2M-instruction budget. Rebuild target: match. ===== Result ===== * reached 17 → **47** (+30 functions) through two bug-class fixes * 16 → 0 unresolved linker failures * Bisection infrastructure in place for the remaining 7-function gap ===== What was built ===== ==== reloc_splice.py (~/projects/AMPere/benchmark/) ==== Reloc-resolving splicer. Steps per candidate.o: * Links via GNU ''ld'' with ''--section-start=.text='' * Resolves every external symbol via ''--defsym=NAME=ADDR'' from a 484-entry symbol table. * ''objcopy -O binary -j .text'' to extract final bytes. * Splices into the vendor image at the function's blob offset. * NOP-pads any remainder (candidate smaller than vendor). Symbol table composition: * **fun_table** (338): every ''FUN_xxx'' / ''fn_xxx'' from the full Ghidra disassembly. * **port_syms** (100): named C exports from each ''candidate.o'' via ''nm --defined-only'' (e.g. ''memcpy_aligned'', ''ddr_bus_grf_init'', ''emit_all_atags''). * **port_overrides** (11): semantic-rename helpers that candidates declare extern but no other port defines (''fn_recurse''→0x29f4, ''ddr_read_timing_param''→0xde34, ''fn_mr_read_helper''→0x2e88, ''fn_warn_fmt''→0x1053c, ''fn_phy_write''→0x636c, ''fn_inner_train''→0x27e0, ''ddrctl_vendor_commit''→0x186c, ''ddr_phy_training_0000''→0xc3d8, ''fn_apply_cur''→0xf170, ''fn_train_inner''→0x9508, ''fn_f60''→0xf60). Each cross-referenced by pattern-matching vendor's BL sequence. * **data_syms** (62): manually mapped data addresses (''uart_ptr_store'', ''log_head'', ''magic_header_table'', 30 string literals, etc). * **mmio_syms** (29): absolute BUS_GRF register addresses. ==== reloc_bisect.sh / pair_4f8.sh ==== Hot-path and pairwise bisection harnesses that splice subsets of candidates and run the trace to isolate which candidate(s) regress reachability. ===== Bug classes found ===== ==== 1. Jump-table rodata silently dropped ==== **Symptom:** ''113_FUN_4f8'' (ddr_bus_grf_init) + ''47_FUN_1033c'' (UART putchar) spliced together → reached collapses from 54 to 17. Singly each splices fine. **Cause:** clang -O2 lowers the 10-arm switch in fn_4f8 to a jump table in ''.rodata''. ''objcopy -j .text'' drops .rodata. Linked binary has ''adrp+add'' pointing at a missing section → computed ''br'' reads junk bytes from .text → branches to a bogus arm that returns ''0xFD890000'' (case 0) instead of ''0xFEB50000'' (case 2). BUS_GRF mux ends up wrong; UART putchar then polls a non-existent UART and busy-loops forever. **Fix:** add ''-fno-jump-tables'' to the canonical compile line. Discovered by memory-write instrumentation that showed wrong ''uart_ptr_store'' value (0xfd890000 instead of 0xfeb50000). **Canonical compile line** (updated): clang -O2 -ffreestanding -mgeneral-regs-only -fno-pic \ -fno-stack-protector -fno-jump-tables \ --target=aarch64-none-elf -c candidate.c -o candidate.o Follow-up sweep identified more candidates with ''.rodata'': 43_FUN_2110, 49_FUN_dcc, 94_FUN_217c, 112_FUN_72d8, 113_FUN_4f8. ==== 2. void signature drops vendor's x0 mutation ==== **Symptom:** ''06_FUN_27e0'' (ddrctl_vendor_commit) spliced into a hot-path build drops reached 54 → 43. **Cause:** vendor's fn_27e0 final instructions are ''add x0, x0, #0x10000; ret'' — implicitly returning ''ch_base+0x10000''. Our candidate was declared ''void'' and computed the same address in a scratch reg (x8), leaving x0 unchanged. Caller relied on the mutated x0 for downstream MMIO access → writes landed in the wrong region → DDR init path silently died. **Fix:** return ''uint8_t *'' pointing at ddrctl (= ch_base+0x10000). This is a **class of bugs**. Rule: before finalizing any void port, diff vendor's last 2-4 insns before RET against the candidate's; flag any x0/x1 arithmetic vendor does that the port doesn't. ===== Scoreboard ===== * **Link gate**: 79 of 100 candidates splice clean, 21 skipped (candidate larger than vendor func — mostly ''-fno-jump-tables'' rebuilds pushed a few over). * **Reach gate**: rebuilt blob reaches 47/54 functions (was 17 before the two fixes). Missing 7: 0x174c, 0x1770, 0x29f4, 0x2e88, 0x3268, 0x430c, 0x6d90 — all early DDR-setup helpers. * **Reloc types handled**: R_AARCH64_CALL26, R_AARCH64_JUMP26, R_AARCH64_ADR_PREL_PG_HI21, R_AARCH64_ADD_ABS_LO12_NC, R_AARCH64_LDST32_ABS_LO12_NC. ===== Next steps ===== * Finish bisection on the remaining non-hot-path regressions (hot+A drops to 47, hot+B to 49). * Sweep all candidates under ''-fno-jump-tables'' — 4 already rebuilt, ~10 more likely have switches. * Audit void-return ports for vendor-side x0 mutation (heuristic: vendor's last non-ret insn touches x0). * **Simulation escalation**: QEMU + gdb for step-level vendor↔rebuilt compare at divergence points; likely the only way to find the remaining subtle bugs (micro-ABI, struct layout, compiler fold differences). ===== Files added / touched ===== * ''reloc_splice.py'' — the splicer * ''reloc_bisect.sh'' — single-splice hot-path bisector * ''06_FUN_27e0/candidate.c'' — void → returning pointer * ''113_FUN_4f8/candidate.o'' — rebuilt with -fno-jump-tables * ''74_FUN_9508 + 90_FUN_c2c'' — rebuilt with -fno-stack-protector to drop __stack_chk_fail externs ===== Memories added ===== * ''feedback_megabitchip_reloc_splicer.md'' — captures both bug classes with //why// and //how to apply// fields for future sessions.