Table of Contents
MegabitChip — Session 2026-04-21 (Reloc-Splicer Pass)
Goal: prove source-rebuilt parity with vendor DDR blob by splicing each matching-decomp candidate.o into the vendor binary (with full relocation resolution) and running the reachability trace. Baseline vendor reaches 54 functions at 2M-instruction budget. Rebuild target: match.
Result
- reached 17 → 47 (+30 functions) through two bug-class fixes
- 16 → 0 unresolved linker failures
- Bisection infrastructure in place for the remaining 7-function gap
What was built
reloc_splice.py (~/projects/AMPere/benchmark/)
Reloc-resolving splicer. Steps per candidate.o:
- Links via GNU
ldwith–section-start=.text=<blob_addr> - Resolves every external symbol via
–defsym=NAME=ADDRfrom a
484-entry symbol table.
objcopy -O binary -j .textto extract final bytes.- Splices into the vendor image at the function's blob offset.
- NOP-pads any remainder (candidate smaller than vendor).
Symbol table composition:
- fun_table (338): every
FUN_xxx/fn_xxxfrom the full
Ghidra disassembly.
- port_syms (100): named C exports from each
candidate.ovia
nm –defined-only (e.g. memcpy_aligned, ddr_bus_grf_init,
''emit_all_atags'').
* **port_overrides** (11): semantic-rename helpers that candidates
declare extern but no other port defines
(''fn_recurse''→0x29f4, ''ddr_read_timing_param''→0xde34,
''fn_mr_read_helper''→0x2e88, ''fn_warn_fmt''→0x1053c,
''fn_phy_write''→0x636c, ''fn_inner_train''→0x27e0,
''ddrctl_vendor_commit''→0x186c, ''ddr_phy_training_0000''→0xc3d8,
''fn_apply_cur''→0xf170, ''fn_train_inner''→0x9508, ''fn_f60''→0xf60).
Each cross-referenced by pattern-matching vendor's BL sequence.
* **data_syms** (62): manually mapped data addresses
(''uart_ptr_store'', ''log_head'', ''magic_header_table'', 30 string
literals, etc).
* **mmio_syms** (29): absolute BUS_GRF register addresses.
reloc_bisect.sh / pair_4f8.sh
Hot-path and pairwise bisection harnesses that splice subsets of candidates and run the trace to isolate which candidate(s) regress reachability.
Bug classes found
1. Jump-table rodata silently dropped
Symptom: 113_FUN_4f8 (ddr_bus_grf_init) + 47_FUN_1033c
(UART putchar) spliced together → reached collapses from 54 to 17.
Singly each splices fine.
Cause: clang -O2 lowers the 10-arm switch in fn_4f8 to a jump table in
.rodata. objcopy -j .text drops .rodata. Linked binary has
adrp+add pointing at a missing section → computed br reads junk
bytes from .text → branches to a bogus arm that returns
0xFD890000 (case 0) instead of 0xFEB50000 (case 2). BUS_GRF mux
ends up wrong; UART putchar then polls a non-existent UART and busy-loops
forever.
Fix: add -fno-jump-tables to the canonical compile line. Discovered
by memory-write instrumentation that showed wrong uart_ptr_store value
(0xfd890000 instead of 0xfeb50000).
Canonical compile line (updated):
clang -O2 -ffreestanding -mgeneral-regs-only -fno-pic \
-fno-stack-protector -fno-jump-tables \
--target=aarch64-none-elf -c candidate.c -o candidate.o
Follow-up sweep identified more candidates with .rodata: 43_FUN_2110,
49_FUN_dcc, 94_FUN_217c, 112_FUN_72d8, 113_FUN_4f8.
2. void signature drops vendor's x0 mutation
Symptom: 06_FUN_27e0 (ddrctl_vendor_commit) spliced into a
hot-path build drops reached 54 → 43.
Cause: vendor's fn_27e0 final instructions are
add x0, x0, #0x10000; ret — implicitly returning ch_base+0x10000.
Our candidate was declared void and computed the same address in a
scratch reg (x8), leaving x0 unchanged. Caller relied on the mutated x0
for downstream MMIO access → writes landed in the wrong region → DDR
init path silently died.
Fix: return uint8_t * pointing at ddrctl (= ch_base+0x10000).
This is a class of bugs. Rule: before finalizing any void port, diff vendor's last 2-4 insns before RET against the candidate's; flag any x0/x1 arithmetic vendor does that the port doesn't.
Scoreboard
- Link gate: 79 of 100 candidates splice clean, 21 skipped
(candidate larger than vendor func — mostly -fno-jump-tables
rebuilds pushed a few over). * **Reach gate**: rebuilt blob reaches 47/54 functions (was 17 before the two fixes). Missing 7: 0x174c, 0x1770, 0x29f4, 0x2e88, 0x3268, 0x430c, 0x6d90 — all early DDR-setup helpers. * **Reloc types handled**: R_AARCH64_CALL26, R_AARCH64_JUMP26, R_AARCH64_ADR_PREL_PG_HI21, R_AARCH64_ADD_ABS_LO12_NC, R_AARCH64_LDST32_ABS_LO12_NC.
Next steps
- Finish bisection on the remaining non-hot-path regressions (hot+A
drops to 47, hot+B to 49).
- Sweep all candidates under
-fno-jump-tables— 4 already rebuilt,
~10 more likely have switches.
- Audit void-return ports for vendor-side x0 mutation (heuristic:
vendor's last non-ret insn touches x0).
- Simulation escalation: QEMU + gdb for step-level vendor↔rebuilt
compare at divergence points; likely the only way to find the
remaining subtle bugs (micro-ABI, struct layout, compiler fold differences).
Files added / touched
reloc_splice.py— the splicerreloc_bisect.sh— single-splice hot-path bisector06_FUN_27e0/candidate.c— void → returning pointer113_FUN_4f8/candidate.o— rebuilt with -fno-jump-tables74_FUN_9508 + 90_FUN_c2c— rebuilt with -fno-stack-protector
to drop __stack_chk_fail externs
Memories added
feedback_megabitchip_reloc_splicer.md— captures both bug classes
with why and how to apply fields for future sessions.
