Table of Contents

MegabitChip — Session 2026-04-21 (Reloc-Splicer Pass)

Goal: prove source-rebuilt parity with vendor DDR blob by splicing each matching-decomp candidate.o into the vendor binary (with full relocation resolution) and running the reachability trace. Baseline vendor reaches 54 functions at 2M-instruction budget. Rebuild target: match.

Result

What was built

reloc_splice.py (~/projects/AMPere/benchmark/)

Reloc-resolving splicer. Steps per candidate.o:

484-entry symbol table.

Symbol table composition:

Ghidra disassembly.

nm –defined-only (e.g. memcpy_aligned, ddr_bus_grf_init,

  ''emit_all_atags'').
* **port_overrides** (11): semantic-rename helpers that candidates
  declare extern but no other port defines
  (''fn_recurse''→0x29f4, ''ddr_read_timing_param''→0xde34,
  ''fn_mr_read_helper''→0x2e88, ''fn_warn_fmt''→0x1053c,
  ''fn_phy_write''→0x636c, ''fn_inner_train''→0x27e0,
  ''ddrctl_vendor_commit''→0x186c, ''ddr_phy_training_0000''→0xc3d8,
  ''fn_apply_cur''→0xf170, ''fn_train_inner''→0x9508, ''fn_f60''→0xf60).
  Each cross-referenced by pattern-matching vendor's BL sequence.
* **data_syms** (62): manually mapped data addresses
  (''uart_ptr_store'', ''log_head'', ''magic_header_table'', 30 string
  literals, etc).
* **mmio_syms** (29): absolute BUS_GRF register addresses.

reloc_bisect.sh / pair_4f8.sh

Hot-path and pairwise bisection harnesses that splice subsets of candidates and run the trace to isolate which candidate(s) regress reachability.

Bug classes found

1. Jump-table rodata silently dropped

Symptom: 113_FUN_4f8 (ddr_bus_grf_init) + 47_FUN_1033c (UART putchar) spliced together → reached collapses from 54 to 17. Singly each splices fine.

Cause: clang -O2 lowers the 10-arm switch in fn_4f8 to a jump table in .rodata. objcopy -j .text drops .rodata. Linked binary has adrp+add pointing at a missing section → computed br reads junk bytes from .text → branches to a bogus arm that returns 0xFD890000 (case 0) instead of 0xFEB50000 (case 2). BUS_GRF mux ends up wrong; UART putchar then polls a non-existent UART and busy-loops forever.

Fix: add -fno-jump-tables to the canonical compile line. Discovered by memory-write instrumentation that showed wrong uart_ptr_store value (0xfd890000 instead of 0xfeb50000).

Canonical compile line (updated):

clang -O2 -ffreestanding -mgeneral-regs-only -fno-pic \
      -fno-stack-protector -fno-jump-tables \
      --target=aarch64-none-elf -c candidate.c -o candidate.o

Follow-up sweep identified more candidates with .rodata: 43_FUN_2110, 49_FUN_dcc, 94_FUN_217c, 112_FUN_72d8, 113_FUN_4f8.

2. void signature drops vendor's x0 mutation

Symptom: 06_FUN_27e0 (ddrctl_vendor_commit) spliced into a hot-path build drops reached 54 → 43.

Cause: vendor's fn_27e0 final instructions are add x0, x0, #0x10000; ret — implicitly returning ch_base+0x10000. Our candidate was declared void and computed the same address in a scratch reg (x8), leaving x0 unchanged. Caller relied on the mutated x0 for downstream MMIO access → writes landed in the wrong region → DDR init path silently died.

Fix: return uint8_t * pointing at ddrctl (= ch_base+0x10000).

This is a class of bugs. Rule: before finalizing any void port, diff vendor's last 2-4 insns before RET against the candidate's; flag any x0/x1 arithmetic vendor does that the port doesn't.

Scoreboard

(candidate larger than vendor func — mostly -fno-jump-tables

  rebuilds pushed a few over).
* **Reach gate**: rebuilt blob reaches 47/54 functions (was 17 before
  the two fixes). Missing 7: 0x174c, 0x1770, 0x29f4, 0x2e88, 0x3268,
  0x430c, 0x6d90 — all early DDR-setup helpers.
* **Reloc types handled**: R_AARCH64_CALL26, R_AARCH64_JUMP26,
  R_AARCH64_ADR_PREL_PG_HI21, R_AARCH64_ADD_ABS_LO12_NC,
  R_AARCH64_LDST32_ABS_LO12_NC.

Next steps

drops to 47, hot+B to 49).

~10 more likely have switches.

vendor's last non-ret insn touches x0).

compare at divergence points; likely the only way to find the

  remaining subtle bugs (micro-ABI, struct layout, compiler fold
  differences).

Files added / touched

to drop __stack_chk_fail externs

Memories added

with why and how to apply fields for future sessions.