Reference page for the pre-silicon simulation + audit tooling used
to guard the MegabitChip rebuild. All tools assume the canonical
compile line from the blitz session
and operate on blob images produced by reloc_splice.py.
Locations:
~/src/rk3588-ddr-decompiled/ on boltzmann — simulators and diff tools~/projects/AMPere/benchmark/ on boltzmann — ports, audits, splicerUnicorn-based DDR training simulator with two modes:
–mode pass (default, used by mmio_diff) — every training-status register returns its “done/OK/trained” stub on every read.–mode bitflip –flip-count N –flip-mask 0xFFFFFFFF — the first N reads of each training-status address return an XOR'd (bad) value, then revert to the pass value. Exercises PHY retry / error-recovery paths.
Training-status addresses defined in is_training_status():
0x080, 0x090, 0x0B4, 0x3CC, 0x514, 0x684, 0xA240x10014, 0x10090, 0x10C84, 0x10514Other addresses keep pass values even in bitflip mode, so signal stays focused on training retry behaviour.
Bin-style per-access capture on Unicorn. Per-access record:
(seq_idx, insn_tick, pc, addr, size, rw, val, region_tag, fn_name)
fn_name comes from PCResolver which bisects the vendor funs
table parsed from ddr_conservative_asm.s ( ============
FUN_<hex> @ <off> headers, 115 entries). BLOB_BASE = 0xFF001000.
sim_tripwire.load_csv rehydrates a saved capture for offline analysis.
CLI integration:
<code bash>
# training_sim: single blob, one CSV
python3 training_sim.py <blob> –mode pass|bitflip \
–tripwire-out /tmp/tw.csv
# mmio_diff: vendor + rebuilt in one run, two CSVs
python3 mmio_diff.py –ignore-pc <vendor.bin> <rebuilt.bin> \
–tripwire-out-vendor /tmp/tw-v.csv \
–tripwire-out-rebuilt /tmp/tw-r.csv
</code>
==== tripwire_diff.py ====
PC-bucketed difflib.SequenceMatcher diff of two tripwire CSVs.
Bucket key: (region, addr, rw, val, size) — excludes PC
(codegen reg-alloc shifts it within a function), seq_idx and
tick (drift with path diffs). Bucketing by fn_name (not
seq_idx) lets the diff survive control-flow divergence in
bitflip mode.
Tiers from ratio():
* OK — byte-identical key sequences, suppressed unless –show-identical
* minor-diff — ratio ≥ suspect_threshold (default 0.9)
* SUSPECT — ratio < threshold, printed first with side-by-side sub-sequence
Fast path: quick_ratio() (set-intersection upper bound) short-circuits
buckets that share almost nothing.
==== bitflip_sweep.py ====
Per-status-address retry convergence test. Flips ONE training-status
register's first read at a time, checks whether rebuilt retry logic
writes different downstream register values than the pass baseline.
Uses BITFLIP_ONLY env var to narrow is_training_status() in
training_sim.py to a single address per run. 23 targets total.
2026-04-21 result on vendor blob LP5-2400: 18 convergent, 3 retry
(STAT CH1/CH2/CH3 → fn_2340 MRCTRL0 = 0x60), 2 not-exercised.
==== mmio_regions.py ====
Shared address → region classifier. classify(addr) → str returns
one of: DDR_MEM, DDRCTL, DDRCTL:SW, DDRCTL:MR,
DDRPHY, DDRPHY:TR, CRU, DDR_CRU, SCRU, GRF,
BUS_GRF, SGRF, PMU, FW_DDR, fn_9fcOTP (fixed from
“SCRAMBLE” after TRM cross-check on ), UART, SRAM,
PMU_SRAM, OTHER.
Imported by every trace/diff tool. mmio_diff –show-regions prints
a histogram of vendor write counts on success; on divergence, the
diverging write and the last 3 context writes all get tagged; on
length mismatch the tail's region histogram prints so you can see
which subsystem the rebuild is missing (or adding).
==== mmio_diff.py ====
Primary write-sequence gate. Vendor total MMIO writes: 3173 at
500k insn budget, LP5-2400 happy-path cold boot. Rebuilt total after
the whole campaign: 3173. Byte-identical.
===== Audits =====
All wired into make audit in ~/projects/AMPere/benchmark/.
==== audit_data_syms.py ====
Scans every candidate.c for DAT_/s_/BLOB_DATA_ extern
declarations, cross-checks case-insensitively against
DATA_SYMS | PORT_OVERRIDES | MMIO_SYMS in reloc_splice.py.
Flags missing or case-mismatched entries before the link step.
Closes bug-class 1 (ld –unresolved-symbols=ignore-all silently
zeroing undefined externs) as a static check.
==== audit_early_return_tail.py ====
Static ARM64 asm scanner looking for cond_br → short block with
mov #const → b INTO_TAIL_WITH_STR patterns — the shape that
corresponds to “vendor's branch-into-shared-tail” that a naive C
port lowers as an early-return skipping mandatory side-effects.
Flagged 15 STRONG candidates across all ports (2026-04-21 sweep):
* 1 real bug: fn_3268 0x208 RMW pair skipped on bit-31 path → fixed.
* 1 different-class bug: fn_1c14 vendor writes via str wzr where the port only reads → fixed.
* 13 false positives.
Signal:noise ~7 %, but the hits are silicon-hostile, so worth running.
==== Triage heuristic ====
Functions with returns > 1 AND gotos == 0 in their C source
are highest risk for class-2 bugs — multiple returns without explicit
goto to a shared tail means the port author likely wrote
independent return paths that diverge from vendor's single shared-tail
asm. returns == 1 && gotos == 0 is typically safe;
returns >= 2 && gotos > 0 usually means the port author was
aware of the shared-tail pattern.
===== Splicer =====
==== reloc_splice.py ====
Reloc-resolving splicer — links each candidate.o via GNU ld
with –section-start=.text=<blob_addr>, resolves every external
symbol via –defsym=NAME=ADDR from a ~484-entry symbol table
(fun_table, port_syms, port_overrides, data_syms,
mmio_syms), objcopy -O binary -j .text to extract, splices
into the vendor image at the function's blob offset, NOP-pads any
remainder.
Post-link ADRP-to-NULL guard (added 2026-04-21): scans each
linked .text for any ADRP whose resolved page ends at 0x0
(instruction_page + encoded offset == 0), emits
WARN <port>+<off>: adrp xN resolves to page 0x0 — likely unresolved
defsym. Same-page ADRP (imm=0) resolves to the blob base
0xff001000 and is legitimate, so it's not flagged. This closes
bug-class 1 at link time.
==== splicer_skip.txt ====
Explicit port-directory skip list for reloc_splice.py. Named dirs
are removed from the candidate set entirely — their candidate.o
is never spliced, vendor bytes remain at the function's offset,
mmio_diff is unaffected because vendor bytes run vendor behaviour.
Use for ports that are deliberately incomplete — where our stub would
replace vendor work with a ret, causing silicon-boot divergence
when silicon hits a code path our emulator doesn't.
Current skip list: 154_FUN_de40 (parked behind internal task #198,
1-bit tp[0x4f] divergence under investigation).
To finish a skipped port: work from func.s (vendor disassembly),
write a clean hand-port, compile. If it fits under vendor byte budget,
remove the skip entry. If it's over budget, skip-larger takes
over naturally — remove the entry and skip-larger does the same job
cleaner.
===== Why this tooling matters =====
* mmio_diff is blind to MMIO reads and to DDR_MEM / SRAM memory accesses. sim_tripwire + tripwire_diff surface read-side divergence with per-function attribution.
* training_sim bitflip + bitflip_sweep exercise PHY retry / error-recovery paths that a happy-path trace never enters.
* audit_data_syms closes the bug-class that ld –unresolved-symbols=ignore-all keeps opening every time a port lands new externs.
* audit_early_return_tail statically screens for the “C port skips vendor shared-tail” shape that only matters on control-flow paths the emulator doesn't exercise.
* mmio_regions'' makes every tool's output scannable. Saves minutes of address-range lookup per divergence investigation.