====== MegabitChip — Simulation & Verification Stack ======
Reference page for the pre-silicon simulation + audit tooling used
to guard the MegabitChip rebuild. All tools assume the canonical
compile line from [[megabitchip:2026-04-20_dokuwiki|the blitz session]]
and operate on blob images produced by ''reloc_splice.py''.
Locations:
* ''~/src/rk3588-ddr-decompiled/'' on **boltzmann** — simulators and diff tools
* ''~/projects/AMPere/benchmark/'' on **boltzmann** — ports, audits, splicer
===== Simulators =====
==== training_sim.py ====
Unicorn-based DDR training simulator with two modes:
* ''--mode pass'' (default, used by ''mmio_diff'') — every training-status register returns its "done/OK/trained" stub on every read.
* ''--mode bitflip --flip-count N --flip-mask 0xFFFFFFFF'' — the first N reads of each training-status address return an XOR'd (bad) value, then revert to the pass value. Exercises PHY retry / error-recovery paths.
Training-status addresses defined in ''is_training_status()'':
* DDRPHY offsets: ''0x080, 0x090, 0x0B4, 0x3CC, 0x514, 0x684, 0xA24''
* DDRCTL per-channel offsets: ''0x10014, 0x10090, 0x10C84, 0x10514''
Other addresses keep pass values even in bitflip mode, so signal
stays focused on training retry behaviour.
==== sim_tripwire.py ====
Bin-style per-access capture on Unicorn. Per-access record:
(seq_idx, insn_tick, pc, addr, size, rw, val, region_tag, fn_name)
''fn_name'' comes from ''PCResolver'' which bisects the vendor funs
table parsed from ''ddr_conservative_asm.s'' (''// ============
FUN_ @ '' headers, 115 entries). ''BLOB_BASE = 0xFF001000''.
''sim_tripwire.load_csv'' rehydrates a saved capture for offline analysis.
CLI integration:
# training_sim: single blob, one CSV
python3 training_sim.py --mode pass|bitflip \
--tripwire-out /tmp/tw.csv
# mmio_diff: vendor + rebuilt in one run, two CSVs
python3 mmio_diff.py --ignore-pc \
--tripwire-out-vendor /tmp/tw-v.csv \
--tripwire-out-rebuilt /tmp/tw-r.csv
==== tripwire_diff.py ====
PC-bucketed ''difflib.SequenceMatcher'' diff of two tripwire CSVs.
Bucket key: ''(region, addr, rw, val, size)'' — excludes PC
(codegen reg-alloc shifts it within a function), ''seq_idx'' and
''tick'' (drift with path diffs). Bucketing by ''fn_name'' (not
''seq_idx'') lets the diff survive control-flow divergence in
bitflip mode.
Tiers from ''ratio()'':
* **OK** — byte-identical key sequences, suppressed unless ''--show-identical''
* **minor-diff** — ratio ≥ suspect_threshold (default 0.9)
* **SUSPECT** — ratio < threshold, printed first with side-by-side sub-sequence
Fast path: ''quick_ratio()'' (set-intersection upper bound) short-circuits
buckets that share almost nothing.
==== bitflip_sweep.py ====
Per-status-address retry convergence test. Flips ONE training-status
register's first read at a time, checks whether rebuilt retry logic
writes different downstream register values than the pass baseline.
Uses ''BITFLIP_ONLY'' env var to narrow ''is_training_status()'' in
''training_sim.py'' to a single address per run. 23 targets total.
2026-04-21 result on vendor blob LP5-2400: 18 convergent, 3 retry
(STAT CH1/CH2/CH3 → ''fn_2340 MRCTRL0 = 0x60''), 2 not-exercised.
==== mmio_regions.py ====
Shared address → region classifier. ''classify(addr) -> str'' returns
one of: ''DDR_MEM'', ''DDRCTL'', ''DDRCTL:SW'', ''DDRCTL:MR'',
''DDRPHY'', ''DDRPHY:TR'', ''CRU'', ''DDR_CRU'', ''SCRU'', ''GRF'',
''BUS_GRF'', ''SGRF'', ''PMU'', ''FW_DDR'', **''OTP''** (fixed from
"SCRAMBLE" after TRM cross-check on ''fn_9fc''), ''UART'', ''SRAM'',
''PMU_SRAM'', ''OTHER''.
Imported by every trace/diff tool. ''mmio_diff --show-regions'' prints
a histogram of vendor write counts on success; on divergence, the
diverging write and the last 3 context writes all get tagged; on
length mismatch the tail's region histogram prints so you can see
//which subsystem// the rebuild is missing (or adding).
==== mmio_diff.py ====
Primary write-sequence gate. Vendor total MMIO writes: **3173** at
500k insn budget, LP5-2400 happy-path cold boot. Rebuilt total after
the whole campaign: **3173**. Byte-identical.
===== Audits =====
All wired into ''make audit'' in ''~/projects/AMPere/benchmark/''.
==== audit_data_syms.py ====
Scans every ''candidate.c'' for ''DAT_/s_/BLOB_DATA_'' extern
declarations, cross-checks case-insensitively against
''DATA_SYMS | PORT_OVERRIDES | MMIO_SYMS'' in ''reloc_splice.py''.
Flags missing or case-mismatched entries //before the link step//.
Closes bug-class 1 (''ld --unresolved-symbols=ignore-all'' silently
zeroing undefined externs) as a static check.
==== audit_early_return_tail.py ====
Static ARM64 asm scanner looking for ''cond_br → short block with
mov #const → b INTO_TAIL_WITH_STR'' patterns — the shape that
corresponds to "vendor's branch-into-shared-tail" that a naive C
port lowers as an early-return skipping mandatory side-effects.
Flagged 15 STRONG candidates across all ports (2026-04-21 sweep):
* 1 real bug: ''fn_3268'' 0x208 RMW pair skipped on bit-31 path → fixed.
* 1 different-class bug: ''fn_1c14'' vendor writes via ''str wzr'' where the port only reads → fixed.
* 13 false positives.
Signal:noise ~7 %, but the hits are silicon-hostile, so worth running.
==== Triage heuristic ====
Functions with ''returns > 1'' AND ''gotos == 0'' in their C source
are highest risk for class-2 bugs — multiple returns without explicit
''goto'' to a shared tail means the port author likely wrote
independent return paths that diverge from vendor's single shared-tail
asm. ''returns == 1 && gotos == 0'' is typically safe;
''returns >= 2 && gotos > 0'' usually means the port author was
aware of the shared-tail pattern.
===== Splicer =====
==== reloc_splice.py ====
Reloc-resolving splicer — links each ''candidate.o'' via GNU ''ld''
with ''--section-start=.text='', resolves every external
symbol via ''--defsym=NAME=ADDR'' from a ~484-entry symbol table
(''fun_table'', ''port_syms'', ''port_overrides'', ''data_syms'',
''mmio_syms''), ''objcopy -O binary -j .text'' to extract, splices
into the vendor image at the function's blob offset, NOP-pads any
remainder.
**Post-link ADRP-to-NULL guard (added 2026-04-21):** scans each
linked ''.text'' for any ADRP whose resolved page ends at 0x0
(''instruction_page + encoded offset == 0''), emits
''WARN +: adrp xN resolves to page 0x0 — likely unresolved
defsym''. Same-page ADRP (imm=0) resolves to the blob base
''0xff001000'' and is legitimate, so it's not flagged. This closes
bug-class 1 at link time.
==== splicer_skip.txt ====
Explicit port-directory skip list for ''reloc_splice.py''. Named dirs
are removed from the candidate set entirely — their ''candidate.o''
is never spliced, vendor bytes remain at the function's offset,
''mmio_diff'' is unaffected because vendor bytes run vendor behaviour.
Use for ports that are deliberately incomplete — where our stub would
replace vendor work with a ''ret'', causing silicon-boot divergence
when silicon hits a code path our emulator doesn't.
Current skip list: ''154_FUN_de40'' (parked behind internal task #198,
1-bit ''tp[0x4f]'' divergence under investigation).
To finish a skipped port: work from ''func.s'' (vendor disassembly),
write a clean hand-port, compile. If it fits under vendor byte budget,
remove the skip entry. If it's over budget, ''skip-larger'' takes
over naturally — remove the entry and skip-larger does the same job
cleaner.
===== Why this tooling matters =====
* ''mmio_diff'' is blind to MMIO //reads// and to DDR_MEM / SRAM memory accesses. ''sim_tripwire + tripwire_diff'' surface read-side divergence with per-function attribution.
* ''training_sim'' bitflip + ''bitflip_sweep'' exercise PHY retry / error-recovery paths that a happy-path trace never enters.
* ''audit_data_syms'' closes the bug-class that ''ld --unresolved-symbols=ignore-all'' keeps opening every time a port lands new externs.
* ''audit_early_return_tail'' statically screens for the "C port skips vendor shared-tail" shape that only matters on control-flow paths the emulator doesn't exercise.
* ''mmio_regions'' makes every tool's output scannable. Saves minutes of address-range lookup per divergence investigation.