====== MegabitChip — Simulation & Verification Stack ====== Reference page for the pre-silicon simulation + audit tooling used to guard the MegabitChip rebuild. All tools assume the canonical compile line from [[megabitchip:2026-04-20_dokuwiki|the blitz session]] and operate on blob images produced by ''reloc_splice.py''. Locations: * ''~/src/rk3588-ddr-decompiled/'' on **boltzmann** — simulators and diff tools * ''~/projects/AMPere/benchmark/'' on **boltzmann** — ports, audits, splicer ===== Simulators ===== ==== training_sim.py ==== Unicorn-based DDR training simulator with two modes: * ''--mode pass'' (default, used by ''mmio_diff'') — every training-status register returns its "done/OK/trained" stub on every read. * ''--mode bitflip --flip-count N --flip-mask 0xFFFFFFFF'' — the first N reads of each training-status address return an XOR'd (bad) value, then revert to the pass value. Exercises PHY retry / error-recovery paths. Training-status addresses defined in ''is_training_status()'': * DDRPHY offsets: ''0x080, 0x090, 0x0B4, 0x3CC, 0x514, 0x684, 0xA24'' * DDRCTL per-channel offsets: ''0x10014, 0x10090, 0x10C84, 0x10514'' Other addresses keep pass values even in bitflip mode, so signal stays focused on training retry behaviour. ==== sim_tripwire.py ==== Bin-style per-access capture on Unicorn. Per-access record: (seq_idx, insn_tick, pc, addr, size, rw, val, region_tag, fn_name) ''fn_name'' comes from ''PCResolver'' which bisects the vendor funs table parsed from ''ddr_conservative_asm.s'' (''// ============ FUN_ @ '' headers, 115 entries). ''BLOB_BASE = 0xFF001000''. ''sim_tripwire.load_csv'' rehydrates a saved capture for offline analysis. CLI integration: # training_sim: single blob, one CSV python3 training_sim.py --mode pass|bitflip \ --tripwire-out /tmp/tw.csv # mmio_diff: vendor + rebuilt in one run, two CSVs python3 mmio_diff.py --ignore-pc \ --tripwire-out-vendor /tmp/tw-v.csv \ --tripwire-out-rebuilt /tmp/tw-r.csv ==== tripwire_diff.py ==== PC-bucketed ''difflib.SequenceMatcher'' diff of two tripwire CSVs. Bucket key: ''(region, addr, rw, val, size)'' — excludes PC (codegen reg-alloc shifts it within a function), ''seq_idx'' and ''tick'' (drift with path diffs). Bucketing by ''fn_name'' (not ''seq_idx'') lets the diff survive control-flow divergence in bitflip mode. Tiers from ''ratio()'': * **OK** — byte-identical key sequences, suppressed unless ''--show-identical'' * **minor-diff** — ratio ≥ suspect_threshold (default 0.9) * **SUSPECT** — ratio < threshold, printed first with side-by-side sub-sequence Fast path: ''quick_ratio()'' (set-intersection upper bound) short-circuits buckets that share almost nothing. ==== bitflip_sweep.py ==== Per-status-address retry convergence test. Flips ONE training-status register's first read at a time, checks whether rebuilt retry logic writes different downstream register values than the pass baseline. Uses ''BITFLIP_ONLY'' env var to narrow ''is_training_status()'' in ''training_sim.py'' to a single address per run. 23 targets total. 2026-04-21 result on vendor blob LP5-2400: 18 convergent, 3 retry (STAT CH1/CH2/CH3 → ''fn_2340 MRCTRL0 = 0x60''), 2 not-exercised. ==== mmio_regions.py ==== Shared address → region classifier. ''classify(addr) -> str'' returns one of: ''DDR_MEM'', ''DDRCTL'', ''DDRCTL:SW'', ''DDRCTL:MR'', ''DDRPHY'', ''DDRPHY:TR'', ''CRU'', ''DDR_CRU'', ''SCRU'', ''GRF'', ''BUS_GRF'', ''SGRF'', ''PMU'', ''FW_DDR'', **''OTP''** (fixed from "SCRAMBLE" after TRM cross-check on ''fn_9fc''), ''UART'', ''SRAM'', ''PMU_SRAM'', ''OTHER''. Imported by every trace/diff tool. ''mmio_diff --show-regions'' prints a histogram of vendor write counts on success; on divergence, the diverging write and the last 3 context writes all get tagged; on length mismatch the tail's region histogram prints so you can see //which subsystem// the rebuild is missing (or adding). ==== mmio_diff.py ==== Primary write-sequence gate. Vendor total MMIO writes: **3173** at 500k insn budget, LP5-2400 happy-path cold boot. Rebuilt total after the whole campaign: **3173**. Byte-identical. ===== Audits ===== All wired into ''make audit'' in ''~/projects/AMPere/benchmark/''. ==== audit_data_syms.py ==== Scans every ''candidate.c'' for ''DAT_/s_/BLOB_DATA_'' extern declarations, cross-checks case-insensitively against ''DATA_SYMS | PORT_OVERRIDES | MMIO_SYMS'' in ''reloc_splice.py''. Flags missing or case-mismatched entries //before the link step//. Closes bug-class 1 (''ld --unresolved-symbols=ignore-all'' silently zeroing undefined externs) as a static check. ==== audit_early_return_tail.py ==== Static ARM64 asm scanner looking for ''cond_br → short block with mov #const → b INTO_TAIL_WITH_STR'' patterns — the shape that corresponds to "vendor's branch-into-shared-tail" that a naive C port lowers as an early-return skipping mandatory side-effects. Flagged 15 STRONG candidates across all ports (2026-04-21 sweep): * 1 real bug: ''fn_3268'' 0x208 RMW pair skipped on bit-31 path → fixed. * 1 different-class bug: ''fn_1c14'' vendor writes via ''str wzr'' where the port only reads → fixed. * 13 false positives. Signal:noise ~7 %, but the hits are silicon-hostile, so worth running. ==== Triage heuristic ==== Functions with ''returns > 1'' AND ''gotos == 0'' in their C source are highest risk for class-2 bugs — multiple returns without explicit ''goto'' to a shared tail means the port author likely wrote independent return paths that diverge from vendor's single shared-tail asm. ''returns == 1 && gotos == 0'' is typically safe; ''returns >= 2 && gotos > 0'' usually means the port author was aware of the shared-tail pattern. ===== Splicer ===== ==== reloc_splice.py ==== Reloc-resolving splicer — links each ''candidate.o'' via GNU ''ld'' with ''--section-start=.text='', resolves every external symbol via ''--defsym=NAME=ADDR'' from a ~484-entry symbol table (''fun_table'', ''port_syms'', ''port_overrides'', ''data_syms'', ''mmio_syms''), ''objcopy -O binary -j .text'' to extract, splices into the vendor image at the function's blob offset, NOP-pads any remainder. **Post-link ADRP-to-NULL guard (added 2026-04-21):** scans each linked ''.text'' for any ADRP whose resolved page ends at 0x0 (''instruction_page + encoded offset == 0''), emits ''WARN +: adrp xN resolves to page 0x0 — likely unresolved defsym''. Same-page ADRP (imm=0) resolves to the blob base ''0xff001000'' and is legitimate, so it's not flagged. This closes bug-class 1 at link time. ==== splicer_skip.txt ==== Explicit port-directory skip list for ''reloc_splice.py''. Named dirs are removed from the candidate set entirely — their ''candidate.o'' is never spliced, vendor bytes remain at the function's offset, ''mmio_diff'' is unaffected because vendor bytes run vendor behaviour. Use for ports that are deliberately incomplete — where our stub would replace vendor work with a ''ret'', causing silicon-boot divergence when silicon hits a code path our emulator doesn't. Current skip list: ''154_FUN_de40'' (parked behind internal task #198, 1-bit ''tp[0x4f]'' divergence under investigation). To finish a skipped port: work from ''func.s'' (vendor disassembly), write a clean hand-port, compile. If it fits under vendor byte budget, remove the skip entry. If it's over budget, ''skip-larger'' takes over naturally — remove the entry and skip-larger does the same job cleaner. ===== Why this tooling matters ===== * ''mmio_diff'' is blind to MMIO //reads// and to DDR_MEM / SRAM memory accesses. ''sim_tripwire + tripwire_diff'' surface read-side divergence with per-function attribution. * ''training_sim'' bitflip + ''bitflip_sweep'' exercise PHY retry / error-recovery paths that a happy-path trace never enters. * ''audit_data_syms'' closes the bug-class that ''ld --unresolved-symbols=ignore-all'' keeps opening every time a port lands new externs. * ''audit_early_return_tail'' statically screens for the "C port skips vendor shared-tail" shape that only matters on control-flow paths the emulator doesn't exercise. * ''mmio_regions'' makes every tool's output scannable. Saves minutes of address-range lookup per divergence investigation.