User Tools

Site Tools


megabitchip:simulation_stack

MegabitChip — Simulation & Verification Stack

Reference page for the pre-silicon simulation + audit tooling used to guard the MegabitChip rebuild. All tools assume the canonical compile line from the blitz session and operate on blob images produced by reloc_splice.py.

Locations:

  • ~/src/rk3588-ddr-decompiled/ on boltzmann — simulators and diff tools
  • ~/projects/AMPere/benchmark/ on boltzmann — ports, audits, splicer

Simulators

training_sim.py

Unicorn-based DDR training simulator with two modes:

  • –mode pass (default, used by mmio_diff) — every training-status register returns its “done/OK/trained” stub on every read.
  • –mode bitflip –flip-count N –flip-mask 0xFFFFFFFF — the first N reads of each training-status address return an XOR'd (bad) value, then revert to the pass value. Exercises PHY retry / error-recovery paths.

Training-status addresses defined in is_training_status():

  • DDRPHY offsets: 0x080, 0x090, 0x0B4, 0x3CC, 0x514, 0x684, 0xA24
  • DDRCTL per-channel offsets: 0x10014, 0x10090, 0x10C84, 0x10514

Other addresses keep pass values even in bitflip mode, so signal stays focused on training retry behaviour.

sim_tripwire.py

Bin-style per-access capture on Unicorn. Per-access record:

(seq_idx, insn_tick, pc, addr, size, rw, val, region_tag, fn_name)

fn_name comes from PCResolver which bisects the vendor funs table parsed from ddr_conservative_asm.s ( ============ FUN_<hex> @ <off> headers, 115 entries). BLOB_BASE = 0xFF001000. sim_tripwire.load_csv rehydrates a saved capture for offline analysis. CLI integration: <code bash> # training_sim: single blob, one CSV python3 training_sim.py <blob> –mode pass|bitflip \ –tripwire-out /tmp/tw.csv # mmio_diff: vendor + rebuilt in one run, two CSVs python3 mmio_diff.py –ignore-pc <vendor.bin> <rebuilt.bin> \ –tripwire-out-vendor /tmp/tw-v.csv \ –tripwire-out-rebuilt /tmp/tw-r.csv </code> ==== tripwire_diff.py ==== PC-bucketed difflib.SequenceMatcher diff of two tripwire CSVs. Bucket key: (region, addr, rw, val, size) — excludes PC (codegen reg-alloc shifts it within a function), seq_idx and tick (drift with path diffs). Bucketing by fn_name (not seq_idx) lets the diff survive control-flow divergence in bitflip mode. Tiers from ratio(): * OK — byte-identical key sequences, suppressed unless –show-identical * minor-diff — ratio ≥ suspect_threshold (default 0.9) * SUSPECT — ratio < threshold, printed first with side-by-side sub-sequence Fast path: quick_ratio() (set-intersection upper bound) short-circuits buckets that share almost nothing. ==== bitflip_sweep.py ==== Per-status-address retry convergence test. Flips ONE training-status register's first read at a time, checks whether rebuilt retry logic writes different downstream register values than the pass baseline. Uses BITFLIP_ONLY env var to narrow is_training_status() in training_sim.py to a single address per run. 23 targets total. 2026-04-21 result on vendor blob LP5-2400: 18 convergent, 3 retry (STAT CH1/CH2/CH3 → fn_2340 MRCTRL0 = 0x60), 2 not-exercised. ==== mmio_regions.py ==== Shared address → region classifier. classify(addr) → str returns one of: DDR_MEM, DDRCTL, DDRCTL:SW, DDRCTL:MR, DDRPHY, DDRPHY:TR, CRU, DDR_CRU, SCRU, GRF, BUS_GRF, SGRF, PMU, FW_DDR, OTP (fixed from “SCRAMBLE” after TRM cross-check on fn_9fc), UART, SRAM, PMU_SRAM, OTHER. Imported by every trace/diff tool. mmio_diff –show-regions prints a histogram of vendor write counts on success; on divergence, the diverging write and the last 3 context writes all get tagged; on length mismatch the tail's region histogram prints so you can see which subsystem the rebuild is missing (or adding). ==== mmio_diff.py ==== Primary write-sequence gate. Vendor total MMIO writes: 3173 at 500k insn budget, LP5-2400 happy-path cold boot. Rebuilt total after the whole campaign: 3173. Byte-identical. ===== Audits ===== All wired into make audit in ~/projects/AMPere/benchmark/. ==== audit_data_syms.py ==== Scans every candidate.c for DAT_/s_/BLOB_DATA_ extern declarations, cross-checks case-insensitively against DATA_SYMS | PORT_OVERRIDES | MMIO_SYMS in reloc_splice.py. Flags missing or case-mismatched entries before the link step. Closes bug-class 1 (ld –unresolved-symbols=ignore-all silently zeroing undefined externs) as a static check. ==== audit_early_return_tail.py ==== Static ARM64 asm scanner looking for cond_br → short block with mov #const → b INTO_TAIL_WITH_STR patterns — the shape that corresponds to “vendor's branch-into-shared-tail” that a naive C port lowers as an early-return skipping mandatory side-effects. Flagged 15 STRONG candidates across all ports (2026-04-21 sweep): * 1 real bug: fn_3268 0x208 RMW pair skipped on bit-31 path → fixed. * 1 different-class bug: fn_1c14 vendor writes via str wzr where the port only reads → fixed. * 13 false positives. Signal:noise ~7 %, but the hits are silicon-hostile, so worth running. ==== Triage heuristic ==== Functions with returns > 1 AND gotos == 0 in their C source are highest risk for class-2 bugs — multiple returns without explicit goto to a shared tail means the port author likely wrote independent return paths that diverge from vendor's single shared-tail asm. returns == 1 && gotos == 0 is typically safe; returns >= 2 && gotos > 0 usually means the port author was aware of the shared-tail pattern. ===== Splicer ===== ==== reloc_splice.py ==== Reloc-resolving splicer — links each candidate.o via GNU ld with –section-start=.text=<blob_addr>, resolves every external symbol via –defsym=NAME=ADDR from a ~484-entry symbol table (fun_table, port_syms, port_overrides, data_syms, mmio_syms), objcopy -O binary -j .text to extract, splices into the vendor image at the function's blob offset, NOP-pads any remainder. Post-link ADRP-to-NULL guard (added 2026-04-21): scans each linked .text for any ADRP whose resolved page ends at 0x0 (instruction_page + encoded offset == 0), emits WARN <port>+<off>: adrp xN resolves to page 0x0 — likely unresolved defsym. Same-page ADRP (imm=0) resolves to the blob base 0xff001000 and is legitimate, so it's not flagged. This closes bug-class 1 at link time. ==== splicer_skip.txt ==== Explicit port-directory skip list for reloc_splice.py. Named dirs are removed from the candidate set entirely — their candidate.o is never spliced, vendor bytes remain at the function's offset, mmio_diff is unaffected because vendor bytes run vendor behaviour. Use for ports that are deliberately incomplete — where our stub would replace vendor work with a ret, causing silicon-boot divergence when silicon hits a code path our emulator doesn't. Current skip list: 154_FUN_de40 (parked behind internal task #198, 1-bit tp[0x4f] divergence under investigation). To finish a skipped port: work from func.s (vendor disassembly), write a clean hand-port, compile. If it fits under vendor byte budget, remove the skip entry. If it's over budget, skip-larger takes over naturally — remove the entry and skip-larger does the same job cleaner. ===== Why this tooling matters ===== * mmio_diff is blind to MMIO reads and to DDR_MEM / SRAM memory accesses. sim_tripwire + tripwire_diff surface read-side divergence with per-function attribution. * training_sim bitflip + bitflip_sweep exercise PHY retry / error-recovery paths that a happy-path trace never enters. * audit_data_syms closes the bug-class that ld –unresolved-symbols=ignore-all keeps opening every time a port lands new externs. * audit_early_return_tail statically screens for the “C port skips vendor shared-tail” shape that only matters on control-flow paths the emulator doesn't exercise. * mmio_regions'' makes every tool's output scannable. Saves minutes of address-range lookup per divergence investigation.

megabitchip/simulation_stack.txt · Last modified: by 127.0.0.1