User Tools

Site Tools


megabitchip:2026-04-21_extended

MegabitChip — Session 2026-04-21 (Extended)

Extended session on top of the reloc-splice + mmio-diff work landed earlier the same day. Focus: close the gap between write-sequence equality (mmio_diff green) and actually running on silicon without bricking. Tools, audits, and three new monster ports shipped; six silicon-hostile bugs caught pre-flash across three bug classes.

TL;DR

  • mmio_diff baseline held at 3173 / 3173 across the whole session.
  • Three bug classes, six concrete bugs, all found and fixed without touching silicon.
  • Three remaining “monster” functions ported (fn_fcc4, fn_1c14, fn_de40).
  • Bitflip sweep: pre-silicon evidence the rebuild's retry logic converges under all plausible transient status faults.

Six silicon-hostile bugs caught pre-flash

# Class Case
1 ld unresolved → 0 NULL deref fn_9a68 DAT_00012B70 case-mismatch
2 same fn_7730 DAT_00010ba8 missing from DATA_SYMS
3 same fn_7730 DAT_00010c2c missing from DATA_SYMS
4 same fn_7730 DAT_00012b50 missing from DATA_SYMS
5 C early-return skips shared tail fn_3268 0x208 RMW pair skipped when bit-31 set
6 Port is read-only where vendor writes fn_1c14 rebuilt as no-op; vendor save/restore DDRPHY training bank

Class 1: ld –unresolved-symbols=ignore-all silently zeros undefined externs. A case-mismatched or missing DATA_SYMS entry becomes an adrp resolving to page 0x0, and ldr returns whatever junk lives at zero. mmio_diff is blind to this because downstream MMIO writes still match vendor.

Class 2: C port uses early-return where vendor's asm has the conditional branch jump into a shared tail. Two 0x208 read-modify-writes that vendor always executes got skipped on one control-flow path in the rebuild. Emulator didn't exercise the bit-31-set entry state so the missing writes never showed up in the trace. On silicon where that bit is live, silicon-hostile.

Class 3: port implemented a DDRPHY training-bank save/restore routine as read-and-discard. Vendor writes via str wzr; our port only ldred. Caller (fn_9a90) never reached under the happy-path LP5-2400 cold-boot trace, so mmio_diff didn't fire. On silicon with the caller active, training coefficients leak between phases.

All six would have bricked or mis-trained silicon. All six were invisible to write-sequence diff.

Tooling shipped this session

See Simulation stack for the full reference. New or hardened:

  • sim_tripwire.py — Bin-style per-access tracer on Unicorn; (seq, tick, pc, addr, size, rw, val, region, fn_name) records with PC→fn resolution
  • tripwire_diff.py — PC-bucketed SequenceMatcher diff; bucket by fn_name to survive bitflip-path control-flow divergence
  • training_sim.py — two-mode DDR training simulator (pass / bitflip-first-N-reads)
  • bitflip_sweep.py — per-address retry convergence test over all training-status addresses
  • mmio_regions.py — shared address → region tag classifier (DDRCTL, DDRPHY, OTP, SRAM, CRU, …); fixed SCRAMBLE→OTP at 0xFECC0000 after TRM cross-check
  • audit_data_syms.py — scans every candidate.c for DAT_/s_/BLOB_DATA_ externs, cross-checks against DATA_SYMS | PORT_OVERRIDES | MMIO_SYMS (case-insensitive)
  • audit_early_return_tail.py — static ARM64 asm scanner for cond_br → short block with mov #const → b INTO_TAIL_WITH_STR patterns; flagged 15 candidates, 1 real bug (fn_3268), 1 different-class bug (fn_1c14), 13 false positives
  • reloc_splice.py gained a post-link ADRP-to-NULL guard — scans each linked .text for any ADRP whose resolved page is 0x0 and emits WARN <port>+<off>. Closes bug-class 1 at build time.

All wired into make audit.

Monster ports

See Port matrix for the full table.

  • fn_fcc4 — source-complete full port, 1684 B. Natural skip-larger. Documented source.
  • fn_1c14 — full port, 656 B ≤ 740 B vendor. Replaces the broken read-only stub. Vendor writes via str wzr; port now does the same.
  • fn_3268 — bug fix: C restructured so the 0x208 RMW pair runs on both control-flow paths, matching vendor's branch-into-tail shape.
  • fn_de40 — source-scaffold, 4888 B ≤ 4912 B vendor budget. Faithful ~700-line port from ddr_annotated.c:9695–10640 (LPDDR5 frequency-band timing programmer). 27 callees resolved via fun_table. 24 new DAT_00011ff0..DAT_000127c0 defsyms added to DATA_SYMS. Currently parked in splicer_skip.txt pending investigation of a 1-bit divergence at tp[0x4f] — see internal task #198.

Bitflip sweep

23 training-status addresses flipped one-at-a-time on vendor LP5-2400:

  • 18 of 23: single-read retry, all downstream writes unchanged — clean convergence.
  • 3 of 23 (STAT CH1/CH2/CH3): fn_2340 writes MRCTRL0 = 0x60 instead of 0x10 — vendor's intended mr_type retry strategy, replicated correctly by the rebuild.
  • 2 of 23 (MicroReset, MicroContMux): no retry fires on the LP5-2400 happy path — flip window isn't polled.

The sweep is the pre-silicon evidence that the rebuild's retry logic converges across all plausible transient status faults. Bitflip mode doesn't degrade tripwire_diff because the buckets key on fn_name not seq_idx, so control-flow divergence just reshapes buckets.

Baseline state at session end

  • mmio_diff 3173/3173 green
  • make audit green on data-symbol coverage + early-return-tail
  • Splicer: 104 candidates / 85 spliced / 19 skip-larger / 0 failed
  • splicer_skip.txt: one entry (154_FUN_de40 until #198 closes)
  • tripwire_diff finds 1 SUSPECT (fn_ac8 vendor early memcpy,

unrelated) and 3 minor-diffs all explained (SWSTAT toggle,

  SCRAMBLE→OTP off-by-one, ''fn_8b40'' extra polls)

Next-session quick-start

cd ~/projects/AMPere/benchmark && make verify   # expects 3173/3173 green

If green, pick task #198 or any pending. Task #198 investigates the 1-bit tp[0x4f] divergence in fn_de40's install trial — details in the internal task board.

Observations

“Markus' insistence on simulation before flashing paid off. Big time. Again.” — 2026-04-21.

The tripwire + PC-bucketed diff caught 3 silent NULL-derefs that were hiding under mmio_diff 3173/3173 green. ld –unresolved-symbols=ignore-all zeroed undefined DATA_SYMS externs into page 0x0, which emulator reads happily returned 0 for, masking the bug in write-sequence equality. Silicon would have bricked.

mmio_diff was the gate we trusted. The gate was passing. The simulator layer — with a tripwire-style per-access capture, not just write-order comparison — is not optional, even late in a campaign that feels “done”.

megabitchip/2026-04-21_extended.txt · Last modified: by 127.0.0.1