====== CIRD — Can It Run Doom ====== //"Four Cortex-M0 cores. One SoC. One very old FPS. How hard could it be."// Status: **design draft, parked** (2026-04-22). Umbrella: [[:coulomb|Coulomb]] (RK3588 stack) — adjacent, not a prerequisite. ===== The question ===== RK3588 has (at least) four in-SoC Cortex-M0 cores: * **PMU0_MCU** — always-on, ~8 KB SRAM, PMU0-local peripherals only * **PMU1_MCU** — always-on, ~64 KB SRAM, PMU peripherals + bridge into main bus * **DDR_MCU** — nannies DDR PHY/CTRL, ~32 KB SRAM, narrow outside view * **BUS_MCU** — ~32 KB SRAM, full AXI interconnect view, general-purpose offload Rules of the game: - The **AP** does display: a VOP2 overlay plane DMA'd from a DDR carveout. - The **AP** relays input via mailbox. - Everything else — game tick, BSP traversal, software rasterization — happens on the M0. - "Cheating" allowed: use DDR for code + state. As long as the M0 does the significant work. ===== Candidate scoring ===== ^ Core ^ SRAM ^ DDR access ^ Mailbox to AP ^ Background duties ^ Verdict ^ | PMU0_MCU | 8 KB | none | indirect | sleep-state | pause button only | | PMU1_MCU | 64 KB | narrow bridge | ✓ | PMIC / thermal / S2R | runner-up — jitter risk | | DDR_MCU | 32 KB | direct (but owns it) | no | DDR training + DFS | disqualified — stutters on DFS | | BUS_MCU | 32 KB | full AXI | ✓ | none | **winner** | ===== Architecture (straw draft) ===== * **BUS_MCU** runs Doom: game tick, renderer, all logic. * DDR carveout: code (~1 MB), WAD (~4 MB shareware), two framebuffers (320×200×1B palette = 64 KB each). * **AP** sets up VOP2 overlay once to scan out from ''ddr_fb[idx]''. Flip ''idx'' on mailbox doorbell. One MMIO per frame. * **Input:** AP forwards keyboard/gamepad events via a second mailbox channel (ring buffer in shared SRAM). * **PMU0_MCU** (optional cheek): watchdogs BUS_MCU. If it stops kicking, display "you died" and reset. Completely unnecessary and therefore mandatory. ===== Ramp-up — what to verify before writing code ===== - **Reachability of BUS_MCU's SRAM and reset-vector latch from AP.** Mainline Linux has drivers for the Rockchip remoteproc; confirm BUS_MCU is one of the supported instances. TRM chapter on "MCU Subsystem" is the source of truth. - **Mailbox channels not already claimed** by ATF / BL31 / PSCI. Pick one bidirectional pair. - **DDR carveout reservation.** ''memory-region'' in the DT with ''no-map'', handed to the M0 via a known base address. - **Cache coherence.** BUS_MCU is almost certainly non-coherent to the AP L3. Either use a non-cacheable mapping on the AP side for the framebuffer, or explicit clean/invalidate around every flip. - **VOP2 overlay setup.** One plane, 8-bit indexed color, scan-out from our carveout. Drop into KMS as an overlay plane; let the kernel composit (or take the CRTC outright). - **Doom port.** Chocolate-Doom or the older id release. Strip SDL. Replace the video backend with "write to framebuffer + signal mailbox". Replace input backend with "read from mailbox ring". No sound (or mailbox-to-AP-PCM later). ===== Chicken-and-egg notes ===== * The AP bringing up BUS_MCU is fine — the reverse would require PMU1/DDR_MCU to load it, which is silly. * No deep-sleep support. When the AP sleeps, BUS_MCU loses power → game over (literal). * DFS on the AP is fine; it doesn't touch BUS_MCU. DFS on DDR, however, stalls everyone reading from DDR — including BUS_MCU — for the duration of retraining. Doom will stutter during aggressive DVFS. Pin DDR to a single OPP while playing. ===== Cheek options (for later) ===== * **PMU1_MCU variant** — Doom that survives an AP kernel panic. ''echo c > /proc/sysrq-trigger'' mid-frag, keep playing. Novelty only. * **Multiplayer M0** — BUS_MCU and PMU1_MCU as two networked players, mailbox as network. Splitscreen via two VOP2 overlay planes. * **Render on DDR_MCU during idle training windows** — do not attempt. ===== Open questions ===== * Is BUS_MCU clocked high enough (a few hundred MHz) to sustain playable Doom? Cortex-M0 at 200 MHz rendering 320×200 software-rasterized — rough math says "probably single-digit FPS, acceptable for the bit, not for actual play". Need a cycle estimate before committing. * Is the DDR latency from BUS_MCU's master port comparable to AP's, or is it routed through a throttled path? TRM will have a block diagram; actual numbers need measurement. * Does mainline Linux already expose BUS_MCU as a remoteproc node, or is a DT patch required? ===== Status / next step ===== Parked. Pre-req to even starting: finish **MegabitChip** (DDR blob RE) and at least one clean boot on ampere with our own TPL. Then this becomes "write an M0 firmware and a small kernel driver", which is a weekend. Linked from [[:start|start page]].