cird
Table of Contents
CIRD — Can It Run Doom
“Four Cortex-M0 cores. One SoC. One very old FPS. How hard could it be.”
Status: design draft, parked (2026-04-22).
Umbrella: Coulomb (RK3588 stack) — adjacent, not a prerequisite.
The question
RK3588 has (at least) four in-SoC Cortex-M0 cores:
- PMU0_MCU — always-on, ~8 KB SRAM, PMU0-local peripherals only
- PMU1_MCU — always-on, ~64 KB SRAM, PMU peripherals + bridge into main bus
- DDR_MCU — nannies DDR PHY/CTRL, ~32 KB SRAM, narrow outside view
- BUS_MCU — ~32 KB SRAM, full AXI interconnect view, general-purpose offload
Rules of the game:
- The AP does display: a VOP2 overlay plane DMA'd from a DDR carveout.
- The AP relays input via mailbox.
- Everything else — game tick, BSP traversal, software rasterization — happens on the M0.
- “Cheating” allowed: use DDR for code + state. As long as the M0 does the significant work.
Candidate scoring
| Core | SRAM | DDR access | Mailbox to AP | Background duties | Verdict |
|---|---|---|---|---|---|
| PMU0_MCU | 8 KB | none | indirect | sleep-state | pause button only |
| PMU1_MCU | 64 KB | narrow bridge | ✓ | PMIC / thermal / S2R | runner-up — jitter risk |
| DDR_MCU | 32 KB | direct (but owns it) | no | DDR training + DFS | disqualified — stutters on DFS |
| BUS_MCU | 32 KB | full AXI | ✓ | none | winner |
Architecture (straw draft)
- BUS_MCU runs Doom: game tick, renderer, all logic.
- DDR carveout: code (~1 MB), WAD (~4 MB shareware), two framebuffers (320×200×1B palette = 64 KB each).
- AP sets up VOP2 overlay once to scan out from
ddr_fb[idx]. Flipidxon mailbox doorbell. One MMIO per frame. - Input: AP forwards keyboard/gamepad events via a second mailbox channel (ring buffer in shared SRAM).
- PMU0_MCU (optional cheek): watchdogs BUS_MCU. If it stops kicking, display “you died” and reset. Completely unnecessary and therefore mandatory.
Ramp-up — what to verify before writing code
- Reachability of BUS_MCU's SRAM and reset-vector latch from AP. Mainline Linux has drivers for the Rockchip remoteproc; confirm BUS_MCU is one of the supported instances. TRM chapter on “MCU Subsystem” is the source of truth.
- Mailbox channels not already claimed by ATF / BL31 / PSCI. Pick one bidirectional pair.
- DDR carveout reservation.
memory-regionin the DT withno-map, handed to the M0 via a known base address. - Cache coherence. BUS_MCU is almost certainly non-coherent to the AP L3. Either use a non-cacheable mapping on the AP side for the framebuffer, or explicit clean/invalidate around every flip.
- VOP2 overlay setup. One plane, 8-bit indexed color, scan-out from our carveout. Drop into KMS as an overlay plane; let the kernel composit (or take the CRTC outright).
- Doom port. Chocolate-Doom or the older id release. Strip SDL. Replace the video backend with “write to framebuffer + signal mailbox”. Replace input backend with “read from mailbox ring”. No sound (or mailbox-to-AP-PCM later).
Chicken-and-egg notes
- The AP bringing up BUS_MCU is fine — the reverse would require PMU1/DDR_MCU to load it, which is silly.
- No deep-sleep support. When the AP sleeps, BUS_MCU loses power → game over (literal).
- DFS on the AP is fine; it doesn't touch BUS_MCU. DFS on DDR, however, stalls everyone reading from DDR — including BUS_MCU — for the duration of retraining. Doom will stutter during aggressive DVFS. Pin DDR to a single OPP while playing.
Cheek options (for later)
- PMU1_MCU variant — Doom that survives an AP kernel panic.
echo c > /proc/sysrq-triggermid-frag, keep playing. Novelty only. - Multiplayer M0 — BUS_MCU and PMU1_MCU as two networked players, mailbox as network. Splitscreen via two VOP2 overlay planes.
- Render on DDR_MCU during idle training windows — do not attempt.
Open questions
- Is BUS_MCU clocked high enough (a few hundred MHz) to sustain playable Doom? Cortex-M0 at 200 MHz rendering 320×200 software-rasterized — rough math says “probably single-digit FPS, acceptable for the bit, not for actual play”. Need a cycle estimate before committing.
- Is the DDR latency from BUS_MCU's master port comparable to AP's, or is it routed through a throttled path? TRM will have a block diagram; actual numbers need measurement.
- Does mainline Linux already expose BUS_MCU as a remoteproc node, or is a DT patch required?
Status / next step
Parked. Pre-req to even starting: finish MegabitChip (DDR blob RE) and at least one clean boot on ampere with our own TPL. Then this becomes “write an M0 firmware and a small kernel driver”, which is a weekend.
Linked from start page.
cird.txt · Last modified: by 127.0.0.1
