This is an old revision of the document!
Table of Contents
Claude-Assisted Development Process — 8-Phase Loop
Markus's standardized loop for our implementation work. Apply by default whenever a task is bigger than a one-liner. Skipping phases is a deliberate choice that should be flagged, not a default.
Mirror of claude-his-agent memory entry feedback_dev_process.md. This page is the canonical artifact for Phase 5 second-model handovers — paste measurements/plan in here, redact, then hand the URL to the reviewer.
Phase 1 — Goal Formulation
Define the objective in measurable terms. State what success looks like before touching anything. The chosen metric is a hypothesis about what to measure, not an axiom — Phase 3 may invalidate it.
Phase 2 — Situation Analysis
Document current state. Identify constraints, dependencies, known failure modes. Reset context here — do not carry assumptions from prior sessions; re-read CLAUDE.md, relevant memory files, run git status, re-verify reachability.
Phase 3 — Baseline Measurements
Take concrete measurements before any changes. Paste raw output into DokuWiki at capture time — verbatim, not paraphrased. The Phase 5 artifact is the raw data, not Claude's summary.
Real data, not theatre. Phase 3 exists to use AI capacity for absorbing wide, low-level instrumentation a human reader would skim past. Attaching strace / perf / ftrace / eBPF / custom tripwires to the process under test is real Phase 3; scraping mpv's stdout dropped-frame counter is not. Discriminator: if a human with bash and grep could produce the same baseline, it isn't Phase 3 yet — go down to the syscall / call-path / MMIO / register layer.
Anti-fabrication:
- Every cited value traces to a visible tool invocation or verbatim paste-in. If a measurement wasn't taken, write “not measured” — never an estimate, inference, or recall from training / prior sessions / sibling-host memory.
- Raw before derived. A derived number (FPS, p99, error rate) appears alongside the raw stream it came from, never alone.
- Rig failure is the finding. Empty strace, dead UART, perf counter that didn't increment → that is the Phase 3 result. Loop back to Phase 2 to fix the rig; do not synthesize plausible-looking baseline data to keep momentum.
- If baseline reveals the Phase 1 metric was tracking the wrong thing → loop back to Phase 1 with the corrected target.
- Example: “max H.264 FPS” Phase 1 metric, but baseline shows DMA-setup + sync overhead dwarfs decode → real metric is bytes-copied-per-second / EGL surface-import time, not FPS.
Phase 4 — Plan
Formulate the approach. Identify what will and will not be touched. State expected outcome of implementation in the same measurable terms used in Phase 1/3.
Phase 5 — Second Model Review
Goal, situation, measurements, plan get pasted into DokuWiki (this namespace, claude: subpages). Markus reviews and redacts, then initiates the handover to a fresh model instance. Claude does not curate the artifact going to the reviewer — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.
Phase 6 — Implementation
Execute the plan. Scope strictly to what was planned — resist feature creep, refactor-creep, “while I'm here” cleanups, and chainsaw enthusiasm. If a plan revision is needed mid-implementation, surface it explicitly and re-enter Phase 4.
Phase 7 — Verification Measurements
Repeat measurements from Phase 3. Compare explicitly against baseline.
- If the delta does not match Phase 4's prediction → loop back to Phase 4 (re-plan). Do not declare success when the numbers say otherwise; an unexplained delta is a finding, not a footnote.
Phase 8 — Memory Update
Loop terminates here. Distill the lesson into a memory entry — what was the mistake the loop caught, what's the rule that would shorten the next cycle. Do not let the lesson rot in chat history.
Loopback edges (summary)
- Phase 3 → Phase 1 (metric was wrong)
- Phase 7 → Phase 4 (plan didn't deliver predicted delta)
- Phase 8 closes the loop
Why this exists
Several recurring failures in prior work codify into individual rules — observer-first, simulate-before-flash, three-strikes-then-verify, “trust eyes not vibes,” scope-strictly-to-plan, no-fake-dry-run. Those are all symptoms; this loop is the structural fix. Use it as the spine and let those rules show up as rejection patterns inside the appropriate phases.
“The seven-year-old with the chainsaw can do real work. Watch the blade, not the smile.”
