Markus's standardized loop for our implementation work. Apply by default whenever a task is bigger than a one-liner. Skipping phases is a deliberate choice that should be flagged, not a default.
Mirror of claude-his-agent memory entry feedback_dev_process.md. This page is the canonical artifact for Phase 5 second-model handovers — paste measurements/plan in here, redact, then hand the URL to the reviewer.
Define the objective in measurable terms. State what success looks like before touching anything. The chosen metric is a hypothesis about what to measure, not an axiom — Phase 3 may invalidate it.
Document current state. Identify constraints, dependencies, known failure modes. Reset context here — do not carry assumptions from prior sessions; re-read CLAUDE.md, relevant memory files, run git status, re-verify reachability.
Take concrete measurements before any changes. Paste raw output into DokuWiki at capture time — verbatim, not paraphrased. The Phase 5 artifact is the raw data, not Claude's summary.
Real data, not theatre. Phase 3 exists to use AI capacity for absorbing wide, low-level instrumentation a human reader would skim past. Attaching strace / perf / ftrace / eBPF / custom tripwires to the process under test is real Phase 3; scraping mpv's stdout dropped-frame counter is not. Discriminator: if a human with bash and grep could produce the same baseline, it isn't Phase 3 yet — go down to the syscall / call-path / MMIO / register layer.
Anti-fabrication:
Measurements describe what the system does, not what it should do. Baseline data is evidence, not a specification. Do NOT derive API call sequences, struct layouts, or parameter values from observed behaviour (strace, perf, example output). Observable behaviour may reflect bugs, workarounds, or implementation accidents — anything you copy from it inherits those.
Formulate the approach. Identify what will and will not be touched. State expected outcome of implementation in the same measurable terms used in Phase 1/3.
Goal, situation, measurements, plan get pasted into DokuWiki (this namespace, claude: subpages). Markus reviews and redacts, then initiates the handover to a fresh model instance. Claude does not curate the artifact going to the reviewer — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.
Execute the plan. Scope strictly to what was planned — resist feature creep, refactor-creep, “while I'm here” cleanups, and over-eager scope expansion. If a plan revision is needed mid-implementation, surface it explicitly and re-enter Phase 4.
Contract before code. Before writing or modifying any call site:
Copying from baseline measurements is not implementation. It is transcription of potentially broken behaviour. A deliverable that matches baseline bytes but violates the API contract is not a deliverable — it is a deferred bug.
Worked example: 0012-h264-omit-scaling-matrix-frame-based.patch in ~/src/ohm_gl_fix/phase6/step1/. The commit message opens with the contract before any code:
VAAPI signals "explicit scaling lists are present in the bitstream" implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a VAIQMatrixBufferH264 alongside RenderPicture iff sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag. When the bitstream uses default (flat) scaling, no IQMatrixBuffer arrives [...] Earlier draft of this patch unconditionally omitted SCALING_MATRIX in FRAME_BASED. That's corpus-correct (bbb has no explicit scaling lists) but the wrong predicate: the kernel-side gating is by "matrix-supplied vs. not," not by decode mode. Streams that signal explicit scaling lists must submit SCALING_MATRIX in either mode. Contract verification (audit_0008_decode_params_2026-05-01.md + hantro_h264.c::assemble_scaling_list): the kernel uses the supplied matrix when SCALING_MATRIX is in the control batch and falls back to spec-defined defaults when absent. Mode-independent.
What this gets right:
ext-ctrls-codec-stateless.rst:752), kernel driver (hantro_h264.c::assemble_scaling_list), and sibling implementation (gst-plugins-bad commit 9e3e775) before any patch hunks.matrix_set, SLICE_PARAMS iff SLICE_BASED, PRED_WEIGHTS iff SLICE_BASED + V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.Mirror format anywhere reviewable: PR description, commit message body, plan section, or a header comment block. The shape is contract clauses with citations → code that maps 1:1 to those clauses.
Repeat measurements from Phase 3. Compare explicitly against baseline.
Loop terminates here. Distill the lesson into a memory entry — what was the mistake the loop caught, what's the rule that would shorten the next cycle. Do not let the lesson rot in chat history.
Several recurring failures in prior work codify into individual rules — observer-first, simulate-before-flash, three-strikes-then-verify, “trust eyes not vibes,” scope-strictly-to-plan, no-fake-dry-run. Those are all symptoms; this loop is the structural fix. Use it as the spine and let those rules show up as rejection patterns inside the appropriate phases.