====== Claude-Assisted Development Process — 8-Phase Loop ======

Markus's standardized loop for our implementation work. Apply by default whenever a task is bigger than a one-liner. Skipping phases is a deliberate choice that should be flagged, not a default.

Mirror of [[https://git.reauktion.de/marfrit/claude-his-agent|claude-his-agent]] memory entry ''feedback_dev_process.md''. This page is the canonical artifact for **Phase 5 second-model handovers** — paste measurements/plan in here, redact, then hand the URL to the reviewer.

===== Phase 1 — Goal Formulation =====

Define the objective in measurable terms. State what success looks like //before// touching anything. The chosen metric is a **hypothesis** about what to measure, not an axiom — Phase 3 may invalidate it.

===== Phase 2 — Situation Analysis =====

Document current state. Identify constraints, dependencies, known failure modes. **Reset context here** — do not carry assumptions from prior sessions; re-read CLAUDE.md, relevant memory files, run ''git status'', re-verify reachability.

===== Phase 3 — Baseline Measurements =====

Take concrete measurements //before// any changes. Paste raw output into DokuWiki at capture time — verbatim, not paraphrased. The Phase 5 artifact is the raw data, not Claude's summary.

**Real data, not theatre.** Phase 3 exists to use AI capacity for absorbing wide, low-level instrumentation a human reader would skim past. Attaching strace / perf / ftrace / eBPF / custom tripwires to the process under test is real Phase 3; scraping mpv's stdout dropped-frame counter is not. Discriminator: if a human with bash and grep could produce the same baseline, it isn't Phase 3 yet — go down to the syscall / call-path / MMIO / register layer.

**Anti-fabrication:**

  * Every cited value traces to a visible tool invocation or verbatim paste-in. If a measurement wasn't taken, write "not measured" — never an estimate, inference, or recall from training / prior sessions / sibling-host memory.
  * Raw before derived. A derived number (FPS, p99, error rate) appears alongside the raw stream it came from, never alone.
  * Rig failure is the finding. Empty strace, dead UART, perf counter that didn't increment → that //is// the Phase 3 result. Loop back to Phase 2 to fix the rig; do not synthesize plausible-looking baseline data to keep momentum.

  * **If baseline reveals the Phase 1 metric was tracking the wrong thing → loop back to Phase 1** with the corrected target.
  * Example: "max H.264 FPS" Phase 1 metric, but baseline shows DMA-setup + sync overhead dwarfs decode → real metric is bytes-copied-per-second / EGL surface-import time, not FPS.

**Measurements describe what the system //does//, not what it //should do//.** Baseline data is evidence, not a specification. Do NOT derive API call sequences, struct layouts, or parameter values from observed behaviour (strace, perf, example output). Observable behaviour may reflect bugs, workarounds, or implementation accidents — anything you copy from it inherits those.

===== Phase 4 — Plan =====

Formulate the approach. Identify what will and will not be touched. State expected outcome of implementation in the //same// measurable terms used in Phase 1/3.

===== Phase 5 — Second Model Review =====

Goal, situation, measurements, plan get pasted into **DokuWiki** (this namespace, ''claude:'' subpages). Markus reviews and redacts, then initiates the handover to a fresh model instance. **Claude does not curate the artifact going to the reviewer** — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.

===== Phase 6 — Implementation =====

Execute the plan. Scope strictly to what was planned — resist feature creep, refactor-creep, "while I'm here" cleanups, and over-eager scope expansion. If a plan revision is needed mid-implementation, surface it explicitly and re-enter Phase 4.

**Contract before code.** Before writing or modifying any call site:

  * Read the API contract — kernel docs, header comments, and upstream source for every call touched.
  * State the contract explicitly before implementing against it (in the plan, the commit message, or a comment — somewhere reviewable).
  * If the contract cannot be found: stop and surface the gap. Don't infer it from baseline behaviour or sibling code.

**Copying from baseline measurements is not implementation. It is transcription of potentially broken behaviour.** A deliverable that matches baseline bytes but violates the API contract is not a deliverable — it is a deferred bug.

==== What "state the contract explicitly" looks like ====

Worked example: ''0012-h264-omit-scaling-matrix-frame-based.patch'' in ''~/src/ohm_gl_fix/phase6/step1/''. The commit message opens with the contract before any code:

<code>
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives [...]

Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.

Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.
</code>

What this gets right:

  * **Contract first** — per-control rules cited from kernel doc (''ext-ctrls-codec-stateless.rst:752''), kernel driver (''hantro_h264.c::assemble_scaling_list''), and sibling implementation (gst-plugins-bad commit 9e3e775) //before// any patch hunks.
  * **Corpus-correct ≠ spec-correct, called out by name** — the rejected predicate ("omit SCALING_MATRIX in FRAME_BASED") //did// match the BBB baseline. It still got rejected, because the contract said the gate is "matrix-supplied vs. not," not "decode mode." This is exactly the Phase 3-derived-implementation trap.
  * **Then** the diff implements one branch per contract clause: SPS/PPS/DECODE_PARAMS always, SCALING_MATRIX iff ''matrix_set'', SLICE_PARAMS iff SLICE_BASED, PRED_WEIGHTS iff SLICE_BASED + ''V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED''.

Mirror format anywhere reviewable: PR description, commit message body, plan section, or a header comment block. The shape is //contract clauses with citations → code that maps 1:1 to those clauses//.

===== Phase 7 — Verification Measurements =====

Repeat measurements from Phase 3. Compare explicitly against baseline.

  * **If the delta does not match Phase 4's prediction → loop back to Phase 4** (re-plan). Do not declare success when the numbers say otherwise; an unexplained delta is a finding, not a footnote.

===== Phase 8 — Memory Update =====

Loop terminates here. Distill the lesson into a memory entry — what was the mistake the loop caught, what's the rule that would shorten the next cycle. Do not let the lesson rot in chat history.

===== Loopback edges (summary) =====

  * Phase 3 → Phase 1 (metric was wrong)
  * Phase 7 → Phase 4 (plan didn't deliver predicted delta)
  * Phase 8 closes the loop

===== Why this exists =====

Several recurring failures in prior work codify into individual rules — observer-first, simulate-before-flash, three-strikes-then-verify, "trust eyes not vibes," scope-strictly-to-plan, no-fake-dry-run. Those are all symptoms; this loop is the structural fix. Use it as the spine and let those rules show up as rejection patterns inside the appropriate phases.