User Tools

Site Tools


bin_debug_strategy

Table of Contents

VOP2 + eDP u-boot Debug Strategy — dual-agent dossier

Drafted overnight 2026-04-16 by two independent AI agents (Sonnet 4.6 + Opus 4.7) given identical context. Both rank top-3 suspects differently; the overlap is the high-confidence signal, the disagreement is where you'll learn the most tomorrow. This page exists because the user wanted something readable from their phone during commute.

TL;DR — what both agents agree on

  • The bug is somewhere between VOP2's DSP_IF output and the HDPTX PHY's high-speed lanes. VOP2 scans correctly (RAM reads happen), link training passes (AUX works), panel backlight is on (power/HPD works), but the main-link pixel stream never lands.
  • dclk_vp2 clock parent / CRU mux is a top-3 suspect in both strategies. The first test of tomorrow's session should be cat /sys/kernel/debug/clk/clk_summary | grep dclk_vp2 on the running kernel — costs 30 seconds, potentially decisive.
  • The existing vop2trace.ko only hooks regmap_write — both agents independently flag this as the biggest blind spot. Extend to regmap_update_bits_base + clk_set_parent + arm_smccc_smc before the next deep dive.
  • No physical probing is needed — everything can be answered from software. The wait is short.

TL;DR — where they disagree

Sonnet's #1 bet Opus's #1 bet
Suspect dclk_vp2 wrong CRU mux parent HDPTX PHY not producing a recovered pixel clock (partially owned by BL31 via SMC)
Why Register diff is byte-identical everywhere EXCEPT the mux trees that don't appear in MMIO reads; classic “frequency right, source wrong” /dev/mem zeros at 0xFED70000 suggest secure-world / syscon indirection; BL31 may own PHY init
First experiment clk_summary diff, then mw.l the right CRU mux word from u-boot console for zero-cost test Upgrade vop2trace.ko to catch regmap_update_bits + arm_smccc_smc, capture full PHY init trace
Time cost ~30 minutes ~2–3 hours

Resolution strategy: they're not actually incompatible. Sonnet's path is cheap-and-definitive if the mux IS wrong; if it isn't, Opus's deeper instrumentation becomes necessary. Run Sonnet's test first, escalate to Opus's plan if it comes back matching.


2026-04-17 evening — Closed / Reopened

Appended after the tonight session — resolution of open questions from the top of this page.

Sessions 1–4 were heuristic register matching. Tonight was empirical disambiguation — targeted experiments that either closed a hypothesis or reopened it more sharply.

Closed

  • Is the DP path working? Closed, yes. PHY-internal BIST colour bars

drive the panel correctly — DP TX, PHY, cable, panel, backlight are all

  healthy.
* **Does the panel see our stream?** Closed, yes. DPCD ''SINK_STATUS''
  (0x205) reads ''IN_SYNC'' after commit. Link trains at HBR×2 without
  errors.
* **Alpha=0 stripe-paint bug.** Closed, fixed in v10. Vendor clusters
  blend incoming pixels against ''VP2_DSP_BG=black'' with the source
  alpha; alpha-0 means every pixel was being multiplied to black before
  ever hitting the DSP_IF. A real content bug, unrelated to VOP2 vs DP.
* **''dclk_vop2'' parent selection.** Closed, last concrete u-boot
  side-win of the night. Rate was ~136 MHz instead of the kernel 147.69
  MHz; fixed by pre-selecting ''V0PLL'' before ''clk_set_rate()'' so the
  u-boot clock driver takes the retune path rather than the nearest
  matching divider on the default ancestor.
* **Vendor u-boot as eDP-logo reference.** Closed, negative. Built vendor
  ''coolpi-loader'' from source (branch ''linux-6.1-stan'', on new CT
  ''ranke''). Factory ''genbook_spi.img'' boots kernel cleanly but shows
  no logo on eDP or HDMI either. Rockchip wiki explicitly states
  ''rockchip_show_logo()'' is Android-only, not implemented for Linux.
  The vendor-knows-how assumption is dead.

Reopened / still open

  • Whatever in the VOP2 pixel-output path produces zero or invalid pixels despite the stream being received. This is where the remaining

bug lives. Not concrete without more instrumentation — we know the

  fault is upstream of DP TX but cannot yet pinpoint whether it is the
  VOP2 pixel-output chain, the content format handed to the eDP
  controller, or something in between.
* The original TL;DR framing here (dclk_vp2 mux parent, HDPTX PHY owned
  by BL31) has been overtaken by BIST+IN_SYNC results: the PHY is
  producing valid output, so those hypotheses no longer match the data.
  Keep them for the archive but do not bet on them tomorrow.

Note on method

The conclusion is that eDP-logo-in-u-boot is not cleanly solvable without a bigger investment. See Project Bin for the full tonight notes, the vendor-u-boot detour on CT171 ranke, and the open tomorrow list (idblock extract, coolpi_rk3588_gbook_nor_upgrade.img test, or accepting that pixels-in-u-boot is not a today problem).


Sonnet 4.6's strategy

For: upstreamable u-boot patch series. No register soup. No cargo-culting.


1. Where the Fault Probably Lives — Top 3 Bets

Bet #1: dclk_vp2 is on the wrong clock parent (HIGH confidence)

This is the most likely culprit and it's structurally invisible to register diffing. The symptom — every VOP2 and eDP register matches, link training passes, panel backlight is on, but no pixels — is classically consistent with “wrong pixel clock source.”

For eDP on RK3588, the pixel clock for VP2 (dclk_vp2) must come from the HDPTX PHY's recovered/divided clock output, not from a CRU PLL. The CRU has a mux for this, and if u-boot's clk_set_rate(dclk_vp2, 147.84 MHz) walks the wrong mux tree, it configures a PLL-derived clock at the right frequency but the VOP2 and eDP TX are running on unrelated oscillators. They're never synchronized. No pixels. The registers all “look right” because frequency ≠ source.

The kernel's phy-rockchip-samsung-hdptx driver does a two-step: it programs the PHY PLL to produce the desired pixel rate, then reconfigures the CRU dclk_vp2 mux to select clk_hdptx1_pixel_io as the parent before enabling the VOP. u-boot almost certainly skips the mux reassignment.

Why u-boot misses it: u-boot's CCF support for RK3588 is partial. Many mux nodes are present but “fixed” or default to the PLL path. The clk_set_rate call may succeed (hitting a PLL), return the right rate, and never touch the mux.

Bet #2: VO1_GRF DSP_IF routing not written (MEDIUM-HIGH confidence)

The VO1_GRF (0xFD5AC000) contains mux bits that route VP2's parallel output bus to the eDP1 TX data lines rather than HDMI/other sinks. If these aren't written, the VOP2 scans out into a dead bus while eDP is receiving nothing. Your register diffing confirmed you haven't touched VO1_GRF at all in u-boot. The kernel writes it during rockchip_vop2_bind() setup, and it's write-once-per-boot.

This one is slightly lower confidence than the clock because link training passing suggests the PHY did get initialized somehow — but DP link training succeeds purely over AUX channel which is independent of the video data path. You can have a fully trained DP link and zero video pixels if the parallel bus from VOP2 is unrouted.

Bet #3: HDPTX PHY not fully initialized by u-boot (MEDIUM confidence)

The /dev/mem zeros at 0xFED70000 are suspicious and need resolving before you can rule this out. The HDPTX combo PHY has a large internal state machine; “link training passes” proves the AUX channel works but AUX goes through a separate path from the high-speed TMDS/DP serial lanes. The zeros could mean: (a) the registers are banked/indirect and /dev/mem isn't hitting the real state, or (b) the kernel has handed the PHY to a power domain and it's invisible to user-space MMIO, or © u-boot's PHY init is genuinely incomplete.

The reason this is #3 rather than #1: if the PHY high-speed lanes were truly dead, you'd expect link training to fail or DPCD to be unreadable. Since both work, the main PHY init probably ran. But there may be a pixel clock enable step or lane swap configuration that's separate from link training init.


2. Concrete Experiments

Experiment A: Clock parent dump (rules out / confirms Bet #1, ~30 min)

What to do: On the running kernel (display working), read the clock tree for dclk_vp2:

cat /sys/kernel/debug/clk/dclk_vp2/clk_summary
====== or ======
cat /sys/kernel/debug/clk/clk_summary | grep -A3 dclk_vp2

Also capture the full CRU register block for the VP2 clock mux word while display is live:

devmem 0xFD7C0180 32   # adjust offset per TRM — CRU_CLKSEL_CON for dclk_vp2
====== read surrounding 16 registers to catch the mux bank ======

Then: Boot with u-boot's eDP init, break into u-boot shell, and use clk info dclk_vp2 or read the same CRU register via md.l 0xFD7C0180 16.

Confirms Bet #1 if: The kernel shows parent: clk_hdptx1_pixel_io (or similar HDPTX-sourced name), and u-boot shows parent: vpll or cpll or any CRU-internal PLL. That's the bug. Write a 4-line CRU mux fixup in your eDP probe function, re-flash, done.

Rejects Bet #1 if: Both show the same parent name.

Experiment B: VO1_GRF capture and comparison (rules out / confirms Bet #2, ~20 min)

What to do: While kernel display is running, dump the VO1_GRF region:

for offset in $(seq 0 4 128); do
  printf "VO1_GRF+0x%03x = 0x%08x\n" $offset $(devmem $((0xFD5AC000 + offset)) 32)
done

Save this. Boot into u-boot, dump the same region via md.l 0xFD5AC000 0x20.

Then: Diff the two. Any bit that's 0 in u-boot and non-zero in the kernel is a candidate missing write.

Cross-reference against TRM Part 2 “VO1_GRF” register map to identify which bits are vp2_dsp_if_mux vs. other noise.

Confirms Bet #2 if: There's a mux control register that the kernel sets to route VP2→eDP1 and u-boot leaves at reset value (typically 0 = routed to HDMI or first default sink).

Experiment C: Extend vop2trace.ko to catch regmap_read + regmap_update_bits (confirms/rejects Bet #3, ~2 hours)

Your existing vop2trace.ko hooks regmap_write. Extend it to also hook regmap_update_bits — this is what you're missing for the HDPTX PHY driver.

// In vop2trace.ko, add:
static int handler_update_bits(struct kretprobe_instance *ri, struct pt_regs *regs)
{
    // arg0 = regmap*, arg1 = reg offset, arg2 = mask, arg3 = val
    unsigned int reg = (unsigned int)regs->regs[1];
    unsigned int mask = (unsigned int)regs->regs[2];
    unsigned int val  = (unsigned int)regs->regs[3];
    pr_info("regmap_update_bits: reg=0x%x mask=0x%x val=0x%x\n", reg, mask, val);
    return 0;
}

Also: use ftrace to capture the full call sequence for phy-rockchip-samsung-hdptx:

echo 'phy_rockchip_samsung_hdptx*' > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
====== trigger a display modeset (blank/unblank or dpms cycle) ======
echo 0 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace > /tmp/hdptx_trace.txt

This gives you the exact function call sequence the kernel uses during PHY init. Map that against what u-boot calls.


3. Creative Angles

Clock rate injection in u-boot

Before flashing a full fix, test your dclk_vp2 mux theory cheaply: add a u-boot command that writes the CRU mux select bits via mw.l before the VOP enable sequence runs. No recompile needed for the first test — just type it in the u-boot console. If pixels appear, you have your answer in 2 minutes.

DPCD readback as "panel saw video" probe

After u-boot's eDP init, read DPCD register 0x00200 (DP_SINK_STATUS). Bit 0 = port 0 in sync, bit 1 = port 1 in sync. If the panel's DP receiver actually saw valid video symbols on the lanes, these bits will be set. If they're clear, the panel received nothing usable on the high-speed lanes despite link training passing.

Read it via AUX in u-boot:

analogix_dp_read_byte_from_dpcd(dp, DP_SINK_STATUS, &sink_status);
printf("DP_SINK_STATUS=0x%02x\n", sink_status);

This is a zero-cost probe that works without probing hardware and definitively answers “did the panel see video data.”

Reverse-bisect: break the kernel intentionally

Load a kernel module that writes a known-wrong value to the CRU dclk_vp2 parent mux while the display is running. If the display dies immediately, you've confirmed the mux is load-bearing live. More importantly: write the value that matches what u-boot currently writes. If that breaks the kernel display, that's your bug replicated in software with zero ambiguity.

// kill-dclk.ko: write the reset/PLL-default mux value
iowrite32(WRONG_MUX_VAL, cru_base + CLK_SEL_CON_FOR_DCLK_VP2);

sysfs clock forced-reparent test

Before any of the above, try the cheap software version:

echo "cpll" > /sys/kernel/debug/clk/dclk_vp2/clk_parent  # or whatever PLL name

If the display dies, reparent it back. This tells you whether the parent is actually meaningful without writing a single line of C.

BL31 SMC introspection

Check whether the kernel makes any ROCKCHIP_SIP_* SMC calls during eDP bring-up that u-boot doesn't. On the running kernel:

====== Check if BL31 mediates any display init ======
dmesg | grep -i "sip\|bl31\|atf\|smc\|psci" | grep -i "vop\|edp\|hdptx\|disp"

Also check /sys/kernel/debug/rockchip_sip/ if it exists. u-boot's BL31 interface is minimal; if the kernel depends on a BL31 call to configure something (clock, power domain, memory bandwidth reservation), u-boot might silently succeed without it.


4. Minimum-Viable Simulation

Honest assessment: Full u-boot simulation of this specific path is expensive. QEMU's RK3588 board model doesn't exist in mainline and building one is a multi-week project.

What's actually practical:

  • User-space register mock: Extract u-boot's rk3588_edp_enable() sequence into a standalone C program that uses a mmap'd anonymous buffer instead of MMIO. Run it, then compare the buffer contents (as if they were registers) against the expected kernel values. This catches logic errors (wrong offset, wrong shift, wrong mask) without hardware. Cost: ~4 hours. Benefit: catches 80% of “wrote the wrong register” bugs without a flash cycle.
  • ftrace replay: Capture the kernel's full MMIO write sequence via a perf-based MMIO tracer, then write a Python script that replays those writes into /dev/mem on a freshly-booted system before the kernel has had a chance to configure anything. If that makes u-boot's framebuffer appear, you've proven the sequence is sufficient. This is a creative but legitimate way to test “is the kernel sequence necessary and sufficient” without writing driver code.

Flash cycles are 30-60s which is fast enough that the user-space mock is borderline worth it. My recommendation: skip the mock and use the ftrace replay approach. It produces a reproducible script, which is also a useful artifact for the upstream patch description.


5. What the Kernel Does That We Probably Don't

In rough order of likelihood that u-boot is missing it:

  • Clock mux reparent — kernel's clk_set_parent(dclk_vp2, hdptx_pixel_clk) after PHY PLL lock. u-boot's CCF clk_set_rate call likely takes the default PLL path and never touches the mux.
  • VO1_GRF routing writesregmap_write(vo1_grf, VO1_GRF_DP_DSP_IF_MUX, …) during rockchip_vop2_bind. Kernel has GRF as a syscon regmap; u-boot has to do this explicitly.
  • HDPTX PHY pixel clock enable step — separate from lane bring-up. The PHY has a phy_power_on() vs. phy_configure() distinction; u-boot may call configure but not the final pixel-clock-output-enable step.
  • Power domain sequencing delayspm_runtime_get_sync() with real delays baked into the power domain driver. u-boot's power domain enable is faster and may not respect the settling time in the HDPTX PHY datasheet.
  • Pinctrl for DP lanespinctrl_select_state(dev, “active”) for eDP data pins. u-boot pinctrl is minimal; if the kernel sets DP-specific pin functions that differ from reset defaults, u-boot gets reset-default pin functions which may be wrong.
  • regulator enable ordering — vccio_edp, avdd_0v9, vcc_panel come up in a specific sequence with delays between them. u-boot may have all regulators enabled but in the wrong order or without inter-regulator delays.

6. Upstream-Safe Fixes

These are debugging tools only (don't go upstream):

  • kill-dclk.ko and reverse-bisect modules
  • Any printk/pr_debug spam added to kernel drivers temporarily
  • The ftrace replay script

These produce upstream-safe artifacts:

  • The exact CRU mux write for dclk_vp2 parent — goes in rk3588_edp_enable() or the clock driver, with a comment citing TRM section and explaining why eDP requires HDPTX-sourced pixel clock
  • The VO1_GRF routing write — goes in the VOP2 driver's DSP_IF configuration section, properly guarded by if (endpoint == VOP2_EP_EDP1)
  • Any regulator sequencing fix — goes in the panel/bridge probe with usleep_range() calls matching the panel datasheet minimum settling times
  • The DPCD sink-status readback after video enable — useful as a diagnostic in the driver's error path, legitimate upstream contribution

For the upstream patch description: the ftrace replay is gold. You can say “captured kernel MMIO sequence via ftrace, identified missing CRU mux write at offset X, verified by replicating in isolation” — that's a credible, reviewable rationale.


7. Order of Attack — Tomorrow's Session

  • (10 min) Read the clock tree. cat /sys/kernel/debug/clk/clk_summary | grep dclk_vp2. If parent is HDPTX-sourced, you found Bet #1. Immediately test the mux fix via mw.l in u-boot console before recompiling anything.
  • (20 min) Dump VO1_GRF while kernel display is running. Compare against u-boot state. Any delta = candidate write to add.
  • (15 min) Add DPCD DP_SINK_STATUS readback to u-boot's eDP init. Flash and check output. Clear = no video symbols reached the panel. This disambiguates PHY vs. VOP2 vs. routing.
  • (2 hrs) Extend vop2trace.ko to catch regmap_update_bits. Trigger a display modeset cycle and capture the full HDPTX PHY + VOP2 write sequence. Cross-reference with what u-boot does.
  • (30 min) Check for SMC/BL31 calls during display init. If there are any, add them to u-boot's path.
  • (remaining) Implement and test fixes in priority order: clock mux first, then VO1_GRF, then PHY enable step.

I'd start with step 1. The clock parent check costs 30 seconds of typing and if it's wrong, everything else is moot. The “every register matches but no pixels” symptom combined with a PHY-derived clock requirement is so classically clock-domain mismatch that I'd bet a round of drinks on it.


Written 2026-04-16. Strategy assumes kernel 6.x+ on custom rk3588-marfrit tree.


Opus 4.7's strategy

You've done the obvious work. Registers match. Link trains. Backlight glows. Something the kernel does between regmap_update_bits() calls — or before your trace window even opens — is missing. Here's where I'd put my money.

1. Where the fault probably lives

My top three, ranked by “worth burning tomorrow on”:

(a) HDPTX PHY is not actually producing a recovered pixel clock that VP2 is using

Probability: high. You see register-byte-identity between u-boot and kernel, and link training succeeds — that's the AUX channel (side-band, ~1 Mbps, doesn't need the main link clock). The main link symbols are what carry pixels. If HDPTX's TX PLL isn't locked, or dclk_vp2 is still being fed from a CRU PLL instead of the PHY's link_clk output, VP2 scans out into a void — AXI reads happen, pixels get formatted, but the serializer at the PHY is either fed garbage or nothing. Panel sees idle symbols forever, never transitions to video stream. Backlight stays on because HPD + power sequencing are independent.

The fact that HDPTX registers at 0xFED70000 read back as mostly zeros via /dev/mem is a giant red flag. Either (i) the region is syscon/regmap-indirected (kernel writes go through a different aperture), (ii) it's clock-gated when you read (PHY APB off in idle), or (iii) there's a secure-world filter and BL31 owns it. Any of those three means u-boot's direct-register approach silently no-ops.

(b) CRU mux + dclk_vp2 parent is wrong

Probability: high, and partially overlaps with (a). Even if the HDPTX PHY is up, if dclk_vp2 is running from a CRU PLL at 147.84 MHz while VP2 hands its pixels off expecting the PHY's link_clk/N domain, you get an async FIFO underrun at the DSP_IF boundary. Symptoms: VP2 sees valid timing, shovels pixels, AXI reads RAM, but downstream the eDP MAC never sees a valid SDP stream. No pixels.

This is a classic u-boot CCF gap: the driver calls clk_set_rate(dclk_vp2, X) and trusts the framework, but the mux-parent reassignment that the kernel does via clk_set_parent() in rockchip_phy_ops.init() isn't modeled.

(c) VO1_GRF routing / DP1 lane muxing is stale

Probability: medium. VO1_GRF at 0xFD5AC000 has ~16 bits that determine whether HDPTX1 drives DP0 or DP1, which VP maps to which DSP_IF, and the lane swap. If u-boot's BL31 handoff left these in a maskrom-default state (or ATF wrote a different routing for HDMI testing), the eDP symbols are being emitted from the PHY onto the wrong lane pair. Training passes because AUX is its own pair. Main link goes to unconnected pads. Panel never locks video.

2. Concrete experiments

Experiment 1: Capture the PHY RMW traffic we're missing (the big one)

Our vop2trace.ko only hooks regmap_write. Upgrade it now:

  • Add kprobes on regmap_update_bits_base, regmap_bulk_write, regmap_multi_reg_write, regmap_noinc_write, and rockchip_grf_write.
  • Also hook clk_set_rate, clk_set_parent, clk_prepare_enable, clk_core_set_rate_nolock — we need the entire clock tree mutation timeline, not just register touches.
  • Hook arm_smccc_smc to catch any SMC calls the PHY driver or CRU driver makes to BL31. Log: caller PC (_RET_IP_), x0..x7.
  • Output to a ring buffer, read via debugfs.

Measure: Boot kernel with display disabled at DTS level, then load analogix_dp + phy-rockchip-samsung-hdptx at runtime with our trace armed. We capture every write that brings the hardware from cold-dead to driving-SDDM.

Confirms: If we see arm_smccc_smc(FUNCID=RK_SIP_ACCESS_REG, …) writing to 0xFED70000 range — bingo, PHY is secure-world. If we see clk_set_parent(dclk_vp2, hdptx1_link_clk) — that's our clock bug (b). If we see massive 200+ RMW sequences on HDPTX during PHY init — we're under-initializing the PHY and regmap_write only logged a fraction of real traffic.

Experiment 2: dclk_vp2 parent audit (cheap, do it tomorrow morning first)

In Linux: cat /sys/kernel/debug/clk/dclk_vp2/clk_parent and walk the tree upward. /sys/kernel/debug/clk/clk_summary | grep -E “dclk_vp2|hdptx|link”. Compare to u-boot's clock driver default parent.

Confirms: If kernel shows hdptx1_phy_pll_link_clk as an ancestor and u-boot has gpll or v0pll, that's hypothesis (b) confirmed without any further work.

Experiment 3: Reverse-bisect — break the kernel on purpose

Patch the kernel (live kprobe-based write-suppression) to skip specific register writes during display init, one subsystem at a time:

  • Suppress all writes to VO1_GRF → does SDDM still come up?
  • Suppress HDPTX writes above 0xFED70000+0x100 → still up?
  • Suppress clk_set_parent for dclk_vp2 → still up?

Measure: Each suppressed class that breaks the kernel tells us “u-boot must do this too.” Each that doesn't break tells us “safe to ignore.” Converts “what must we port?” from guesswork to data. Do this with HPD already up so we're only testing video-path writes, not power sequencing.

Experiment 4: DPCD as a pixel-arrival probe

From u-boot, after you think you've committed, poll DPCD register 0x200 (SINK_COUNT) and 0x205 (LANE_ALIGN_STATUS_UPDATED) via AUX. Then — more telling — read DPCD 0x201 (DEVICE_SERVICE_IRQ_VECTOR) and check for CP_IRQ or AUTOMATED_TEST_REQUEST. Also poll 0x206/0x207 for symbol-lock per lane *during* video transmission, not just post-training.

Confirms: If lanes show symbol-lock lost after training completes and MSA should be flowing, the PHY is emitting invalid symbols — hypothesis (a) or ©. If lanes stay locked, the pixel pipeline upstream of the PHY is the problem, not the PHY itself. This is a binary search across the chain without a logic analyzer.

Experiment 5: Framebuffer content probe

From the kernel, once up, devmem2 0xef700000 w — read what u-boot left in the framebuffer. Is it actually the vidconsole glyph pattern, or is it zeros / garbage? If glyphs are there, the AXI reader path and VP2 are not the problem, VP2 is consuming the buffer fine. If the buffer is blank, VP2 isn't even reaching AXI — different failure mode, probably PD_VOP power domain.

3. Creative angles

  • Breadcrumb LED on GPIO: the GenBook has a caps-lock LED somewhere accessible via the pinctrl. Wire it to toggle at key points in the u-boot probe (phy_init_done, link_train_complete, commit_done). Free low-bandwidth debug channel. Zero instrumentation cost. Side-steps the “is u-boot even getting there” question.
  • SMC sniff from kernel side: ftrace the arm_smccc_* calls during phy-rockchip-samsung-hdptx probe. If BL31 owns PHY registers, a u-boot PHY driver that bangs MMIO directly is writing into /dev/null and the kernel's init via SMC is what actually programs it.
  • Inject kernel failures: Use echo N > /sys/module/phy_rockchip_samsung_hdptx/parameters/… if any module params exist, or live-patch the kernel's HDPTX driver to skip its post-link-training PHY PLL re-lock sequence. If skipping it breaks the kernel's display, u-boot is missing that exact sequence.
  • clk_summary diff: Dump it once when kernel has display up, dump it in u-boot (we can print CRU register state from u-boot), diff. Every row that differs is a candidate gap. More productive than comparing MMIO registers one-by-one.
  • Panel self-test via DPCD 0x010a (TEST_PATTERN): if the panel supports it, ask the panel to display a solid color via DPCD. If that works, panel + link are fine and the entire fault is upstream of the eDP MAC. Very surgical.
  • Bootstage markers: u-boot's CONFIG_BOOTSTAGE can emit timestamped markers over UART. Drop one before probe(), one after each major init step. Latency skew between kernel-first-working and u-boot-seems-done tells us whether we're racing a slow PLL lock.

4. Minimum-viable simulation

Yes, and cheap: user-space mock-MMIO harness. Take the u-boot driver files verbatim. Provide shims for readl/writel/regmap_*/clk_* that log to stdout and maintain an in-memory shadow of register state. Drive the probe from a main(). Cost: ~4 hours of shim plumbing. Payoff: you can run the driver under valgrind, gdb, rr, bisect logic bugs at C-level speed, and diff the shadow against our Linux-side capture. This catches every “I forgot to set bit 3 in OVL_CTRL” class of bug before you ever flash.

Not worth it: full QEMU with fake PHY. The RK3588 QEMU model doesn't exist in any useful form for VOP2, and writing one is a side-quest bigger than the original project.

Recommended simulation scope: just MMIO + clk tree + regmap. Skip PHY modeling. The harness only needs to validate *“did the driver attempt the right register sequence”*, not *“did the pixels come out”*. That's the logical half of the bug, and the cheap half to catch.

5. What the kernel does that we might not

  • SMC calls to BL31 for secure register access on HDPTX / PMU-coupled domains.
  • Clock parent reassignment via clk_set_parent() — CCF magic the kernel does that u-boot's clock driver may not model.
  • Regulator enable ordering with msleep delays between rails — panel power-on sequencing from the DT regulator-boot-on + startup-delay-us properties. u-boot typically just enables GPIOs; kernel honors the delays.
  • Pinctrl states — kernel switches eDP pins into a non-default function state. If u-boot left pins in reset-default (which may be GPIO or HDMI mux), signals never leave the chip.
  • PD_VOP / PD_VO1 power domain ordering: u-boot may enable PD_VOP but not PD_VO1, leaving the DSP_IF routing block unpowered. Check /sys/kernel/debug/pm_genpd/pm_genpd_summary on the running kernel.
  • AUX retries + sink-wake: kernel retries DPCD 0x600 (SET_POWER) with full D0 transition + 20ms delay. u-boot often shortcuts this.
  • Post-training PHY re-lock — the Samsung HDPTX combo PHY has a TX PLL that needs a second lock cycle *after* link training at final rate. Kernel does it; u-boot driver ports tend to skip it.

6. Upstream-safe fixes

Two separate hats:

  • Debug scaffolding (vop2trace++, kprobes, LED breadcrumbs, userspace mock harness): ours, stays local, goes in a Documentation/rockchip-vop2-debug.md at best.
  • u-boot patches: must look like real drivers. If we find a missing SMC call, the upstream fix is a clean rockchip,secure-regs DT binding + SMC helper, not a hardcoded smc #0x82000012 with no binding. If we find a missing clock parent change, the fix goes in the Rockchip CRU driver's clock-mux tables, not as a hack in the VOP2 driver. Keep the architectural layer discipline; maintainers will reject “kitchen sink in vop2.c.”

Mock harness is particularly valuable as upstream artifact: package it as a unit test under test/dm/vop2_mmio.c. First u-boot VOP2 unit test. Maintainer will love it.

7. Order of attack — tomorrow's session

  1. 15 min: clk_summary diff (kernel live vs u-boot state). Cheapest info/effort ratio on the table. This alone may crack hypothesis (b).
  2. 90 min: Extend vop2trace.ko with regmap RMW + clk_set_parent + arm_smccc_smc hooks. Capture clean kernel-brings-display-up trace. This is your ground truth.
  3. 30 min: Reverse-bisect — suppress writes to HDPTX region via kprobe-return-0 on regmap_write calls with address in 0xFED70000..0xFED80000. If display survives, PHY is owned by BL31 and we've been writing to void. If display breaks, we need every bit.
  4. 45 min: DPCD symbol-lock probe from u-boot. Binary-search which half of the chain is broken.
  5. Afternoon: Based on 1-4, one of three paths — SMC-wrap the PHY writes in u-boot, fix the clock parent, or fix VO1_GRF routing. Expect 5-8 flash cycles.
  6. Evening if time: Start the userspace mock harness regardless. You'll want it for the next bug, and the next, and the next.

I'd start with clk_summary. Not because it's the most likely answer, but because fifteen minutes of work either eliminates or confirms an entire hypothesis branch, and you go into the hard work with less unknown. The HDPTX-secure-world question is bigger but requires the extended kprobe module first, which is the afternoon's work.

If clk_summary shows HDPTX is in the tree and dclk_vp2 parent is right, drop straight into hypothesis (a) via the extended trace module. The PHY-via-SMC question is, I think, your actual bug. But prove it, don't assume it.


Back to Project Bin

bin_debug_strategy.txt · Last modified: (external edit)