Table of Contents
ohm_gl_fix — Phase 6 Step 0.5: Kernel UAPI surface audit, 2026-05-01
Per Phase 4 §3 Step 0.5 (the post-Phase-5-review amendment): document the actual V4L2 stateless H.264 control structure layout that the hantro driver consumes on RK3566, so that Step 1's src/picture.c / src/h264.c work has a per-byte template instead of a naive UAPI-header interpretation.
Method
strace -f -e trace=ioctl -tt -X verbose -s 4096on the workinggst-launch-1.0 … ! v4l2slh264dec ! fakesinkpipeline (the S1 reference path that is empirically correct end-to-end on hantro). Captures liveVIDIOC_S_EXT_CTRLScalls including their argument structures. Output:/home/mfritsche/ohm_gl_fix_artefacts/step0_5/gst_v4l2slh264_xverbose.strace(738 KB).- Cross-check captured control IDs and payload sizes against the kernel UAPI header
include/uapi/linux/v4l2-controls.h. - The working GStreamer path is by construction matching the kernel driver's expectations; the UAPI header IS the canonical layout for the working path.
Capture summary
| Probe | Count |
|---|---|
VIDIOC_S_EXT_CTRLS calls during ~3 s | 145 |
VIDIOC_G_EXT_CTRLS (driver capability probe) | 1 |
VIDIOC_TRY_EXT_CTRLS | 0 |
| Strace file size | 738 KB |
A representative VIDIOC_S_EXT_CTRLS line, decoded from the strace:
ioctl(7, 0xc0205648 /* VIDIOC_S_EXT_CTRLS */, {
ctrl_class = 0,
count = 1,
controls = [{
id = 0xa40902, /* V4L2_CID_STATELESS_H264_SPS */
size = 1048,
string = "<1048-byte v4l2_ctrl_h264_sps payload>"
}]
})
Control ID decode (canonical from ''v4l2-controls.h'')
V4L2_CTRL_CLASS_CODEC_STATELESS = 0x00a40000. V4L2_CID_CODEC_STATELESS_BASE = V4L2_CTRL_CLASS_CODEC_STATELESS | 0x900 = 0x00a40900.
The H.264 stateless controls follow as BASE + offset:
| ID (hex) | Offset | Symbol |
|---|---|---|
| 0xa40900 | +0 | V4L2_CID_STATELESS_H264_DECODE_MODE |
| 0xa40901 | +1 | V4L2_CID_STATELESS_H264_START_CODE |
| 0xa40902 | +2 | V4L2_CID_STATELESS_H264_SPS |
| 0xa40903 | +3 | V4L2_CID_STATELESS_H264_PPS |
| 0xa40904 | +4 | V4L2_CID_STATELESS_H264_SCALING_MATRIX |
| 0xa40905 | +5 | V4L2_CID_STATELESS_H264_PRED_WEIGHTS |
| 0xa40906 | +6 | V4L2_CID_STATELESS_H264_SLICE_PARAMS |
| 0xa40907 | +7 | V4L2_CID_STATELESS_H264_DECODE_PARAMS |
Strace confirms 0xa40902 is the first control submitted per stream init — that is the SPS, with payload size 1048 ≈ sizeof(struct v4l2_ctrl_h264_sps) = 1040 B plus alignment padding (the offset_for_ref_frame[255] array is 1020 B and dominates the size).
Struct layouts the kernel driver expects
These are byte-by-byte the layouts Step 1's libva-v4l2-request port must produce when calling VIDIOC_S_EXT_CTRLS. Sizes match the v4l2_ext_control.size field strace captured.
v4l2_ctrl_h264_sps (1040 B + alignment → 1048 B observed)
struct v4l2_ctrl_h264_sps { __u8 profile_idc; __u8 constraint_set_flags; __u8 level_idc; __u8 seq_parameter_set_id; __u8 chroma_format_idc; __u8 bit_depth_luma_minus8; __u8 bit_depth_chroma_minus8; __u8 log2_max_frame_num_minus4; __u8 pic_order_cnt_type; __u8 log2_max_pic_order_cnt_lsb_minus4; __u8 max_num_ref_frames; __u8 num_ref_frames_in_pic_order_cnt_cycle; __s32 offset_for_ref_frame[255]; // 1020 B __s32 offset_for_non_ref_pic; __s32 offset_for_top_to_bottom_field; __u16 pic_width_in_mbs_minus1; __u16 pic_height_in_map_units_minus1; __u32 flags; };
v4l2_ctrl_h264_pps (16 B)
struct v4l2_ctrl_h264_pps { __u8 pic_parameter_set_id; __u8 seq_parameter_set_id; __u8 num_slice_groups_minus1; __u8 num_ref_idx_l0_default_active_minus1; __u8 num_ref_idx_l1_default_active_minus1; __u8 weighted_bipred_idc; __s8 pic_init_qp_minus26; __s8 pic_init_qs_minus26; __s8 chroma_qp_index_offset; __s8 second_chroma_qp_index_offset; __u16 flags; };
v4l2_ctrl_h264_scaling_matrix (2688 B)
struct v4l2_ctrl_h264_scaling_matrix { __u8 scaling_list_4x4[6][16]; // 96 B __u8 scaling_list_8x8[6][64]; // 2592 B (note: 6 lists, not the 2 in baseline) };
v4l2_ctrl_h264_pred_weights (≈1028 B)
struct v4l2_ctrl_h264_pred_weights { __u16 luma_log2_weight_denom; __u16 chroma_log2_weight_denom; struct v4l2_h264_weight_factors weight_factors[2]; };
v4l2_h264_weight_factors: luma_weight[32], luma_offset[32], chroma_weight[32][2], chroma_offset[32][2] as signed 16-bit. 512 B per factor × 2 + 4 B header = 1028 B.
v4l2_ctrl_h264_slice_params (≈148 B per slice; struct max 2064 B)
struct v4l2_ctrl_h264_slice_params { __u32 header_bit_size; __u32 first_mb_in_slice; __u8 slice_type; __u8 colour_plane_id; __u8 redundant_pic_cnt; __u8 cabac_init_idc; __s8 slice_qp_delta; __s8 slice_qs_delta; __u8 disable_deblocking_filter_idc; __s8 slice_alpha_c0_offset_div2; __s8 slice_beta_offset_div2; __u8 num_ref_idx_l0_active_minus1; __u8 num_ref_idx_l1_active_minus1; __u8 reserved; struct v4l2_h264_reference ref_pic_list0[V4L2_H264_REF_LIST_LEN]; // 32 entries × 2 B struct v4l2_h264_reference ref_pic_list1[V4L2_H264_REF_LIST_LEN]; __u32 flags; };
V4L2_H264_REF_LIST_LEN = 32. Each v4l2_h264_reference is 2 bytes (u8 fields; u8 index).
v4l2_ctrl_h264_decode_params (≈592 B)
struct v4l2_ctrl_h264_decode_params { struct v4l2_h264_dpb_entry dpb[V4L2_H264_NUM_DPB_ENTRIES]; // 16 entries × 32 B __u16 nal_ref_idc; __u16 frame_num; __s32 top_field_order_cnt; __s32 bottom_field_order_cnt; __u16 idr_pic_id; __u16 pic_order_cnt_lsb; __s32 delta_pic_order_cnt_bottom; __s32 delta_pic_order_cnt0; __s32 delta_pic_order_cnt1; __u32 dec_ref_pic_marking_bit_size; __u32 pic_order_cnt_bit_size; __u32 slice_group_change_cycle; __u32 reserved; __u32 flags; };
V4L2_H264_NUM_DPB_ENTRIES = 16. Each v4l2_h264_dpb_entry is 32 B (u64 reference_ts; u32 pic_num; u16 frame_num; u8 fields; u8 reserved; s32 top_field_order_cnt; s32 bottom_field_order_cnt; u32 flags). DPB total: 512 B. Plus 80 B of frame-level state. ≈592 B.
Sequencing observations from the strace
The 145 VIDIOC_S_EXT_CTRLS calls per ~3 s of decode break down roughly per-frame:
- At stream init (once): 1× SPS + 1× PPS + 1× SCALING_MATRIX (3 one-time controls).
- Per decoded frame: 1× DECODE_PARAMS, 1× SLICE_PARAMS (or one per slice if the frame has multiple slices), optionally 1× PRED_WEIGHTS for B-frames.
- For ~24 fps × 3 s = 72 frames, ≈ 144 per-frame submissions matches the observed 145.
The H.264 SPS is submitted once at session start, before VIDIOC_STREAMON. The decoder caches it across the stream until a new SPS arrives (e.g. mid-stream resolution change). PPS and scaling_matrix similarly. DECODE_PARAMS and SLICE_PARAMS are submitted per frame, attached to the same request fd as the bitstream-input MPLANE buffer that carries the slice bytes.
Loose ends for Step 1 to verify empirically
- Per-frame request-fd lifecycle. Each frame's request fd is created via
ioctl(media_fd, MEDIA_IOC_REQUEST_ALLOC), populated with controls + bitstream buffer, thenioctl(req_fd, MEDIA_REQUEST_IOC_QUEUE)submits, andMEDIA_REQUEST_IOC_REINITresets for reuse. Step 1 must mirror this lifecycle exactly; skippingREINITbetween requests causes the kernel to reject subsequentIOC_QUEUEwithEBUSY. - Slice multi-submission. Multi-slice frames submit each slice's
SLICE_PARAMS+ bitstream payload as a SEPARATE request, and the kernel collates them via theframe_numinDECODE_PARAMS. The strace should show this on multi-slice clips; bbb is mostly single-slice so doesn't exercise it. V4L2_CID_STATELESS_H264_DECODE_MODEand_START_CODE. Set once at session init:DECODE_MODE = SLICE_BASED(since hantro is a slice-based decoder, not frame-based),START_CODE = ANNEX_B_3B_4Bif Annex B prefixed, orSTART_CODE = NONEif AVCC. libva input is typically AVCC; the H.264 parse step strips start codes before submitting bitstream bytes.
Implication for Phase 4 R1 mitigation
The R1 mitigation (revised post-Phase-5 review) flagged “silent black frames > 3 days” as the slip trigger. The most likely cause of that failure mode, per Step 0.5, is:
- Wrong
flagsfield inv4l2_ctrl_h264_sps(subset of the bitfield flags that govern frame-mbs-only, mb-adaptive-frame-field, etc.). - Wrong
dpb[]entry order orreference_tsmismatch inDECODE_PARAMS. - Wrong
cabac_init_idcordisable_deblocking_filter_idcinSLICE_PARAMS. - Sending
PRED_WEIGHTSwhen not needed (only B-slices in weighted-prediction mode), or NOT sending when needed.
Step 1 diagnostic technique: when a Step 1 build produces black frames on bbb, capture another strace under Step 1's libva, do a field-level diff against the GStreamer baseline strace from this audit. The first divergent field is the bug.
Artefacts
/home/mfritsche/ohm_gl_fix_artefacts/step0_5/gst_v4l2slh264_xverbose.strace— the source strace.- This document — the layout reference.
Step 1 begins from this point with full UAPI confidence.
