User Tools

Site Tools


ohm_gl_fix:phase6_step0_5_uapi_audit_2026-05-01

ohm_gl_fix — Phase 6 Step 0.5: Kernel UAPI surface audit, 2026-05-01

Per Phase 4 §3 Step 0.5 (the post-Phase-5-review amendment): document the actual V4L2 stateless H.264 control structure layout that the hantro driver consumes on RK3566, so that Step 1's src/picture.c / src/h264.c work has a per-byte template instead of a naive UAPI-header interpretation.

Method

  1. strace -f -e trace=ioctl -tt -X verbose -s 4096 on the working gst-launch-1.0 … ! v4l2slh264dec ! fakesink pipeline (the S1 reference path that is empirically correct end-to-end on hantro). Captures live VIDIOC_S_EXT_CTRLS calls including their argument structures. Output: /home/mfritsche/ohm_gl_fix_artefacts/step0_5/gst_v4l2slh264_xverbose.strace (738 KB).
  2. Cross-check captured control IDs and payload sizes against the kernel UAPI header include/uapi/linux/v4l2-controls.h.
  3. The working GStreamer path is by construction matching the kernel driver's expectations; the UAPI header IS the canonical layout for the working path.

Capture summary

Probe Count
VIDIOC_S_EXT_CTRLS calls during ~3 s 145
VIDIOC_G_EXT_CTRLS (driver capability probe) 1
VIDIOC_TRY_EXT_CTRLS 0
Strace file size 738 KB

A representative VIDIOC_S_EXT_CTRLS line, decoded from the strace:

ioctl(7, 0xc0205648 /* VIDIOC_S_EXT_CTRLS */, {
    ctrl_class = 0,
    count = 1,
    controls = [{
        id = 0xa40902,    /* V4L2_CID_STATELESS_H264_SPS */
        size = 1048,
        string = "<1048-byte v4l2_ctrl_h264_sps payload>"
    }]
})

Control ID decode (canonical from ''v4l2-controls.h'')

V4L2_CTRL_CLASS_CODEC_STATELESS = 0x00a40000. V4L2_CID_CODEC_STATELESS_BASE = V4L2_CTRL_CLASS_CODEC_STATELESS | 0x900 = 0x00a40900.

The H.264 stateless controls follow as BASE + offset:

ID (hex) Offset Symbol
0xa40900 +0 V4L2_CID_STATELESS_H264_DECODE_MODE
0xa40901 +1 V4L2_CID_STATELESS_H264_START_CODE
0xa40902 +2 V4L2_CID_STATELESS_H264_SPS
0xa40903 +3 V4L2_CID_STATELESS_H264_PPS
0xa40904 +4 V4L2_CID_STATELESS_H264_SCALING_MATRIX
0xa40905 +5 V4L2_CID_STATELESS_H264_PRED_WEIGHTS
0xa40906 +6 V4L2_CID_STATELESS_H264_SLICE_PARAMS
0xa40907 +7 V4L2_CID_STATELESS_H264_DECODE_PARAMS

Strace confirms 0xa40902 is the first control submitted per stream init — that is the SPS, with payload size 1048 ≈ sizeof(struct v4l2_ctrl_h264_sps) = 1040 B plus alignment padding (the offset_for_ref_frame[255] array is 1020 B and dominates the size).

Struct layouts the kernel driver expects

These are byte-by-byte the layouts Step 1's libva-v4l2-request port must produce when calling VIDIOC_S_EXT_CTRLS. Sizes match the v4l2_ext_control.size field strace captured.

v4l2_ctrl_h264_sps (1040 B + alignment → 1048 B observed)

struct v4l2_ctrl_h264_sps {
    __u8  profile_idc;
    __u8  constraint_set_flags;
    __u8  level_idc;
    __u8  seq_parameter_set_id;
    __u8  chroma_format_idc;
    __u8  bit_depth_luma_minus8;
    __u8  bit_depth_chroma_minus8;
    __u8  log2_max_frame_num_minus4;
    __u8  pic_order_cnt_type;
    __u8  log2_max_pic_order_cnt_lsb_minus4;
    __u8  max_num_ref_frames;
    __u8  num_ref_frames_in_pic_order_cnt_cycle;
    __s32 offset_for_ref_frame[255];      // 1020 B
    __s32 offset_for_non_ref_pic;
    __s32 offset_for_top_to_bottom_field;
    __u16 pic_width_in_mbs_minus1;
    __u16 pic_height_in_map_units_minus1;
    __u32 flags;
};

v4l2_ctrl_h264_pps (16 B)

struct v4l2_ctrl_h264_pps {
    __u8  pic_parameter_set_id;
    __u8  seq_parameter_set_id;
    __u8  num_slice_groups_minus1;
    __u8  num_ref_idx_l0_default_active_minus1;
    __u8  num_ref_idx_l1_default_active_minus1;
    __u8  weighted_bipred_idc;
    __s8  pic_init_qp_minus26;
    __s8  pic_init_qs_minus26;
    __s8  chroma_qp_index_offset;
    __s8  second_chroma_qp_index_offset;
    __u16 flags;
};

v4l2_ctrl_h264_scaling_matrix (2688 B)

struct v4l2_ctrl_h264_scaling_matrix {
    __u8 scaling_list_4x4[6][16];   // 96 B
    __u8 scaling_list_8x8[6][64];   // 2592 B (note: 6 lists, not the 2 in baseline)
};

v4l2_ctrl_h264_pred_weights (≈1028 B)

struct v4l2_ctrl_h264_pred_weights {
    __u16 luma_log2_weight_denom;
    __u16 chroma_log2_weight_denom;
    struct v4l2_h264_weight_factors weight_factors[2];
};

v4l2_h264_weight_factors: luma_weight[32], luma_offset[32], chroma_weight[32][2], chroma_offset[32][2] as signed 16-bit. 512 B per factor × 2 + 4 B header = 1028 B.

v4l2_ctrl_h264_slice_params (≈148 B per slice; struct max 2064 B)

struct v4l2_ctrl_h264_slice_params {
    __u32 header_bit_size;
    __u32 first_mb_in_slice;
    __u8  slice_type;
    __u8  colour_plane_id;
    __u8  redundant_pic_cnt;
    __u8  cabac_init_idc;
    __s8  slice_qp_delta;
    __s8  slice_qs_delta;
    __u8  disable_deblocking_filter_idc;
    __s8  slice_alpha_c0_offset_div2;
    __s8  slice_beta_offset_div2;
    __u8  num_ref_idx_l0_active_minus1;
    __u8  num_ref_idx_l1_active_minus1;
    __u8  reserved;
    struct v4l2_h264_reference ref_pic_list0[V4L2_H264_REF_LIST_LEN];  // 32 entries × 2 B
    struct v4l2_h264_reference ref_pic_list1[V4L2_H264_REF_LIST_LEN];
    __u32 flags;
};

V4L2_H264_REF_LIST_LEN = 32. Each v4l2_h264_reference is 2 bytes (u8 fields; u8 index).

v4l2_ctrl_h264_decode_params (≈592 B)

struct v4l2_ctrl_h264_decode_params {
    struct v4l2_h264_dpb_entry dpb[V4L2_H264_NUM_DPB_ENTRIES];   // 16 entries × 32 B
    __u16 nal_ref_idc;
    __u16 frame_num;
    __s32 top_field_order_cnt;
    __s32 bottom_field_order_cnt;
    __u16 idr_pic_id;
    __u16 pic_order_cnt_lsb;
    __s32 delta_pic_order_cnt_bottom;
    __s32 delta_pic_order_cnt0;
    __s32 delta_pic_order_cnt1;
    __u32 dec_ref_pic_marking_bit_size;
    __u32 pic_order_cnt_bit_size;
    __u32 slice_group_change_cycle;
    __u32 reserved;
    __u32 flags;
};

V4L2_H264_NUM_DPB_ENTRIES = 16. Each v4l2_h264_dpb_entry is 32 B (u64 reference_ts; u32 pic_num; u16 frame_num; u8 fields; u8 reserved; s32 top_field_order_cnt; s32 bottom_field_order_cnt; u32 flags). DPB total: 512 B. Plus 80 B of frame-level state. ≈592 B.

Sequencing observations from the strace

The 145 VIDIOC_S_EXT_CTRLS calls per ~3 s of decode break down roughly per-frame:

  • At stream init (once): 1× SPS + 1× PPS + 1× SCALING_MATRIX (3 one-time controls).
  • Per decoded frame: 1× DECODE_PARAMS, 1× SLICE_PARAMS (or one per slice if the frame has multiple slices), optionally 1× PRED_WEIGHTS for B-frames.
  • For ~24 fps × 3 s = 72 frames, ≈ 144 per-frame submissions matches the observed 145.

The H.264 SPS is submitted once at session start, before VIDIOC_STREAMON. The decoder caches it across the stream until a new SPS arrives (e.g. mid-stream resolution change). PPS and scaling_matrix similarly. DECODE_PARAMS and SLICE_PARAMS are submitted per frame, attached to the same request fd as the bitstream-input MPLANE buffer that carries the slice bytes.

Loose ends for Step 1 to verify empirically

  1. Per-frame request-fd lifecycle. Each frame's request fd is created via ioctl(media_fd, MEDIA_IOC_REQUEST_ALLOC), populated with controls + bitstream buffer, then ioctl(req_fd, MEDIA_REQUEST_IOC_QUEUE) submits, and MEDIA_REQUEST_IOC_REINIT resets for reuse. Step 1 must mirror this lifecycle exactly; skipping REINIT between requests causes the kernel to reject subsequent IOC_QUEUE with EBUSY.
  2. Slice multi-submission. Multi-slice frames submit each slice's SLICE_PARAMS + bitstream payload as a SEPARATE request, and the kernel collates them via the frame_num in DECODE_PARAMS. The strace should show this on multi-slice clips; bbb is mostly single-slice so doesn't exercise it.
  3. V4L2_CID_STATELESS_H264_DECODE_MODE and _START_CODE. Set once at session init: DECODE_MODE = SLICE_BASED (since hantro is a slice-based decoder, not frame-based), START_CODE = ANNEX_B_3B_4B if Annex B prefixed, or START_CODE = NONE if AVCC. libva input is typically AVCC; the H.264 parse step strips start codes before submitting bitstream bytes.

Implication for Phase 4 R1 mitigation

The R1 mitigation (revised post-Phase-5 review) flagged “silent black frames > 3 days” as the slip trigger. The most likely cause of that failure mode, per Step 0.5, is:

  • Wrong flags field in v4l2_ctrl_h264_sps (subset of the bitfield flags that govern frame-mbs-only, mb-adaptive-frame-field, etc.).
  • Wrong dpb[] entry order or reference_ts mismatch in DECODE_PARAMS.
  • Wrong cabac_init_idc or disable_deblocking_filter_idc in SLICE_PARAMS.
  • Sending PRED_WEIGHTS when not needed (only B-slices in weighted-prediction mode), or NOT sending when needed.

Step 1 diagnostic technique: when a Step 1 build produces black frames on bbb, capture another strace under Step 1's libva, do a field-level diff against the GStreamer baseline strace from this audit. The first divergent field is the bug.

Artefacts

  • /home/mfritsche/ohm_gl_fix_artefacts/step0_5/gst_v4l2slh264_xverbose.strace — the source strace.
  • This document — the layout reference.

Step 1 begins from this point with full UAPI confidence.

ohm_gl_fix/phase6_step0_5_uapi_audit_2026-05-01.txt · Last modified: by markus_fritsche