mediagrab — Kids Show Downloader

Automated downloader for kids' TV shows, running in the doppler LXD container on hertz. Downloads are stored in /moviedata/video/ and automatically picked up by Gerbera UPnP/DLNA server.

Location

Script: /opt/mediagrab/mediagrab.py (in doppler)
Config: /opt/mediagrab/shows.json
Archives: /opt/mediagrab/archives/*.txt (yt-dlp download tracking)
Log: /opt/mediagrab/mediagrab.log
Videos: /moviedata/video/<show_dir>/

Cron Schedule

Saturday 10:00 — mediagrab.py weekly — checks all weekly shows for new episodes
Daily 11:00 — mediagrab.py archive — downloads N backlog episodes per archive show

Commands

# List all shows and status
python3 /opt/mediagrab/mediagrab.py list

# Run weekly check manually
python3 /opt/mediagrab/mediagrab.py weekly

# Run archive backfill manually
python3 /opt/mediagrab/mediagrab.py archive

# Test a URL (list episodes without downloading)
python3 /opt/mediagrab/mediagrab.py test "<url>"

# Seed archive from existing files (run after manual downloads)
python3 /opt/mediagrab/mediagrab.py seed

All commands can also be run via Claude Code MCP:

hertz-tools → mediagrab → command: "list"

Adding a New Show

Step 1: Find the show URL

Supported sources (via yt-dlp):

ARD Mediathek: https://www.ardmediathek.de/sendung/<show-name>/<base64-id> — best for ARD/WDR/KiKA shows
KiKA: https://www.kika.de/<show-slug>/<page-id> — only works if the page has a videoSubchannel in the API
archive.org: https://archive.org/details/<collection> — good for bulk backlog
YouTube: standard playlist/channel URLs

To find a show URL:

Go to ardmediathek.de, search for the show, click “Sendung” tab, copy URL
Or search on kika.de

Step 2: Test the URL

python3 /opt/mediagrab/mediagrab.py test "https://www.ardmediathek.de/sendung/..."

Check: are episodes listed? Are durations correct? Are there unwanted variants (Gebärdensprache, Audiodeskription)?

Step 3: Build the JSON config

{
  "name": "Show Name",
  "url": "https://...",
  "dir": "show_directory_name",
  "mode": "weekly",
  "min_duration": 1200,
  "max_duration": 2100,
  "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"]
}

Fields:

Field	Required	Description
`name`	yes	Display name
`url`	yes	yt-dlp compatible URL
`dir`	yes	Subdirectory under `/moviedata/video/`
`mode`	yes	`weekly` (all new) or `archive` (N per day)
`per_run`	archive only	Episodes per daily run (default: 3)
`max_total`	archive only	Stop after this many total files
`min_duration`	no	Minimum duration in seconds
`max_duration`	no	Maximum duration in seconds
`title_exclude`	no	List of regex patterns to skip by title

Step 4: Add it

python3 /opt/mediagrab/mediagrab.py add '{"name": "Show Name", "url": "https://...", "dir": "show_name", "mode": "weekly"}'

Step 5: Test download

Run mediagrab.py weekly or mediagrab.py archive to verify it downloads correctly. Check:

Files appear in /moviedata/video/<dir>/
Permissions are gerbera:gerbera 664
Gerbera picks them up (may take up to 20 minutes for autoscan, or restart gerbera)

Examples

Weekly show (ARD, filtered):

python3 /opt/mediagrab/mediagrab.py add '{
  "name": "Die Sendung mit der Maus",
  "url": "https://www.ardmediathek.de/sendung/die-sendung-mit-der-maus/Y3JpZDovL2Rhc2Vyc3RlLmRlL3NlbmR1bmcgbWl0IGRlciBtYXVz",
  "dir": "sendung_mit_der_maus",
  "mode": "weekly",
  "min_duration": 1500,
  "max_duration": 2100,
  "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"]
}'

Archive backfill (with cap):

python3 /opt/mediagrab/mediagrab.py add '{
  "name": "Woozle Goozle",
  "url": "https://archive.org/details/woozle-goozle",
  "dir": "woozle_goozle",
  "mode": "archive",
  "per_run": 3,
  "max_total": 50,
  "min_duration": 1200,
  "max_duration": 1500
}'

Troubleshooting

yt-dlp errors: Update with pip install –break-system-packages -U yt-dlp in doppler
Permission denied creating dirs: Run chmod 777 /sparfuxdata/media/video/ on hertz host
Gerbera not picking up files: Restart gerbera (systemctl restart gerbera in doppler), autoscan interval is 20 min
Duplicate downloads: Run mediagrab.py seed to sync archive files with existing files
Check log: cat /opt/mediagrab/mediagrab.log

Architecture

Uses yt-dlp's --download-archive for dedup — reliable across re-runs
720p cap to save disk space
Files named Title [id].mp4 — Gerbera indexes by title
No database, no daemon — just cron + yt-dlp + a Python script

Zettelkasten

Table of Contents