Automated downloader for kids' TV shows, running in the doppler LXD container on hertz. Downloads are stored in /moviedata/video/ and automatically picked up by Gerbera UPnP/DLNA server.
/opt/mediagrab/mediagrab.py (in doppler)/opt/mediagrab/shows.json/opt/mediagrab/archives/*.txt (yt-dlp download tracking)/opt/mediagrab/mediagrab.log/moviedata/video/<show_dir>/mediagrab.py weekly — checks all weekly shows for new episodesmediagrab.py archive — downloads N backlog episodes per archive show# List all shows and status python3 /opt/mediagrab/mediagrab.py list # Run weekly check manually python3 /opt/mediagrab/mediagrab.py weekly # Run archive backfill manually python3 /opt/mediagrab/mediagrab.py archive # Test a URL (list episodes without downloading) python3 /opt/mediagrab/mediagrab.py test "<url>" # Seed archive from existing files (run after manual downloads) python3 /opt/mediagrab/mediagrab.py seed
All commands can also be run via Claude Code MCP:
hertz-tools → mediagrab → command: "list"
Supported sources (via yt-dlp):
https://www.ardmediathek.de/sendung/<show-name>/<base64-id> — best for ARD/WDR/KiKA showshttps://www.kika.de/<show-slug>/<page-id> — only works if the page has a videoSubchannel in the APIhttps://archive.org/details/<collection> — good for bulk backlogTo find a show URL:
python3 /opt/mediagrab/mediagrab.py test "https://www.ardmediathek.de/sendung/..."
Check: are episodes listed? Are durations correct? Are there unwanted variants (Gebärdensprache, Audiodeskription)?
{ "name": "Show Name", "url": "https://...", "dir": "show_directory_name", "mode": "weekly", "min_duration": 1200, "max_duration": 2100, "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"] }
Fields:
| Field | Required | Description |
|---|---|---|
name | yes | Display name |
url | yes | yt-dlp compatible URL |
dir | yes | Subdirectory under /moviedata/video/ |
mode | yes | weekly (all new) or archive (N per day) |
per_run | archive only | Episodes per daily run (default: 3) |
max_total | archive only | Stop after this many total files |
min_duration | no | Minimum duration in seconds |
max_duration | no | Maximum duration in seconds |
title_exclude | no | List of regex patterns to skip by title |
python3 /opt/mediagrab/mediagrab.py add '{"name": "Show Name", "url": "https://...", "dir": "show_name", "mode": "weekly"}'
Run mediagrab.py weekly or mediagrab.py archive to verify it downloads correctly. Check:
/moviedata/video/<dir>/gerbera:gerbera 664Weekly show (ARD, filtered):
python3 /opt/mediagrab/mediagrab.py add '{
"name": "Die Sendung mit der Maus",
"url": "https://www.ardmediathek.de/sendung/die-sendung-mit-der-maus/Y3JpZDovL2Rhc2Vyc3RlLmRlL3NlbmR1bmcgbWl0IGRlciBtYXVz",
"dir": "sendung_mit_der_maus",
"mode": "weekly",
"min_duration": 1500,
"max_duration": 2100,
"title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"]
}'
Archive backfill (with cap):
python3 /opt/mediagrab/mediagrab.py add '{
"name": "Woozle Goozle",
"url": "https://archive.org/details/woozle-goozle",
"dir": "woozle_goozle",
"mode": "archive",
"per_run": 3,
"max_total": 50,
"min_duration": 1200,
"max_duration": 1500
}'
pip install –break-system-packages -U yt-dlp in dopplerchmod 777 /sparfuxdata/media/video/ on hertz hostsystemctl restart gerbera in doppler), autoscan interval is 20 minmediagrab.py seed to sync archive files with existing filescat /opt/mediagrab/mediagrab.log--download-archive for dedup — reliable across re-runsTitle [id].mp4 — Gerbera indexes by title