mediagrab
Table of Contents
mediagrab — Kids Show Downloader
Automated downloader for kids' TV shows, running in the doppler LXD container on hertz. Downloads are stored in /moviedata/video/ and automatically picked up by Gerbera UPnP/DLNA server.
Location
- Script:
/opt/mediagrab/mediagrab.py(in doppler) - Config:
/opt/mediagrab/shows.json - Archives:
/opt/mediagrab/archives/*.txt(yt-dlp download tracking) - Log:
/opt/mediagrab/mediagrab.log - Videos:
/moviedata/video/<show_dir>/
Cron Schedule
- Saturday 10:00 —
mediagrab.py weekly— checks all weekly shows for new episodes - Daily 11:00 —
mediagrab.py archive— downloads N backlog episodes per archive show
Commands
# List all shows and status python3 /opt/mediagrab/mediagrab.py list # Run weekly check manually python3 /opt/mediagrab/mediagrab.py weekly # Run archive backfill manually python3 /opt/mediagrab/mediagrab.py archive # Test a URL (list episodes without downloading) python3 /opt/mediagrab/mediagrab.py test "<url>" # Seed archive from existing files (run after manual downloads) python3 /opt/mediagrab/mediagrab.py seed
All commands can also be run via Claude Code MCP:
hertz-tools → mediagrab → command: "list"
Adding a New Show
Step 1: Find the show URL
Supported sources (via yt-dlp):
- ARD Mediathek:
https://www.ardmediathek.de/sendung/<show-name>/<base64-id>— best for ARD/WDR/KiKA shows - KiKA:
https://www.kika.de/<show-slug>/<page-id>— only works if the page has avideoSubchannelin the API - archive.org:
https://archive.org/details/<collection>— good for bulk backlog - YouTube: standard playlist/channel URLs
To find a show URL:
- Go to ardmediathek.de, search for the show, click “Sendung” tab, copy URL
- Or search on kika.de
Step 2: Test the URL
python3 /opt/mediagrab/mediagrab.py test "https://www.ardmediathek.de/sendung/..."
Check: are episodes listed? Are durations correct? Are there unwanted variants (Gebärdensprache, Audiodeskription)?
Step 3: Build the JSON config
{ "name": "Show Name", "url": "https://...", "dir": "show_directory_name", "mode": "weekly", "min_duration": 1200, "max_duration": 2100, "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"] }
Fields:
| Field | Required | Description |
|---|---|---|
name | yes | Display name |
url | yes | yt-dlp compatible URL |
dir | yes | Subdirectory under /moviedata/video/ |
mode | yes | weekly (all new) or archive (N per day) |
per_run | archive only | Episodes per daily run (default: 3) |
max_total | archive only | Stop after this many total files |
min_duration | no | Minimum duration in seconds |
max_duration | no | Maximum duration in seconds |
title_exclude | no | List of regex patterns to skip by title |
Step 4: Add it
python3 /opt/mediagrab/mediagrab.py add '{"name": "Show Name", "url": "https://...", "dir": "show_name", "mode": "weekly"}'
Step 5: Test download
Run mediagrab.py weekly or mediagrab.py archive to verify it downloads correctly. Check:
- Files appear in
/moviedata/video/<dir>/ - Permissions are
gerbera:gerbera664 - Gerbera picks them up (may take up to 20 minutes for autoscan, or restart gerbera)
Examples
Weekly show (ARD, filtered):
python3 /opt/mediagrab/mediagrab.py add '{
"name": "Die Sendung mit der Maus",
"url": "https://www.ardmediathek.de/sendung/die-sendung-mit-der-maus/Y3JpZDovL2Rhc2Vyc3RlLmRlL3NlbmR1bmcgbWl0IGRlciBtYXVz",
"dir": "sendung_mit_der_maus",
"mode": "weekly",
"min_duration": 1500,
"max_duration": 2100,
"title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"]
}'
Archive backfill (with cap):
python3 /opt/mediagrab/mediagrab.py add '{
"name": "Woozle Goozle",
"url": "https://archive.org/details/woozle-goozle",
"dir": "woozle_goozle",
"mode": "archive",
"per_run": 3,
"max_total": 50,
"min_duration": 1200,
"max_duration": 1500
}'
Troubleshooting
- yt-dlp errors: Update with
pip install –break-system-packages -U yt-dlpin doppler - Permission denied creating dirs: Run
chmod 777 /sparfuxdata/media/video/on hertz host - Gerbera not picking up files: Restart gerbera (
systemctl restart gerberain doppler), autoscan interval is 20 min - Duplicate downloads: Run
mediagrab.py seedto sync archive files with existing files - Check log:
cat /opt/mediagrab/mediagrab.log
Architecture
- Uses yt-dlp's
--download-archivefor dedup — reliable across re-runs - 720p cap to save disk space
- Files named
Title [id].mp4— Gerbera indexes by title - No database, no daemon — just cron + yt-dlp + a Python script
mediagrab.txt · Last modified: by 127.0.0.1
