User Tools

Site Tools


mediagrab

mediagrab — Kids Show Downloader

Automated downloader for kids' TV shows, running in the doppler LXD container on hertz. Downloads are stored in /moviedata/video/ and automatically picked up by Gerbera UPnP/DLNA server.

Location

  • Script: /opt/mediagrab/mediagrab.py (in doppler)
  • Config: /opt/mediagrab/shows.json
  • Archives: /opt/mediagrab/archives/*.txt (yt-dlp download tracking)
  • Log: /opt/mediagrab/mediagrab.log
  • Videos: /moviedata/video/<show_dir>/

Cron Schedule

  • Saturday 10:00mediagrab.py weekly — checks all weekly shows for new episodes
  • Daily 11:00mediagrab.py archive — downloads N backlog episodes per archive show

Commands

# List all shows and status
python3 /opt/mediagrab/mediagrab.py list

# Run weekly check manually
python3 /opt/mediagrab/mediagrab.py weekly

# Run archive backfill manually
python3 /opt/mediagrab/mediagrab.py archive

# Test a URL (list episodes without downloading)
python3 /opt/mediagrab/mediagrab.py test "<url>"

# Seed archive from existing files (run after manual downloads)
python3 /opt/mediagrab/mediagrab.py seed

All commands can also be run via Claude Code MCP:

hertz-tools → mediagrab → command: "list"

Adding a New Show

Step 1: Find the show URL

Supported sources (via yt-dlp):

  • ARD Mediathek: https://www.ardmediathek.de/sendung/<show-name>/<base64-id> — best for ARD/WDR/KiKA shows
  • KiKA: https://www.kika.de/<show-slug>/<page-id> — only works if the page has a videoSubchannel in the API
  • archive.org: https://archive.org/details/<collection> — good for bulk backlog
  • YouTube: standard playlist/channel URLs

To find a show URL:

  1. Go to ardmediathek.de, search for the show, click “Sendung” tab, copy URL
  2. Or search on kika.de

Step 2: Test the URL

python3 /opt/mediagrab/mediagrab.py test "https://www.ardmediathek.de/sendung/..."

Check: are episodes listed? Are durations correct? Are there unwanted variants (Gebärdensprache, Audiodeskription)?

Step 3: Build the JSON config

{
  "name": "Show Name",
  "url": "https://...",
  "dir": "show_directory_name",
  "mode": "weekly",
  "min_duration": 1200,
  "max_duration": 2100,
  "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"]
}

Fields:

Field Required Description
name yes Display name
url yes yt-dlp compatible URL
dir yes Subdirectory under /moviedata/video/
mode yes weekly (all new) or archive (N per day)
per_run archive only Episodes per daily run (default: 3)
max_total archive only Stop after this many total files
min_duration no Minimum duration in seconds
max_duration no Maximum duration in seconds
title_exclude no List of regex patterns to skip by title

Step 4: Add it

python3 /opt/mediagrab/mediagrab.py add '{"name": "Show Name", "url": "https://...", "dir": "show_name", "mode": "weekly"}'

Step 5: Test download

Run mediagrab.py weekly or mediagrab.py archive to verify it downloads correctly. Check:

  • Files appear in /moviedata/video/<dir>/
  • Permissions are gerbera:gerbera 664
  • Gerbera picks them up (may take up to 20 minutes for autoscan, or restart gerbera)

Examples

Weekly show (ARD, filtered):

python3 /opt/mediagrab/mediagrab.py add '{
  "name": "Die Sendung mit der Maus",
  "url": "https://www.ardmediathek.de/sendung/die-sendung-mit-der-maus/Y3JpZDovL2Rhc2Vyc3RlLmRlL3NlbmR1bmcgbWl0IGRlciBtYXVz",
  "dir": "sendung_mit_der_maus",
  "mode": "weekly",
  "min_duration": 1500,
  "max_duration": 2100,
  "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"]
}'

Archive backfill (with cap):

python3 /opt/mediagrab/mediagrab.py add '{
  "name": "Woozle Goozle",
  "url": "https://archive.org/details/woozle-goozle",
  "dir": "woozle_goozle",
  "mode": "archive",
  "per_run": 3,
  "max_total": 50,
  "min_duration": 1200,
  "max_duration": 1500
}'

Troubleshooting

  • yt-dlp errors: Update with pip install –break-system-packages -U yt-dlp in doppler
  • Permission denied creating dirs: Run chmod 777 /sparfuxdata/media/video/ on hertz host
  • Gerbera not picking up files: Restart gerbera (systemctl restart gerbera in doppler), autoscan interval is 20 min
  • Duplicate downloads: Run mediagrab.py seed to sync archive files with existing files
  • Check log: cat /opt/mediagrab/mediagrab.log

Architecture

  • Uses yt-dlp's --download-archive for dedup — reliable across re-runs
  • 720p cap to save disk space
  • Files named Title [id].mp4 — Gerbera indexes by title
  • No database, no daemon — just cron + yt-dlp + a Python script
mediagrab.txt · Last modified: by 127.0.0.1