====== mediagrab — Kids Show Downloader ====== Automated downloader for kids' TV shows, running in the **doppler** LXD container on hertz. Downloads are stored in ''/moviedata/video/'' and automatically picked up by Gerbera UPnP/DLNA server. ===== Location ===== * Script: ''/opt/mediagrab/mediagrab.py'' (in doppler) * Config: ''/opt/mediagrab/shows.json'' * Archives: ''/opt/mediagrab/archives/*.txt'' (yt-dlp download tracking) * Log: ''/opt/mediagrab/mediagrab.log'' * Videos: ''/moviedata/video//'' ===== Cron Schedule ===== * **Saturday 10:00** — ''mediagrab.py weekly'' — checks all weekly shows for new episodes * **Daily 11:00** — ''mediagrab.py archive'' — downloads N backlog episodes per archive show ===== Commands ===== # List all shows and status python3 /opt/mediagrab/mediagrab.py list # Run weekly check manually python3 /opt/mediagrab/mediagrab.py weekly # Run archive backfill manually python3 /opt/mediagrab/mediagrab.py archive # Test a URL (list episodes without downloading) python3 /opt/mediagrab/mediagrab.py test "" # Seed archive from existing files (run after manual downloads) python3 /opt/mediagrab/mediagrab.py seed All commands can also be run via Claude Code MCP: hertz-tools → mediagrab → command: "list" ===== Adding a New Show ===== === Step 1: Find the show URL === Supported sources (via yt-dlp): * **ARD Mediathek**: ''https://www.ardmediathek.de/sendung//'' — best for ARD/WDR/KiKA shows * **KiKA**: ''https://www.kika.de//'' — only works if the page has a ''videoSubchannel'' in the API * **archive.org**: ''https://archive.org/details/'' — good for bulk backlog * **YouTube**: standard playlist/channel URLs To find a show URL: - Go to [[https://www.ardmediathek.de|ardmediathek.de]], search for the show, click "Sendung" tab, copy URL - Or search on [[https://www.kika.de|kika.de]] === Step 2: Test the URL === python3 /opt/mediagrab/mediagrab.py test "https://www.ardmediathek.de/sendung/..." Check: are episodes listed? Are durations correct? Are there unwanted variants (Gebärdensprache, Audiodeskription)? === Step 3: Build the JSON config === { "name": "Show Name", "url": "https://...", "dir": "show_directory_name", "mode": "weekly", "min_duration": 1200, "max_duration": 2100, "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"] } **Fields:** ^ Field ^ Required ^ Description ^ | ''name'' | yes | Display name | | ''url'' | yes | yt-dlp compatible URL | | ''dir'' | yes | Subdirectory under ''/moviedata/video/'' | | ''mode'' | yes | ''weekly'' (all new) or ''archive'' (N per day) | | ''per_run'' | archive only | Episodes per daily run (default: 3) | | ''max_total'' | archive only | Stop after this many total files | | ''min_duration'' | no | Minimum duration in seconds | | ''max_duration'' | no | Maximum duration in seconds | | ''title_exclude'' | no | List of regex patterns to skip by title | === Step 4: Add it === python3 /opt/mediagrab/mediagrab.py add '{"name": "Show Name", "url": "https://...", "dir": "show_name", "mode": "weekly"}' === Step 5: Test download === Run ''mediagrab.py weekly'' or ''mediagrab.py archive'' to verify it downloads correctly. Check: * Files appear in ''/moviedata/video//'' * Permissions are ''gerbera:gerbera'' 664 * Gerbera picks them up (may take up to 20 minutes for autoscan, or restart gerbera) ===== Examples ===== **Weekly show (ARD, filtered):** python3 /opt/mediagrab/mediagrab.py add '{ "name": "Die Sendung mit der Maus", "url": "https://www.ardmediathek.de/sendung/die-sendung-mit-der-maus/Y3JpZDovL2Rhc2Vyc3RlLmRlL3NlbmR1bmcgbWl0IGRlciBtYXVz", "dir": "sendung_mit_der_maus", "mode": "weekly", "min_duration": 1500, "max_duration": 2100, "title_exclude": ["Gebärdensprache", "Audiodeskription", "Hörfassung"] }' **Archive backfill (with cap):** python3 /opt/mediagrab/mediagrab.py add '{ "name": "Woozle Goozle", "url": "https://archive.org/details/woozle-goozle", "dir": "woozle_goozle", "mode": "archive", "per_run": 3, "max_total": 50, "min_duration": 1200, "max_duration": 1500 }' ===== Troubleshooting ===== * **yt-dlp errors**: Update with ''pip install --break-system-packages -U yt-dlp'' in doppler * **Permission denied creating dirs**: Run ''chmod 777 /sparfuxdata/media/video/'' on hertz host * **Gerbera not picking up files**: Restart gerbera (''systemctl restart gerbera'' in doppler), autoscan interval is 20 min * **Duplicate downloads**: Run ''mediagrab.py seed'' to sync archive files with existing files * **Check log**: ''cat /opt/mediagrab/mediagrab.log'' ===== Architecture ===== * Uses yt-dlp's ''%%--download-archive%%'' for dedup — reliable across re-runs * 720p cap to save disk space * Files named ''Title [id].mp4'' — Gerbera indexes by title * No database, no daemon — just cron + yt-dlp + a Python script