Descrizione
Pipeline multimodale video → summary strutturato: ffmpeg extract frames+audio → Whisper ASR transcript con timestamp → Vision Qwen2.5-VL describe scenes (scene-change detection) → Liara LLM fonde transcript+scenes in TL;DR + bullets + chapters timestamped. Backend self-hosted Hetzner (no costi per-token, GDPR-compliant). Output: { transcript, scenes[], summary: {tldr, bullets[], chapters[]}, durationSec }. Use case: indicizzazione media library (corsi/webinar/meeting), highlight reel automatico per social, accessibility (caption + audio description WCAG 2.2), search-in-video tramite scene timestamps, briefing post-meeting con chapter cliccabili.
