AI Video Pipeline
AI Video Pipeline
Use this skill when the user wants production-ready video assets quickly from a concept, transcript, or source footage.
What this skill does
- Normalizes a project brief into a production spec.
- Generates a script + scene plan (hook, body, CTA).
- Builds a rough cut from source clips/assets.
- Adds narration/audio bed and normalizes loudness.
- Generates subtitles and burns captions.
- Exports platform variants (9:16, 16:9, 1:1).
- Produces a QA report (duration, loudness, dropped/blank frames, outputs).
Input contract
Required:
project_namegoal(e.g., launch teaser, explainer, highlight reel)target_platforms(youtube, x, linkedin, tiktok, instagram)duration_seconds- At least one source:
source_videopath, orsource_audiopath +broll_dir, orscript_text
Optional:
tone(bold, educational, cinematic, documentary, etc.)ctabrand_hex_primary,brand_hex_secondarylogo_pathmusic_pathvoiceover_pathvoice_profile(autodefault,narrator,founder)captions_style(minimal, bold, subtitle)
Output contract
Return:
project_dirmaster_videoexports[](per platform)captions(.srtand burned-in outputs)qa_report.jsonnotes(manual touch-ups recommended)
Execution workflow
- Initialize project folders with
scripts/init_project.sh. - Write/edit brief in
project/brief.md. - If needed, extract transcript with Whisper skill/tooling and save
captions/raw.srt. - Build rough cut: concatenate/select clips; trim to target duration.
- Audio pass:
- generate natural voiceover with
scripts/voiceover_natural.shusingvoice_profile=auto - support expressive script tags:
[[pause:ms]],[[emph:text]],[[slow:text]],[[calm:text]],[[urgent:text]],[[inspiring:text]] - voiceover mix
- background music ducking
- loudness normalization (EBU R128 target)
- generate natural voiceover with
- Caption pass:
- clean SRT timing
- burn styled captions for each export aspect ratio
- Export variants with
scripts/export_variants.sh. - Generate QA summary with
scripts/qa_report.sh.
Guardrails
- Never claim “fully autonomous replacement” of editors.
- Default positioning: speed + iteration + cost compression with human creative oversight.
- Preserve source media; write outputs into project-scoped folders only.
- Fail fast when required input is missing; print exact remediation.
Edge cases
- Missing footage -> generate script + shot list only, no render.
- No voiceover/music -> export clean spoken/audio-light version.
- Captions fail -> still export clean master and report caption failure.
- Aspect-ratio crop conflicts -> produce letterboxed fallback and note it.