current_stage field on the job object tells you which stage is active at any given moment.
Pipeline stages
video_analysis
The engine ingests your template video and performs frame-level analysis:
- Detects cut points and shot boundaries using visual difference detection.
- Identifies composition patterns: rule-of-thirds framing, headroom, leading lines.
- Measures aspect ratio and frame rate.
- Calculates
cut_frequency(cuts per minute).
cut_frequency, aspect_ratio, frame_rate.audio_analysis
The engine isolates and analyzes the audio track:
- Detects tempo and measures
tempo_bpm. - Identifies beat positions across bass, mid, and treble frequency bands and populates
detected_beats. - Classifies
music_genreif music is present. - Measures dynamic range, loudness normalization level, and compression characteristics.
- Detects whether speech enhancement is applied.
tempo_bpm, music_genre, detected_beats, audio_style fields.motion_analysis
The engine runs optical flow analysis across the video to understand camera and subject movement:
- Tracks pixel displacement between frames to build a motion vector map.
- Calculates
motion_intensity(0–100) as the average magnitude of motion across the clip. - Detects zoom gestures and measures
zoom_usage(0–100). - Identifies camera movement patterns: handheld shake, dolly, pan, tilt.
motion_intensity, zoom_usage, camera movement classification.style_extraction
The engine combines all analysis results into a single Style DNA object:
- Aggregates color grade measurements across all shots into
color_grade. - Identifies the most and second-most common transitions to populate
primary_transitionandsecondary_transition. - Detects text overlay and subtitle patterns.
- Calculates
extraction_confidenceas a holistic quality score for the extraction.
StyleJSON) object. This is stored on the job and available immediately after this stage completes, even if you did not request rendering.style_application
The engine maps the extracted Style DNA onto your target footage:
- Re-grades the color of your footage to match the
color_gradevalues. - Re-times cuts to match
cut_frequencyand aligns them todetected_beatswhere possible. - Applies
primary_transitionandsecondary_transitionat cut points. - Adjusts motion and zoom to match the
motion_intensityandzoom_usagetargets. - Applies the
audio_styletreatment to your footage’s audio track.
rendering
The engine encodes the final output video:
- Renders all applied effects and color grades to a single output stream.
- Encodes to MP4 (H.264) at the resolution of your input footage.
- Produces a download URL attached to the completed job object.
output_video_url field on the job result.Retrieve the
output_video_url from GET /api/v1/jobs/{job_id}/result once the job status is completed.Stage summary
| Stage | What it analyzes | Key outputs |
|---|---|---|
video_analysis | Frames, cuts, composition | cut_frequency, aspect_ratio, frame_rate |
audio_analysis | Audio track, music, speech | tempo_bpm, music_genre, detected_beats, audio_style |
motion_analysis | Optical flow, camera movement | motion_intensity, zoom_usage |
style_extraction | All analysis combined | Full Style DNA (StyleJSON) |
style_application | Target footage + Style DNA | Styled footage (pre-render) |
rendering | Styled footage | Final MP4 output |
Progress tracking
As the job advances through stages, theprogress_percent field increases from 0 to 100. The distribution is roughly:
| Stage | Approximate progress range |
|---|---|
video_analysis | 0–20% |
audio_analysis | 20–35% |
motion_analysis | 35–55% |
style_extraction | 55–65% |
style_application | 65–85% |
rendering | 85–100% |
What to do if a stage fails
If the job transitions tofailed, the error object on the job response includes the stage where failure occurred and a human-readable message. Common causes by stage:
| Stage | Common failure causes |
|---|---|
video_analysis | Unsupported codec, corrupted file, duration below minimum |
audio_analysis | No audio track, audio codec not supported |
motion_analysis | Resolution too low for optical flow analysis |
style_extraction | Insufficient data from upstream stages |
style_application | Target footage incompatible with extracted style |
rendering | Output encoding failure |