Processing pipeline

When you submit a job, the Clipzy ML Engine processes your video through a series of sequential stages. Each stage produces data that feeds into the next. The current_stage field on the job object tells you which stage is active at any given moment.

Pipeline stages

video_analysis → audio_analysis → motion_analysis → style_extraction → style_application → rendering

video_analysis

The engine ingests your template video and performs frame-level analysis:

Detects cut points and shot boundaries using visual difference detection.
Identifies composition patterns: rule-of-thirds framing, headroom, leading lines.
Measures aspect ratio and frame rate.
Calculates cut_frequency (cuts per minute).

Output: Shot boundary timeline, composition metadata, cut_frequency, aspect_ratio, frame_rate.

audio_analysis

The engine isolates and analyzes the audio track:

Detects tempo and measures tempo_bpm.
Identifies beat positions across bass, mid, and treble frequency bands and populates detected_beats.
Classifies music_genre if music is present.
Measures dynamic range, loudness normalization level, and compression characteristics.
Detects whether speech enhancement is applied.

Output: tempo_bpm, music_genre, detected_beats, audio_style fields.

motion_analysis

The engine runs optical flow analysis across the video to understand camera and subject movement:

Tracks pixel displacement between frames to build a motion vector map.
Calculates motion_intensity (0–100) as the average magnitude of motion across the clip.
Detects zoom gestures and measures zoom_usage (0–100).
Identifies camera movement patterns: handheld shake, dolly, pan, tilt.

Output: motion_intensity, zoom_usage, camera movement classification.

style_extraction

The engine combines all analysis results into a single Style DNA object:

Aggregates color grade measurements across all shots into color_grade.
Identifies the most and second-most common transitions to populate primary_transition and secondary_transition.
Detects text overlay and subtitle patterns.
Calculates extraction_confidence as a holistic quality score for the extraction.

Output: The complete Style DNA (StyleJSON) object. This is stored on the job and available immediately after this stage completes, even if you did not request rendering.

style_application

The engine maps the extracted Style DNA onto your target footage:

Re-grades the color of your footage to match the color_grade values.
Re-times cuts to match cut_frequency and aligns them to detected_beats where possible.
Applies primary_transition and secondary_transition at cut points.
Adjusts motion and zoom to match the motion_intensity and zoom_usage targets.
Applies the audio_style treatment to your footage’s audio track.

This stage is only executed if you supplied target footage in your job request. If you submitted a template-only job for style extraction, this stage is skipped.

rendering

The engine encodes the final output video:

Renders all applied effects and color grades to a single output stream.
Encodes to MP4 (H.264) at the resolution of your input footage.
Produces a download URL attached to the completed job object.

Output: Rendered MP4 file, accessible via the output_video_url field on the job result.

Retrieve the output_video_url from GET /api/v1/jobs/{job_id}/result once the job status is completed.

Stage summary

Stage	What it analyzes	Key outputs
`video_analysis`	Frames, cuts, composition	`cut_frequency`, `aspect_ratio`, `frame_rate`
`audio_analysis`	Audio track, music, speech	`tempo_bpm`, `music_genre`, `detected_beats`, `audio_style`
`motion_analysis`	Optical flow, camera movement	`motion_intensity`, `zoom_usage`
`style_extraction`	All analysis combined	Full Style DNA (`StyleJSON`)
`style_application`	Target footage + Style DNA	Styled footage (pre-render)
`rendering`	Styled footage	Final MP4 output

Progress tracking

As the job advances through stages, the progress_percent field increases from 0 to 100. The distribution is roughly:

Stage	Approximate progress range
`video_analysis`	0–20%
`audio_analysis`	20–35%
`motion_analysis`	35–55%
`style_extraction`	55–65%
`style_application`	65–85%
`rendering`	85–100%

If you only need Style DNA (no rendering), progress_percent reaches 100 and the job status becomes completed at the end of the style_extraction stage.

What to do if a stage fails

If the job transitions to failed, the error object on the job response includes the stage where failure occurred and a human-readable message. Common causes by stage:

Stage	Common failure causes
`video_analysis`	Unsupported codec, corrupted file, duration below minimum
`audio_analysis`	No audio track, audio codec not supported
`motion_analysis`	Resolution too low for optical flow analysis
`style_extraction`	Insufficient data from upstream stages
`style_application`	Target footage incompatible with extracted style
`rendering`	Output encoding failure

See Job lifecycle for how to handle failed jobs.

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

Pipeline stages

Stage summary

Progress tracking

What to do if a stage fails

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

​Pipeline stages

​Stage summary

​Progress tracking

​What to do if a stage fails

Pipeline stages

Stage summary

Progress tracking

What to do if a stage fails