Skip to main content
When you submit a job, the Clipzy ML Engine processes your video through a series of sequential stages. Each stage produces data that feeds into the next. The current_stage field on the job object tells you which stage is active at any given moment.

Pipeline stages

video_analysis → audio_analysis → motion_analysis → style_extraction → style_application → rendering
1

video_analysis

The engine ingests your template video and performs frame-level analysis:
  • Detects cut points and shot boundaries using visual difference detection.
  • Identifies composition patterns: rule-of-thirds framing, headroom, leading lines.
  • Measures aspect ratio and frame rate.
  • Calculates cut_frequency (cuts per minute).
Output: Shot boundary timeline, composition metadata, cut_frequency, aspect_ratio, frame_rate.
2

audio_analysis

The engine isolates and analyzes the audio track:
  • Detects tempo and measures tempo_bpm.
  • Identifies beat positions across bass, mid, and treble frequency bands and populates detected_beats.
  • Classifies music_genre if music is present.
  • Measures dynamic range, loudness normalization level, and compression characteristics.
  • Detects whether speech enhancement is applied.
Output: tempo_bpm, music_genre, detected_beats, audio_style fields.
3

motion_analysis

The engine runs optical flow analysis across the video to understand camera and subject movement:
  • Tracks pixel displacement between frames to build a motion vector map.
  • Calculates motion_intensity (0–100) as the average magnitude of motion across the clip.
  • Detects zoom gestures and measures zoom_usage (0–100).
  • Identifies camera movement patterns: handheld shake, dolly, pan, tilt.
Output: motion_intensity, zoom_usage, camera movement classification.
4

style_extraction

The engine combines all analysis results into a single Style DNA object:
  • Aggregates color grade measurements across all shots into color_grade.
  • Identifies the most and second-most common transitions to populate primary_transition and secondary_transition.
  • Detects text overlay and subtitle patterns.
  • Calculates extraction_confidence as a holistic quality score for the extraction.
Output: The complete Style DNA (StyleJSON) object. This is stored on the job and available immediately after this stage completes, even if you did not request rendering.
5

style_application

The engine maps the extracted Style DNA onto your target footage:
  • Re-grades the color of your footage to match the color_grade values.
  • Re-times cuts to match cut_frequency and aligns them to detected_beats where possible.
  • Applies primary_transition and secondary_transition at cut points.
  • Adjusts motion and zoom to match the motion_intensity and zoom_usage targets.
  • Applies the audio_style treatment to your footage’s audio track.
This stage is only executed if you supplied target footage in your job request. If you submitted a template-only job for style extraction, this stage is skipped.
6

rendering

The engine encodes the final output video:
  • Renders all applied effects and color grades to a single output stream.
  • Encodes to MP4 (H.264) at the resolution of your input footage.
  • Produces a download URL attached to the completed job object.
Output: Rendered MP4 file, accessible via the output_video_url field on the job result.
Retrieve the output_video_url from GET /api/v1/jobs/{job_id}/result once the job status is completed.

Stage summary

StageWhat it analyzesKey outputs
video_analysisFrames, cuts, compositioncut_frequency, aspect_ratio, frame_rate
audio_analysisAudio track, music, speechtempo_bpm, music_genre, detected_beats, audio_style
motion_analysisOptical flow, camera movementmotion_intensity, zoom_usage
style_extractionAll analysis combinedFull Style DNA (StyleJSON)
style_applicationTarget footage + Style DNAStyled footage (pre-render)
renderingStyled footageFinal MP4 output

Progress tracking

As the job advances through stages, the progress_percent field increases from 0 to 100. The distribution is roughly:
StageApproximate progress range
video_analysis0–20%
audio_analysis20–35%
motion_analysis35–55%
style_extraction55–65%
style_application65–85%
rendering85–100%
If you only need Style DNA (no rendering), progress_percent reaches 100 and the job status becomes completed at the end of the style_extraction stage.

What to do if a stage fails

If the job transitions to failed, the error object on the job response includes the stage where failure occurred and a human-readable message. Common causes by stage:
StageCommon failure causes
video_analysisUnsupported codec, corrupted file, duration below minimum
audio_analysisNo audio track, audio codec not supported
motion_analysisResolution too low for optical flow analysis
style_extractionInsufficient data from upstream stages
style_applicationTarget footage incompatible with extracted style
renderingOutput encoding failure
See Job lifecycle for how to handle failed jobs.