StyleJSON is the structured representation of a video’s visual and audio style, extracted by the Clipzy ML Engine during job processing. It is returned in the style_json field of a completed job result .
Top-level fields
Schema version. Currently always "1.0".
Target output aspect ratio. One of "16:9", "9:16", or "1:1".
Target output frame rate. One of 24, 30, or 60 fps.
Average number of cuts per minute detected in the source video.
Overall motion intensity on a scale from 0 (static) to 100 (highly dynamic).
Frequency of zoom and pan movements on a scale from 0 to 100.
Whether the source video includes subtitles that should be preserved in output.
Detected music tempo in beats per minute. null if no music was detected.
Detected music genre (e.g., "pop", "hip-hop", "ambient"). null if no music was detected or the genre could not be determined.
ISO 8601 timestamp when this StyleJSON was generated.
Duration of the source video in seconds.
Overall confidence score for the style extraction, from 0 (low confidence) to 1 (high confidence). Values below 0.5 may indicate unreliable style data.
color_grade
Color grading parameters extracted from the source video. All values are on a scale of -100 to 100 unless otherwise noted.
Color temperature adjustment. Negative values shift toward cool/blue; positive values shift toward warm/orange. Range: -100 to 100.
Color tint adjustment. Negative values shift toward green; positive values shift toward magenta. Range: -100 to 100.
Saturation adjustment. -100 is fully desaturated (grayscale); 100 is maximum saturation. Range: -100 to 100.
Contrast adjustment. Negative values reduce contrast; positive values increase it. Range: -100 to 100.
Highlights (bright areas) adjustment. Negative values recover blown highlights; positive values boost them. Range: -100 to 100.
Shadows (dark areas) adjustment. Negative values crush shadows; positive values lift them. Range: -100 to 100.
audio_style
Audio processing parameters extracted from the source video.
Target loudness normalization level from 0 (silent) to 1 (full level).
Dynamic range compression ratio (e.g., 4.0 means 4:1 compression). Higher values reduce dynamic range more aggressively.
Low-frequency emphasis from 0 (no boost) to 100 (maximum boost).
Whether speech enhancement processing should be applied to improve vocal clarity.
primary_transition
The most frequently used transition style detected in the source video.
Transition type. One of "cut", "fade", or "wipe".
Duration of the transition in milliseconds.
Easing function applied to the transition. One of "linear", "ease-in", "ease-out", or "ease-in-out".
secondary_transition
The second most frequently used transition style, if detected. Has the same structure as primary_transition.
Transition type. One of "cut", "fade", or "wipe".
Duration of the transition in milliseconds.
Easing function applied to the transition. One of "linear", "ease-in", "ease-out", or "ease-in-out".
detected_beats
An array of beat positions detected in the audio track. Each item represents a single detected beat.
Position of the beat in milliseconds from the start of the audio track.
Confidence score for this detection, from 0 (uncertain) to 1 (highly confident).
Frequency band of the detected beat. One of "bass", "mid", or "treble".
Full example
{
"version" : "1.0" ,
"color_grade" : {
"temperature" : 10 ,
"tint" : 5 ,
"saturation" : 20 ,
"contrast" : 15 ,
"highlights" : 10 ,
"shadows" : -5
},
"aspect_ratio" : "16:9" ,
"frame_rate" : 30 ,
"audio_style" : {
"normalization_level" : 0.85 ,
"compression_ratio" : 4.0 ,
"bass_boost" : 20 ,
"enhance_speech" : true
},
"cut_frequency" : 3.5 ,
"motion_intensity" : 65 ,
"zoom_usage" : 30 ,
"use_subtitles" : false ,
"primary_transition" : {
"type" : "fade" ,
"duration_ms" : 300 ,
"easing" : "ease-in-out"
},
"secondary_transition" : null ,
"detected_beats" : [
{
"timestamp_ms" : 0 ,
"confidence" : 0.95 ,
"frequency" : "bass"
},
{
"timestamp_ms" : 500 ,
"confidence" : 0.88 ,
"frequency" : "mid"
}
],
"tempo_bpm" : 120.0 ,
"music_genre" : "pop" ,
"created_at" : "2024-04-05T10:30:00Z" ,
"source_duration_seconds" : 60.0 ,
"extraction_confidence" : 0.92
}