Skip to main content
StyleJSON is the structured representation of a video’s visual and audio style, extracted by the Clipzy ML Engine during job processing. It is returned in the style_json field of a completed job result.

Top-level fields

version
string
required
Schema version. Currently always "1.0".
aspect_ratio
string
required
Target output aspect ratio. One of "16:9", "9:16", or "1:1".
frame_rate
integer
required
Target output frame rate. One of 24, 30, or 60 fps.
cut_frequency
float
required
Average number of cuts per minute detected in the source video.
motion_intensity
float
required
Overall motion intensity on a scale from 0 (static) to 100 (highly dynamic).
zoom_usage
float
required
Frequency of zoom and pan movements on a scale from 0 to 100.
use_subtitles
boolean
required
Whether the source video includes subtitles that should be preserved in output.
tempo_bpm
float
Detected music tempo in beats per minute. null if no music was detected.
music_genre
string
Detected music genre (e.g., "pop", "hip-hop", "ambient"). null if no music was detected or the genre could not be determined.
created_at
string
required
ISO 8601 timestamp when this StyleJSON was generated.
source_duration_seconds
float
required
Duration of the source video in seconds.
extraction_confidence
float
required
Overall confidence score for the style extraction, from 0 (low confidence) to 1 (high confidence). Values below 0.5 may indicate unreliable style data.

color_grade

Color grading parameters extracted from the source video. All values are on a scale of -100 to 100 unless otherwise noted.
color_grade
object
required

audio_style

Audio processing parameters extracted from the source video.
audio_style
object
required

primary_transition

The most frequently used transition style detected in the source video.
primary_transition
object
required

secondary_transition

The second most frequently used transition style, if detected. Has the same structure as primary_transition.
secondary_transition
object

detected_beats

An array of beat positions detected in the audio track. Each item represents a single detected beat.
detected_beats
object[]
required

Full example

{
  "version": "1.0",
  "color_grade": {
    "temperature": 10,
    "tint": 5,
    "saturation": 20,
    "contrast": 15,
    "highlights": 10,
    "shadows": -5
  },
  "aspect_ratio": "16:9",
  "frame_rate": 30,
  "audio_style": {
    "normalization_level": 0.85,
    "compression_ratio": 4.0,
    "bass_boost": 20,
    "enhance_speech": true
  },
  "cut_frequency": 3.5,
  "motion_intensity": 65,
  "zoom_usage": 30,
  "use_subtitles": false,
  "primary_transition": {
    "type": "fade",
    "duration_ms": 300,
    "easing": "ease-in-out"
  },
  "secondary_transition": null,
  "detected_beats": [
    {
      "timestamp_ms": 0,
      "confidence": 0.95,
      "frequency": "bass"
    },
    {
      "timestamp_ms": 500,
      "confidence": 0.88,
      "frequency": "mid"
    }
  ],
  "tempo_bpm": 120.0,
  "music_genre": "pop",
  "created_at": "2024-04-05T10:30:00Z",
  "source_duration_seconds": 60.0,
  "extraction_confidence": 0.92
}