StyleJSON

StyleJSON is the structured representation of a video’s visual and audio style, extracted by the Clipzy ML Engine during job processing. It is returned in the style_json field of a completed job result.

Top-level fields

version

string

required

Schema version. Currently always "1.0".

aspect_ratio

string

required

Target output aspect ratio. One of "16:9", "9:16", or "1:1".

frame_rate

integer

required

Target output frame rate. One of 24, 30, or 60 fps.

cut_frequency

float

required

Average number of cuts per minute detected in the source video.

motion_intensity

float

required

Overall motion intensity on a scale from 0 (static) to 100 (highly dynamic).

zoom_usage

float

required

Frequency of zoom and pan movements on a scale from 0 to 100.

use_subtitles

boolean

required

Whether the source video includes subtitles that should be preserved in output.

tempo_bpm

float

Detected music tempo in beats per minute. null if no music was detected.

music_genre

string

Detected music genre (e.g., "pop", "hip-hop", "ambient"). null if no music was detected or the genre could not be determined.

created_at

string

required

ISO 8601 timestamp when this StyleJSON was generated.

source_duration_seconds

float

required

Duration of the source video in seconds.

extraction_confidence

float

required

Overall confidence score for the style extraction, from 0 (low confidence) to 1 (high confidence). Values below 0.5 may indicate unreliable style data.

color_grade

Color grading parameters extracted from the source video. All values are on a scale of -100 to 100 unless otherwise noted.

color_grade

object

required

Hide properties

temperature

float

required

Color temperature adjustment. Negative values shift toward cool/blue; positive values shift toward warm/orange. Range: -100 to 100.

tint

float

required

Color tint adjustment. Negative values shift toward green; positive values shift toward magenta. Range: -100 to 100.

saturation

float

required

Saturation adjustment. -100 is fully desaturated (grayscale); 100 is maximum saturation. Range: -100 to 100.

contrast

float

required

Contrast adjustment. Negative values reduce contrast; positive values increase it. Range: -100 to 100.

highlights

float

required

Highlights (bright areas) adjustment. Negative values recover blown highlights; positive values boost them. Range: -100 to 100.

shadows

float

required

Shadows (dark areas) adjustment. Negative values crush shadows; positive values lift them. Range: -100 to 100.

audio_style

Audio processing parameters extracted from the source video.

audio_style

object

required

Hide properties

normalization_level

float

required

Target loudness normalization level from 0 (silent) to 1 (full level).

compression_ratio

float

required

Dynamic range compression ratio (e.g., 4.0 means 4:1 compression). Higher values reduce dynamic range more aggressively.

bass_boost

float

required

Low-frequency emphasis from 0 (no boost) to 100 (maximum boost).

enhance_speech

boolean

required

Whether speech enhancement processing should be applied to improve vocal clarity.

primary_transition

The most frequently used transition style detected in the source video.

primary_transition

object

required

Hide properties

type

string

required

Transition type. One of "cut", "fade", or "wipe".

duration_ms

integer

required

Duration of the transition in milliseconds.

easing

string

required

Easing function applied to the transition. One of "linear", "ease-in", "ease-out", or "ease-in-out".

secondary_transition

The second most frequently used transition style, if detected. Has the same structure as primary_transition.

secondary_transition

object

Show properties

type

string

required

Transition type. One of "cut", "fade", or "wipe".

duration_ms

integer

required

Duration of the transition in milliseconds.

easing

string

required

Easing function applied to the transition. One of "linear", "ease-in", "ease-out", or "ease-in-out".

detected_beats

An array of beat positions detected in the audio track. Each item represents a single detected beat.

detected_beats

object[]

required

Hide item properties

timestamp_ms

float

required

Position of the beat in milliseconds from the start of the audio track.

confidence

float

required

Confidence score for this detection, from 0 (uncertain) to 1 (highly confident).

frequency

string

required

Frequency band of the detected beat. One of "bass", "mid", or "treble".

Full example

{
  "version": "1.0",
  "color_grade": {
    "temperature": 10,
    "tint": 5,
    "saturation": 20,
    "contrast": 15,
    "highlights": 10,
    "shadows": -5
  },
  "aspect_ratio": "16:9",
  "frame_rate": 30,
  "audio_style": {
    "normalization_level": 0.85,
    "compression_ratio": 4.0,
    "bass_boost": 20,
    "enhance_speech": true
  },
  "cut_frequency": 3.5,
  "motion_intensity": 65,
  "zoom_usage": 30,
  "use_subtitles": false,
  "primary_transition": {
    "type": "fade",
    "duration_ms": 300,
    "easing": "ease-in-out"
  },
  "secondary_transition": null,
  "detected_beats": [
    {
      "timestamp_ms": 0,
      "confidence": 0.95,
      "frequency": "bass"
    },
    {
      "timestamp_ms": 500,
      "confidence": 0.88,
      "frequency": "mid"
    }
  ],
  "tempo_bpm": 120.0,
  "music_genre": "pop",
  "created_at": "2024-04-05T10:30:00Z",
  "source_duration_seconds": 60.0,
  "extraction_confidence": 0.92
}

Endpoints

Schemas

Top-level fields

color_grade

audio_style

primary_transition

secondary_transition

detected_beats

Full example

Endpoints

Schemas

​Top-level fields

​color_grade

​audio_style

​primary_transition

​secondary_transition

​detected_beats

​Full example

Top-level fields

color_grade

audio_style

primary_transition

secondary_transition

detected_beats

Full example