Skip to main content
Style DNA is a structured JSON object that describes every measurable stylistic property of a video: its color grade, edit rhythm, audio treatment, motion intensity, transitions, and beat structure. Clipzy extracts Style DNA from a template video and can apply it to any footage you supply. Think of Style DNA as a style manifesto — a complete, portable description of how a video looks and feels, independent of its content.

How customers use Style DNA

  • Apply to your footage — submit a template video alongside your own clips, and the engine replicates the template’s Style DNA on your footage.
  • Inspect and store — retrieve the Style DNA JSON from any completed job and store it in your own system for later use or auditing.
  • Compare styles — diff two Style DNA objects to understand how two editors approach the same type of content differently.
The extraction_confidence field (0–1) tells you how reliably the engine analyzed the template. Higher-quality source videos with consistent style produce scores closer to 1.0.

Full example

{
  "version": "1.0",
  "color_grade": {
    "temperature": 10,
    "tint": 5,
    "saturation": 20,
    "contrast": 15,
    "highlights": 10,
    "shadows": -5
  },
  "aspect_ratio": "16:9",
  "frame_rate": 30,
  "audio_style": {
    "normalization_level": 0.85,
    "compression_ratio": 4.0,
    "bass_boost": 20,
    "enhance_speech": true
  },
  "cut_frequency": 3.5,
  "motion_intensity": 65,
  "zoom_usage": 30,
  "primary_transition": {
    "type": "fade",
    "duration_ms": 300,
    "easing": "ease-in-out"
  },
  "detected_beats": [
    { "timestamp_ms": 0, "confidence": 0.95, "frequency": "bass" }
  ],
  "tempo_bpm": 120.0,
  "music_genre": "pop",
  "created_at": "2024-04-05T10:30:00Z",
  "source_duration_seconds": 60.0,
  "extraction_confidence": 0.92
}

Schema reference

Top-level fields

version
string
required
Schema version. Currently always "1.0".
aspect_ratio
string
required
Detected aspect ratio of the template video. One of "16:9", "9:16", or "1:1".
frame_rate
number
required
Detected frame rate in frames per second. Typical values: 24, 30, 60.
cut_frequency
number
required
Average number of cuts per minute in the template video. A higher value means a faster-paced edit style. For example, 3.5 means roughly one cut every 17 seconds.
motion_intensity
number
required
Overall camera and subject motion on a scale of 0–100. 0 is completely static; 100 is near-constant high-speed movement.
zoom_usage
number
required
How frequently zoom effects appear, on a scale of 0–100. 0 means no zooms detected; 100 means zooms are present in nearly every shot.
use_subtitles
boolean
required
Whether the template video uses subtitles or captions.
tempo_bpm
number
Detected music tempo in beats per minute. null if no music was detected.
music_genre
string
Detected music genre (e.g., "pop", "hip-hop", "cinematic"). null if no music was detected or the genre could not be classified.
created_at
string
required
ISO 8601 timestamp of when the Style DNA was extracted.
source_duration_seconds
number
required
Duration of the template video in seconds.
extraction_confidence
number
required
Overall quality score for the extraction on a scale of 0–1. Scores above 0.85 indicate a reliable extraction. Scores below 0.6 suggest the template video may be too short, inconsistent, or low-quality for accurate style transfer.

color_grade

The color grade describes the visual tone of the template video. All fields use a scale of -100 to 100, where 0 is neutral.
color_grade
object
required

audio_style

The audio style describes how the template video’s audio was treated and mixed.
audio_style
object
required

primary_transition and secondary_transition

Transitions describe how the template moves between shots.
primary_transition
object
required
The most commonly used transition style in the template video.
secondary_transition
object
The second most common transition style, if the template uses more than one. null if only one transition type was detected.

detected_beats

An array of beat events detected in the template’s audio track. The engine uses these to align cuts and transitions in the applied output.
detected_beats
object[]
required