Style DNA

Style DNA is a structured JSON object that describes every measurable stylistic property of a video: its color grade, edit rhythm, audio treatment, motion intensity, transitions, and beat structure. Clipzy extracts Style DNA from a template video and can apply it to any footage you supply. Think of Style DNA as a style manifesto — a complete, portable description of how a video looks and feels, independent of its content.

How customers use Style DNA

Apply to your footage — submit a template video alongside your own clips, and the engine replicates the template’s Style DNA on your footage.
Inspect and store — retrieve the Style DNA JSON from any completed job and store it in your own system for later use or auditing.
Compare styles — diff two Style DNA objects to understand how two editors approach the same type of content differently.

The extraction_confidence field (0–1) tells you how reliably the engine analyzed the template. Higher-quality source videos with consistent style produce scores closer to 1.0.

Full example

{
  "version": "1.0",
  "color_grade": {
    "temperature": 10,
    "tint": 5,
    "saturation": 20,
    "contrast": 15,
    "highlights": 10,
    "shadows": -5
  },
  "aspect_ratio": "16:9",
  "frame_rate": 30,
  "audio_style": {
    "normalization_level": 0.85,
    "compression_ratio": 4.0,
    "bass_boost": 20,
    "enhance_speech": true
  },
  "cut_frequency": 3.5,
  "motion_intensity": 65,
  "zoom_usage": 30,
  "primary_transition": {
    "type": "fade",
    "duration_ms": 300,
    "easing": "ease-in-out"
  },
  "detected_beats": [
    { "timestamp_ms": 0, "confidence": 0.95, "frequency": "bass" }
  ],
  "tempo_bpm": 120.0,
  "music_genre": "pop",
  "created_at": "2024-04-05T10:30:00Z",
  "source_duration_seconds": 60.0,
  "extraction_confidence": 0.92
}

Schema reference

Top-level fields

version

string

required

Schema version. Currently always "1.0".

aspect_ratio

string

required

Detected aspect ratio of the template video. One of "16:9", "9:16", or "1:1".

frame_rate

number

required

Detected frame rate in frames per second. Typical values: 24, 30, 60.

cut_frequency

number

required

Average number of cuts per minute in the template video. A higher value means a faster-paced edit style. For example, 3.5 means roughly one cut every 17 seconds.

motion_intensity

number

required

Overall camera and subject motion on a scale of 0–100. 0 is completely static; 100 is near-constant high-speed movement.

zoom_usage

number

required

How frequently zoom effects appear, on a scale of 0–100. 0 means no zooms detected; 100 means zooms are present in nearly every shot.

use_subtitles

boolean

required

Whether the template video uses subtitles or captions.

tempo_bpm

number

Detected music tempo in beats per minute. null if no music was detected.

music_genre

string

Detected music genre (e.g., "pop", "hip-hop", "cinematic"). null if no music was detected or the genre could not be classified.

created_at

string

required

ISO 8601 timestamp of when the Style DNA was extracted.

source_duration_seconds

number

required

Duration of the template video in seconds.

extraction_confidence

number

required

Overall quality score for the extraction on a scale of 0–1. Scores above 0.85 indicate a reliable extraction. Scores below 0.6 suggest the template video may be too short, inconsistent, or low-quality for accurate style transfer.

`color_grade`

The color grade describes the visual tone of the template video. All fields use a scale of -100 to 100, where 0 is neutral.

color_grade

object

required

Hide properties

temperature

number

required

Warm/cool color bias. Positive values push toward warm (orange/yellow); negative values push toward cool (blue).

tint

number

required

Green/magenta color bias. Positive values push toward magenta; negative values push toward green.

saturation

number

required

Color intensity. Positive values increase vividness; negative values desaturate toward monochrome.

contrast

number

required

Difference between light and dark areas. Positive values increase contrast; negative values flatten the image.

highlights

number

required

Brightness of the brightest areas. Positive values recover blown highlights; negative values darken them further.

shadows

number

required

Brightness of the darkest areas. Positive values lift shadows; negative values crush blacks.

`audio_style`

The audio style describes how the template video’s audio was treated and mixed.

audio_style

object

required

Hide properties

normalization_level

number

required

Target loudness level on a scale of 0–1. 1.0 is fully normalized to a consistent broadcast-standard level.

compression_ratio

number

required

Dynamic range compression ratio applied to the audio. A value of 4.0 means a 4:1 ratio — loud sounds are reduced to one-quarter of their original dynamic range.

bass_boost

number

required

Low-frequency enhancement on a scale of 0–100. 0 means no bass boost applied.

enhance_speech

boolean

required

Whether the template uses speech enhancement processing to improve vocal clarity.

`primary_transition` and `secondary_transition`

Transitions describe how the template moves between shots.

primary_transition

object

required

The most commonly used transition style in the template video.

Hide properties

type

string

required

Transition type. Common values: "cut", "fade", "wipe", "dissolve", "zoom".

duration_ms

number

required

Duration of the transition in milliseconds.

easing

string

required

Animation easing curve. One of "linear", "ease-in", "ease-out", or "ease-in-out".

secondary_transition

object

The second most common transition style, if the template uses more than one. null if only one transition type was detected.

Show properties

type

string

required

Transition type.

duration_ms

number

required

Duration of the transition in milliseconds.

easing

string

required

Animation easing curve.

`detected_beats`

An array of beat events detected in the template’s audio track. The engine uses these to align cuts and transitions in the applied output.

detected_beats

object[]

required

Hide item properties

timestamp_ms

number

required

Position of the beat in the template video, in milliseconds from the start.

confidence

number

required

How confident the engine is that this is a true beat, on a scale of 0–1. Values above 0.8 are high-confidence.

frequency

string

required

The frequency band of the beat. One of "bass", "mid", or "treble". Bass beats are most commonly used to trigger cuts.

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

How customers use Style DNA

Full example

Schema reference

Top-level fields

`color_grade`

`audio_style`

`primary_transition` and `secondary_transition`

`detected_beats`

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

​How customers use Style DNA

​Full example

​Schema reference

​Top-level fields

​color_grade

​audio_style

​primary_transition and secondary_transition

​detected_beats

How customers use Style DNA

Full example

Schema reference

Top-level fields

`color_grade`

`audio_style`

`primary_transition` and `secondary_transition`

`detected_beats`