Performance

Expected processing times

Processing time depends on video length and whether your workers have GPU acceleration. The table below shows end-to-end job duration estimates (not including queue wait time):

Video length	CPU	GPU
1 minute	35–65s	12–18s
2 minutes	73–124s	24–36s
5 minutes	3–5 min	60–90s
15 minutes	9–15 min	3–5 min
60 minutes	35–60 min	12–20 min

These estimates are for typical talking-head or interview-style content at 1080p. Videos with dense motion, overlapping speech, or multiple audio tracks may take longer.

Processing stage breakdown

For a 2-minute video, the pipeline spends time in each stage as follows:

Stage	CPU	GPU
Video ingestion	2–3s	2–3s
Shot detection	5–8s	5–8s
Motion analysis	10–15s	2–4s
Audio analysis	3–5s	3–5s
Speech recognition	30–60s	5–10s
Visual embeddings	20–30s	5–10s
Color analysis	2–3s	2–3s
Total	73–124s	24–36s

Speech recognition and visual embeddings together account for roughly 70% of total CPU processing time. If you cannot add GPU hardware, optimizing video length before submission has the largest impact on these two stages.

Diagnosing slow jobs

Check the job queue depth

Use GET /api/v1/jobs to see how many jobs exist and compare with those in processing status to gauge queue depth:

curl "http://localhost:8000/api/v1/jobs?page=1&page_size=1"

{
  "jobs": [...],
  "total": 17,
  "page": 1,
  "page_size": 1
}

A growing total count — especially when most jobs are in queued status — means jobs are waiting for a worker to pick them up. This is a capacity problem, not a per-job performance problem.

Check MAX_CONCURRENT_JOBS

If queued is high and processing is at the MAX_CONCURRENT_JOBS limit, your workers are at capacity. You can either increase MAX_CONCURRENT_JOBS (if the machine has headroom) or add more workers.

# Current setting
MAX_CONCURRENT_JOBS=5

# Increase if CPU/GPU can handle more parallel work
MAX_CONCURRENT_JOBS=10

Do not increase MAX_CONCURRENT_JOBS beyond what the hardware can support. Excessive concurrency on a CPU-only machine slows all jobs down.

Add more workers

The most effective way to reduce queue depth and increase throughput is to run additional worker processes. Each worker connects to the same Redis queue and storage backend independently.

# Run four workers in parallel
python run_worker.py --worker-id worker-1 &
python run_worker.py --worker-id worker-2 &
python run_worker.py --worker-id worker-3 &
python run_worker.py --worker-id worker-4 &

Workers can also run on separate machines, as long as they share the same Redis instance and storage backend.

Reducing file size before upload

Large files take longer to ingest, transfer to storage, and process. If your pipeline allows it, pre-process videos before uploading: Trim to the relevant section. Speech recognition and visual embedding run on the full video duration. Cutting a 60-minute recording down to the 10-minute segment you care about reduces processing time proportionally.

ffmpeg -i input.mp4 -ss 00:05:00 -to 00:15:00 -c copy trimmed.mp4

Reduce resolution for large files. Clipzy processes videos at their native resolution. Downscaling a 4K video to 1080p before upload has minimal impact on result quality but significantly reduces ingestion and motion analysis time.

ffmpeg -i input_4k.mp4 -vf scale=1920:1080 -c:v libx264 -crf 18 output_1080p.mp4

Do not downsample below 720p. Lower resolutions reduce the quality of visual embeddings and shot detection.

Memory issues with large videos

Very large video files can exhaust worker memory during processing. Symptoms include workers crashing mid-job or jobs failing with an out-of-memory error. To reduce memory pressure:

Reduce resolution to 1080p before uploading (see above).
Ensure each worker machine has at least 8 GB of RAM for CPU deployments and 16 GB for GPU deployments.
Reduce MAX_CONCURRENT_JOBS on machines handling very large files, so fewer jobs compete for memory simultaneously.

GPU vs CPU

GPU hardware reduces total processing time by 3–5x, primarily through faster speech recognition and visual embedding. If you process a consistent volume of videos and CPU times are too slow, adding a GPU worker is the highest-leverage infrastructure change you can make. You do not need to replace all workers with GPU instances. A single GPU worker can handle the computationally intensive stages while CPU workers handle lighter workloads in parallel.

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

Expected processing times

Processing stage breakdown

Diagnosing slow jobs

Reducing file size before upload

Memory issues with large videos

GPU vs CPU

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

​Expected processing times

​Processing stage breakdown

​Diagnosing slow jobs

​Reducing file size before upload

​Memory issues with large videos

​GPU vs CPU

Expected processing times

Processing stage breakdown

Diagnosing slow jobs

Reducing file size before upload

Memory issues with large videos

GPU vs CPU