Skip to main content

Expected processing times

Processing time depends on video length and whether your workers have GPU acceleration. The table below shows end-to-end job duration estimates (not including queue wait time):
Video lengthCPUGPU
1 minute35–65s12–18s
2 minutes73–124s24–36s
5 minutes3–5 min60–90s
15 minutes9–15 min3–5 min
60 minutes35–60 min12–20 min
These estimates are for typical talking-head or interview-style content at 1080p. Videos with dense motion, overlapping speech, or multiple audio tracks may take longer.

Processing stage breakdown

For a 2-minute video, the pipeline spends time in each stage as follows:
StageCPUGPU
Video ingestion2–3s2–3s
Shot detection5–8s5–8s
Motion analysis10–15s2–4s
Audio analysis3–5s3–5s
Speech recognition30–60s5–10s
Visual embeddings20–30s5–10s
Color analysis2–3s2–3s
Total73–124s24–36s
Speech recognition and visual embeddings together account for roughly 70% of total CPU processing time. If you cannot add GPU hardware, optimizing video length before submission has the largest impact on these two stages.

Diagnosing slow jobs

1

Check the job queue depth

Use GET /api/v1/jobs to see how many jobs exist and compare with those in processing status to gauge queue depth:
curl "http://localhost:8000/api/v1/jobs?page=1&page_size=1"
{
  "jobs": [...],
  "total": 17,
  "page": 1,
  "page_size": 1
}
A growing total count — especially when most jobs are in queued status — means jobs are waiting for a worker to pick them up. This is a capacity problem, not a per-job performance problem.
2

Check MAX_CONCURRENT_JOBS

If queued is high and processing is at the MAX_CONCURRENT_JOBS limit, your workers are at capacity. You can either increase MAX_CONCURRENT_JOBS (if the machine has headroom) or add more workers.
# Current setting
MAX_CONCURRENT_JOBS=5

# Increase if CPU/GPU can handle more parallel work
MAX_CONCURRENT_JOBS=10
Do not increase MAX_CONCURRENT_JOBS beyond what the hardware can support. Excessive concurrency on a CPU-only machine slows all jobs down.
3

Add more workers

The most effective way to reduce queue depth and increase throughput is to run additional worker processes. Each worker connects to the same Redis queue and storage backend independently.
# Run four workers in parallel
python run_worker.py --worker-id worker-1 &
python run_worker.py --worker-id worker-2 &
python run_worker.py --worker-id worker-3 &
python run_worker.py --worker-id worker-4 &
Workers can also run on separate machines, as long as they share the same Redis instance and storage backend.

Reducing file size before upload

Large files take longer to ingest, transfer to storage, and process. If your pipeline allows it, pre-process videos before uploading: Trim to the relevant section. Speech recognition and visual embedding run on the full video duration. Cutting a 60-minute recording down to the 10-minute segment you care about reduces processing time proportionally.
ffmpeg -i input.mp4 -ss 00:05:00 -to 00:15:00 -c copy trimmed.mp4
Reduce resolution for large files. Clipzy processes videos at their native resolution. Downscaling a 4K video to 1080p before upload has minimal impact on result quality but significantly reduces ingestion and motion analysis time.
ffmpeg -i input_4k.mp4 -vf scale=1920:1080 -c:v libx264 -crf 18 output_1080p.mp4
Do not downsample below 720p. Lower resolutions reduce the quality of visual embeddings and shot detection.

Memory issues with large videos

Very large video files can exhaust worker memory during processing. Symptoms include workers crashing mid-job or jobs failing with an out-of-memory error. To reduce memory pressure:
  1. Reduce resolution to 1080p before uploading (see above).
  2. Ensure each worker machine has at least 8 GB of RAM for CPU deployments and 16 GB for GPU deployments.
  3. Reduce MAX_CONCURRENT_JOBS on machines handling very large files, so fewer jobs compete for memory simultaneously.

GPU vs CPU

GPU hardware reduces total processing time by 3–5x, primarily through faster speech recognition and visual embedding. If you process a consistent volume of videos and CPU times are too slow, adding a GPU worker is the highest-leverage infrastructure change you can make. You do not need to replace all workers with GPU instances. A single GPU worker can handle the computationally intensive stages while CPU workers handle lighter workloads in parallel.