Skip to main content
Clipzy processes videos through a multi-stage pipeline that extracts scene, audio, speech, and visual data. Understanding the processing configuration helps you set appropriate limits and scale to meet your throughput requirements.

Concurrency

MAX_CONCURRENT_JOBS controls how many jobs a single worker processes simultaneously.
MAX_CONCURRENT_JOBS=5
Setting this too high on an underpowered machine causes all jobs to compete for CPU or GPU, increasing the total time for each individual job. A good starting point is:
  • CPU deployment: set MAX_CONCURRENT_JOBS to the number of physical cores divided by 2
  • GPU deployment: set MAX_CONCURRENT_JOBS to 1–2 per GPU, depending on VRAM

Scaling with multiple workers

Running multiple worker processes is the primary way to increase throughput. Each worker picks up jobs independently from the Redis queue. You can run workers on the same machine or across multiple machines, as long as all workers share the same Redis instance and storage backend.
# Start two workers on the same machine (in separate terminals)
python run_worker.py --worker-id worker-1 &
python run_worker.py --worker-id worker-2 &
With two workers each handling up to MAX_CONCURRENT_JOBS at once, your effective throughput doubles.

Job timeout

PROCESSING_TIMEOUT_SECONDS sets the maximum wall-clock time a job may run before Clipzy marks it as failed.
PROCESSING_TIMEOUT_SECONDS=3600
If a job exceeds this limit, its status becomes failed with error code JOB_TIMEOUT. Jobs do not automatically retry on timeout. You must re-submit them manually.
Setting PROCESSING_TIMEOUT_SECONDS too low will cause legitimate long videos to fail. A 60-minute video processed on CPU may take 20–30 minutes. Ensure your timeout accommodates your longest expected input.

Processing stages

Clipzy runs these stages sequentially for each video. The table shows approximate durations for a 2-minute video on CPU and GPU hardware:
StageCPUGPU
Video ingestion2–3s2–3s
Shot detection5–8s5–8s
Motion analysis10–15s2–4s
Audio analysis3–5s3–5s
Speech recognition30–60s5–10s
Visual embeddings20–30s5–10s
Color analysis2–3s2–3s
Total73–124s24–36s
Speech recognition and visual embeddings account for the majority of processing time on CPU. A GPU deployment reduces both stages dramatically and is recommended for any production workload with consistent video volume.

Expected processing times by video length

The table below shows rough estimates for end-to-end job duration (not including queue wait time):
Video lengthCPUGPU
1 minute35–65s12–18s
2 minutes73–124s24–36s
5 minutes3–5 min60–90s
15 minutes9–15 min3–5 min
60 minutes35–60 min12–20 min
These are estimates for typical talking-head or interview-style content. Videos with dense motion, multiple speakers, or complex audio may take longer.

Checking queue depth

To understand how many jobs are waiting, use GET /api/v1/jobs with page_size=1 and read the total field. Compare that count against the number of jobs currently in processing status to gauge queue depth.
curl "http://localhost:8000/api/v1/jobs?page=1&page_size=1"
{
  "jobs": [...],
  "total": 14,
  "page": 1,
  "page_size": 1
}
If the total count is consistently growing, you need more workers or a higher MAX_CONCURRENT_JOBS value.