Processing

Clipzy processes videos through a multi-stage pipeline that extracts scene, audio, speech, and visual data. Understanding the processing configuration helps you set appropriate limits and scale to meet your throughput requirements.

Concurrency

MAX_CONCURRENT_JOBS controls how many jobs a single worker processes simultaneously.

MAX_CONCURRENT_JOBS=5

Setting this too high on an underpowered machine causes all jobs to compete for CPU or GPU, increasing the total time for each individual job. A good starting point is:

CPU deployment: set MAX_CONCURRENT_JOBS to the number of physical cores divided by 2
GPU deployment: set MAX_CONCURRENT_JOBS to 1–2 per GPU, depending on VRAM

Scaling with multiple workers

Running multiple worker processes is the primary way to increase throughput. Each worker picks up jobs independently from the Redis queue. You can run workers on the same machine or across multiple machines, as long as all workers share the same Redis instance and storage backend.

# Start two workers on the same machine (in separate terminals)
python run_worker.py --worker-id worker-1 &
python run_worker.py --worker-id worker-2 &

With two workers each handling up to MAX_CONCURRENT_JOBS at once, your effective throughput doubles.

Job timeout

PROCESSING_TIMEOUT_SECONDS sets the maximum wall-clock time a job may run before Clipzy marks it as failed.

PROCESSING_TIMEOUT_SECONDS=3600

If a job exceeds this limit, its status becomes failed with error code JOB_TIMEOUT. Jobs do not automatically retry on timeout. You must re-submit them manually.

Setting PROCESSING_TIMEOUT_SECONDS too low will cause legitimate long videos to fail. A 60-minute video processed on CPU may take 20–30 minutes. Ensure your timeout accommodates your longest expected input.

Processing stages

Clipzy runs these stages sequentially for each video. The table shows approximate durations for a 2-minute video on CPU and GPU hardware:

Stage	CPU	GPU
Video ingestion	2–3s	2–3s
Shot detection	5–8s	5–8s
Motion analysis	10–15s	2–4s
Audio analysis	3–5s	3–5s
Speech recognition	30–60s	5–10s
Visual embeddings	20–30s	5–10s
Color analysis	2–3s	2–3s
Total	73–124s	24–36s

Speech recognition and visual embeddings account for the majority of processing time on CPU. A GPU deployment reduces both stages dramatically and is recommended for any production workload with consistent video volume.

Expected processing times by video length

The table below shows rough estimates for end-to-end job duration (not including queue wait time):

Video length	CPU	GPU
1 minute	35–65s	12–18s
2 minutes	73–124s	24–36s
5 minutes	3–5 min	60–90s
15 minutes	9–15 min	3–5 min
60 minutes	35–60 min	12–20 min

These are estimates for typical talking-head or interview-style content. Videos with dense motion, multiple speakers, or complex audio may take longer.

Checking queue depth

To understand how many jobs are waiting, use GET /api/v1/jobs with page_size=1 and read the total field. Compare that count against the number of jobs currently in processing status to gauge queue depth.

curl "http://localhost:8000/api/v1/jobs?page=1&page_size=1"

{
  "jobs": [...],
  "total": 14,
  "page": 1,
  "page_size": 1
}

If the total count is consistently growing, you need more workers or a higher MAX_CONCURRENT_JOBS value.

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

Concurrency

Scaling with multiple workers

Job timeout

Processing stages

Expected processing times by video length

Checking queue depth

Get Started

Core Concepts

Guides

Configuration

Troubleshooting

​Concurrency

​Scaling with multiple workers

​Job timeout

​Processing stages

​Expected processing times by video length

​Checking queue depth

Concurrency

Scaling with multiple workers

Job timeout

Processing stages

Expected processing times by video length

Checking queue depth