open source — video intelligence

Real-time CCTV
that actually thinks.

Vigilens monitors live RTSP streams continuously, detects events from natural-language queries, and makes every minute of footage searchable after the fact.

stack Python 3.11+

queue Redis Streams

memory SQLite

llm Gemini

The camera nobody watches

Why existing CCTV systems stay passive

A few months ago I noticed something strange about surveillance systems.

Modern cameras capture incredibly detailed footage, yet the way we interact with them hasn't changed much. A camera records. Something happens. A person opens the footage later and starts searching.

The camera saw everything. The problem is that nobody was actually watching.

That observation eventually became Vigilens - an experiment in turning live video streams into something that can understand, remember, and answer questions about what it sees.

rtsp_monitoring Watches live streams without interruption, chunking and screening frames on the fly.

trigger_queries Define events in plain English - "person falling", "fire" - with per-query confidence thresholds.

webhook_alerts Verified detections fire webhooks with clip context and timestamps, with automatic retry.

searchable_memory SQLite stores all events and activity timelines. Query past footage in natural language, any time.

activity_timeline Periodic scene summaries build a continuous, compactable record of what happened on each camera.

Architecture

Teaching cameras what matters

Most footage is boring. Hallways stay empty. Cars stay parked. Nothing happens.

Sending every second of video to a frontier multimodal model would be wasteful. The system therefore starts with a lightweight screening stage - a Qwen3-VL reranker that filters frames before anything expensive runs. Only relevant chunks make it to Gemini for structured verification.

Three purpose-built pipelines coordinate through Redis Streams consumer groups. Object storage holds clips; SQLite holds all queryable state.

system architecture vigilens-arch.png

pipeline view vigilens-pipeline.png

01 Event detection pipeline

1POST /streams/submit enqueues the stream job

2Stream worker chunks RTSP and screens with Qwen3-VL reranker

3Relevant chunks uploaded and pushed to llm.jobs

4LLM worker verifies events via Gemini structured output

5Verified detections persist to events table with dedupe guard

6Webhooks fire with retry

02 Activity timeline pipeline

1Stream worker samples frames in-memory - no dump files

2Clip builder creates short clips and uploads to MinIO

3Scene jobs enqueued to scene.jobs

4Scene worker summarizes each clip

5Summaries persist to scene_timeline with compaction support

03 Query pipeline

1POST /query accepts natural language + optional camera filter

2Query router selects event or activity path automatically

3SQLite returns matching rows with clip references

4API returns normalized results with timestamp, summary, clip URL, confidence

Demo

What the system actually looks like

Three views: system configuration, live streaming with alerts, and historical query.

Vigilens system setup - configure streams, triggers, and thresholds

system configuration demo_connect.png

Vigilens live stream view with real-time event detection

live stream + event detection demo_stream.png

Vigilens natural language query over video history

natural language query demo_query.png

Memory

Making video queryable

Detection is only half the problem. A system that alerts but can't answer questions about the past isn't much better than a passive archive.

Every verified event and every scene summary lands in SQLite with a timestamp, camera ID, clip reference, and confidence score. The query endpoint accepts plain English and routes automatically - to the event store for specific incidents, to the activity timeline for broader questions about what was happening.

The result is footage you can actually ask things.

Reliability

Built to stay up

Production deployments need to survive worker crashes, duplicate events, and disk pressure. Each of these is addressed explicitly.

Consumer group isolation

Redis Streams consumer groups prevent multiple workers from processing the same job simultaneously.

Crash recovery

Stale messages are automatically reclaimed via XAUTOCLAIM after a worker crash.

Idempotency guard

A dedupe_key on every event row prevents duplicate inserts across retries.

Disk control

In-memory frame sampling for the scene branch - no frame dump files, controlled disk growth.

Observability

Understanding what the system is doing

Every API call, worker job, and LLM invocation is captured end-to-end. Large payloads like raw video matrices are dynamically filtered from the UI. Tracing safely no-ops when credentials are absent.

FastAPI endpoints

Real-time insights into submit_stream and query executions.

Workers

End-to-end tracing of stream consumption and LLM processing jobs.

LLM calls

Prompts, multimodal verification latency, and output token counts.

Event pipeline

Deduplication metrics and database persistence timing.

VIGILENS_OPIK_ENABLED=true
OPIK_PROJECT_NAME="vigilens"
OPIK_WORKSPACE="your-workspace-name"
OPIK_API_KEY="your-opik-api-key"

Implementation Notes

Running it yourself

Prerequisites: Python 3.11+, FFmpeg on PATH, Redis, an S3-compatible store (MinIO), a Screener endpoint (Qwen3-VL, deployable on Modal), and a Gemini API key.

# 1. install
python -m venv env && source env/bin/activate
pip install -e . && pip install -e ".[test]"

# 2. configure
cp .env.example .env
# set REDIS_URL, S3_ENDPOINT, S3_BUCKET,
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
# SCREENER_BASE_URL, SCREENER_API_KEY, LLM_API_KEY

# 3. init db
make db-init

# 4. run services (separate terminals)
make api
make stream-worker
make llm-worker
make scene-worker

Endpoints

POST

/streams/submit

Creates and enqueues a stream processing job with trigger queries and webhook targets.

GET

/streams/{stream_id}

Returns stream lifecycle state from SQLite.