open source — video intelligence

Real-time CCTV
that actually thinks.

Vigilens monitors live RTSP streams continuously, detects events from natural-language queries, and makes every minute of footage searchable after the fact.

stack Python 3.11+
queue Redis Streams
memory SQLite
llm Gemini

Why existing CCTV systems stay passive

A few months ago I noticed something strange about surveillance systems.

Modern cameras capture incredibly detailed footage, yet the way we interact with them hasn't changed much. A camera records. Something happens. A person opens the footage later and starts searching.

The camera saw everything. The problem is that nobody was actually watching.

That observation eventually became Vigilens - an experiment in turning live video streams into something that can understand, remember, and answer questions about what it sees.

rtsp_monitoring Watches live streams without interruption, chunking and screening frames on the fly.
trigger_queries Define events in plain English - "person falling", "fire" - with per-query confidence thresholds.
webhook_alerts Verified detections fire webhooks with clip context and timestamps, with automatic retry.
searchable_memory SQLite stores all events and activity timelines. Query past footage in natural language, any time.
activity_timeline Periodic scene summaries build a continuous, compactable record of what happened on each camera.

Teaching cameras what matters

Most footage is boring. Hallways stay empty. Cars stay parked. Nothing happens.

Sending every second of video to a frontier multimodal model would be wasteful. The system therefore starts with a lightweight screening stage - a Qwen3-VL reranker that filters frames before anything expensive runs. Only relevant chunks make it to Gemini for structured verification.

Three purpose-built pipelines coordinate through Redis Streams consumer groups. Object storage holds clips; SQLite holds all queryable state.

Vigilens system architecture diagram
system architecture vigilens-arch.png
Vigilens pipeline detail
pipeline view vigilens-pipeline.png
01 Event detection pipeline
1POST /streams/submit enqueues the stream job
2Stream worker chunks RTSP and screens with Qwen3-VL reranker
3Relevant chunks uploaded and pushed to llm.jobs
4LLM worker verifies events via Gemini structured output
5Verified detections persist to events table with dedupe guard
6Webhooks fire with retry
02 Activity timeline pipeline
1Stream worker samples frames in-memory - no dump files
2Clip builder creates short clips and uploads to MinIO
3Scene jobs enqueued to scene.jobs
4Scene worker summarizes each clip
5Summaries persist to scene_timeline with compaction support
03 Query pipeline
1POST /query accepts natural language + optional camera filter
2Query router selects event or activity path automatically
3SQLite returns matching rows with clip references
4API returns normalized results with timestamp, summary, clip URL, confidence

What the system actually looks like

Three views: system configuration, live streaming with alerts, and historical query.

Vigilens system setup - configure streams, triggers, and thresholds
system configuration demo_connect.png
Vigilens live stream view with real-time event detection
live stream + event detection demo_stream.png
Vigilens natural language query over video history
natural language query demo_query.png

Making video queryable

Detection is only half the problem. A system that alerts but can't answer questions about the past isn't much better than a passive archive.

Every verified event and every scene summary lands in SQLite with a timestamp, camera ID, clip reference, and confidence score. The query endpoint accepts plain English and routes automatically - to the event store for specific incidents, to the activity timeline for broader questions about what was happening.

The result is footage you can actually ask things.

Built to stay up

Production deployments need to survive worker crashes, duplicate events, and disk pressure. Each of these is addressed explicitly.

Consumer group isolation

Redis Streams consumer groups prevent multiple workers from processing the same job simultaneously.

Crash recovery

Stale messages are automatically reclaimed via XAUTOCLAIM after a worker crash.

Idempotency guard

A dedupe_key on every event row prevents duplicate inserts across retries.

Disk control

In-memory frame sampling for the scene branch - no frame dump files, controlled disk growth.

Understanding what the system is doing

Every API call, worker job, and LLM invocation is captured end-to-end. Large payloads like raw video matrices are dynamically filtered from the UI. Tracing safely no-ops when credentials are absent.

FastAPI endpoints

Real-time insights into submit_stream and query executions.

Workers

End-to-end tracing of stream consumption and LLM processing jobs.

LLM calls

Prompts, multimodal verification latency, and output token counts.

Event pipeline

Deduplication metrics and database persistence timing.

VIGILENS_OPIK_ENABLED=true
OPIK_PROJECT_NAME="vigilens"
OPIK_WORKSPACE="your-workspace-name"
OPIK_API_KEY="your-opik-api-key"

Running it yourself

Prerequisites: Python 3.11+, FFmpeg on PATH, Redis, an S3-compatible store (MinIO), a Screener endpoint (Qwen3-VL, deployable on Modal), and a Gemini API key.

# 1. install
python -m venv env && source env/bin/activate
pip install -e . && pip install -e ".[test]"

# 2. configure
cp .env.example .env
# set REDIS_URL, S3_ENDPOINT, S3_BUCKET,
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
# SCREENER_BASE_URL, SCREENER_API_KEY, LLM_API_KEY

# 3. init db
make db-init

# 4. run services (separate terminals)
make api
make stream-worker
make llm-worker
make scene-worker

Endpoints

POST
/streams/submit
Creates and enqueues a stream processing job with trigger queries and webhook targets.
GET
/streams/{stream_id}
Returns stream lifecycle state from SQLite.
POST
/query
Natural-language query over event and activity memory, with optional camera filter.
curl -X POST http://localhost:8000/streams/submit \
  -H "Content-Type: application/json" \
  -d '{
    "camera_id": "cam_1",
    "rtsp_url": "rtsp://example.local/live",
    "trigger_queries": [
      {"query": "person falling", "threshold": 0.55},
      {"query": "fire", "threshold": 0.70}
    ],
    "webhook_urls": ["https://example.com/webhook"],
    "chunk_seconds": 10,
    "fps": 1
  }'
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "did someone fall", "camera_id": "cam_1"}'
{
  "route": "event",
  "results": [{
    "source": "event",
    "timestamp": "2026-04-15 12:00:00",
    "camera_id": "cam_1",
    "summary": "person fell near stairs",
    "clip_url": "https://...",
    "confidence": 1.0
  }]
}

Data model

table purpose key fields
streams Stream lifecycle state queued · processing · completed · failed
events Verified event memory dedupe_key, clip_url, confidence
scene_timeline Activity summary memory summary, compaction support

Testing

uv run --active pytest

make test-unit
make test-integration
make test-contract

What happened?

For decades, the answer lived somewhere inside hours of footage.


VigiLens explores a different possibility - one where cameras don't just record events,
but help us understand them.