name: scalability-sqlite-pool description: > Connection pool topology, double-connection pattern detection, semaphore/lock inventory, and exhaustion headroom calculation for RAGAPPv3. Load before reviewing scalability, connection pool sizing, or concurrent user throughput.
Scalability & Connection Pool Patterns (RAGAPPv3)
Pool Inventory
The application has 4 independent SQLite connection pools:
| Pool | Size | Created by | Used by |
|---|---|---|---|
| Main | 10 (set at backend/app/lifespan.py:315; default max_size=5 at backend/app/models/database.py:2532 is overridden) |
lifespan.py:315 |
Auth, routes, vault operations, BackgroundProcessor, RAG engine |
| MemoryStore | 2 (backend/app/services/memory_store.py:144) |
memory_store.py init |
Memory search (FTS + dense) |
| DocumentProcessor fallback | 2 (backend/app/services/document_processor.py:556) |
document_processor.py |
Document ingestion when no pool is injected |
| FileWatcher fallback | 2 (backend/app/services/file_watcher.py:180) |
file_watcher.py |
File watcher scan path |
BackgroundProcessor does NOT have its own pool — it shares the main pool (max_size=10) via app.state.db_pool.
Double-Connection Pattern Detection
The most common scalability trap is a dependency that opens its own pool connection instead of using the DI-injected one. This doubles per-request connection consumption.
Pattern: evaluate_policy vs _evaluate_policy
There are two variants of the permission check function:
- DI variant (
_evaluate_policyatbackend/app/api/deps.py:393):_evaluate_policy(db, principal, resource_type, resource_id, action)— accepts an injected DB connection. Use this in FastAPI dependency chains. - Standalone variant (
evaluate_policyatbackend/app/api/deps.py:448):evaluate_policy(principal, resource_type, resource_id, action)— callsget_pool()andpool.get_connection()directly, opening a second connection.
require_vault_permission at backend/app/api/deps.py:476 uses the standalone variant, which means every vault-scoped request using this dependency consumes 2 pool connections instead of 1.
Detection heuristics
When reviewing scalability in this codebase:
- Search for
.get_connection()calls inside dependency functions — these bypass DI - Check if
require_vault_permissionis used alongsideDepends(get_db)— this signals double-connection - Prefer
get_evaluate_policy(the DI generator atbackend/app/api/deps.py:426) over the standaloneevaluate_policy
Auth Caching Gap
get_current_active_user at backend/app/api/deps.py:240 executes a fresh DB query on every authenticated request:
await asyncio.to_thread(lambda: db.execute(
"SELECT id, username, full_name, role, is_active, must_change_password FROM users WHERE id = ?",
(user_id,)
).fetchone())
This is the highest-frequency DB query in the system. There is no in-memory cache, no TTL, no memoization. At 10 concurrent users, this means 10 concurrent auth DB queries per request wave.
Semaphore & Lock Inventory
| Resource | Type | Size/Capacity | Timeout | Created at |
|---|---|---|---|---|
| Vector store write lock | asyncio.Lock |
1 | 30s (backend/app/services/vector_store.py:86, write_lock_timeout_seconds) |
Per VectorStore instance |
| Search semaphore | asyncio.Semaphore |
16 (default, backend/app/config.py:113, vector_search_concurrency) |
None | Per VectorStore instance |
| Embedding batch semaphore | asyncio.Semaphore |
4 (global, backend/app/services/embeddings.py:117) |
None | Module-level singleton |
| Background write semaphore | asyncio.Semaphore |
1 (backend/app/services/background_tasks.py:224) |
None | Per BackgroundProcessor |
| Circuit breaker (embeddings) | AsyncCircuitBreaker |
fail_max=5, reset=30s | N/A | Module-level singleton |
| Circuit breaker (LLM) | AsyncCircuitBreaker |
fail_max=5, reset=60s | N/A | Per LLMClient instance |
| Circuit breaker (reranker) | AsyncCircuitBreaker |
fail_max=3, reset=30s | N/A | Module-level singleton |
Exhaustion Headroom Calculation
To determine how many concurrent users the system can support:
headroom = floor(pool_size / connections_per_request) - concurrent_users
Where connections_per_request is the count of separate pool connections consumed across the full middleware + dependency chain for the worst-case request path.
Example at 10 concurrent users with a vault-scoped POST:
- Each request: auth (1 connection) + evaluate_policy (1 connection, standalone) = 2 connections
- Main pool: 10
- Without double-connection fix:
floor(10 / 2) - 10 = -5→ exhausted (5 users worth of requests blocking) - With double-connection fix:
floor(10 / 1) - 10 = 0→ at capacity
The headroom formula reveals that eliminating the double-connection pattern is a 2× improvement in effective capacity.
asyncio.to_thread Usage Pattern
Services using asyncio.to_thread for sync DB operations (correct — SQLite I/O releases GIL during C-level execution):
- MemoryStore:
search_memories()→ calls_fts_search+_dense_search(both sync, connections held sequentially, not simultaneously) - Document retrieval:
_get_indexed_file_idsinbackend/app/services/rag_engine.py:741 - Wiki retrieval:
wiki_retrieval.retrieveinbackend/app/services/rag_engine.py:626 - KMS retrieval:
kms_retrieval.retrieveinbackend/app/services/rag_engine.py:643
Services that are synchronous in an async context (event-loop blocking — should use to_thread):
backend/app/api/routes/vault_members.py— all handlers aredef, notasync def. FastAPI wraps sync handlers in a threadpool, but usingasyncio.to_threadexplicitly would be more consistent.
Load Test Checklist
When testing scalability at 8-10+ concurrent users, verify:
- Pool exhaustion:
asyncio.gather(*[client.post(vault_scoped_route) for _ in range(10)])→ all 10 succeed, none timeout - Auth caching: 10 rapid requests from same user → only 1 auth DB query (not 10)
- Permission check:
asyncio.gatherbatching vs sequential — measure latency at N concurrent - Search semaphore contention: 10 concurrent searches → measure p50/p99 tail latency
- Write lock starvation: concurrent search + upload → searches don't time out waiting for index creation