name: victorialogs-analysis description: VictoriaLogs log analysis using LogsQL. Use when querying logs stored in VictoriaLogs. Provides statistics-first investigation with server-side aggregation. allowed-tools: Bash(python *)
VictoriaLogs Analysis
Query and analyze logs from VictoriaLogs for incident investigation and debugging.
Authentication
IMPORTANT: Credentials are injected automatically by a proxy layer. Do NOT check for VICTORIALOGS_TOKEN in environment variables - they won't be visible to you. Just run the scripts directly; authentication is handled transparently.
MANDATORY: Statistics-First Investigation
NEVER dump raw logs. Always follow this pattern:
STATISTICS → FIELDS → SAMPLE → QUERY
- Statistics First — Know volume, error rate, top streams before looking at logs
- Field Discovery — Understand what fields exist (helps build targeted queries)
- Strategic Sampling — See representative errors only (max 20 entries)
- Targeted Query — Only if needed, always with
| statsor| limit
Available Scripts
All scripts are in .claude/skills/observability-victorialogs/scripts/
PRIMARY INVESTIGATION SCRIPTS
get_statistics.py — ALWAYS START HERE
Comprehensive statistics using server-side LogsQL aggregation.
python .claude/skills/observability-victorialogs/scripts/get_statistics.py
python .claude/skills/observability-victorialogs/scripts/get_statistics.py --query '_stream:{app="api"}'
python .claude/skills/observability-victorialogs/scripts/get_statistics.py --time-range 120 --json
Output includes:
- Total count, error count, error rate percentage
- Logs per minute
- Top 10 streams by volume
- Top error patterns (normalized and deduplicated)
- Actionable recommendation
sample_logs.py — Strategic Sampling
Choose the right sampling strategy based on statistics.
python .claude/skills/observability-victorialogs/scripts/sample_logs.py --strategy errors_only
python .claude/skills/observability-victorialogs/scripts/sample_logs.py --query '_stream:{app="api"}' --strategy errors_only
python .claude/skills/observability-victorialogs/scripts/sample_logs.py --strategy around_time --timestamp "2026-01-27T05:00:00Z" --window 5
python .claude/skills/observability-victorialogs/scripts/sample_logs.py --strategy all --limit 20
Strategies:
errors_only— Only error/exception/fail logs (default)warnings_up— Warning and error logsaround_time— Logs around a specific timestampall— All log levels
list_fields.py — Metadata Discovery
Discover available fields before building queries.
python .claude/skills/observability-victorialogs/scripts/list_fields.py
python .claude/skills/observability-victorialogs/scripts/list_fields.py --query '_stream:{app="api"}'
python .claude/skills/observability-victorialogs/scripts/list_fields.py --field level
python .claude/skills/observability-victorialogs/scripts/list_fields.py --field service --limit 20 --json
query_logs.py — Raw LogsQL (escape hatch)
Execute raw LogsQL queries. Safety limit auto-appended if missing.
python .claude/skills/observability-victorialogs/scripts/query_logs.py --query 'error | stats by (service) count() hits | sort by (hits) desc'
python .claude/skills/observability-victorialogs/scripts/query_logs.py --query '_stream:{app="api"} AND timeout' --limit 10
python .claude/skills/observability-victorialogs/scripts/query_logs.py --query 'status:>=500 | stats count() total' --json
LogsQL Quick Reference
Filters (what logs to select)
# Word search (searches _msg field by default)
error
"connection timeout"
# Field-specific match
level:error
service:payment
status:500
# Stream selector (efficient — uses indexed stream labels)
_stream:{app="api"}
_stream:{app="api", namespace="production"}
# Negation
NOT error
NOT level:debug
# Logical operators
error AND timeout
error OR exception OR fatal
# Numeric comparison
status:>=500
duration:>1000
# Regex
_msg:~"timeout.*connection"
service:~"payment-.*"
Pipes (post-processing)
# Limit results (ALWAYS use this or | stats)
error | limit 20
# Select specific fields
error | fields _time, _msg, service, level
# Sort
error | sort by (_time) desc
# Deduplicate
error | uniq by (service, _msg)
# Statistics (server-side aggregation — most context-efficient)
error | stats count() total
error | stats by (service) count() hits | sort by (hits) desc | limit 10
* | stats by (level) count() hits
# Available stats functions:
# count(), sum(field), avg(field), min(field), max(field),
# count_uniq(field), uniq_values(field)
Common Patterns
# Count errors by service
error OR exception | stats by (service) count() errors | sort by (errors) desc
# Find unique error messages
error | stats by (_msg) count() hits | sort by (hits) desc | limit 10
# Errors in a specific time window (use --start/--end in scripts)
_stream:{app="api"} AND error | limit 20
# HTTP 5xx errors
status:>=500 | stats by (path) count() hits | sort by (hits) desc
# Slow requests
duration:>1000 | stats by (service) count() slow | sort by (slow) desc
Investigation Workflows
Standard Incident Investigation
┌───────────────────────────────────────────────────┐
│ 1. STATISTICS FIRST (mandatory) │
│ python get_statistics.py │
│ → Know volume, error rate, top streams │
└───────────────────────────────────────────────────┘
│
▼
High Error Rate?
┌─────────────┴─────────────┐
│ │
YES (>5%) NO
│ │
▼ ▼
┌───────────────────────┐ ┌────────────────────────┐
│ 2. FAST PATH │ │ 2. DISCOVER FIELDS │
│ sample_logs.py │ │ list_fields.py │
│ --strategy │ │ → Understand schema │
│ errors_only │ │ → Build targeted │
│ │ │ query │
└───────────────────────┘ └────────────────────────┘
Quick Commands Reference
| Goal | Command |
|---|---|
| Start investigation | get_statistics.py |
| Discover fields | list_fields.py --query '_stream:{app="X"}' |
| Sample errors | sample_logs.py --strategy errors_only |
| Investigate spike | sample_logs.py --strategy around_time --timestamp T |
| Count by service | query_logs.py --query 'error | stats by (service) count() hits' |
Anti-Patterns to Avoid
- NEVER run a bare query without
| limitor| stats— Scripts auto-append limits as safety net, but write efficient queries - NEVER skip
get_statistics.py— It's the mandatory first step - NEVER fetch all logs — Use sampling strategies and aggregation
- NEVER ignore error rate — High error rate means investigate patterns, not dump logs
- NEVER query without time bounds — Always specify
--time-rangeor use defaults