name: bq-analytics-query description: Query bq-analytics data in BigQuery via the bq CLI. Use when the user asks for product analytics, conversion funnels, log searches, user trait lookups, or any "show me from the data" question on a project that has bq-analytics installed.
Querying bq-analytics
If bq isn't on PATH
The skill assumes the gcloud BigQuery CLI is callable as bq. On macOS with Homebrew (brew install gcloud-cli), the binaries land in /opt/homebrew/share/google-cloud-sdk/bin/ and are not symlinked into /opt/homebrew/bin/, so non-interactive shells (the ones tools like Claude Code spawn) won't find them — even when an interactive which bq works.
If a bq invocation fails with command not found, run one of:
# Option A — make it visible to non-interactive shells (loaded before .zshrc):
echo 'export PATH="/opt/homebrew/share/google-cloud-sdk/bin:$PATH"' >> ~/.zshenv
# Option B — symlink directly into a dir that's already on the system PATH:
ln -s /opt/homebrew/share/google-cloud-sdk/bin/{bq,gcloud,gsutil} /opt/homebrew/bin/
The PATH entry that lives in ~/.zshrc only helps interactive shells — ~/.zshenv is the right file for tooling. After either fix, plain bq query … calls below work as written.
Tables you can rely on
events.raw — append-only events. Hot columns: event_name, user_id, ts.
JSON: properties (use JSON_VALUE for filtering).
events.identifies — append-only user trait updates.
events.groups — append-only group trait updates.
events.user_groups — append-only user→group memberships.
events.users — view: latest traits per user_id.
events.groups_current — view: latest traits per (group_type, group_id).
events.user_groups_current — view: most-recent group per (user_id, group_type).
events.feedback — append-only product feedback. Hot columns: kind, user_id, ts.
kind ∈ {"bug", "request", "general", ...custom}.
JSON: properties. Plain TEXT: subject, message.
logs.raw — Server-side SDK log() calls (logger.* / analytics.log()).
Hot columns: ts, level, source, message.
JSON: fields. path/status/request_id are nullable
(set only when the caller supplies them).
All event/log tables partition by DATE(ts). Always include a WHERE DATE(ts) > ... filter unless you're querying a single recent ID — full-table scans are 10–100× slower and bill against the 1 TB/mo free quota.
Schema discovery
bq show --schema --format=prettyjson <project>.<dataset>.<table>
bq ls <project>:<dataset>
Run a query
bq query --nouse_legacy_sql --format=json --max_rows=100 'SELECT ...'
Pass long SQL via stdin to avoid shell-quoting headaches with $:
bq query --nouse_legacy_sql --format=json <<'SQL'
SELECT JSON_VALUE(properties, '$.video_id') AS vid, COUNT(*) AS n
FROM `proj.events.raw`
WHERE event_name = 'translation.completed'
AND DATE(ts) > CURRENT_DATE() - 7
GROUP BY 1 ORDER BY n DESC LIMIT 10
SQL
Common queries
Event counts by name
SELECT event_name, COUNT(*) AS n
FROM `proj.events.raw`
WHERE DATE(ts) > CURRENT_DATE() - 7
GROUP BY 1 ORDER BY n DESC;
Funnel — pageview → translation.started → translation.completed
WITH steps AS (
SELECT user_id,
MAX(IF(event_name = 'pageview', 1, 0)) AS s1,
MAX(IF(event_name = 'translation.started', 1, 0)) AS s2,
MAX(IF(event_name = 'translation.completed', 1, 0)) AS s3
FROM `proj.events.raw`
WHERE DATE(ts) > CURRENT_DATE() - 7
GROUP BY user_id
)
SELECT SUM(s1) AS viewed, SUM(s2) AS started, SUM(s3) AS completed FROM steps;
Pro yearly users (joining identifies view)
SELECT u.user_id, u.last_seen, JSON_VALUE(u.traits, '$.email') AS email
FROM `proj.events.users` u
WHERE JSON_VALUE(u.traits, '$.plan') = 'pro'
AND JSON_VALUE(u.traits, '$.plan_period') = 'yearly';
Translations per household
SELECT g.group_id, JSON_VALUE(g.traits, '$.size') AS size, COUNT(e.event_id) AS n
FROM `proj.events.user_groups_current` ug
JOIN `proj.events.groups_current` g
ON g.group_type = ug.group_type AND g.group_id = ug.group_id
JOIN `proj.events.raw` e USING (user_id)
WHERE ug.group_type = 'household'
AND e.event_name = 'translation.completed'
AND DATE(e.ts) > CURRENT_DATE() - 30
GROUP BY 1, 2 ORDER BY n DESC;
Recent bug reports with user context
SELECT f.ts, f.subject, f.message,
JSON_VALUE(u.traits, '$.plan') AS plan,
JSON_VALUE(u.traits, '$.app_version') AS app_version,
JSON_VALUE(f.properties, '$.platform') AS platform
FROM `proj.events.feedback` f
LEFT JOIN `proj.events.users` u USING (user_id)
WHERE f.kind = 'bug'
AND DATE(f.ts) > CURRENT_DATE() - 7
ORDER BY f.ts DESC LIMIT 50;
Investigation: feedback + last 30 min of user activity
The agent-investigation pattern. One query gives you the report + traits + the user's recent events leading up to it — no cross-system stitching.
WITH f AS (
SELECT * FROM `proj.events.feedback`
WHERE DATE(ts) > CURRENT_DATE() - 7
AND kind = 'bug'
AND user_id = 'u_alice' -- or filter by feedback_id
)
SELECT
f.feedback_id, f.ts AS reported_at, f.subject, f.message,
JSON_VALUE(u.traits, '$.plan') AS plan,
JSON_VALUE(u.traits, '$.app_version') AS app_version,
ARRAY(
SELECT AS STRUCT e.event_name, e.ts, e.properties
FROM `proj.events.raw` e
WHERE e.user_id = f.user_id
AND e.ts BETWEEN TIMESTAMP_SUB(f.ts, INTERVAL 30 MINUTE) AND f.ts
ORDER BY e.ts DESC LIMIT 20
) AS recent_events
FROM f
LEFT JOIN `proj.events.users` u USING (user_id)
ORDER BY f.ts DESC;
Feature requests grouped by theme (manual triage)
SELECT subject, message, ts, user_id,
JSON_VALUE(properties, '$.app_version') AS app_version
FROM `proj.events.feedback`
WHERE kind = 'request'
AND DATE(ts) > CURRENT_DATE() - 30
ORDER BY ts DESC;
For semantic clustering, dump the messages and feed them to the agent — BQ doesn't do similarity search natively at indie scale.
Replace vercel logs --query "beacon"
SELECT ts, level, path, status, message
FROM `proj.logs.raw`
WHERE ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE)
AND CONTAINS_SUBSTR(message, 'beacon')
ORDER BY ts DESC LIMIT 50;
5xx rate by path, last hour
SELECT path,
COUNTIF(status >= 500) AS errs,
COUNT(*) AS total,
SAFE_DIVIDE(COUNTIF(status >= 500), COUNT(*)) AS rate
FROM `proj.logs.raw`
WHERE ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
AND path IS NOT NULL
GROUP BY path ORDER BY errs DESC LIMIT 20;
Note:
logs.rawcontains only what your server code emits vialogger.*/analytics.log(). It does NOT auto-capture every HTTP request — for request-level data (paths, status codes, pageviews) either log it explicitly or use thevercel-logsCLI.
When to fall back to vercel-logs instead
logs.raw holds only the lines your code explicitly emits. For anything Vercel
captures that you didn't log yourself, use the vercel-logs skill (CLI):
- Vercel runtime logs — request lines, status codes, and any third-party
console.*you can't intercept aren't inlogs.raw. Usevercel logs/get_runtime_logsMCP. - Build logs — compile errors, install failures, framework warnings during build go through
vercel inspect <url> --logsorget_deployment_build_logsMCP. - Live tail during a deploy — for "what's happening right now" use
vercel logs --follow. - Project doesn't have bq-analytics installed yet —
logs.rawdoesn't exist; everything goes viavercel logs.
Performance tips
- Always filter
WHERE DATE(ts) > .... This restricts to specific date partitions. Forgetting it scans the full table. - Cluster columns:
events.rawclusters on(event_name, user_id),logs.rawon(level, status). Adding these to yourWHEREis cheap. - JSON_VALUE for filtering, JSON_QUERY for nested access.
JSON_VALUEreturns a STRING; cast withCAST(... AS INT64)etc. for numeric comparisons. - Cost guardrails: BigQuery on-demand bills $5/TB after 1 TB free. A typical query at indie scale scans <100 MB and is free.
When to add a dedicated column
If a JSON field shows up in WHERE clauses constantly (e.g. properties.user_plan, traits.country), promote it to a real column:
ALTER TABLE `proj.events.raw` ADD COLUMN user_plan STRING;
-- Then update the SDK to write it both as a column and in properties (for back-compat).
You won't need this until volume climbs into 100M+ events/mo or queries get slow.
Anti-patterns to avoid
- Don't UPDATE/DELETE rows in
events.raw. Streaming-inserted rows can't be modified for ~90 minutes anyway. The append-only design is on purpose; useevent_idfor dedup if needed. - Don't query without
DATE(ts)filter unless the partition pruning is intentional. - Don't dual-write events to PostHog "for safety". That's the cost trap this whole architecture exists to avoid.
If a query is unexpectedly empty
- Streaming inserts have ~few-second visibility lag. Wait and retry.
- Check the
event_idexists inevents.rawdirectly with aLIMIT 1lookup — if missing, the SDK call probably failed.bq-analyticsreturns 5xx on insert failure; check the function logs. - Verify the dataset name matches
BQ_EVENTS_DATASET/BQ_LOGS_DATASETenv vars (defaultsevents/logs).