name: bigquery-sql description: "BigQuery-specific SQL patterns: UNNEST for array expansion, STRUCT, ARRAY_AGG, DATE_DIFF/DATE_ADD, backtick-quoted table references, EXCEPT/REPLACE in SELECT, approximate aggregation, partitioned and wildcard tables." type: skill
BigQuery SQL Skill
1. Table References — Always Backtick-Quote
-- Full table reference
SELECT * FROM `project.dataset.table`;
-- Can omit project if using the default project
SELECT * FROM `dataset.table`;
2. Array Expansion — Use UNNEST
-- Explode an array column to rows
SELECT id, item
FROM `project.dataset.table`,
UNNEST(array_col) AS item;
-- UNNEST with offset (position)
SELECT id, item, pos
FROM `project.dataset.table`,
UNNEST(array_col) AS item WITH OFFSET AS pos;
-- UNNEST a literal array
SELECT * FROM UNNEST([1, 2, 3]) AS num;
3. Date Functions
-- Add/subtract time
DATE_ADD(order_date, INTERVAL 7 DAY)
DATE_ADD(CURRENT_DATE(), INTERVAL -1 MONTH)
-- Difference between dates
DATE_DIFF(end_date, start_date, DAY)
DATE_DIFF(end_date, start_date, MONTH)
-- Truncate to period
DATE_TRUNC(event_date, MONTH)
TIMESTAMP_TRUNC(event_ts, HOUR)
-- Current date/time
CURRENT_DATE()
CURRENT_TIMESTAMP()
4. SELECT EXCEPT and REPLACE
-- All columns except one
SELECT * EXCEPT (col_to_remove) FROM `dataset.table`;
-- Replace a column value inline
SELECT * REPLACE (UPPER(name) AS name) FROM `dataset.table`;
5. STRUCT and ARRAY_AGG
-- Create a STRUCT
SELECT STRUCT(id, name) AS person FROM `dataset.table`;
-- Aggregate rows into an array
SELECT department, ARRAY_AGG(employee_name) AS employees
FROM `dataset.employees`
GROUP BY department;
-- Aggregate into array of structs
SELECT ARRAY_AGG(STRUCT(id, name)) AS records FROM `dataset.table`;
6. Approximate Aggregation (for large tables)
-- Approximate distinct count (faster for large tables)
APPROX_COUNT_DISTINCT(user_id)
-- Approximate quantiles
APPROX_QUANTILES(value, 100)[OFFSET(50)] -- median
7. Partitioned Tables
When querying partitioned tables, always filter on the partition column to avoid full-table scans:
-- Partition on _PARTITIONDATE (pseudo-column)
WHERE _PARTITIONDATE >= '2024-01-01'
-- Partition on a date column
WHERE event_date BETWEEN '2024-01-01' AND '2024-12-31'
8. Wildcard Tables (date-sharded)
-- Query all date-sharded tables matching a prefix
SELECT * FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20241231';
9. String Functions
REGEXP_EXTRACT(col, r'pattern') -- extract first match
REGEXP_REPLACE(col, r'pattern', 'repl') -- replace matches
SPLIT(col, ',')[SAFE_OFFSET(0)] -- split, access by index
TRIM(col) / LTRIM(col) / RTRIM(col)
FORMAT('%s-%d', str_col, int_col) -- printf-style formatting
10. Common Anti-Patterns to Avoid
- Do NOT use
= NULL— useIS NULL - Do NOT forget to filter partitioned tables — costs money
- Do NOT use
COUNT(DISTINCT ...)on huge tables — useAPPROX_COUNT_DISTINCT - Always backtick-quote table names with dots in them
11. Benchmark Patterns
- STRING_AGG: Use
STRING_AGG(col, ',' ORDER BY col)for string aggregation (not GROUP_CONCAT). - SAFE_DIVIDE / SAFE_CAST: Use to avoid division-by-zero errors and cast failures.
- IF / IIF: BigQuery supports
IF(condition, true_val, false_val)— often cleaner than CASE WHEN for simple conditions. - GENERATE_DATE_ARRAY / GENERATE_TIMESTAMP_ARRAY: For date spine generation.
- Numeric precision: BigQuery's FLOAT64 can lose precision. Use NUMERIC type or ROUND() only when the question asks for it.
- INFORMATION_SCHEMA:
SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNSfor metadata queries — useful when schema_overview is insufficient.
12. Spider2 BigQuery Patterns
- Default project:
spider2-public-data. Table references:spider2-public-data.{dataset}.{table} - StackOverflow tags: Stored as pipe-delimited strings in
tagscolumn (e.g.,|python|python-2.7|). To filter for Python 2 specific questions (excluding Python 3):WHERE REGEXP_CONTAINS(tags, r'python-2') AND NOT REGEXP_CONTAINS(tags, r'python-3') - Date columns: Many BQ tables store dates as TIMESTAMP or DATE. Always check the actual type with describe_table.
- Large tables: Use partition filters and LIMIT during exploration. Avoid SELECT * on tables with >1M rows.