name: clickhouse-nanopass description: "Use this skill when writing ClickHouse SQL transformation passes, pipelines, macro expanders, or analysis functions using the nanopass framework. Triggers include: any mention of 'nanopass', 'SQL pass', 'SQL transformation', 'SQL rewrite', 'ClickHouse pass', 'macro expansion', 'qualify tables', 'add WHERE', or requests to manipulate ClickHouse SQL programmatically in Go. Also use when the user wants to parse ClickHouse SQL, walk a CST, rewrite tokens, build scope-aware transformations, or compose SQL→SQL pipelines. Do NOT use for general SQL querying, ClickHouse client usage, or ORM-based database access."
ClickHouse SQL Nanopass Framework
Critical Rules — Read Before Writing Any Code
- Every pass re-parses from scratch. A pass receives a
stringand returns astring. Never share aParseResultorTokenStreamRewriteracross passes. - Every pass must return syntactically valid SQL. If you cannot guarantee this, add
nanopass.Validateafter your pass in the pipeline. - Use
BuildScopesfor any pass that touches tables, WHERE clauses, or needs UNION ALL awareness. Never useFindAll/FindFirstfor table references — use scopes instead. - Never use
FindAllto locateTableIdentifierContextdirectly.TableIdentifierappears both in FROM clauses and insideColumnIdentifier(as column qualifiers liket1.id). Scopes handle this correctly. - The grammar parses
NOT(x)as a function call, not logical NOT. OnlyNOT x(without parens) producesColumnExprNotContext. - Whitespace is on the hidden channel. The lexer uses
-> channel(HIDDEN)for whitespace and comments. TheTokenStreamRewriterpreserves them automatically.
Module Layout
github.com/stergiotis/boxer/public/db/clickhouse/dsl/nanopass/
├── parse.go # Parse(sql) → *ParseResult
├── walk.go # WalkCST, FindAll, FindFirst
├── rewrite.go # NewRewriter, ReplaceNode, DeleteNode, InsertBefore, InsertAfter, etc.
│ # TrackedRewriter for overlap detection
├── pipeline.go # Pass type, Pipeline, FixedPoint, FixedPointPipeline, Validate
├── scope.go # BuildScopes → []*SelectScope, table alias resolution, CTE tracking
├── macro.go # MacroExpander, ExtractLiteralArgs, LiteralArg, LiteralTypeE
├── analysis/
│ ├── tables.go # ExtractTables
│ ├── columns.go # ExtractColumns
│ └── functions.go # ExtractFunctions
├── passes/
│ ├── normalize_case.go
│ ├── normalize_whitespace.go
│ ├── strip_comments.go
│ ├── remove_parens.go
│ ├── qualify_tables.go
│ ├── add_where.go
│ ├── add_settings.go
│ ├── rewrite_functions.go
│ └── helpers.go # findOutermostSelectStmt, findLastSelectStmtClause
└── testdata/
├── corpus.go # LoadCorpus() via embed.FS
└── corpus/ # 56 .sql files
Dependencies and Imports
import (
"github.com/antlr4-go/antlr/v4"
"github.com/stergiotis/boxer/public/db/clickhouse/dsl/grammar"
"github.com/stergiotis/boxer/public/observability/eh"
"github.com/rs/zerolog"
"github.com/stergiotis/boxer/public/db/clickhouse/dsl/nanopass"
)
Coding Standards (Mandatory)
- Named return values on all functions/methods
- Naked returns after setting
err - No
if err := func(); err != nil— always assign then check - Receiver name always
inst - Interfaces end with
I, enums end withE - Use
eh.Errorfwith%wfor error wrapping - Pre-allocate slices/maps when size is known
- Explicit anonymous blocks for scoping within large functions
How to Write a New Pass
Template: Simple Token-Level Pass
For passes that operate on individual tokens (keywords, whitespace, comments):
package passes
import (
"github.com/stergiotis/boxer/public/db/clickhouse/dsl/grammar"
"github.com/stergiotis/boxer/public/observability/eh"
"github.com/stergiotis/boxer/public/db/clickhouse/dsl/nanopass"
)
func MyTokenPass(sql string) (result string, err error) {
pr, err := nanopass.Parse(sql)
if err != nil {
err = eh.Errorf("MyTokenPass: %w", err)
return
}
rw := nanopass.NewRewriter(pr)
for i := 0; i < pr.TokenStream.Size(); i++ {
tok := pr.TokenStream.Get(i)
tokenType := tok.GetTokenType()
// Operate on tokens here
// Use: nanopass.ReplaceToken(rw, tok.GetTokenIndex(), newText)
// Use: nanopass.DeleteToken(rw, tok.GetTokenIndex())
_ = tokenType
}
result = nanopass.GetText(rw)
return
}
Template: CST Node Pass (No Scope Needed)
For passes that match specific CST node types but don't need table/CTE awareness:
func MyNodePass(sql string) (result string, err error) {
pr, err := nanopass.Parse(sql)
if err != nil {
err = eh.Errorf("MyNodePass: %w", err)
return
}
rw := nanopass.NewRewriter(pr)
nanopass.WalkCST(pr.Tree, func(ctx antlr.ParserRuleContext) bool {
switch c := ctx.(type) {
case *grammar.SomeSpecificContext:
// Operate on the node
nanopass.ReplaceNode(rw, c, "new text")
return false // don't descend into replaced node
}
return true // continue walking
})
result = nanopass.GetText(rw)
return
}
Template: Scope-Aware Pass
For passes that touch tables, WHERE clauses, or need UNION ALL awareness:
func MyStructuralPass(param string) nanopass.Pass {
return func(sql string) (result string, err error) {
pr, err := nanopass.Parse(sql)
if err != nil {
err = eh.Errorf("MyStructuralPass: %w", err)
return
}
rw := nanopass.NewRewriter(pr)
scopes := nanopass.BuildScopes(pr)
for _, scope := range scopes {
applyToScope(rw, scope, param)
}
result = nanopass.GetText(rw)
return
}
}
func applyToScope(rw *antlr.TokenStreamRewriter, scope *nanopass.SelectScope, param string) {
// Access scope.Node (*grammar.SelectStmtContext) for clauses:
// scope.Node.WhereClause()
// scope.Node.FromClause()
// scope.Node.ProjectionClause()
// scope.Node.OrderByClause()
// scope.Node.GroupByClause()
// scope.Node.HavingClause()
// scope.Node.LimitClause()
// scope.Node.SettingsClause()
// Access scope.Tables for FROM/JOIN table sources:
for _, ts := range scope.Tables {
if ts.IsCTE || ts.IsSubquery {
continue
}
// ts.Table, ts.Database, ts.Alias, ts.Node
}
// Resolve aliases:
// source, found := scope.ResolveAlias("alias_or_table_name")
// Check CTE names:
// def, found := scope.ResolveCTE("cte_name")
// Recurse into CTE bodies:
for _, cte := range scope.CTEDefs {
if cte.Scope != nil {
applyToScope(rw, cte.Scope, param)
}
}
// Recurse into subqueries:
for _, sub := range scope.Subqueries {
applyToScope(rw, sub, param)
}
// Recurse into FROM subquery scopes:
for _, ts := range scope.Tables {
if ts.IsSubquery && ts.Scope != nil {
applyToScope(rw, ts.Scope, param)
}
}
}
Template: Parameterized Pass (Returns nanopass.Pass)
When a pass needs configuration, return a closure:
func QualifyTables(defaultDB string) nanopass.Pass {
return func(sql string) (result string, err error) {
// ... implementation using defaultDB
}
}
Template: Macro Registration
expander := nanopass.NewMacroExpander()
expander.Register("macroName", func(args []nanopass.LiteralArg) (string, error) {
// args[i].Type: LiteralTypeString, LiteralTypeInt, LiteralTypeFloat, LiteralTypeNull
// args[i].Value: raw text (strings include quotes, e.g. "'hello'")
return "expanded SQL fragment", nil
})
// Use: expander.Pass() returns a nanopass.Pass
// For nested macros: nanopass.FixedPoint(expander.Pass(), 10)
How to Write Tests for a Pass
Every pass MUST have these four test categories:
1. Explicit Input/Output Pairs
func TestMyPass(t *testing.T) {
tests := []struct {
name string
input string
expected string
}{
{name: "basic", input: "SELECT a FROM t", expected: "SELECT a FROM t"},
// Add 5-10 cases covering the transformation
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := passes.MyPass(tt.input)
require.NoError(t, err)
assert.Equal(t, tt.expected, got)
// ALWAYS verify output is parseable
_, err = nanopass.Parse(got)
require.NoError(t, err, "produced invalid SQL: %s", got)
})
}
}
2. Idempotency
func TestMyPassIdempotent(t *testing.T) {
sqls := []string{
"SELECT a FROM t",
"SELECT a FROM t WHERE x > 1",
// Include cases that exercise the transformation
}
for i, sql := range sqls {
t.Run(fmt.Sprintf("idempotent_%d", i), func(t *testing.T) {
pass1, err := passes.MyPass(sql)
require.NoError(t, err)
pass2, err := passes.MyPass(pass1)
require.NoError(t, err)
assert.Equal(t, pass1, pass2, "not idempotent")
})
}
}
3. Corpus Validity
func TestMyPassOutputValidity(t *testing.T) {
entries, err := testdata.LoadCorpus()
require.NoError(t, err)
for _, entry := range entries {
t.Run(entry.Name, func(t *testing.T) {
out, err := passes.MyPass(entry.SQL)
if err != nil {
t.Skipf("pass failed: %v", err)
}
_, err = nanopass.Parse(out)
require.NoError(t, err, "produced invalid SQL for %s:\n%s", entry.Name, out)
})
}
}
4. Scope Preservation (for pure passes only)
Pure passes (case normalization, whitespace, comments, parens) must not change the query structure:
func TestMyPassPreservesScopes(t *testing.T) {
entries, err := testdata.LoadCorpus()
require.NoError(t, err)
for _, entry := range entries {
t.Run(entry.Name, func(t *testing.T) {
prBefore, err := nanopass.Parse(entry.SQL)
if err != nil { t.Skip() }
out, err := passes.MyPass(entry.SQL)
if err != nil { t.Skip() }
prAfter, err := nanopass.Parse(out)
require.NoError(t, err)
scopesBefore := nanopass.BuildScopes(prBefore)
scopesAfter := nanopass.BuildScopes(prAfter)
require.Equal(t, len(scopesBefore), len(scopesAfter))
})
}
}
Additional: UNION ALL Tests (for structural passes)
func TestMyPassUnionAll(t *testing.T) {
// Always test with 2+ UNION ALL branches
sql := "SELECT a FROM t1 UNION ALL SELECT b FROM t2"
got, err := passes.MyPass(sql)
require.NoError(t, err)
// Verify the transformation applied to BOTH branches
}
Additional: CTE Tests (for structural passes)
func TestMyPassCTEs(t *testing.T) {
// Verify CTE references are not accidentally modified
sql := "WITH cte AS (SELECT a FROM real_table) SELECT x FROM cte"
got, err := passes.MyPass(sql)
require.NoError(t, err)
// Verify: "FROM cte" unchanged, "FROM real_table" was transformed
}
Additional: Invalid SQL Rejection
func TestMyPassRejectsInvalid(t *testing.T) {
invalid := []string{"", " ", "SELECT", ";;;"}
for _, sql := range invalid {
_, err := passes.MyPass(sql)
assert.Error(t, err)
}
}
ANTLR4 Go Runtime API Quick Reference
// Parsing
pr, err := nanopass.Parse(sql)
// pr.Tree — *grammar.QueryStmtContext (root)
// pr.TokenStream — *antlr.CommonTokenStream
// pr.Parser — *grammar.ClickHouseParser
// Token access
tok := pr.TokenStream.Get(i) // antlr.Token
tok.GetTokenType() // int
tok.GetText() // string
tok.GetTokenIndex() // int
tok.GetChannel() // int (0=default, 1=hidden)
pr.TokenStream.Size() // int (total tokens including hidden + EOF)
// Rewriter
rw := nanopass.NewRewriter(pr)
nanopass.ReplaceNode(rw, ctx, "new text")
nanopass.DeleteNode(rw, ctx)
nanopass.InsertBefore(rw, ctx, "prefix ")
nanopass.InsertAfter(rw, ctx, " suffix")
nanopass.ReplaceToken(rw, tokenIndex, "new text")
nanopass.DeleteToken(rw, tokenIndex)
result := nanopass.GetText(rw)
// CST navigation
ctx.GetStart().GetTokenIndex() // first token index
ctx.GetStop().GetTokenIndex() // last token index
ctx.GetParent() // antlr.Tree
ctx.GetChildCount() // int
ctx.GetChild(i) // antlr.Tree
ctx.GetRuleIndex() // int
Key Grammar Types
Token Type Constants
// Keywords: grammar.ClickHouseLexerADD (1) through grammar.ClickHouseLexerJSON_TRUE (199)
// IDENTIFIER = 200
// STRING_LITERAL = 205
// WHITESPACE = 240 (on hidden channel)
// MULTI_LINE_COMMENT, SINGLE_LINE_COMMENT (on hidden channel)
CST Context Types for Type-Switching
Column expressions (alternatives of columnExpr rule, all have GetRuleIndex() == RULE_columnExpr):
*grammar.ColumnExprOrContext // a OR b
*grammar.ColumnExprAndContext // a AND b
*grammar.ColumnExprNotContext // NOT a (only without parens after NOT)
*grammar.ColumnExprIsNullContext // a IS [NOT] NULL
*grammar.ColumnExprPrecedence3Context // =, !=, <, >, <=, >=, [NOT] IN, [NOT] LIKE, GLOBAL [NOT] IN
*grammar.ColumnExprBetweenContext // a [NOT] BETWEEN b AND c
*grammar.ColumnExprPrecedence2Context // +, -, ||
*grammar.ColumnExprPrecedence1Context // *, /, %
*grammar.ColumnExprNegateContext // -a (unary)
*grammar.ColumnExprTernaryOpContext // a ? b : c
*grammar.ColumnExprLiteralContext // 42, 'hello', NULL
*grammar.ColumnExprIdentifierContext // column_name
*grammar.ColumnExprFunctionContext // func(args) — includes NOT(x)!
*grammar.ColumnExprParensContext // (expr) — single expression in parens
*grammar.ColumnExprTupleContext // (expr, expr, ...) — 2+ expressions
*grammar.ColumnExprSubqueryContext // (SELECT ...)
*grammar.ColumnExprCaseContext // CASE WHEN ... END
*grammar.ColumnExprCastContext // CAST(x AS T)
*grammar.ColumnExprAliasContext // expr AS alias
*grammar.ColumnExprArrayContext // [1, 2, 3]
*grammar.ColumnExprArrayAccessContext // arr[i]
*grammar.ColumnExprTupleAccessContext // t.1
*grammar.ColumnExprWinFunctionContext // func() OVER (...)
*grammar.ColumnExprParamSlotContext // {name: Type}
Table expressions (alternatives of tableExpr rule):
*grammar.TableExprIdentifierContext // table_name or db.table_name
*grammar.TableExprAliasContext // tableExpr AS alias
*grammar.TableExprSubqueryContext // (SELECT ...) in FROM
*grammar.TableExprFunctionContext // table_function(args)
Join expressions (alternatives of joinExpr rule):
*grammar.JoinExprTableContext // single table source
*grammar.JoinExprOpContext // left JOIN right ON condition
Key accessor patterns on SelectStmtContext:
stmt.FromClause() // IFromClauseContext or nil
stmt.WhereClause() // IWhereClauseContext or nil
stmt.ProjectionClause() // IProjectionClauseContext
stmt.GroupByClause() // IGroupByClauseContext or nil
stmt.HavingClause() // IHavingClauseContext or nil
stmt.OrderByClause() // IOrderByClauseContext or nil
stmt.LimitClause() // ILimitClauseContext or nil
stmt.LimitByClause() // ILimitByClauseContext or nil
stmt.SettingsClause() // ISettingsClauseContext or nil
stmt.PrewhereClause() // IPrewhereClauseContext or nil
stmt.ArrayJoinClause() // IArrayJoinClauseContext or nil
stmt.WindowClause() // IWindowClauseContext or nil
All clause accessors return interfaces. Cast to concrete types: stmt.WhereClause().(*grammar.WhereClauseContext).
SelectScope Reference
type SelectScope struct {
Node *grammar.SelectStmtContext // the SELECT statement
Tables []TableSource // FROM/JOIN sources
Parent *SelectScope // enclosing scope (nil for outermost)
CTEDefs []CTEDef // WITH clause definitions
UnionPeers []*SelectScope // all branches including self
Subqueries []*SelectScope // expression subqueries (WHERE, SELECT list, HAVING)
}
type TableSource struct {
Node antlr.ParserRuleContext // TableIdentifierContext or TableExprSubqueryContext
Database string // empty if unqualified
Table string // table name
Alias string // AS alias, empty if none
IsCTE bool // references a CTE name
IsSubquery bool // FROM (SELECT ...)
Scope *SelectScope // inner scope for subqueries
}
// Key methods:
scope.ResolveAlias(name) → (TableSource, bool) // alias or unaliased table name lookup
scope.ResolveCTE(name) → (CTEDef, bool) // walks ancestors
scope.AllScopes() → []*SelectScope // depth-first flattened
Operator Precedence Table
Used by RemoveRedundantParens. Higher number = tighter binding.
| Level | Operators | CST Context |
|---|---|---|
| 1 | OR | ColumnExprOr |
| 2 | AND | ColumnExprAnd |
| 3 | NOT (unary) | ColumnExprNot |
| 4 | IS [NOT] NULL | ColumnExprIsNull |
| 5 | =, !=, <, >, IN, LIKE, BETWEEN | ColumnExprPrecedence3, ColumnExprBetween |
| 6 | +, -, || | ColumnExprPrecedence2 |
| 7 | *, /, % | ColumnExprPrecedence1 |
| 8 | unary - | ColumnExprNegate |
| 9 | ?: ternary | ColumnExprTernaryOp |
| 99 | atoms | literals, identifiers, functions, etc. |
Common Pitfalls
- Don't use
rw.GetTextDefault()directly. Usenanopass.GetText(rw)which wraps it. - Don't modify the CST. Only use the
TokenStreamRewriter. The CST is read-only after parsing. WalkCSTreturnsfalseto skip a subtree, not to stop the walk. There is no "stop early" mechanism — set a flag and check it.GetChild(i)returnsantlr.Tree, notantlr.ParserRuleContext. Always type-assert:if stmt, ok := node.GetChild(i).(*grammar.SelectStmtContext); ok { ... }.- Go generics don't work reliably with ANTLR types for
findFirstChild-style helpers. Use explicitGetChild+ type assertion loops. IN (expr)with a single value — the(expr)is parsed asColumnExprParens, notColumnExprTuple. If your pass removes parens, guard against removing IN's right-operand parens.ColumnExprAliasContextcan appear anywhere in the select list. Don't confuse it withTableExprAliasContext.- The token stream includes EOF as the last token with type
-1. Always checktok.GetTokenType() == antlr.TokenEOFortok.GetTokenType() == -1when iterating. - Overlapping
ReplaceDefaultcalls — the later one silently wins. UseTrackedRewriterduring development to detect this. - The
NamedQueryContextinside CTEs has anIdentifier()method for the CTE name and aQuery()for the body. Access the body'sSelectUnionStmtContextby walking children of theQueryContext.
Pipeline Composition
// Sequential pipeline
result, err := nanopass.Pipeline(sql,
passes.StripComments,
passes.NormalizeKeywordCase,
passes.QualifyTables("mydb"),
passes.AddWhereCondition("tenant_id = 1"),
nanopass.Validate,
)
// Fixed-point (repeat until stable)
pass := nanopass.FixedPoint(passes.RemoveRedundantParens, 10)
// Fixed-point pipeline
pass := nanopass.FixedPointPipeline(5,
expander.Pass(),
passes.NormalizeKeywordCase,
)
// Logging wrapper
debugPass := nanopass.LoggingPass(logger, "my_pass", passes.MyPass)
Known Grammar Limitations
These ClickHouse features are NOT supported by the parser:
FROM t SELECT a(FROM-first syntax)WITH (SELECT x) AS name(scalar subquery CTE)SET param = {'key': [1,2]}(complex SET literal values)EXISTS (SELECT ...)(EXISTS predicate)
Queries using these features will fail at Parse() with a syntax error. Add them to the corpus as .sql files if/when grammar support is added.