name: hprscript
description: Full-power multi-pattern content search and code intelligence via the hprscript CLI (Vectorscan/Hyperscan). One pass matches ALL patterns at once, so multi-pattern search costs the same as single-pattern. Covers keyword/regex search, finding files that contain or lack a pattern, per-file counts, extracting function bodies and balanced delimiter blocks, annotating each match with its enclosing function, capture-group extraction, proximity filters (X near or without Y), representative-sample usages, ranking files by relevance to a query, and aggregation/grouping/cross-file symbol resolution via a JSON script DSL. TRIGGER whenever you would grep/search/find/locate/count across files, ask where something is defined, what calls it, show me usages, or which files contain X — or would otherwise reach for the Grep tool, grep, or rg. hprscript replaces those for the entire task. Invoke via the Bash tool; the hprscript binary is on PATH.
hprscript — full-power multi-pattern search (CLI)
hprscript scans files (or stdin) once and matches every pattern simultaneously via Vectorscan/Hyperscan. Adding patterns is virtually free — always batch patterns into one call instead of running the tool N times. It is read-only (never edits files). Default output is JSON Lines (one JSON object per match). Invoke through the Bash tool; the binary is on PATH as hprscript.
Always prefer hprscript over the Grep tool, grep, or rg — for the whole task. If a Hyperscan limitation blocks one pattern (e.g. lookaround), restructure the pattern or move to a -s script; do not fall back to grep/rg.
Pick the right tool fast
| Your intent | Reach for |
|---|---|
| Match records (file, line, text, context) | default (-p) |
| Only filenames that contain a pattern | -f |
| Filenames missing a pattern | -absent |
| Per-file match counts | -c |
| Only the matched text | -o |
| Cheapest way to read matches into your own context | -llm |
| Function body / JSON value / JSX subtree (cross-line) | -block-open + -block-close |
| "What function/class is this match inside?" | -scope auto |
| Pull capture groups out as named fields | -extract a,b |
| X with Y nearby / X without Y nearby | -near / -far |
| Representative usages, dedup near-identical lines | -sample N |
| Rank files by relevance to a set of signals | -s '{"rank":true,...}' |
| Counts, sums, manifests, grouping, cross-file resolve | -s script DSL |
Cheapness ladder when you don't need the match text: -f / -absent / -c ≪ -o / -llm ≪ default JSON. Use -limit N aggressively when you only need to know whether something exists — scanning stops early.
Core flags
Patterns
-p <pat>— case-sensitive pattern (repeatable; all match in one pass).-pi <pat>— case-insensitive pattern (Unicode-folding; repeatable; mix freely with-p).-w— whole-word (\b(?:…)\b). Or write\b…\binline for per-alternative control.
Split alternations into separate -p patterns when you want to know which branch matched where. Every match is tagged with the id of the pattern that produced it — pat in -j, a [p0]/[p1] prefix in -llm, $PAT_ID in -format/scripts. With one alternation (-p 'alpha|beta|gamma') every hit collapses to p0 — you lose attribution. Split it (-p alpha -p beta -p gamma) and each hit reports p0/p1/p2, so you see exactly which term fired on each line. Since adding patterns is free, default to splitting; keep an alternation only when the branches are one concept you don't need to tell apart, or must share a single -near/-far operand. In -s scripts give each pattern a meaningful "id" ("auth", "db") and it shows up verbatim as $PAT_ID instead of p3.
Targeting
-glob '<glob>'— e.g.'**/*.go','src/**/*.{ts,tsx}', absolute'/var/log/**/*.log'(repeatable).-exclude <rule>— glob ('*.min.js'), bare dir name (vendorskips anyvendor/), or path prefix ('src/generated/'). Repeatable.- Positional files/dirs also work. No glob + no files → reads stdin (pipelines).
Matching / Unicode
- UTF-8 is on by default: literal Cyrillic/CJK/emoji and
-picase-folding "just work";.= one codepoint. \w \d \sare ASCII by default.-ucpmakes them Unicode (but rejects many patterns as "too large" — prefer[\p{L}]{1,32}or[\w\x80-\xff]+).-no-utf8— byte-level matching (non-UTF-8 input, binary).
Volume / context
-limit <n>global cap (stops scanning) ·-m <n>per-file cap.-C <n>(=-context) symmetric ·-A <n>after ·-B <n>before.
Output modes (mutually exclusive)
| Flag | Output |
|---|---|
default / -j |
JSON Lines: {"file","pat","line","col","from","to","match","context"} |
-f |
File paths only, deduped (grep -l) |
-absent |
Files where the pattern is not found (grep -L) |
-c |
path:N per-file counts (grep -c) |
-o |
Matched text only (grep -o) |
-format '<tmpl>' |
Custom line with $FILE $LINE $COL $MATCH $CONTEXT $FROM $TO $PAT_ID (+ $BLOCK*, $ENCLOSING_*, $EXTRACT_* when active) |
-llm |
Token-efficient text grouped by file; auto-adapts to -block/-scope; prints a limit reached footer when truncated |
-llm is the best choice when you are about to read the matches — it strips JSON noise and dedupes file headers (~30–50% fewer tokens than -j). Switch back to -j only when you need offsets, pat, or extracted.
Pattern syntax (Hyperscan PCRE)
Most everyday regex works. Supported: . * + ? {n,m}, lazy *? +? ??, ^ $ (line-anchored — multiline is default), \A \z (buffer), \b \B, \d \w \s + negations, classes [...], groups (...)/(?:...), alternation |, inline flags (?i)(?m)(?s)(?x), escapes \xHH \uHHHH \n \t ….
NOT supported (clear compile error — restructure, don't fall back to grep):
- Lookarounds
(?=) (?!) (?<=) (?<!) - Backreferences
\1, atomic groups(?>…), conditionals(?(…)),\K
Gotchas: escape literal braces (interface\{\}); captures (...) are ignored unless you add -extract; from/to/col are always byte offsets even in UTF-8 mode.
Power features (the reasons to use hprscript)
Block extraction — anchor on a signature, pull the whole body
-block-open + -block-close pair each match with the next balanced delimiter block (depth-tracked; nesting handled).
# Every Go function body (signature + braces)
hprscript -p 'func \w+\(' -block-open '{' -block-close '}' -o '**/*.go'
# A single named function
hprscript -p 'func\s+LoadData\b' -block-open '{' -block-close '}' -o '**/*.go'
# Multi-char delimiters: a <div>…</div> subtree
hprscript -p '<div\b' -block-open '<div>' -block-close '</div>' -o '**/*.html'
# JSON value by anchor key (nested objects/arrays handled)
hprscript -p '"config"\s*:' -block-open '{' -block-close '}' -o '**/*.json'
# Call argument lists: ( ) instead of { }
hprscript -p '\bSpawn\s*' -block-open '(' -block-close ')' -o '**/*.go'
Depth tracking is lexical, not language-aware — a } inside a string/comment can skew it (usually fine for hand-written code). Python/Ruby have no brace — anchor the signature and use a script submatch over a window ending at the next ^def /^class .
Enclosing scope — "what function is this in?" inline
-scope auto (or go|rust|c|cpp|java|js|ts) adds enclosing.{name,kind,line_start,line_end} to every match — no follow-up scan.
hprscript -p '\bdangerous_call\(' -scope auto -llm -glob '**/*.go'
# Custom anchor (capture group 1 = name):
hprscript -p TODO -scope-pattern 'sub\s+(\w+)' -scope-open '{' -scope-close '}' -scope-kind perl-sub '**/*.pl'
Capture-group extraction — groups become typed fields
-extract name1,name2 re-extracts the preceding -p/-pi's capture groups (left-to-right) into an extracted map; usable as $EXTRACT_<NAME> in -format.
hprscript -p 'func\s+(\w+)\(([^)]*)\)' -extract name,args \
-format '$FILE:$LINE $EXTRACT_NAME($EXTRACT_ARGS)' '**/*.go'
# Optional groups become empty strings:
hprscript -p 'TODO(?:\(([^)]+)\))?:\s*(.*)' -extract author,message
Pattern relations — X with/without Y nearby
A:B:K — A and B are pattern IDs (p0,p1,…) or indices; K = line distance (0 = same line). Repeatable; multiple relations AND.
hprscript -p 'defer\b' -p 'Lock\(\)' -near p0:p1:3 -glob '**/*.go' # defer with Lock within 3 lines
hprscript -p 'log\.Print' -p 'allow-print' -far p0:p1:0 -glob '**/*.go' # log.Print with no allow-print on the line
Sample — representative usages, not 500 near-duplicates
-sample N returns ≤N matches stratified by (file, line-shape) (identifiers→_), round-robined across files. Great for "show me how X is used."
hprscript -p 'httpClient' -sample 10 -glob '**/*.go'
Byte budgets — protect your context from minified files
hprscript -p 'TODO' -max-context-bytes 200 -max-output-bytes 50000 -glob '**/*.{js,ts}'
-max-match-bytes · -max-context-bytes · -max-block-bytes · -max-output-bytes (stops scan, emits an output_truncated info line). UTF-8-safe; truncation is flagged with *_truncated, never silent.
Script mode (-s '<json>') — the full DSL
Reach for -s when flags aren't enough: aggregation (counts/sums/manifests), grouping, ranking, phases (collect→resolve across files), submatch/block logic, absent+present combined, set algebra, pagination. The whole script compiles once and runs in a single pass. Pass a file with -script <path>, positionally, or on stdin; positional paths after -s override the script's scan.
Top-level: scan, exclude, patterns (or phases), variables, context/context_before/context_after, limit, limit_per_file, skip, group_by, rank (+ rank_surprise, rank_rich_clusters), relations, scope, max_*_bytes, on_file_end, on_complete.
Pattern object: id, regexp (required), case_insensitive, word_boundary, utf8 (default true), ucp, weight (for rank), absent (fire once per file where NOT found), extract:[…], on_match:[actions] (omit → default emit).
Tokens (in any data/value/key string): $FILE $PAT_ID $LINE $COL $FROM $TO $MATCH $CONTEXT $CONTEXT_BEFORE $CONTEXT_AFTER; in on_block: $BLOCK $BLOCK_FULL $BLOCK_START $BLOCK_END $BLOCK_LINE_START $BLOCK_LINE_END; in lookup: $LOOKUP_KEY $LOOKUP_VALUE; with scope: $ENCLOSING_*; any var: $name (whole-string "$x" keeps native type; embedded → stringified).
Variables ({"v":{"type":…,"default":…}}): string int bool list map. Reset to default with {"action":"reset","vars":[…]} (typically in on_file_end).
Actions
| Group | Actions |
|---|---|
| Output | emit (JSON line; data optional), print (raw text; bypasses group_by) |
| Arithmetic | set increment decrement add subtract multiply divide reset |
| Lists | append collect unique_append sort (by key; "value":"desc") |
| Maps | map_set map_increment count (= map_increment on $PAT_ID) |
| Set algebra | set_difference set_intersection set_union ({target,a,b}; lists/maps→keysets) |
| Control | if (condition+then/else) · for_each (as for lists; key_as+as for maps) · stop (skip rest of file) |
| Cross-line | submatch (sub-patterns over $MATCH/text; sub-patterns may be absent) · block (open/close → on_block) · lookup (map/key → on_hit/on_miss) |
Conditions (if): eq ne gt lt gte lte · and or not · contains (substring or list element) · isset (var set & non-zero/non-empty).
Lifecycle: on_match (per match) → on_file_end (after each file; $FILE only) → on_complete (after all files & phases).
High-value script patterns
Aggregate — count per file, one summary line each
hprscript -s '{
"scan":["**/*.go"],
"variables":{"counts":{"type":"map"}},
"patterns":[{"id":"t","regexp":"TODO","on_match":[
{"action":"map_increment","target":"counts","key":"$FILE"}]}],
"on_complete":[{"action":"for_each","var":"counts","key_as":"f","as":"n","do":[
{"action":"emit","data":{"file":"$f","count":"$n"}}]}]
}'
Group — one JSON line per file (group_by)
hprscript -s '{"scan":["**/*.go"],"group_by":"file","patterns":[
{"id":"t","regexp":"TODO|FIXME","on_match":[
{"action":"emit","data":{"file":"$FILE","line":"$LINE","match":"$MATCH"}}]}]}'
Rank — which files are most relevant to a query (your fastest "where should I look?" tool)
hprscript -s '{
"scan":["**/*.go"],
"rank":true, "rank_surprise":true, "rank_rich_clusters":true,
"patterns":[
{"id":"endpoint","regexp":"func handle","weight":3},
{"id":"auth","regexp":"auth|token|session","case_insensitive":true,"weight":2},
{"id":"db","regexp":"db\\.|sql\\.","weight":1}]
}'
# → {"file":"api/handler.go","score":1.95,"density":0.78,"matched_patterns":[...],"surprise":{...}}
score = coverage^1.5 × Σweight / log(lines+10) + clusters. rank_surprise weights each pattern by IDF-style corpus rarity (matches-everywhere → ~1×, rare → boosted; needs ≥3 files). rank_rich_clusters rewards dense co-occurrence (3 patterns in 20 lines = +1.0). Rank rows replace per-match output.
Phases — collect in pass 1, resolve in pass 2 (cross-file symbols)
hprscript -s '{
"variables":{"defs":{"type":"map"},"_n":{"type":"string"}},
"phases":[
{"id":"defs","scan":["**/*.go"],"patterns":[{"id":"def","regexp":"func (\\w+)","extract":["name"],"on_match":[
{"action":"map_set","target":"defs","key":"$EXTRACT_NAME","value":"$FILE"}]}]},
{"id":"uses","scan":["**/*.go"],"patterns":[{"id":"call","regexp":"(\\w+)\\(","extract":["name"],"on_match":[
{"action":"lookup","map":"defs","key":"$EXTRACT_NAME",
"on_hit":[{"action":"emit","data":{"sym":"$EXTRACT_NAME","defined_in":"$LOOKUP_VALUE","used_in":"$FILE","line":"$LINE"}}]}]}]}
]
}'
Block + submatch — TODOs that live inside a specific function
hprscript -s '{"scan":["**/*.go"],"patterns":[{"id":"fn","regexp":"func handleRequest","on_match":[
{"action":"block","open":"{","close":"}","on_block":[
{"action":"submatch","text":"$BLOCK","patterns":[{"id":"todo","regexp":"TODO","on_match":[
{"action":"emit","data":{"file":"$FILE","line":"$LINE","todo":"$CONTEXT"}}]}]}]}]}]}'
Absent + present — files that have A but not B
hprscript -s '{
"scan":["**/*.go"], "variables":{"has_err":{"type":"bool"}},
"patterns":[
{"id":"err","regexp":"if err != nil","on_match":[{"action":"set","var":"has_err","value":true}]},
{"id":"no_wrap","regexp":"fmt\\.Errorf\\(","absent":true,"on_match":[
{"action":"if","condition":{"op":"eq","args":["$has_err",true]},
"then":[{"action":"emit","data":{"file":"$FILE","issue":"err without wrap"}}]}]}],
"on_file_end":[{"action":"reset","vars":["has_err"]}]
}'
Pagination: "skip":20,"limit":20 → records 21–40.
Agent recipes (intent → invocation)
# Locate a symbol's definition across shapes, one pass
hprscript -p 'func\s+LoadData\b' -p 'type LoadData\b' -p 'LoadData\s*=' -llm -glob '**/*.go'
# Who calls X — with the enclosing function of each call site
hprscript -p '\bLoadData\(' -scope auto -llm -glob '**/*.go'
# Rank files by relevance to a feature (see Rank above) — start here for "where do I look?"
# Files importing a package
hprscript -p '^import\s+"net/http"' -f -glob '**/*.go'
# Files MISSING a license header
hprscript -p 'Copyright|SPDX-License-Identifier' -absent -glob '**/*.go'
# Multi-language TODO/FIXME sweep — split so each hit reports which tag fired (pat=p0/p1/p2)
hprscript -pi 'TODO' -pi 'FIXME' -pi 'XXX' -C 1 -llm -glob '**/*.{go,py,js,ts,rs,c,cpp,h}'
# API surface: every function signature with file:line
hprscript -p 'func\s+\w+\s*\(' -format '$FILE:$LINE $MATCH' -glob '**/*.go'
# Representative usages of an identifier (no near-duplicate spam)
hprscript -p '\bhttpClient\b' -sample 8 -scope auto -glob '**/*.go'
# Mixed case-sensitivity in one pass: the Error type + case-insensitive notes
hprscript -p '\bError\b' -pi 'todo|fixme|hack' -glob '**/*.go' -exclude vendor
# Search piped input (stdin) — no glob/files
kubectl logs deploy/api --tail=10000 | hprscript -p 'ERROR|panic|fatal' -C 1
curl -s https://example.com | hprscript -p 'href="([^"]+)"' -extract url -o
Anti-patterns
- ❌ Multiple sequential
hprscriptcalls for different patterns → ✅ one call, many-p/-pi(free). - ❌ Cramming terms you want to tell apart into one alternation (
-p 'a|b|c') — every hit reportspat=p0, so you can't see which matched → ✅ split into separate-pand read thepat/[p0]/$PAT_IDtag per match. - ❌ Falling back to grep/rg because a lookaround/backref won't compile → ✅ restructure, or split into a
-sphase. - ❌
-s '{…}'when a flag set fits → ✅ flag mode is easier to read/verify. - ❌ Piping output through
grep/jqfor trivial filtering → ✅ use-f/-c/-o/-format/-absent, orgroup_by/rankin a script. - ❌ Dumping full JSON when you'll just read it → ✅
-llm(or-f/-cif you only need files/counts). - ❌ Letting a minified line blow your context → ✅ set
-max-context-bytes/-max-output-bytes.
Going deeper
Two exhaustive references ship with hprscript — read them on demand for the long tail:
- HPRSCRIPT.md — complete CLI + script-DSL reference (every flag, action, condition, Unicode/UTF-8 rules, exit codes).
- COOKBOOK.md — 200+ task recipes by domain (logs/observability, security/DFIR, source code, config/infra, data wrangling, DevOps).
Find them in this skill's directory (personal install), at the hprscript repo root, or alongside the binary. hprscript -h prints the live flag list; exit codes follow grep (0 match, 1 none, 2 error).