name: posix-awk description: > Write strictly POSIX-conformant awk and flag gawk/mawk extensions (gensub, asort, length(array), systime, FIELDWIDTHS, bitwise fns). Invoke for any awk you write, edit, or review, including one-liners in shell scripts, pipelines, Makefiles, or CI. If awk appears, invoke this skill.
POSIX Awk
mawk is the default /usr/bin/awk on Ubuntu and GitHub-hosted runners, so a gawk-ism that runs locally can break or behave differently in CI. Write to POSIX awk; reach for an extension only when the runtime is known to be gawk.
POSIX Vs Gawk
Non-POSIX features to avoid by default:
- String functions
gensub,patsplit,strtonum,asort,asorti: gawk-only. Use POSIXsub/gsub/split/match. gensubbackreferences: no POSIX equivalent; restructure withmatch+substr.- Time functions
systime,strftime,mktime: gawk extensions, not portable (mawk support is version-dependent). - For dates, run shell
dateand pass the value in with-v. - Bit and type functions
and,or,xor,lshift,rshift,compl,isarray,typeof: gawk-only. length(array): non-POSIX. Count withfor (k in arr) n++.delete arr(whole array): non-POSIX. Delete elements withdelete arr[k], or loop.nextfile: non-POSIX.- Multi-character/regex
RSand theRTvariable: gawk-only. POSIXRSis a single character. - gawk-only variables:
PROCINFO,FPAT,FIELDWIDTHS,IGNORECASE,ARGIND,ERRNO,BINMODE. \xhex escapes: non-POSIX. Octal\dddis POSIX.
Passing Shell Values
- Pass shell values in as data with
awk -v var="${value}", not by splicing them into program text; referencevarinside the program.
Regex Patterns
- Embed constant regex as
/regex/literals; reserve-v var=...for dynamic values (user input, env data). awk -vruns C-style string-escape processing on the value before the regex engine sees it.- POSIX leaves unknown escapes like
\.undefined; mawk strips the backslash (with a warning), so\.becomes.(any char), not a literal dot. - Don't rely on this: the one true awk strips
\.too, and behavior varies across awks and versions. Never count on a bare\.surviving the-vlayer. - Regex literals (
/.../) skip the string layer and behave consistently across awks. - For genuinely dynamic patterns, the caller owns the double-escape (e.g. shell-level
\\.to survive both layers). sub(/regex/, repl)returns the substitution count, so it doubles as a boolean. Prefercond && sub(/r/, repl) { action }overif (match($0, r)) { action; sub(r, repl) }, which duplicates the regex.- Build a non-trivial regex from named parts: assign each fragment to a
BEGINvariable named for what it matches, then concatenate. The names document the regex in place of a comment. - Name parts and patterns for their meaning (
leading_indent,key_separator,key_line_pattern), never opaque abbreviations (keyre).
Script Structure
- Default to the "filter pattern": specific pattern-action blocks at the top, a trailing
{ print }passthrough at the bottom to handle every line the blocks don't claim. - When a matching block prints its own output, end it with
nextto skip the passthrough and avoid double-printing. - When a matching block mutates
$0in place (e.g. two-argumentsub), don'tprintornext; let the passthrough print the modified record. - When the caller needs "did anything match?", set a flag in the matching block, check it in
END, andexit 2(a distinct non-zero) so "no match" is distinguishable from awk's own errors. - Don't mutate
-vinputs in place. Copy to a working local (output = replacement) beforesub/gsub, so the input stays visible and reusable.
Variable Naming
- Use descriptive snake_case for
-vvariables (pattern,replacement,version,output); the shell-side casing need not match.
Comments
- Keep trailing
#comments to a short phrase, aligned to a consistent column within each block (at least two spaces past the block's longest code line). - When a comment needs more than a phrase (typically the "why" behind a non-obvious line), put it on its own line above the code, not trailing.
Example
awk -v pattern="${LINE_MATCH}" -v replacement="${LINE_REPLACE}" -v version="${LATEST_VERSION}" '
$0 ~ pattern { # Match on the current line using the regex supplied in `pattern`.
output = replacement # Working copy of the replacement template.
gsub(/\{version\}/, version, output) # Substitute {version} with the latest version.
print output
next # Skip the passthrough block for this line.
}
{ print } # Passthrough for non-matching lines.
' "${FILE}"