name: semgrep-rule-variant-creator description: Creates language variants of existing Semgrep rules. Use when porting a Semgrep rule to specified target languages. Takes an existing rule and target languages as input, produces independent rule+test directories for each language. license: CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/) origin: Adapted from Trail of Bits Skills Marketplace (https://github.com/trailofbits/skills) category: "security_testing" subcategory: "static-analysis"
Semgrep Rule Variant Creator
Port existing Semgrep rules to new target languages with proper applicability analysis and test-driven validation.
When to Use
Ideal scenarios:
- Porting an existing Semgrep rule to one or more target languages
- Creating language-specific variants of a universal vulnerability pattern
- Expanding rule coverage across a polyglot codebase
- Translating rules between languages with equivalent constructs
When NOT to Use
Do NOT use this skill for:
- Creating a new Semgrep rule from scratch (use
semgrep-rule-creatorinstead) - Running existing rules against code
- Languages where the vulnerability pattern fundamentally doesn't apply
- Minor syntax variations within the same language
Input Specification
This skill requires:
- Existing Semgrep rule - YAML file path or YAML rule content
- Target languages - One or more languages to port to (e.g., "Golang and Java")
Output Specification
For each applicable target language, produces:
<original-rule-id>-<language>/
├── <original-rule-id>-<language>.yaml # Ported Semgrep rule
└── <original-rule-id>-<language>.<ext> # Test file with annotations
Example output for porting sql-injection to Go and Java:
sql-injection-golang/
├── sql-injection-golang.yaml
└── sql-injection-golang.go
sql-injection-java/
├── sql-injection-java.yaml
└── sql-injection-java.java
Rationalizations to Reject
When porting Semgrep rules, reject these common shortcuts:
| Rationalization | Why It Fails | Correct Approach |
|---|---|---|
| "Pattern structure is identical" | Different ASTs across languages | Always dump AST for target language |
| "Same vulnerability, same detection" | Data flow differs between languages | Analyze target language idioms |
| "Rule doesn't need tests since original worked" | Language edge cases differ | Write NEW test cases for target |
| "Skip applicability - it obviously applies" | Some patterns are language-specific | Complete applicability analysis first |
| "I'll create all variants then test" | Errors compound, hard to debug | Complete full cycle per language |
| "Library equivalent is close enough" | Surface similarity hides differences | Verify API semantics match |
| "Just translate the syntax 1:1" | Languages have different idioms | Research target language patterns |
Strictness Level
This workflow is strict - do not skip steps:
- Applicability analysis is mandatory: Don't assume patterns translate
- Each language is independent: Complete full cycle before moving to next
- Test-first for each variant: Never write a rule without test cases
- 100% test pass required: "Most tests pass" is not acceptable
Overview
This skill guides the creation of language-specific variants of existing Semgrep rules. Each target language goes through an independent 4-phase cycle:
FOR EACH target language:
Phase 1: Applicability Analysis → Verdict
Phase 2: Test Creation (Test-First)
Phase 3: Rule Creation
Phase 4: Validation
(Complete full cycle before moving to next language)
Foundational Knowledge
The semgrep-rule-creator skill is the authoritative reference for Semgrep rule creation fundamentals. While this skill focuses on porting existing rules to new languages, the core principles of writing quality rules remain the same.
Consult semgrep-rule-creator for guidance on:
- When to use taint mode vs pattern matching - Choosing the right approach for the vulnerability type
- Test-first methodology - Why tests come before rules and how to write effective test cases
- Anti-patterns to avoid - Common mistakes like overly broad or overly specific patterns
- Iterating until tests pass - The validation loop and debugging techniques
- Rule optimization - Removing redundant patterns after tests pass
When porting a rule, you're applying these same principles in a new language context. If uncertain about rule structure or approach, refer to semgrep-rule-creator first.
Four-Phase Workflow
Phase 1: Applicability Analysis
Before porting, determine if the pattern applies to the target language.
Analysis criteria:
- Does the vulnerability class exist in the target language?
- Does an equivalent construct exist (function, pattern, library)?
- Are the semantics similar enough for meaningful detection?
Verdict options:
APPLICABLE→ Proceed with variant creationAPPLICABLE_WITH_ADAPTATION→ Proceed but significant changes neededNOT_APPLICABLE→ Skip this language, document why
Full guidance is inlined below (upstream references/applicability-analysis.md). (see upstream Trail of Bits prodsec-skills for companion files)
Phase 2: Test Creation (Test-First)
Always write tests before the rule.
Create test file with target language idioms:
- Minimum 2 vulnerable cases (
ruleid:) - Minimum 2 safe cases (
ok:) - Include language-specific edge cases
// ruleid: sql-injection-golang
db.Query("SELECT * FROM users WHERE id = " + userInput)
// ok: sql-injection-golang
db.Query("SELECT * FROM users WHERE id = ?", userInput)
Phase 3: Rule Creation
- Analyze AST:
semgrep --dump-ast -l <lang> test-file - Translate patterns to target language syntax
- Update metadata: language key, message, rule ID
- Adapt for idioms: Handle language-specific constructs
See Inlined: language syntax guide below.
Phase 4: Validation
# Validate YAML
semgrep --validate --config rule.yaml
# Run tests
semgrep --test --config rule.yaml test-file
Checkpoint: Output MUST show All tests passed.
For taint rule debugging:
semgrep --dataflow-traces -f rule.yaml test-file
Extended troubleshooting and examples: upstream references/workflow.md in the rule-creator plugin and semgrep-rule-creator in this repo.
Quick Reference
| Task | Command |
|---|---|
| Run tests | semgrep --test --config rule.yaml test-file |
| Validate YAML | semgrep --validate --config rule.yaml |
| Dump AST | semgrep --dump-ast -l <lang> <file> |
| Debug taint flow | semgrep --dataflow-traces -f rule.yaml file |
Key Differences from Rule Creation
| Aspect | semgrep-rule-creator | This skill |
|---|---|---|
| Input | Bug pattern description | Existing rule + target languages |
| Output | Single rule+test | Multiple rule+test directories |
| Workflow | Single creation cycle | Independent cycle per language |
| Phase 1 | Problem analysis | Applicability analysis per language |
| Library research | Always relevant | Optional (when original uses libraries) |
Documentation
REQUIRED: Before porting rules, read relevant Semgrep documentation:
- Rule Syntax - YAML structure and operators
- Pattern Syntax - Pattern matching and metavariables
- Pattern Examples - Per-language pattern references
- Testing Rules - Testing annotations
- Trail of Bits Testing Handbook - Advanced patterns
Inlined: applicability analysis (upstream references/applicability-analysis.md)
Applicability Analysis
Phase 1 of the variant creation workflow. Before porting a rule, analyze whether the vulnerability pattern applies to the target language.
Analysis Process
For EACH target language, answer these questions:
1. Does the Vulnerability Class Exist?
Determine if the vulnerability type is possible in the target language.
Examples:
- Buffer overflow: Applies to C/C++, may apply to Rust (in unsafe blocks), does NOT apply to Python/Java
- SQL injection: Applies to any language with database access
- XSS: Applies to any language generating HTML output
- Memory leak: Relevant in C/C++, less relevant in garbage-collected languages
- Type confusion: Relevant in dynamically typed languages, less relevant in strongly typed
2. Does an Equivalent Construct Exist?
Identify what the original rule detects and find equivalents.
Parse the original rule to identify:
- Sinks: What dangerous functions/methods does it detect?
- Sources: Where does tainted data originate?
- Pattern type: Is it taint-mode or pattern-matching?
Then research the target language:
- What are the equivalent dangerous functions?
- What are the common source patterns?
- Are there language-specific idioms to consider?
3. Are the Semantics Similar Enough?
Verify the pattern translates meaningfully.
Consider:
- Does the vulnerability manifest the same way?
- Are there language-specific mitigations that change detection needs?
- Would the ported rule provide actual security value?
Verdict Format
Document your analysis for each target language:
TARGET: <language>
VERDICT: APPLICABLE | APPLICABLE_WITH_ADAPTATION | NOT_APPLICABLE
REASONING: <specific analysis>
ADAPTATIONS_NEEDED: <if APPLICABLE_WITH_ADAPTATION>
EQUIVALENT_CONSTRUCTS:
- Original: <function/pattern>
- Target: <equivalent function/pattern>
Verdict Definitions
APPLICABLE
The pattern translates directly with minor syntax adjustments.
Criteria:
- Equivalent constructs exist with same semantics
- Vulnerability manifests identically
- Detection logic remains the same
Example:
Original: Python os.system(user_input)
Target: Go exec.Command(user_input)
VERDICT: APPLICABLE
REASONING: Both execute shell commands with user input. Vulnerability is
identical (command injection). Detection logic (taint from input to exec)
translates directly.
APPLICABLE_WITH_ADAPTATION
The pattern can be ported but requires significant changes.
Criteria:
- Vulnerability class exists but manifests differently
- Equivalent constructs exist but with different APIs
- Additional patterns needed for target language idioms
Example:
Original: Python pickle.loads(untrusted)
Target: Java ObjectInputStream.readObject()
VERDICT: APPLICABLE_WITH_ADAPTATION
REASONING: Both detect deserialization vulnerabilities but the APIs differ
significantly. Java requires detection of ObjectInputStream creation and
readObject() calls, not a single function call.
ADAPTATIONS_NEEDED:
- Different sink patterns (readObject vs loads)
- May need pattern-inside for ObjectInputStream context
- Consider readUnshared() variant
NOT_APPLICABLE
The pattern should not be ported to this language.
Criteria:
- Vulnerability class doesn't exist in target language
- No equivalent construct exists
- Pattern would be meaningless or misleading
Example:
Original: C buffer overflow detection
Target: Python
VERDICT: NOT_APPLICABLE
REASONING: Python handles memory management automatically. Buffer overflows
in the traditional C sense don't exist. The vulnerability class is not
present in the target language.
Common Applicability Patterns
Always Translate (Language-Agnostic Vulnerabilities)
These vulnerability classes exist across most languages:
- SQL injection (any language with DB access)
- Command injection (any language with shell execution)
- Path traversal (any language with file operations)
- SSRF (any language with HTTP clients)
- XSS (any language generating HTML)
Sometimes Translate (Context-Dependent)
These require careful analysis:
- Deserialization: Different mechanisms per language
- Cryptographic weaknesses: Language-specific crypto libraries
- Race conditions: Depends on concurrency model
- Integer overflow: Depends on type system
Rarely Translate (Language-Specific)
These are often NOT_APPLICABLE for other languages:
- Memory corruption (C/C++ specific)
- Type juggling (PHP specific)
- Prototype pollution (JavaScript specific)
- GIL-related issues (Python specific)
Library-Specific Rules
When the original rule targets a third-party library:
Step 1: Identify the Library's Purpose
What functionality does the library provide?
- ORM / Database access
- HTTP client/server
- Serialization
- Templating
- etc.
Step 2: Research Target Language Ecosystem
For the target language, identify:
- Standard library equivalents
- Popular third-party libraries with same functionality
- Language-specific idioms for this functionality
Step 3: Decide on Scope
Options:
- Native constructs only: Port to standard library equivalents
- Popular library: Port to the most common library in target ecosystem
- Multiple variants: Create separate rules for multiple libraries
Recommendation: Start with standard library or most popular option. Additional library variants can be created separately if needed.
Analysis Checklist
Before proceeding past Phase 1:
- Parsed original rule and identified pattern type
- Identified sinks, sources, and sanitizers (if taint mode)
- Researched equivalent constructs in target language
- Documented verdict with specific reasoning
- If APPLICABLE_WITH_ADAPTATION, listed required changes
- If NOT_APPLICABLE, documented clear explanation
Example Analysis
Original Rule: Python command injection via subprocess
rules:
- id: python-command-injection
mode: taint
languages: [python]
pattern-sources:
- pattern: request.args.get(...)
pattern-sinks:
- pattern: subprocess.call($CMD, shell=True, ...)
Target: Go
TARGET: Go
VERDICT: APPLICABLE_WITH_ADAPTATION
REASONING:
- Command injection exists in Go (vulnerability class present)
- Go uses exec.Command() and exec.CommandContext() for command execution
- Go doesn't have shell=True equivalent; commands run directly by default
- Shell execution in Go requires explicit bash -c wrapping
EQUIVALENT_CONSTRUCTS:
- Original sink: subprocess.call(cmd, shell=True)
- Target sinks:
- exec.Command("bash", "-c", cmd)
- exec.Command("sh", "-c", cmd)
- exec.Command(cmd) when cmd comes from user input
ADAPTATIONS_NEEDED:
1. Different sink patterns for Go's exec package
2. Source patterns need Go HTTP handler equivalents (r.URL.Query(), r.FormValue())
3. Consider both direct exec.Command and shell-wrapped variants
Target: Java
TARGET: Java
VERDICT: APPLICABLE
REASONING:
- Command injection exists in Java (vulnerability class present)
- Java uses Runtime.exec() and ProcessBuilder for command execution
- Direct equivalent functionality available
EQUIVALENT_CONSTRUCTS:
- Original sink: subprocess.call(cmd, shell=True)
- Target sinks:
- Runtime.getRuntime().exec(cmd)
- new ProcessBuilder(cmd).start()
ADAPTATIONS_NEEDED:
- Source patterns need Java servlet equivalents (request.getParameter())
- Consider both Runtime.exec and ProcessBuilder patterns
Inlined: language syntax guide (excerpt, upstream references/language-syntax-guide.md)
Language Syntax Translation Guide
Guidance for translating Semgrep patterns between languages. This is NOT a pre-built mapping—use these principles to research and adapt patterns for your specific case.
General Translation Principles
1. Never Assume Syntax Equivalence
What looks similar may parse differently:
# Python: method call on object
obj.method(arg)
# Go: might be method OR field access + function call
obj.Method(arg) # Method call
obj.Field(arg) # Field holding function, then called
Always dump the AST for your target language to see the actual structure.
2. Research Before Translating
For each construct in the original rule:
- Search target language documentation for equivalent
- Look for multiple ways the same thing can be written
- Check if language idioms differ significantly
3. Preserve Detection Intent, Not Literal Syntax
The goal is detecting the same vulnerability, not matching identical syntax.
# Original (Python) - detects eval of user input
pattern: eval($USER_INPUT)
# Go doesn't have eval() - what's the equivalent danger?
# Research shows: template execution, reflect-based eval, etc.
# Adapt to what actually creates the vulnerability in Go
AST Analysis
Always Dump the AST
semgrep --dump-ast -l <target-language> test-file
Compare how similar constructs are represented:
# Python
cursor.execute(query)
// Go
db.Query(query)
The AST structure may differ significantly even for conceptually similar operations.
Key Differences to Watch
| Aspect | May Differ |
|---|---|
| Method calls | Receiver position, syntax |
| Function arguments | Named vs positional, defaults |
| String handling | Interpolation, concatenation |
| Error handling | Exceptions vs return values |
| Imports | How namespaces work |
Metavariable Adaptation
Metavariables Work Cross-Language
Semgrep metavariables ($X, $FUNC, etc.) work in all languages:
# Works in Python
pattern: $OBJ.execute($QUERY)
# Works in Java
pattern: $OBJ.executeQuery($QUERY)
# Works in Go
pattern: $DB.Query($QUERY, ...)
Ellipsis Behavior
... matches language-appropriate constructs:
- In Python: matches arguments, statements
- In Go: matches arguments, statements (handles multi-return)
- In Java: matches arguments, statements, annotations
Common Translation Categories
Database Queries
Research for your target language:
- Standard library database package
- Popular ORM frameworks
- Raw query execution methods
Common patterns to look for:
- Query execution methods
- Prepared statement patterns
- String interpolation into queries
Command Execution
Research for your target language:
- Standard library process/exec package
- Shell execution vs direct execution
- Argument passing (array vs string)
File Operations
Research for your target language:
- File open/read/write APIs
- Path construction methods
- Directory traversal patterns
HTTP Handling
Research for your target language:
- Request parameter access
- Header access
- Body parsing
Researching Equivalents
Step 1: Identify What the Original Detects
Parse the original rule:
- What function/method is the sink?
- What's the vulnerability being detected?
- What makes it dangerous?
Step 2: Search Target Language Docs
Search for:
"<target language> <functionality>"(e.g., "golang exec command")"<target language> <vulnerability>"(e.g., "java sql injection")- Standard library documentation
- Semgrep Pattern Examples - Per-language pattern references
Step 3: Find All Variants
A single Python function may have multiple equivalents:
# Python has one main way
os.system(cmd)
// Java has multiple
Runtime.getRuntime().exec(cmd);
new ProcessBuilder(cmd).start();
ProcessBuilder.command(cmd).start();
Include all common variants in your rule.
Step 4: Check for Idioms
Languages have preferred patterns:
# Python: often inline
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
// Go: typically uses placeholders
db.Query("SELECT * FROM users WHERE id = ?", userID)
// Vulnerability is when they DON'T use placeholders
db.Query("SELECT * FROM users WHERE id = " + userID)
Source Pattern Translation
Web Framework Sources
Original rule sources need framework-specific translation:
# Python Flask
pattern: request.args.get(...)
# Java Servlet
pattern: $REQUEST.getParameter(...)
# Go net/http
pattern: $R.URL.Query().Get(...)
pattern: $R.FormValue(...)
Further sink/source examples and edge cases: (see upstream Trail of Bits prodsec-skills for companion files) — full language-syntax-guide.md.