name: oss-fuzz-harness description: > End-to-end workflow for adding C/C++ projects to OSS-Fuzz: analyze source code to identify high-value fuzz targets, generate correct fuzz harnesses, create OSS-Fuzz project files, build and verify everything. Use this skill whenever the user wants to fuzz a C/C++ project, add a project to OSS-Fuzz, write fuzz harnesses, find critical functions to fuzz, or improve fuzzing coverage. Also trigger when the user mentions fuzzing, libFuzzer, AFL, honggfuzz, sanitizers (ASAN/UBSAN/MSAN), or asks about fuzz target selection for any C/C++ codebase.
OSS-Fuzz Harness Generator
This skill automates the full pipeline from source analysis to running fuzzers for C/C++ projects in the OSS-Fuzz framework.
Overview
The workflow has five phases:
- Attack Surface Analysis — Identify high-value fuzz targets by analyzing how untrusted input enters and flows through the codebase
- Research — Study actual function signatures and internal APIs
- Generate — Write correct harnesses and OSS-Fuzz project files
- Build — Compile and verify with OSS-Fuzz infrastructure
- Run — Execute fuzzers and confirm they work
Each phase has common pitfalls. This skill captures the lessons learned from real harness development so you avoid the usual traps.
Phase 1: Attack Surface Analysis
Analyze the project's attack surface to find the best fuzz targets. Do not
rely solely on automated tools — understand how untrusted input enters and
flows through the code. See references/attack-surface.md for detailed
methodology and real examples.
Step 1: Clone and orient
git clone --depth 1 <repo-url> /tmp/<project>
Get oriented quickly:
- Read the project README for an overview of what the project does
- Identify the project's public API prefix (e.g.,
av_for FFmpeg,pcap_for libpcap) - Check for existing fuzz targets — don't duplicate coverage that already exists
# Check for existing fuzz targets in the project
find /tmp/<project> -name "*fuzz*" -o -name "*fuzzer*" | head -20
# Check what OSS-Fuzz already covers
ls projects/<project>/ 2>/dev/null
Step 2: Identify input entry points
Find where untrusted data enters the codebase:
- I/O functions:
read(),recv(),fread(),fopen(),pcap_* - Public API functions: Functions with the project's API prefix that accept
user-supplied data (e.g.,
av_parse_time(),SSL_read()) - Format/protocol handlers: Function pointer tables, vtables, codec/format registration structs
- Command-line parsers:
getopt, option parsing, config file readers
# Find public API functions that take string or buffer input
grep -rn "^[a-z].*\(.*const char \*\|const uint8_t \*\|const unsigned char \*" \
/tmp/<project>/include/ /tmp/<project>/lib*/ 2>/dev/null | head -40
# Find format/protocol handler registrations
grep -rn "\.read\s*=\|\.parse\s*=\|\.decode\s*=\|\.dissect\s*=" \
/tmp/<project>/ | head -20
Step 3: Trace input flow
From each entry point, follow the data through the call graph:
- Which functions directly parse or process the untrusted input?
- Where does the parsing logic live (string parsing, binary decoding, etc.)?
- What intermediate functions transform or validate the data?
Read the actual source code of promising functions. Prioritize functions that:
- Take raw input buffers and parse structured data from them
- Have complex control flow (switches, loops over input bytes)
- Do memory allocation based on input-controlled values
Step 4: Classify and select targets
Categorize candidates into target types. Each type has different fuzzing value:
| Target Type | Description | Example |
|---|---|---|
| String parsers | Take const char *, parse with sscanf/strtol/char loops |
av_parse_time(), av_parse_color() |
| Binary format parsers | Take uint8_t * + length, decode structured data |
Codec decode functions, packet parsers |
| URL/path manipulation | URL splitting, path joining, encoding/decoding | av_url_split(), ff_make_absolute_url() |
| Expression evaluators | Math/query/format string processors | av_expr_parse_and_eval() |
| Multi-input parsers | Take 2+ independent untrusted strings | av_dict_parse_string() (input + delimiters) |
| Recursive descent parsers | Self-calling or mutually recursive parsing | HTML/XML parsers, nested format parsers |
Target selection criteria — a function is a good fuzz target if:
- It takes untrusted input (string, buffer, or structured data)
- It has non-trivial parsing logic (not just a thin wrapper)
- It can be called standalone without complex state setup
- It is not already fuzzed transitively by existing harnesses
Prefer public API functions (e.g., av_*) over internal functions (e.g.,
ff_*) — they have stable signatures, are easier to link, and represent the
actual attack surface that external callers use.
Step 5 (optional): Use fuzz_target_selector for complexity ranking
The fuzz_target_selector tool can help prioritize among candidates by
measuring code complexity. Use it as a supplement to your analysis, not
as the primary discovery method.
cd fuzz_target_selector/
python3 fuzz_target_selector.py analyze /tmp/<project> \
--project <project-name> -o /tmp/<project>_targets.json -n 100
python3 fuzz_target_selector.py list /tmp/<project>_targets.json \
--priority critical -n 20
Cross-reference the complexity scores with your attack surface analysis. High-complexity functions that also sit on input paths are the best targets.
Phase 2: Research the Target Project
Before writing harnesses, study the actual source code to understand:
Exact function signatures — The auto-generator often gets parameter types and counts wrong. Grep the source for the real declarations.
Context/state objects — Many C projects pass a context struct through their call chain (like tcpdump's
netdissect_options). Identify what fields must be initialized and what function pointers need to be set.Error handling — Find functions that call
exit()orabort(). These will kill the fuzzer. You needlongjmp-based recovery instead.Bounds checking — Find how the project checks for truncated/short input. Many use
setjmp/longjmpfor early termination on truncated data. Your harness must set up the same mechanism.Network/DNS calls — Functions that do DNS lookups or network I/O will cause timeouts during fuzzing. Find flags that disable these.
Build system — Read CMakeLists.txt or Makefile to understand what libraries are built (especially static libs) and what dependencies exist.
Test files — Look for a
tests/directory with sample inputs that can serve as seed corpus.
# Example: find function signatures
grep -rn "^function_name\|^void.*function_name\|^int.*function_name" *.c
# Find context struct definitions
grep -n "struct.*options\|typedef.*context" *.h
# Find exit/abort calls in the library
grep -rn "exit(\|abort(" lib/ src/
Phase 3: Generate Harnesses
OSS-Fuzz project files
Create projects/<project>/ with four file types:
project.yaml:
homepage: "<project-homepage>"
language: c # or c++
primary_contact: "<security-contact>"
fuzzing_engines:
- libfuzzer
- afl
- honggfuzz
sanitizers:
- address
- undefined
# Only add 'memory' if the project compiles cleanly with MSAN.
# Projects using many system headers often don't.
main_repo: '<git-repo-url>'
Dockerfile:
FROM gcr.io/oss-fuzz-base/base-builder
RUN apt-get update && apt-get install -y <build-deps>
RUN git clone --depth 1 <dependency-repos>
RUN git clone --depth 1 <main-repo>
WORKDIR $SRC
COPY build.sh *.h *.c $SRC/
build.sh — See references/build-patterns.md for templates.
Writing correct harnesses
The most important lesson: auto-generated harnesses are almost always wrong. You must manually verify every function signature against the actual source.
For projects with a shared context object, create a fuzz_common.h that handles
all the boilerplate. This avoids duplicating the same setup code in every harness.
The common header should provide:
No-op output functions — The project's print/log functions should be silenced during fuzzing. They waste time and can trigger false positives.
longjmp-based error recovery — Replace any
exit()-calling error handler with one that doeslongjmp()back to the fuzzer loop.Warning suppression — No-op warning handlers.
One-time initialization — Use
LLVMFuzzerInitialize()for setup that should happen once (lookup table init, library init). This avoids memory leaks from repeated initialization.A dissector/parser call macro — Wrap the target function call with proper bounds setup and truncation recovery.
See references/harness-template.md for a complete template with examples.
Per-function harnesses
Each harness should be minimal — just include the common header, declare the extern function with its real signature, and call it through the macro:
#include "fuzz_common.h"
extern void target_func(context_t *, const u_char *, u_int);
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
if (size < MINIMUM_INPUT_SIZE)
return 0;
FUZZ_CALL(target_func(&g_ctx, data, (u_int)size));
return 0;
}
Choose minimum input sizes based on the protocol's minimum header:
- IPv4: 20 bytes
- IPv6: 40 bytes
- TCP: 20 bytes (+ 20 for enclosing IP = 40 total)
- UDP: 8 bytes
- ICMP: 8 bytes
- Generic TLV protocols: 4 bytes
- Single-byte dispatch: 1 byte
build.sh patterns
The build script should:
- Build dependencies as static libraries
- Build the target project (generates config headers and static libs)
- Loop over all fuzz_*.c files, compile and link each one
- Create seed corpus from test files
COMMON_CFLAGS="-I$SRC/<project> -I$SRC/<project>/build -I$SRC/<dep>"
COMMON_LIBS="$SRC/<project>/build/lib<name>.a $SRC/<dep>/build/lib<dep>.a \
$LIB_FUZZING_ENGINE <extra-libs>"
for fuzzer in $SRC/fuzz_*.c; do
target=$(basename "$fuzzer" .c)
$CC $CFLAGS $COMMON_CFLAGS -c "$fuzzer" -o "$SRC/${target}.o"
$CXX $CXXFLAGS "$SRC/${target}.o" -o "$OUT/$target" $COMMON_LIBS
done
Watch for missing library dependencies at link time — if the project uses
OpenSSL, zlib, etc., add -lcrypto, -lz, etc. to COMMON_LIBS.
Phase 4: Build and Verify
Run the three verification steps in order:
# 1. Build the Docker image
echo "n" | python3 infra/helper.py build_image <project>
# 2. Compile fuzzers inside the container
echo "n" | python3 infra/helper.py build_fuzzers <project>
# 3. Verify binaries pass OSS-Fuzz checks
echo "n" | python3 infra/helper.py check_build <project>
Common build failures and fixes:
| Error | Fix |
|---|---|
undefined reference to MD5_Init |
Add -lcrypto to link flags |
undefined reference to inflate |
Add -lz to link flags |
LeakSanitizer: detected memory leaks |
Move init code to LLVMFuzzerInitialize |
ALARM: timeout after N seconds |
Set flags to disable DNS/network I/O |
| Wrong function signature | Check actual source, fix extern declaration |
Missing config.h |
Build the project with cmake/configure first |
Phase 5: Run Fuzzers
# Quick smoke test (10 seconds)
echo "n" | python3 infra/helper.py run_fuzzer <project> <target> \
-- -max_total_time=10
# Longer validation (1+ minutes)
echo "n" | python3 infra/helper.py run_fuzzer <project> <target> \
-- -max_total_time=60
# Run multiple in parallel for extended testing
bash -c '
(echo "n" | python3 infra/helper.py run_fuzzer <project> fuzz_a -- -max_total_time=3600 2>&1 | tail -50 > /tmp/fuzz_a.log) &
(echo "n" | python3 infra/helper.py run_fuzzer <project> fuzz_b -- -max_total_time=3600 2>&1 | tail -50 > /tmp/fuzz_b.log) &
wait
'
A healthy fuzzer should show:
- Steadily increasing
cov:(coverage) numbers - Thousands of
runsper second (varies by complexity) - No
ERROR,SUMMARY,leak,timeoutlines Done N runs in M second(s)at the end
Decision Guide
One broad harness vs. many targeted harnesses?
Do both. A broad harness (like fuzz_pcap that feeds full file format input
through the main parsing pipeline) exercises all code paths but gives the fuzzer
less direct control. Per-function harnesses (like fuzz_ip, fuzz_tcp) bypass
format overhead and let the fuzzer focus mutations on the specific protocol
parser. The broad harness catches integration bugs; targeted harnesses find
deeper protocol-specific bugs.
Which functions to target?
Use Phase 1's attack surface analysis to identify candidates, then apply these filters:
- Prefer public API over internal functions —
av_parse_time()overff_parse_time(). Public APIs have stable signatures, are easier to link, and represent the real attack surface. - Check what's already fuzzed transitively — If the project has a broad
harness that exercises a parser pipeline, individual parsers in that pipeline
may already get coverage. Focus on functions that are NOT reached by existing
harnesses. Example: FFmpeg's subtitle decoders already exercise
ff_htmlmarkup_to_ass()transitively, so a standalone harness adds less value. - Skip
main()— It's not useful to fuzz directly. - A good target function: takes untrusted input, has non-trivial parsing logic, and can be called standalone without complex state setup.
Naming conventions by target type:
- String/expression parsers:
target_<name>_fuzzer.c(e.g.,target_parse_time_fuzzer.c) - Protocol parsers:
fuzz_<protocol>.c(e.g.,fuzz_tcp.c) - Format parsers:
fuzz_<format>.c(e.g.,fuzz_pcap.c)
When to skip memory sanitizer (MSAN)?
Skip MSAN when the project uses many system headers or third-party libraries that aren't MSAN-instrumented. ASAN + UBSAN cover most bugs. Add MSAN only if the project compiles cleanly with it.