pentest-r1-autonomous-penetration-testing - SKILL.md Agent Skill

name: pentest-r1-autonomous-penetration-testing description: Two-stage reinforcement learning framework for training LLMs to perform autonomous penetration testing and CTF challenges triggers: - train a penetration testing AI model - use pentest-r1 for autonomous security testing - setup reinforcement learning for pentesting - run pentest-r1 offline training - configure pentest-r1 online RL environment - train LLM on CTF challenges - build autonomous penetration testing agent - setup intercode-ctf docker environment

Pentest-R1 Autonomous Penetration Testing

Skill by ara.so — Security Skills collection

Pentest-R1 is a two-stage reinforcement learning framework that trains Large Language Models for autonomous penetration testing. It combines offline RL on expert walkthroughs with online RL in interactive CTF environments to develop robust attack reasoning capabilities.

Installation

Prerequisites

Python 3.11.11
Docker (for Stage 2 and reproducible environments)
NVIDIA Container Toolkit (for GPU support)
CUDA 12.4 runtime (optional, for GPU acceleration)

Basic Setup

git clone https://github.com/KHenryAegis/Pentest-R1.git
cd Pentest-R1
pip install -r requirements.txt

Docker Environment (Recommended for Reproducibility)

# Build the reproducible research environment
source setup-docker.sh

# Run container with optimized cache mounting
docker run --rm -it \
  --name pentest-r1 \
  -v "$(pwd)":/root/Pentest-R1 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v ~/.cache/triton:/root/.cache/triton \
  -v ~/.cache/torch_extensions:/root/.cache/torch_extensions \
  -w /root/Pentest-R1 \
  --gpus all \
  --net=host \
  pentest-r1:ubuntu22.04

Core Training Pipeline

Stage 1: Offline Reinforcement Learning

Stage 1 trains the base LLM on a curated dataset of 500+ real-world expert penetration testing walkthroughs.

python grpo_stage1.py

Key configuration in grpo_stage1.py:

from transformers import AutoTokenizer, AutoModelForCausalLM
from unsloth import FastLanguageModel
import torch

# Model configuration
model_name = "unsloth/Meta-Llama-3.1-8B-Instruct"
max_seq_length = 4096
dtype = None  # Auto-detect
load_in_4bit = True  # Use 4-bit quantization

# Load model with unsloth optimization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

# PEFT configuration for efficient training
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, peft_config)

Stage 2: Online Reinforcement Learning

Stage 2 fine-tunes the model in interactive CTF environments with real-time feedback.

Setup CTF environment:

cd train_ctf_env
docker build -t intercode-ctf .
cd ..

Run Stage 2 training:

python grpo_multi_turn_stage2.py

Key Components

Data Loading and Preprocessing

from datasets import load_dataset

# Load expert walkthrough dataset
dataset = load_dataset("json", data_files="path/to/expert_walkthroughs.jsonl")

# Example dataset format
# {
#   "challenge": "SQL Injection in login form",
#   "steps": [
#     {"action": "reconnaissance", "command": "sqlmap -u http://target/login", "reasoning": "..."},
#     {"action": "exploit", "command": "sqlmap --dump", "reasoning": "..."}
#   ],
#   "flag": "CTF{...}"
# }

def preprocess_function(examples):
    """Format data for training"""
    prompts = []
    responses = []
    
    for challenge, steps in zip(examples["challenge"], examples["steps"]):
        prompt = f"Challenge: {challenge}\nWhat are the steps to solve this?"
        response = "\n".join([
            f"Step {i+1}: {step['reasoning']}\nCommand: {step['command']}"
            for i, step in enumerate(steps)
        ])
        prompts.append(prompt)
        responses.append(response)
    
    return {"prompt": prompts, "response": responses}

tokenized_dataset = dataset.map(preprocess_function, batched=True)

Reward Model Configuration

class PentestRewardModel:
    """Reward model for evaluating penetration testing actions"""
    
    def __init__(self):
        self.success_reward = 1.0
        self.partial_reward = 0.5
        self.failure_penalty = -0.1
    
    def calculate_reward(self, action, environment_feedback):
        """Calculate reward based on action outcome"""
        if "flag" in environment_feedback.lower():
            return self.success_reward
        elif "error" in environment_feedback.lower():
            return self.failure_penalty
        elif "progress" in environment_feedback.lower():
            return self.partial_reward
        return 0.0

reward_model = PentestRewardModel()

Interacting with CTF Environment

import docker

class CTFEnvironment:
    """Wrapper for InterCode-CTF Docker environment"""
    
    def __init__(self, image_name="intercode-ctf"):
        self.client = docker.from_env()
        self.image_name = image_name
        self.container = None
    
    def start(self, challenge_id):
        """Start a CTF challenge container"""
        self.container = self.client.containers.run(
            self.image_name,
            detach=True,
            environment={"CHALLENGE_ID": challenge_id},
            network_mode="host",
            remove=True
        )
        return self.container
    
    def execute_command(self, command):
        """Execute a command in the container"""
        if not self.container:
            raise RuntimeError("Container not started")
        
        exec_result = self.container.exec_run(command)
        return {
            "stdout": exec_result.output.decode('utf-8'),
            "exit_code": exec_result.exit_code
        }
    
    def cleanup(self):
        """Stop and remove container"""
        if self.container:
            self.container.stop()
            self.container = None

# Usage example
env = CTFEnvironment()
env.start(challenge_id="sql_injection_001")
result = env.execute_command("sqlmap -u http://localhost/login --batch")
print(result["stdout"])
env.cleanup()

Training Configuration

GRPO (Group Relative Policy Optimization) Settings

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./pentest-r1-checkpoints",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=5e-5,
    warmup_steps=100,
    logging_steps=10,
    save_steps=500,
    save_total_limit=3,
    fp16=True,  # Mixed precision training
    report_to="wandb",  # Optional: integration with Weights & Biases
    remove_unused_columns=False,
)

Environment Variables

import os

# Model and training configuration
os.environ["HF_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN")  # For model downloads
os.environ["WANDB_API_KEY"] = os.getenv("WANDB_API_KEY")  # For logging
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # GPU selection

# Cache directories (mounted in Docker)
os.environ["HF_HOME"] = "/root/.cache/huggingface"
os.environ["TRITON_CACHE_DIR"] = "/root/.cache/triton"

Common Patterns

Multi-Turn Reasoning

class MultiTurnAgent:
    """Agent for multi-turn penetration testing reasoning"""
    
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.conversation_history = []
    
    def generate_action(self, observation):
        """Generate next action based on current observation"""
        # Build prompt with conversation history
        prompt = self._build_prompt(observation)
        
        inputs = self.tokenizer(prompt, return_tensors="pt")
        outputs = self.model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.7,
            do_sample=True,
            top_p=0.95
        )
        
        action = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        self.conversation_history.append({
            "observation": observation,
            "action": action
        })
        
        return action
    
    def _build_prompt(self, observation):
        """Build prompt with conversation history"""
        prompt = "You are a penetration testing expert. Analyze and exploit:\n\n"
        
        for turn in self.conversation_history[-3:]:  # Last 3 turns
            prompt += f"Observation: {turn['observation']}\n"
            prompt += f"Action: {turn['action']}\n\n"
        
        prompt += f"Current Observation: {observation}\n"
        prompt += "Next Action:"
        
        return prompt

Evaluation Loop

def evaluate_on_ctf_challenges(model, tokenizer, challenge_set):
    """Evaluate model on a set of CTF challenges"""
    agent = MultiTurnAgent(model, tokenizer)
    results = []
    
    for challenge in challenge_set:
        env = CTFEnvironment()
        env.start(challenge["id"])
        
        solved = False
        max_turns = 20
        
        for turn in range(max_turns):
            # Get current state
            observation = env.execute_command("cat /challenge/description.txt")
            
            # Generate action
            action = agent.generate_action(observation["stdout"])
            
            # Execute action
            result = env.execute_command(action)
            
            # Check for success
            if "CTF{" in result["stdout"]:
                solved = True
                break
        
        results.append({
            "challenge_id": challenge["id"],
            "solved": solved,
            "turns": turn + 1
        })
        
        env.cleanup()
    
    return results

Troubleshooting

CUDA Out of Memory

# Use gradient checkpointing
model.gradient_checkpointing_enable()

# Reduce batch size
training_args.per_device_train_batch_size = 1
training_args.gradient_accumulation_steps = 8

# Use 8-bit or 4-bit quantization
load_in_4bit = True

Docker Container Issues

# Check container logs
docker logs intercode-ctf

# Verify network connectivity
docker run --rm --net=host intercode-ctf ping -c 4 localhost

# Rebuild with no cache if issues persist
docker build --no-cache -t intercode-ctf train_ctf_env/

Unsloth Version Compatibility

The research used unsloth==2025.5.10. If unavailable, the closest match is specified in requirements.txt:

unsloth_zoo==2025.5.11
unsloth @ git+https://github.com/unslothai/unsloth.git@45f26cda996ec0b9a2e28cb18a03251095aa29e8

Model Loading Errors

# If model fails to load, try without quantization first
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=torch.float16,
    load_in_4bit=False,
)

# Clear cache if needed
import shutil
shutil.rmtree(os.path.expanduser("~/.cache/huggingface"), ignore_errors=True)

Performance Optimization

Cache Mounting for Faster Training

Always mount HuggingFace, Triton, and PyTorch caches when using Docker:

-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/triton:/root/.cache/triton \
-v ~/.cache/torch_extensions:/root/.cache/torch_extensions

Distributed Training

# For multi-GPU training
training_args.ddp_find_unused_parameters = False
training_args.local_rank = int(os.environ.get("LOCAL_RANK", -1))

# Launch with torchrun
# torchrun --nproc_per_node=4 grpo_stage1.py

References

Paper: arXiv:2508.07382
Base model: Meta-Llama-3.1-8B-Instruct
Framework: Unsloth for optimized fine-tuning
Environment: InterCode-CTF for interactive training