autoresearch-pipeline-for-ai-safety-research - SKILL.md Agent Skill

name: autoresearch-pipeline-for-ai-safety-research description: Skill for AI agent capabilities

Autoresearch Pipeline for AI Safety Research

Overview

Source: arXiv:2603.24511v1 (Claudini) Utility: 0.95 Topic: LLM agent autonomous research for discovering new algorithms Key Contribution: Autoresearch pipeline achieves SOTA results in adversarial attack discovery

Activation Keywords

autoresearch pipeline
LLM agent autonomous research
automated AI safety research
iterative algorithm discovery
Claude Code research automation

Core Innovation

Problem

AI safety research often manual and slow
Existing methods provide good starting points but optimization needed
Dense quantitative feedback available but not leveraged

Solution

Autoresearch Pipeline:

Start from existing implementations - Strong baseline (e.g., GCG)
LLM agent iteration - Claude Code explores modifications
Quantitative evaluation - Attack success rate (ASR) feedback
Discover new algorithms - SOTA results achieved

Key Results

Target Model	New Algorithm	Best Baseline	Improvement
GPT-OSS-Safeguard-20B	40% ASR	≤10% ASR	+30%
Meta-SecAlign-70B	100% ASR	56% ASR	+44%

Pipeline Architecture

Existing Methods → LLM Agent Exploration → Iterative Refinement → Evaluation → New Discovery
      ↓                    ↓                      ↓                ↓           ↓
   Baseline          Code Modification      Algorithm Changes    ASR Test    SOTA Results

Implementation Framework

class AutoresearchPipeline:
    def __init__(self, baseline_method, evaluation_fn, agent):
        self.baseline = baseline_method
        self.evaluate = evaluation_fn
        self.agent = agent  # Claude Code-like agent
    
    def run(self, n_iterations=100):
        current_algorithm = self.baseline
        
        for i in range(n_iterations):
            # Agent explores modifications
            modifications = self.agent.suggest_modifications(current_algorithm)
            
            # Try each modification
            for mod in modifications:
                new_algorithm = apply_modification(current_algorithm, mod)
                score = self.evaluate(new_algorithm)
                
                if score > best_score:
                    current_algorithm = new_algorithm
                    best_score = score
                    log_discovery(mod, score)
        
        return current_algorithm, best_score

Key Principles

1. Strong Starting Points

Existing methods provide foundation
Don't start from scratch
Leverage prior research

2. Dense Quantitative Feedback

Clear optimization objective
Measurable outcomes (ASR, accuracy, etc.)
Direct feedback drives improvement

3. Agent Capabilities

Code generation/modification
Literature understanding
Creative exploration

4. Iterative Refinement

Many small modifications
Gradual improvement accumulation
Exploration vs exploitation balance

Application Domains

Domain	Starting Point	Objective	Suitability
Adversarial Attacks	GCG, AutoPrompt	ASR maximization	✅ Excellent
Prompt Optimization	Base prompts	Task performance	✅ Good
Architecture Search	Known architectures	Accuracy	✅ Good
Hyperparameter Tuning	Default configs	Validation score	✅ Good
Algorithm Discovery	Existing algorithms	Benchmark scores	✅ Excellent

Safety Considerations

⚠️ Important: This pipeline can be used for both defensive and offensive research.

Defensive Applications

Discover robust defense mechanisms
Identify vulnerabilities before attackers
Stress-test safety systems

Offensive Applications

Create new attack algorithms
Jailbreak safety measures
Prompt injection optimization

Recommended Use

Prioritize defensive research
Use for authorized security testing only
Follow ethical guidelines
Report findings responsibly

Relation to Self-Evolution

Self-Evolution Concept	Autoresearch Pipeline
Learn → Apply → Reflect → Improve	Baseline → Modify → Evaluate → Discover
Delegation to Specialists	Agent handles code exploration
Dense Feedback	Quantitative ASR metrics
Ship or It Doesn't Count	Published SOTA algorithms

Implementation for OpenClaw

Potential Applications

Skill Optimization
- Start from existing skills
- Agent modifies instructions
- Evaluate on task performance
Agent Improvement
- Optimize agent behaviors
- Discover new workflows
- Quantitative success metrics
Workflow Discovery
- Find better processes
- Optimize existing workflows
- Task completion metrics

Example: Skill Autoresearch

class SkillAutoresearch:
    def optimize_skill(self, base_skill, evaluation_tasks):
        current_skill = base_skill
        
        for iteration in range(n_iterations):
            # Agent suggests skill modifications
            suggestions = self.agent.analyze_skill(current_skill)
            
            for suggestion in suggestions:
                modified_skill = apply_suggestion(current_skill, suggestion)
                
                # Evaluate on tasks
                performance = evaluate_skill(modified_skill, evaluation_tasks)
                
                if performance > best_performance:
                    current_skill = modified_skill
                    best_performance = performance
        
        return current_skill

Best Practices

Define Clear Objectives - Measurable success metrics
Set Constraints - Safety boundaries, computational limits
Document Discoveries - Track all improvements
Validate Transfers - Test generalization to other contexts
Report Responsibly - Ethical disclosure for security findings

Description

Autoresearch Pipeline for AI Safety Research

Tools Used

read - Read documentation and references
web_search - Search for related information
web_fetch - Fetch paper or documentation

Instructions for Agents

Follow these steps when applying this skill:

Step 1: Start from existing implementations

Step 2: LLM agent iteration

Step 3: Quantitative evaluation

Step 4: Discover new algorithms

Step 5: Skill Optimization

Examples

Example 1: Basic Application

User: I need to apply Autoresearch Pipeline for AI Safety Research to my analysis.

Agent: I'll help you apply autoresearch-pipeline. First, let me understand your specific use case...

Context: Apply the methodology

Example 2: Advanced Scenario

User: Complex analysis scenario

Agent: Based on the methodology, I'll guide you through the advanced application...

Example 2: Advanced Application

User: What are the key considerations for autoresearch-pipeline?

Agent: Let me search for the latest research and best practices...

References

Paper: https://arxiv.org/abs/2603.24511
GitHub: https://github.com/romovpa/claudini
Related: self-evolving-agents-survey

Created: 2026-03-28 Source: arXiv:2603.24511v1 - "Claudini: Autoresearch Discovers SOTA Adversarial Attack Algorithms"

⚠️ Note: Focus on research methodology, not attack details. Use for defensive research only.