rrwrite-analyze-repository - SKILL.md Agent Skill

name: rrwrite-analyze-repository description: Analyzes a GitHub repository or local directory to extract structure, files, and research context arguments:

name: repo_path description: GitHub URL or local repository path to analyze required: true
name: target_dir description: Output directory for analysis results default: manuscript allowed-tools: context: fork

Repository Analysis Skill

This skill analyzes a code repository (GitHub URL or local path) and generates a structured analysis document containing:

Repository structure and organization
Key files identified (data, scripts, figures)
Inferred research context and topics
File metadata for downstream manuscript generation

Phase 0: Input Validation

Check required arguments:

if [ -z "{repo_path}" ]; then
    echo "Error: --repo-path is required"
    echo ""
    echo "Usage: /rrwrite-analyze-repository --repo-path <path-or-url> [--target-dir <dir>]"
    echo ""
    echo "Examples:"
    echo "  /rrwrite-analyze-repository --repo-path https://github.com/user/project"
    echo "  /rrwrite-analyze-repository --repo-path /path/to/local/repo --target-dir manuscript"
    exit 1
fi

# Create target directory
mkdir -p "{target_dir}"

# Set output file path
OUTPUT_FILE="{target_dir}/repository_analysis.md"

echo ""
echo "============================================================"
echo "REPOSITORY ANALYSIS"
echo "============================================================"
echo ""
echo "Repository: {repo_path}"
echo "Output directory: {target_dir}"
echo "Output file: $OUTPUT_FILE"
echo ""

Phase 1: Execute Repository Analysis

Check for existing analysis and run script:

# Check for existing analysis
SKIP_ANALYSIS=false
if [ -f "$OUTPUT_FILE" ]; then
    echo "⚠ Warning: Analysis file already exists: $OUTPUT_FILE"
    echo ""
    read -p "Overwrite existing analysis? [y/N]: " response
    if [[ ! "$response" =~ ^[Yy] ]]; then
        echo "Using existing analysis file."
        SKIP_ANALYSIS=true
    fi
fi

# Run analysis script (unless skipped)
if [ "$SKIP_ANALYSIS" != "true" ]; then
    echo "Analyzing repository structure..."
    echo ""

    python scripts/rrwrite-analyze-repo.py "{repo_path}" --output "$OUTPUT_FILE"

    if [ $? -ne 0 ]; then
        echo ""
        echo "============================================================"
        echo "ANALYSIS FAILED"
        echo "============================================================"
        echo ""
        echo "Troubleshooting:"
        echo "  • For GitHub URLs: Ensure git is installed and URL is correct"
        echo "    - Test clone: git clone {repo_path} /tmp/test_clone"
        echo "  • For local paths: Verify the path exists and is accessible"
        echo "    - Test access: ls -la {repo_path}"
        echo "  • Check you have write permissions to {target_dir}"
        echo "    - Test write: touch {target_dir}/.test && rm {target_dir}/.test"
        echo ""
        echo "Common issues:"
        echo "  • Private repositories require authentication"
        echo "  • Network connectivity problems with GitHub"
        echo "  • Git not installed (install: brew install git)"
        echo ""
        exit 1
    fi

    echo ""
    echo "✓ Analysis complete: $OUTPUT_FILE"
fi

Phase 2: Extract Metadata

Parse analysis file to extract file counts and research topics:

import sys
import re
from pathlib import Path

# Add scripts directory to path
sys.path.insert(0, str(Path('scripts').resolve()))

# Read analysis file
analysis_file = Path("{target_dir}/repository_analysis.md")

if not analysis_file.exists():
    print(f"Error: Analysis file not found: {analysis_file}")
    sys.exit(1)

content = analysis_file.read_text(encoding='utf-8')

# Extract file counts by counting file references in each section
data_section = re.search(r'### Data Files.*?(?=###|\Z)', content, re.DOTALL)
data_files = len(re.findall(r'- `[^`]+`', data_section.group())) if data_section else 0

script_section = re.search(r'### (Analysis Scripts|Scripts).*?(?=###|\Z)', content, re.DOTALL)
script_files = len(re.findall(r'- `[^`]+`', script_section.group())) if script_section else 0

figure_section = re.search(r'### Figures.*?(?=###|\Z)', content, re.DOTALL)
figure_files = len(re.findall(r'- `[^`]+`', figure_section.group())) if figure_section else 0

file_counts = {
    "data": data_files,
    "scripts": script_files,
    "figures": figure_files
}

# Extract topics from "Inferred Research Context" section
topics_section = re.search(r'## Inferred Research Context.*?(?=##|\Z)', content, re.DOTALL)
topics = []
if topics_section:
    # Look for bullet points or numbered items
    topic_matches = re.findall(r'[-*\d.]+\s+(.+?)$', topics_section.group(), re.MULTILINE)
    topics = [t.strip() for t in topic_matches if t.strip()]

# Check for empty repository
if sum(file_counts.values()) == 0:
    print("\n⚠ Warning: Repository appears to be empty")
    print("  No data, scripts, or figures detected")
    print("\nAnalysis will continue with minimal information.")

print("\nExtracted metadata:")
print(f"  Data files: {file_counts['data']}")
print(f"  Scripts: {file_counts['scripts']}")
print(f"  Figures: {file_counts['figures']}")
print(f"  Topics detected: {len(topics)}")

# Store for next phase
globals()['file_counts'] = file_counts
globals()['topics'] = topics
globals()['analysis_file_path'] = str(analysis_file)

Phase 3: Update Workflow State

Update state manager with analysis results:

from rrwrite_state_manager import StateManager

# Initialize state manager (disable git for this operation)
manager = StateManager(output_dir="{target_dir}", enable_git=False)

# Update repository analysis stage
manager.update_repository_analysis(
    analysis_file=analysis_file_path,
    repo_path="{repo_path}",
    file_counts=file_counts,
    topics_detected=topics
)

print("✓ Workflow state updated")

Phase 4: Display Summary

Show completion summary and next steps:

echo ""
echo "============================================================"
echo "REPOSITORY ANALYSIS COMPLETE"
echo "============================================================"
echo ""
echo "Repository: {repo_path}"
echo "Output: {target_dir}/repository_analysis.md"
echo ""
echo "Summary:"
echo "  • Data files: {file_counts[data]}"
echo "  • Scripts: {file_counts[scripts]}"
echo "  • Figures: {file_counts[figures]}"
echo "  • Topics detected: {len(topics)}"
echo ""
echo "Next Steps:"
echo "  1. Review repository_analysis.md"
echo "  2. Generate manuscript outline:"
echo "     /rrwrite-plan-manuscript --target-dir {target_dir} --journal <name>"
echo ""
echo "  Or check workflow status:"
echo "     python scripts/rrwrite-status.py --output-dir {target_dir}"
echo ""
echo "============================================================"

Error Handling

Invalid Repository Path

Symptom: Script exits with "Error cloning repository" or file not found
Solution: Verify path exists and is accessible

GitHub Cloning Failures

Symptom: Git clone fails with authentication or network errors
Solutions:
- Check repository is public or you have SSH keys configured
- Verify network connectivity
- Clone manually first, then analyze local path

Permission Denied

Symptom: Cannot write to target directory
Solution: Check directory permissions or use different target directory

Empty Repository

Symptom: Zero file counts detected
Behavior: Analysis continues but warns user about minimal information

Notes

Analysis output follows schemas/manuscript.yaml requirements
File counts are used by downstream skills for validation
Topics inform literature search and section planning
State tracking enables workflow coordination between skills