name: rrwrite-extract-figures-tables description: Extract figures and tables from analyzed repository and generate supplementary visualizations allowed_tools: - Read - Write - Bash - Glob - Grep fork_mode: fork
RRWrite: Figure and Table Extraction
Extract existing figures/tables from the analyzed repository and generate supplementary analysis visualizations. Creates manifests with priority metadata.
Overview
This skill performs two extraction tasks:
- Repository Extraction (Priority 1): Copy existing figures/tables from the analyzed code repository
- Supplementary Generation (Priority 2): Generate analysis visualizations from repository data
Prerequisites
Before running this skill, ensure:
- ✓ Repository analysis completed (
/rrwrite-analyze-repository) - ✓
data_tables/directory exists with analysis outputs - ✓ Repository path is known
Inputs
Required arguments:
--target_dir: Manuscript output directory (e.g.,manuscript/project_v1)--repo_path: Path to analyzed repository
Optional arguments:
--extract: What to extract:figures,tables, orfigures,tables(default: both)--generate: What to generate:figures,tables, orfigures,tables(default: both)
Execution
# Run extraction
python scripts/rrwrite-extract-figures-tables.py \
--repo-path {repo_path} \
--manuscript-dir {target_dir} \
--extract figures,tables \
--generate figures \
--verbose
Output Structure
Creates the following directory structure:
{target_dir}/
├── figures/
│ ├── from_repo/ # Priority 1: Original repository figures
│ │ ├── workflow_diagram.pdf
│ │ ├── results_plot.png
│ │ └── README.md # Auto-generated index
│ ├── generated/ # Priority 2: Analysis visualizations
│ │ ├── repository_composition.png
│ │ ├── repository_composition.pdf
│ │ ├── file_size_distribution.png
│ │ └── research_topics.png
│ └── figure_manifest.json # Metadata with priorities
├── tables/
│ ├── from_repo/ # Priority 1: Original data tables
│ │ ├── experimental_data.csv
│ │ └── benchmark_results.tsv
│ ├── generated/ # Priority 2: Analysis tables
│ │ ├── repository_statistics.tsv
│ │ └── file_inventory.tsv
│ └── table_manifest.json
Manifest Format
Figure Manifest (figures/figure_manifest.json)
{
"version": "1.0",
"created_at": "2025-01-15T10:30:00",
"total_figures": 8,
"figures_from_repo": 5,
"figures_generated": 3,
"figures": [
{
"id": "fig_repo_001",
"path": "figures/from_repo/workflow_diagram.pdf",
"source": "from_repo",
"priority": 1,
"original_path": "docs/figures/workflow.pdf",
"recommended_sections": ["methods", "introduction"],
"default_caption": "Workflow diagram showing analysis pipeline",
"generating_script": "scripts/create_workflow.py"
},
{
"id": "fig_gen_001",
"path": "figures/generated/repository_composition.png",
"source": "generated",
"priority": 2,
"recommended_sections": ["methods", "results"],
"default_caption": "Repository composition by file category"
}
]
}
State Tracking
Updates StateManager with extraction results:
from pathlib import Path
import sys
sys.path.insert(0, 'scripts')
from rrwrite_state_manager import StateManager
state = StateManager(output_dir="{target_dir}")
# Mark stage as in-progress
state.update_stage("figure_table_extraction", "in_progress")
# After extraction, mark as completed
state.update_figure_table_extraction(
figures_from_repo=5,
figures_generated=3,
tables_from_repo=2,
tables_generated=4,
figure_manifest_path="figures/figure_manifest.json",
table_manifest_path="tables/table_manifest.json",
scripts_parsed=12
)
Validation
After extraction, validate manifests:
# Validate against schema
python scripts/rrwrite_manifest_generator.py \
--manuscript-dir {target_dir} \
--validate \
--schemas-dir schemas
Using Extracted Figures in Sections
When drafting sections, query manifests for available figures:
from pathlib import Path
from rrwrite_manifest_generator import ManifestGenerator
generator = ManifestGenerator(Path("{target_dir}"))
# Get figures for specific section (prioritizes repo figures)
results_figures = generator.get_figures_for_section("results")
for fig in results_figures:
print(f"Priority {fig['priority']}: {fig['path']}")
print(f"Caption: {fig['default_caption']}")
Exclusion Rules
The extractor automatically excludes:
- Thumbnails and icons (patterns:
*thumb*,*icon*,*logo*,*badge*) - Build artifacts (
build/,dist/,node_modules/,__pycache__/) - Version control (
.git/,.ipynb_checkpoints/) - Oversized files (>10MB for figures, >5MB for tables)
Troubleshooting
No figures/tables found
Cause: Repository may not contain figure files or extraction patterns don't match
Solution: Check repository for figure files manually:
find {repo_path} -name "*.png" -o -name "*.pdf" -o -name "*.svg"
Missing generated figures
Cause: Repository analysis didn't create data_tables/ directory
Solution: Re-run repository analysis:
/rrwrite-analyze-repository --repo_path {repo_path} --output_dir {target_dir}
Manifest validation fails
Cause: Schema files missing or malformed manifest
Solution: Check schemas exist in schemas/ directory:
ls schemas/figure_manifest_schema.json
ls schemas/table_manifest_schema.json
Integration with Workflow
This stage runs after research and before drafting:
- Repository Analysis → generates
data_tables/ - Planning → creates outline
- Assessment → selects journal
- Literature Research → finds citations
- Figure/Table Extraction ← THIS STAGE
- Section Drafting → uses manifests to include figures/tables
- Assembly → embeds all figures/tables in final manuscript
- Critique
Next Steps
After successful extraction:
- Review
figures/from_repo/README.mdto verify detected figures - Check manifest files for correct priority assignments
- Proceed to section drafting:
/rrwrite-draft-section --section introduction
Expected Duration
- Small repositories (<50 figures): 30-60 seconds
- Medium repositories (50-200 figures): 1-2 minutes
- Large repositories (>200 figures): 2-4 minutes
Success Criteria
✓ Figures extracted and copied to figures/from_repo/
✓ Generated figures created in figures/generated/
✓ Tables extracted to tables/from_repo/
✓ Manifests created with valid JSON schema
✓ StateManager updated with extraction counts
✓ Priority metadata correctly assigned (1=repo, 2=generated)