rrwrite-assemble - SKILL.md Agent Skill

name: rrwrite-assemble description: Assembles all manuscript sections into a complete manuscript with validation and metadata generation arguments:

name: target_dir description: Output directory containing manuscript sections default: manuscript allowed-tools: context: fork

Manuscript Assembly Protocol

Purpose

Combine all drafted manuscript sections into a complete manuscript file with validation, metadata generation, and quality checks.

Prerequisites

Required files in {target_dir}:

abstract.md
introduction.md
methods.md
results.md
discussion.md
availability.md (optional)

Recommended:

literature_citations.bib - For citation validation
outline.md - For structure verification

Workflow

Phase 1: Pre-Assembly Validation

Check Required Sections:

cd {target_dir}

required_sections=("abstract.md" "introduction.md" "results.md" "discussion.md" "methods.md")
missing=()

for section in "${required_sections[@]}"; do
    if [ ! -f "$section" ]; then
        missing+=("$section")
    fi
done

if [ ${#missing[@]} -gt 0 ]; then
    echo "Error: Missing required sections: ${missing[*]}"
    exit 1
fi

Verify Section Completion: Check workflow state to ensure all sections are marked as completed:

import json
from pathlib import Path

state_file = Path("{target_dir}/.rrwrite/state.json")
if state_file.exists():
    with open(state_file) as f:
        state = json.load(f)

    sections = state.get("workflow_status", {}).get("drafting", {}).get("sections", {})
    incomplete = [s for s, data in sections.items() if data.get("status") != "completed"]

    if incomplete:
        print(f"Warning: Incomplete sections: {incomplete}")

Phase 2: Section Assembly

Combine Sections in Order:

For Nature format: Abstract → Introduction → Results → Discussion → Methods → Availability

cd {target_dir}

# Clear or create output file
> manuscript_full.md

# Add header
cat >> manuscript_full.md << 'EOF'
# MicroGrowAgents Manuscript

**Target Journal:** Nature
**Date:** $(date +%Y-%m-%d)

---

EOF

# Combine sections
echo "# Abstract" >> manuscript_full.md
echo "" >> manuscript_full.md
cat abstract.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md

echo "# Introduction" >> manuscript_full.md
echo "" >> manuscript_full.md
cat introduction.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md

echo "# Results" >> manuscript_full.md
echo "" >> manuscript_full.md
cat results.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md

echo "# Discussion" >> manuscript_full.md
echo "" >> manuscript_full.md
cat discussion.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md

echo "# Methods" >> manuscript_full.md
echo "" >> manuscript_full.md
cat methods.md >> manuscript_full.md
echo -e "\n\n---\n" >> manuscript_full.md

if [ -f "availability.md" ]; then
    echo "# Data and Code Availability" >> manuscript_full.md
    echo "" >> manuscript_full.md
    cat availability.md >> manuscript_full.md
    echo -e "\n\n---\n" >> manuscript_full.md
fi

# Add references section placeholder
echo "# References" >> manuscript_full.md
echo "" >> manuscript_full.md
echo "[References will be generated from literature_citations.bib]" >> manuscript_full.md

echo "✓ Manuscript assembled: manuscript_full.md"

Phase 3: Metadata Generation

Calculate Statistics:

import re
from pathlib import Path
from datetime import datetime
import json

target_dir = Path("{target_dir}")
manuscript_file = target_dir / "manuscript_full.md"

# Read manuscript
with open(manuscript_file) as f:
    content = f.read()

# Count words (exclude markdown headers, code blocks)
text_only = re.sub(r'```.*?```', '', content, flags=re.DOTALL)
text_only = re.sub(r'^#.*$', '', text_only, flags=re.MULTILINE)
text_only = re.sub(r'\[.*?\]', '', text_only)
words = len(text_only.split())

# Count citations
citations = re.findall(r'\[([a-zA-Z0-9,\s]+)\]', content)
unique_citations = set()
for cite_group in citations:
    for cite in cite_group.split(','):
        cite = cite.strip()
        if cite and cite[0].islower():  # Citation keys start with lowercase
            unique_citations.add(cite)

# Count sections
sections = re.findall(r'^# (.+)$', content, flags=re.MULTILINE)

# Section word counts
section_words = {}
current_section = None
current_text = []

for line in content.split('\n'):
    if line.startswith('# '):
        if current_section:
            section_text = ' '.join(current_text)
            section_text = re.sub(r'\[.*?\]', '', section_text)
            section_words[current_section] = len(section_text.split())
        current_section = line[2:].strip()
        current_text = []
    else:
        current_text.append(line)

if current_section:
    section_text = ' '.join(current_text)
    section_text = re.sub(r'\[.*?\]', '', section_text)
    section_words[current_section] = len(section_text.split())

# Generate manifest
manifest = {
    "version": "1.0",
    "generated": datetime.now().isoformat(),
    "manuscript_file": "manuscript_full.md",
    "statistics": {
        "total_words": words,
        "total_sections": len(sections),
        "unique_citations": len(unique_citations),
        "section_breakdown": section_words
    },
    "sections_included": sections,
    "validation": {
        "all_required_sections": True,
        "word_count_target": 3000,  # Nature
        "within_limit": words <= 3500
    }
}

# Save manifest
manifest_file = target_dir / "manifest.json"
with open(manifest_file, 'w') as f:
    json.dump(manifest, f, indent=2)

print(f"✓ Generated manifest: {manifest_file}")
print(f"  Total words: {words}")
print(f"  Citations: {len(unique_citations)}")
print(f"  Sections: {len(sections)}")

Phase 4: Quality Checks

Check Citation Validity:

import re
from pathlib import Path

target_dir = Path("{target_dir}")
manuscript_file = target_dir / "manuscript_full.md"
bib_file = target_dir / "literature_citations.bib"

# Extract citations from manuscript
with open(manuscript_file) as f:
    content = f.read()

cited_keys = set()
for cite_group in re.findall(r'\[([a-zA-Z0-9,\s]+)\]', content):
    for cite in cite_group.split(','):
        cite = cite.strip()
        if cite and cite[0].islower():
            cited_keys.add(cite)

# Extract citation keys from .bib file
if bib_file.exists():
    with open(bib_file) as f:
        bib_content = f.read()

    bib_keys = set(re.findall(r'@\w+\{([^,]+),', bib_content))

    # Check for missing citations
    missing = cited_keys - bib_keys
    if missing:
        print(f"⚠ Warning: Citations not in .bib file: {missing}")
    else:
        print(f"✓ All {len(cited_keys)} citations found in .bib file")
else:
    print("⚠ Warning: No literature_citations.bib file found")

Check Word Count Compliance:

import json
from pathlib import Path

target_dir = Path("{target_dir}")
manifest_file = target_dir / "manifest.json"

with open(manifest_file) as f:
    manifest = json.load(f)

total_words = manifest["statistics"]["total_words"]
target = manifest["validation"]["word_count_target"]

if total_words > target * 1.2:  # 20% over
    print(f"⚠ Warning: Manuscript is {total_words - target} words over target ({total_words}/{target})")
    print(f"  Recommendation: Trim {total_words - target} words before submission")
elif total_words < target * 0.8:  # 20% under
    print(f"⚠ Warning: Manuscript is {target - total_words} words under target ({total_words}/{target})")
else:
    print(f"✓ Word count within acceptable range: {total_words}/{target}")

Phase 5: State Update

Update Workflow State:

import sys
from pathlib import Path
sys.path.insert(0, str(Path('scripts').resolve()))
from rrwrite_state_manager import StateManager
import json

target_dir = "{target_dir}"
manager = StateManager(output_dir=target_dir, enable_git=False)

# Load manifest for statistics
manifest_file = Path(target_dir) / "manifest.json"
with open(manifest_file) as f:
    manifest = json.load(f)

# Update assembly stage
manager.update_workflow_stage(
    "assembly",
    status="completed",
    file=f"{target_dir}/manuscript_full.md",
    manifest_file=f"{target_dir}/manifest.json",
    sections_included=manifest["statistics"]["total_sections"],
    total_word_count=manifest["statistics"]["total_words"],
    validation_warnings=0  # TODO: count actual warnings
)

print("✓ Workflow state updated")

Output Files

The assembly process generates:

{target_dir}/manuscript_full.md
- Complete manuscript with all sections
- Section headers and separators
- References placeholder
{target_dir}/manifest.json
- Assembly metadata
- Word counts per section
- Citation statistics
- Validation results

Validation Criteria

The manuscript is considered successfully assembled if:

✅ All required sections present and combined ✅ Word count statistics calculated ✅ Citations extracted and validated against .bib file ✅ Manifest generated with complete metadata ✅ Workflow state updated to mark assembly as completed

Display Summary

After successful assembly, display:

============================================================
MANUSCRIPT ASSEMBLY COMPLETE
============================================================

Output: {target_dir}/manuscript_full.md

Statistics:
  • Total words: [X]
  • Target: [Y] (Nature: 3000 words)
  • Status: [Within limit / Over by X words]
  • Citations: [N] unique citations
  • Sections: [M] sections included

Section Breakdown:
  • Abstract: [X] words
  • Introduction: [X] words
  • Results: [X] words
  • Discussion: [X] words
  • Methods: [X] words
  • Availability: [X] words

Next Steps:
  1. Review manuscript_full.md
  2. Run critique:
     /rrwrite-critique-manuscript --target-dir {target_dir} --file manuscript_full.md
  3. Address critique feedback
  4. Trim to target word count if needed

============================================================

Error Handling

Missing sections:

Error: Missing required sections: [list]
Please draft all sections before assembly:
  /rrwrite-draft-section --target-dir {target_dir} --section [name]

No sections found:

Error: No manuscript sections found in {target_dir}
Expected files: abstract.md, introduction.md, results.md, discussion.md, methods.md

Invalid target directory:

Error: Target directory not found: {target_dir}
Please specify a valid manuscript directory

Notes

Assembly does NOT modify individual section files
Original sections remain unchanged in {target_dir}
Manifest can be regenerated by re-running assembly
Word counts exclude markdown formatting and citations
Nature format uses Abstract → Intro → Results → Discussion → Methods order
For other journals (PLOS, Bioinformatics), adjust section order as needed