name: rrwrite-assemble description: Assembles all manuscript sections into a complete manuscript with validation and metadata generation arguments:
- name: target_dir description: Output directory containing manuscript sections default: manuscript allowed-tools: context: fork
Manuscript Assembly Protocol
Purpose
Combine all drafted manuscript sections into a complete manuscript file with validation, metadata generation, and quality checks.
Prerequisites
Required files in {target_dir}:
abstract.mdintroduction.mdmethods.mdresults.mddiscussion.mdavailability.md(optional)
Recommended:
literature_citations.bib- For citation validationoutline.md- For structure verification
Workflow
Phase 1: Pre-Assembly Validation
Check Required Sections:
cd {target_dir} required_sections=("abstract.md" "introduction.md" "results.md" "discussion.md" "methods.md") missing=() for section in "${required_sections[@]}"; do if [ ! -f "$section" ]; then missing+=("$section") fi done if [ ${#missing[@]} -gt 0 ]; then echo "Error: Missing required sections: ${missing[*]}" exit 1 fiVerify Section Completion: Check workflow state to ensure all sections are marked as completed:
import json from pathlib import Path state_file = Path("{target_dir}/.rrwrite/state.json") if state_file.exists(): with open(state_file) as f: state = json.load(f) sections = state.get("workflow_status", {}).get("drafting", {}).get("sections", {}) incomplete = [s for s, data in sections.items() if data.get("status") != "completed"] if incomplete: print(f"Warning: Incomplete sections: {incomplete}")
Phase 2: Section Assembly
Combine Sections in Order:
For Nature format: Abstract → Introduction → Results → Discussion → Methods → Availability
cd {target_dir} # Clear or create output file > manuscript_full.md # Add header cat >> manuscript_full.md << 'EOF' # MicroGrowAgents Manuscript **Target Journal:** Nature **Date:** $(date +%Y-%m-%d) --- EOF # Combine sections echo "# Abstract" >> manuscript_full.md echo "" >> manuscript_full.md cat abstract.md >> manuscript_full.md echo -e "\n\n---\n" >> manuscript_full.md echo "# Introduction" >> manuscript_full.md echo "" >> manuscript_full.md cat introduction.md >> manuscript_full.md echo -e "\n\n---\n" >> manuscript_full.md echo "# Results" >> manuscript_full.md echo "" >> manuscript_full.md cat results.md >> manuscript_full.md echo -e "\n\n---\n" >> manuscript_full.md echo "# Discussion" >> manuscript_full.md echo "" >> manuscript_full.md cat discussion.md >> manuscript_full.md echo -e "\n\n---\n" >> manuscript_full.md echo "# Methods" >> manuscript_full.md echo "" >> manuscript_full.md cat methods.md >> manuscript_full.md echo -e "\n\n---\n" >> manuscript_full.md if [ -f "availability.md" ]; then echo "# Data and Code Availability" >> manuscript_full.md echo "" >> manuscript_full.md cat availability.md >> manuscript_full.md echo -e "\n\n---\n" >> manuscript_full.md fi # Add references section placeholder echo "# References" >> manuscript_full.md echo "" >> manuscript_full.md echo "[References will be generated from literature_citations.bib]" >> manuscript_full.md echo "✓ Manuscript assembled: manuscript_full.md"
Phase 3: Metadata Generation
- Calculate Statistics:
import re from pathlib import Path from datetime import datetime import json target_dir = Path("{target_dir}") manuscript_file = target_dir / "manuscript_full.md" # Read manuscript with open(manuscript_file) as f: content = f.read() # Count words (exclude markdown headers, code blocks) text_only = re.sub(r'```.*?```', '', content, flags=re.DOTALL) text_only = re.sub(r'^#.*$', '', text_only, flags=re.MULTILINE) text_only = re.sub(r'\[.*?\]', '', text_only) words = len(text_only.split()) # Count citations citations = re.findall(r'\[([a-zA-Z0-9,\s]+)\]', content) unique_citations = set() for cite_group in citations: for cite in cite_group.split(','): cite = cite.strip() if cite and cite[0].islower(): # Citation keys start with lowercase unique_citations.add(cite) # Count sections sections = re.findall(r'^# (.+)$', content, flags=re.MULTILINE) # Section word counts section_words = {} current_section = None current_text = [] for line in content.split('\n'): if line.startswith('# '): if current_section: section_text = ' '.join(current_text) section_text = re.sub(r'\[.*?\]', '', section_text) section_words[current_section] = len(section_text.split()) current_section = line[2:].strip() current_text = [] else: current_text.append(line) if current_section: section_text = ' '.join(current_text) section_text = re.sub(r'\[.*?\]', '', section_text) section_words[current_section] = len(section_text.split()) # Generate manifest manifest = { "version": "1.0", "generated": datetime.now().isoformat(), "manuscript_file": "manuscript_full.md", "statistics": { "total_words": words, "total_sections": len(sections), "unique_citations": len(unique_citations), "section_breakdown": section_words }, "sections_included": sections, "validation": { "all_required_sections": True, "word_count_target": 3000, # Nature "within_limit": words <= 3500 } } # Save manifest manifest_file = target_dir / "manifest.json" with open(manifest_file, 'w') as f: json.dump(manifest, f, indent=2) print(f"✓ Generated manifest: {manifest_file}") print(f" Total words: {words}") print(f" Citations: {len(unique_citations)}") print(f" Sections: {len(sections)}")
Phase 4: Quality Checks
Check Citation Validity:
import re from pathlib import Path target_dir = Path("{target_dir}") manuscript_file = target_dir / "manuscript_full.md" bib_file = target_dir / "literature_citations.bib" # Extract citations from manuscript with open(manuscript_file) as f: content = f.read() cited_keys = set() for cite_group in re.findall(r'\[([a-zA-Z0-9,\s]+)\]', content): for cite in cite_group.split(','): cite = cite.strip() if cite and cite[0].islower(): cited_keys.add(cite) # Extract citation keys from .bib file if bib_file.exists(): with open(bib_file) as f: bib_content = f.read() bib_keys = set(re.findall(r'@\w+\{([^,]+),', bib_content)) # Check for missing citations missing = cited_keys - bib_keys if missing: print(f"⚠ Warning: Citations not in .bib file: {missing}") else: print(f"✓ All {len(cited_keys)} citations found in .bib file") else: print("⚠ Warning: No literature_citations.bib file found")Check Word Count Compliance:
import json from pathlib import Path target_dir = Path("{target_dir}") manifest_file = target_dir / "manifest.json" with open(manifest_file) as f: manifest = json.load(f) total_words = manifest["statistics"]["total_words"] target = manifest["validation"]["word_count_target"] if total_words > target * 1.2: # 20% over print(f"⚠ Warning: Manuscript is {total_words - target} words over target ({total_words}/{target})") print(f" Recommendation: Trim {total_words - target} words before submission") elif total_words < target * 0.8: # 20% under print(f"⚠ Warning: Manuscript is {target - total_words} words under target ({total_words}/{target})") else: print(f"✓ Word count within acceptable range: {total_words}/{target}")
Phase 5: State Update
- Update Workflow State:
import sys from pathlib import Path sys.path.insert(0, str(Path('scripts').resolve())) from rrwrite_state_manager import StateManager import json target_dir = "{target_dir}" manager = StateManager(output_dir=target_dir, enable_git=False) # Load manifest for statistics manifest_file = Path(target_dir) / "manifest.json" with open(manifest_file) as f: manifest = json.load(f) # Update assembly stage manager.update_workflow_stage( "assembly", status="completed", file=f"{target_dir}/manuscript_full.md", manifest_file=f"{target_dir}/manifest.json", sections_included=manifest["statistics"]["total_sections"], total_word_count=manifest["statistics"]["total_words"], validation_warnings=0 # TODO: count actual warnings ) print("✓ Workflow state updated")
Output Files
The assembly process generates:
{target_dir}/manuscript_full.md- Complete manuscript with all sections
- Section headers and separators
- References placeholder
{target_dir}/manifest.json- Assembly metadata
- Word counts per section
- Citation statistics
- Validation results
Validation Criteria
The manuscript is considered successfully assembled if:
✅ All required sections present and combined ✅ Word count statistics calculated ✅ Citations extracted and validated against .bib file ✅ Manifest generated with complete metadata ✅ Workflow state updated to mark assembly as completed
Display Summary
After successful assembly, display:
============================================================
MANUSCRIPT ASSEMBLY COMPLETE
============================================================
Output: {target_dir}/manuscript_full.md
Statistics:
• Total words: [X]
• Target: [Y] (Nature: 3000 words)
• Status: [Within limit / Over by X words]
• Citations: [N] unique citations
• Sections: [M] sections included
Section Breakdown:
• Abstract: [X] words
• Introduction: [X] words
• Results: [X] words
• Discussion: [X] words
• Methods: [X] words
• Availability: [X] words
Next Steps:
1. Review manuscript_full.md
2. Run critique:
/rrwrite-critique-manuscript --target-dir {target_dir} --file manuscript_full.md
3. Address critique feedback
4. Trim to target word count if needed
============================================================
Error Handling
Missing sections:
Error: Missing required sections: [list]
Please draft all sections before assembly:
/rrwrite-draft-section --target-dir {target_dir} --section [name]
No sections found:
Error: No manuscript sections found in {target_dir}
Expected files: abstract.md, introduction.md, results.md, discussion.md, methods.md
Invalid target directory:
Error: Target directory not found: {target_dir}
Please specify a valid manuscript directory
Notes
- Assembly does NOT modify individual section files
- Original sections remain unchanged in {target_dir}
- Manifest can be regenerated by re-running assembly
- Word counts exclude markdown formatting and citations
- Nature format uses Abstract → Intro → Results → Discussion → Methods order
- For other journals (PLOS, Bioinformatics), adjust section order as needed