name: simulation description: Skill for simulating outbreaks, contact data, and transmission chains using simulist and epichains.
Epiverse-TRACE Simulation
[!IMPORTANT] Use the Introspection Protocol: See epiverse-overview skill for the protocol. Before generating code, verify package APIs and functions using R introspection commands.
[!NOTE] Workflow Context: This skill's code should be included in Quarto document code chunks (not separate R scripts). See the reporting skill for the complete workflow structure.
This skill provides tools for simulating synthetic outbreak data and transmission chains for testing, validation, and scenario analysis.
Packages
simulist
Purpose: Simulate individual-level outbreak data including line lists and contact tracing data.
Key Functions:
sim_linelist(): Simulate line list with cases, dates, outcomessim_contacts(): Simulate contact tracing datasim_outbreak(): Simulate both line list and contacts together
Post-processing Functions:
truncate_linelist(): Create real-time snapshot with right-truncationmessy_linelist(): Add realistic data quality issuescensor_linelist(): Censor dates to weekly/monthly
Key Features:
- Parameterized with epiparameter distributions
- Age-structured populations
- Age-stratified risks (hospitalization, death)
- Time-varying CFR
- Realistic data quality issues
epichains
Purpose: Simulate and analyze transmission chains using branching processes.
Key Functions:
simulate_chains(): Simulate full transmission trees with detailssimulate_chain_stats(): Simulate chain statistics only (faster)likelihood(): Calculate likelihood of observed chain sizes/lengthssummary(): Summarize chain statisticsaggregate(): Aggregate cases by time or generationplot(): Visualize chains
Key Features:
- Track infection trees
- Calculate chain sizes and lengths
- Population effects (finite population, immunity)
- Generation times
Typical Workflow
Simulating Line List Data
Get epidemiological parameters from
epiparameter:- Contact distribution
- Infectious period
- Onset to hospitalization
- Onset to death
Call
sim_linelist()orsim_outbreak()with:- Reproduction number (via contact distribution and infection probability)
- Delay distributions
- Hospitalization and death risks
- Population structure
Post-process if needed:
truncate_linelist()for real-time analysismessy_linelist()for realistic data quality issues
Simulating Transmission Chains
- Define offspring distribution (Poisson, negative binomial)
- Set generation time distribution
- Call
simulate_chains()for full details orsimulate_chain_stats()for speed - Analyze with
summary(),aggregate(),plot()
Chain Size Analysis
- Simulate chains with
simulate_chain_stats(statistic = "size") - Calculate outbreak probability with results
- Compare to empirical data using
likelihood()
Important Simulation Considerations
Reproduction Number
- R = contact_rate × probability_of_infection
- If R > 1, outbreaks can grow very large
- Use
outbreak_sizeparameter to cap maximum size - No susceptible depletion in basic model
Right-Truncation
Real-time outbreak analysis requires accounting for incomplete data:
# Simulate full outbreak
full_outbreak <- sim_linelist(...)
# Create real-time snapshot
realtime_data <- truncate_linelist(
linelist = full_outbreak,
max_date = as.Date("2023-03-01")
)
Realistic Data Quality
Add common data issues for testing cleaning pipelines:
messy_data <- messy_linelist(
linelist = clean_linelist,
proportion_missing = 0.1,
proportion_dates_inconsistent = 0.05
)
Best Practices
- Always set
set.seed()for reproducibility - Use parameters from literature via
epiparameter - Document parameter sources and justifications
- Test cleaning pipelines on simulated data before real data
- Use time-varying CFR for longer outbreaks
- Consider age-structure when relevant
- Cap outbreak size to prevent excessive simulation time
Common Use Cases
- Testing Analysis Methods: Generate data with known properties to validate methods
- Scenario Analysis: Compare intervention effects on outbreak dynamics
- Method Validation: Test whether analysis methods can recover true parameters
- Training Data: Create realistic examples for teaching
- Pipeline Testing: Validate data cleaning and analysis workflows
- Sensitivity Analysis: Assess impact of parameter uncertainty
Integration with Other Skills
- parameters: Provides realistic distributions for simulation
- data-intake: Simulated data tests cleaning pipelines
- analysis: Simulated data validates analysis methods
- visualisation: Creates epidemic curves and transmission networks
- reporting: Documents simulation methods and parameters