graphicalmcpgsd2 - SKILL.md Agent Skill

name: graphicalMCPgsd2 description: > Guide users through group sequential design with graphical multiplicity control using the graphicalMCP and gsDesign2 R packages. Use this skill whenever the user asks about: group sequential designs with multiple hypotheses, graphical multiplicity testing, sequential p-values with gsDesign2, combining graphicalMCP with gsDesign2, clinical trial designs with multiple endpoints and populations, Maurer-Bretz procedures, alpha-spending with multiplicity graphs, or adapting the gMCPLite vignette template. Also trigger when users mention spending time, information fraction, or sequential p-values in the context of group sequential or graphical testing.

Group Sequential Design with graphicalMCP + gsDesign2

This skill helps users design and analyze clinical trials that combine graphical multiplicity control (graphicalMCP) with group sequential designs (gsDesign2). The workflow follows the Maurer-Bretz (2013) framework.

When to use this skill

Setting up a multiplicity graph for multiple hypotheses (endpoints x populations)
Designing group sequential bounds for each hypothesis using gsDesign2
Computing sequential p-values from observed data
Testing hypotheses using graphicalMCP with sequential p-values
Verifying rejection decisions with updated group sequential bounds

Required packages

library(dplyr)
library(tibble)
library(gsDesign)
library(gsDesign2)
library(graphicalMCP)

gsDesign2 must export sequential_pval(). Install from GitHub if needed:

remotes::install_github("Merck/gsDesign2")

Workflow overview

The workflow has 4 phases:

Phase 1: Design specification

Define hypotheses — typically endpoints (OS, PFS, ORR) crossed with populations (subgroup, overall).
Build the multiplicity graph — assign initial alpha weights and transition matrix using graphicalMCP::graph_create().
Choose a sample-size-driving hypothesis (typically OS in the subgroup). Design it with gsDesign2::gs_design_ahr() targeting the desired power (e.g., 90%). Use info_frac = NULL and specify analysis_time as calendar months; gsDesign2 derives the information fraction from the enrollment/failure rate assumptions and analysis timing.
Derive enrollment rates from the driving hypothesis. The subgroup enrollment rate comes directly from the design output. The complement enrollment rate is scaled by prevalence: rate_complement = rate_sub * (1 - prevalence) / prevalence. Build stratified enrollment for overall population designs using define_enroll_rate() with stratum columns.
Compute power for remaining hypotheses:
- Time-to-event hypotheses (OS, PFS) in the subgroup or overall: use gsDesign2::gs_power_ahr() with the derived enrollment rates. Pass event = NULL so analysis_time drives the design. For overall population, use stratified fail_rate with different HRs per stratum.
- Binary endpoints (ORR): use gsDesign2::fixed_design_rd() with sample sizes derived from the driving hypothesis. These get NULL in the design list.
Specify analysis timing rules — document when each analysis is triggered (minimum follow-up after FPE, event count thresholds, maximum extensions).
Store designs in an ordered list matching the hypothesis order in the graph. Use NULL for non-GSD hypotheses.

Phase 2: Results entry

Record event counts at each analysis for each hypothesis.
Record nominal one-sided p-values for each analysis of each hypothesis.
Compute spending times — typically events / max(events) using the subgroup information fraction. The spending time must reach 1 at the final analysis of each hypothesis.

Phase 3: Hypothesis testing

Compute sequential p-values using gsDesign2::sequential_pval() for each group sequential hypothesis. For non-GSD hypotheses, the nominal p-value is the sequential p-value.
Test with graphicalMCP using graphicalMCP::graph_test_shortcut() with the sequential p-values and total FWER alpha.

Phase 4: Verification

Extract the graph update sequence using graphicalMCP::graph_update() to see the multiplicity graph at each rejection step.
Update group sequential bounds at the maximum alpha allocated to each hypothesis using gsDesign2::gs_update_ahr().
Compare nominal p-values to updated bounds to confirm rejection decisions.

Key code patterns

For detailed code templates covering each phase, read references/code_patterns.md.

Related skills

illness-death model: For simulating correlated OS, PFS, and ORR endpoints used in the vignette template. See the illness-death skill.

Important design considerations

H1 drives sample size: One hypothesis (typically OS in the subgroup) is designed with gs_design_ahr() to determine enrollment rates and sample sizes. All other hypotheses derive their enrollment from H1 using gs_power_ahr() (time-to-event) or fixed_design_rd() (binary).
Stratified overall designs: Overall population designs use stratified define_enroll_rate() and define_fail_rate() with separate stratum rows (e.g., "BM+" and "BM-") and stratum-specific HRs or rates.
gs_power_ahr() API: Does not accept info_frac. Use event = NULL with analysis_time to let timing drive the design. If event is not set to NULL, the default c(30, 40, 50) may cause length mismatches for designs with fewer analyses.
fixed_design_rd() output: Returns a fixed_design object. Wrap with summary() before piping to gt() or kable().
Zero initial alpha: If a hypothesis starts with alpha=0 (receives alpha only through reallocation), use another hypothesis's alpha for the bounds structure in the design. The actual testing uses the reallocated alpha from the graph.
Spending time vs information fraction: Spending time determines how alpha is allocated across analyses. Information fraction drives the correlation structure. Both are needed for bound computation.
Non-binding futility: Use binding = FALSE so efficacy bounds are computed ignoring the futility bound, preserving Type I error control even if the trial continues past a futility crossing. Theoretical basis: Liu & Anderson (2008) Theorem 1.
Hung-Wang-O'Neill warning: Naively testing a secondary endpoint at level α whenever the primary is significant does NOT control FWER in a group sequential trial (Hung, Wang & O'Neill, 2007). Must use sequential p-values with a proper closed testing or graphical procedure.
Well-ordered spending functions: For the Maurer-Bretz graphical procedure to be consonant (sequentially rejective), the nominal significance levels α*_t(γ) must be non-decreasing in γ. Qualified families: power (αt^ρ, all ρ > 0), Pocock-type, OBF-type (for γ < 0.318, covering all practical significance levels). See Maurer & Bretz (2013).
Time travel: If OS hypotheses are rejected and alpha passes to previously-tested PFS hypotheses, those PFS tests can be re-evaluated with updated bounds. This controls Type I error per Liu & Anderson (2008).
One-sided testing: Maurer-Bretz designs assume one-sided testing or non-binding futility bounds.
Alpha spending function: Lan-DeMets spending approximating O'Brien-Fleming (sfLDOF) is a common default.

Key references

Maurer W, Bretz F. Multiple testing in group sequential trials using graphical approaches. Stat Biopharm Res 2013; 5:311–320.
Liu Q, Anderson KM. On adaptive extensions of group sequential trials for clinical investigations. JASA 2008; 103:1621–1630.
Hung HMJ, Wang SJ, O'Neill R. Statistical considerations for testing multiple endpoints in group sequential or adaptive clinical trials. J Biopharm Stat 2007; 17:1201–1210.
Spending time rules: Use pmin(planned_IF, actual_IF) at interim analyses to protect against over-spending. Align spending time across populations (H2 uses H1's spending time). At final analysis, spending time = 1.
Stratified ORR: Use gs_power_rd() with weight = "invar" for stratified overall population ORR, not fixed_design_rd().
Repeated p-values: In verification, compute both sequential p-values (cumulative evidence) and repeated p-values (single-analysis evidence) to understand each analysis's contribution.