name: graphicalMCPgsd2 description: > Guide users through group sequential design with graphical multiplicity control using the graphicalMCP and gsDesign2 R packages. Use this skill whenever the user asks about: group sequential designs with multiple hypotheses, graphical multiplicity testing, sequential p-values with gsDesign2, combining graphicalMCP with gsDesign2, clinical trial designs with multiple endpoints and populations, Maurer-Bretz procedures, alpha-spending with multiplicity graphs, or adapting the gMCPLite vignette template. Also trigger when users mention spending time, information fraction, or sequential p-values in the context of group sequential or graphical testing.
Group Sequential Design with graphicalMCP + gsDesign2
This skill helps users design and analyze clinical trials that combine graphical multiplicity control (graphicalMCP) with group sequential designs (gsDesign2). The workflow follows the Maurer-Bretz (2013) framework.
When to use this skill
- Setting up a multiplicity graph for multiple hypotheses (endpoints x populations)
- Designing group sequential bounds for each hypothesis using gsDesign2
- Computing sequential p-values from observed data
- Testing hypotheses using graphicalMCP with sequential p-values
- Verifying rejection decisions with updated group sequential bounds
Required packages
library(dplyr)
library(tibble)
library(gsDesign)
library(gsDesign2)
library(graphicalMCP)
gsDesign2 must export sequential_pval(). Install from GitHub if needed:
remotes::install_github("Merck/gsDesign2")
Workflow overview
The workflow has 4 phases:
Phase 1: Design specification
- Define hypotheses — typically endpoints (OS, PFS, ORR) crossed with populations (subgroup, overall).
- Build the multiplicity graph — assign initial alpha weights and transition matrix using
graphicalMCP::graph_create(). - Choose a sample-size-driving hypothesis (typically OS in the subgroup). Design it with
gsDesign2::gs_design_ahr()targeting the desired power (e.g., 90%). Useinfo_frac = NULLand specifyanalysis_timeas calendar months; gsDesign2 derives the information fraction from the enrollment/failure rate assumptions and analysis timing. - Derive enrollment rates from the driving hypothesis. The subgroup enrollment rate comes directly from the design output. The complement enrollment rate is scaled by prevalence:
rate_complement = rate_sub * (1 - prevalence) / prevalence. Build stratified enrollment for overall population designs usingdefine_enroll_rate()with stratum columns. - Compute power for remaining hypotheses:
- Time-to-event hypotheses (OS, PFS) in the subgroup or overall: use
gsDesign2::gs_power_ahr()with the derived enrollment rates. Passevent = NULLsoanalysis_timedrives the design. For overall population, use stratifiedfail_ratewith different HRs per stratum. - Binary endpoints (ORR): use
gsDesign2::fixed_design_rd()with sample sizes derived from the driving hypothesis. These getNULLin the design list.
- Time-to-event hypotheses (OS, PFS) in the subgroup or overall: use
- Specify analysis timing rules — document when each analysis is triggered (minimum follow-up after FPE, event count thresholds, maximum extensions).
- Store designs in an ordered list matching the hypothesis order in the graph. Use
NULLfor non-GSD hypotheses.
Phase 2: Results entry
- Record event counts at each analysis for each hypothesis.
- Record nominal one-sided p-values for each analysis of each hypothesis.
- Compute spending times — typically
events / max(events)using the subgroup information fraction. The spending time must reach 1 at the final analysis of each hypothesis.
Phase 3: Hypothesis testing
- Compute sequential p-values using
gsDesign2::sequential_pval()for each group sequential hypothesis. For non-GSD hypotheses, the nominal p-value is the sequential p-value. - Test with graphicalMCP using
graphicalMCP::graph_test_shortcut()with the sequential p-values and total FWER alpha.
Phase 4: Verification
- Extract the graph update sequence using
graphicalMCP::graph_update()to see the multiplicity graph at each rejection step. - Update group sequential bounds at the maximum alpha allocated to each hypothesis using
gsDesign2::gs_update_ahr(). - Compare nominal p-values to updated bounds to confirm rejection decisions.
Key code patterns
For detailed code templates covering each phase, read references/code_patterns.md.
Related skills
- illness-death model: For simulating correlated OS, PFS, and ORR endpoints used in the vignette template. See the
illness-deathskill.
Important design considerations
- H1 drives sample size: One hypothesis (typically OS in the subgroup) is designed with
gs_design_ahr()to determine enrollment rates and sample sizes. All other hypotheses derive their enrollment from H1 usinggs_power_ahr()(time-to-event) orfixed_design_rd()(binary). - Stratified overall designs: Overall population designs use stratified
define_enroll_rate()anddefine_fail_rate()with separate stratum rows (e.g., "BM+" and "BM-") and stratum-specific HRs or rates. gs_power_ahr()API: Does not acceptinfo_frac. Useevent = NULLwithanalysis_timeto let timing drive the design. Ifeventis not set toNULL, the defaultc(30, 40, 50)may cause length mismatches for designs with fewer analyses.fixed_design_rd()output: Returns afixed_designobject. Wrap withsummary()before piping togt()orkable().- Zero initial alpha: If a hypothesis starts with alpha=0 (receives alpha only through reallocation), use another hypothesis's alpha for the bounds structure in the design. The actual testing uses the reallocated alpha from the graph.
- Spending time vs information fraction: Spending time determines how alpha is allocated across analyses. Information fraction drives the correlation structure. Both are needed for bound computation.
- Non-binding futility: Use
binding = FALSEso efficacy bounds are computed ignoring the futility bound, preserving Type I error control even if the trial continues past a futility crossing. Theoretical basis: Liu & Anderson (2008) Theorem 1. - Hung-Wang-O'Neill warning: Naively testing a secondary endpoint at level α whenever the primary is significant does NOT control FWER in a group sequential trial (Hung, Wang & O'Neill, 2007). Must use sequential p-values with a proper closed testing or graphical procedure.
- Well-ordered spending functions: For the Maurer-Bretz graphical procedure to be consonant (sequentially rejective), the nominal significance levels α*_t(γ) must be non-decreasing in γ. Qualified families: power (αt^ρ, all ρ > 0), Pocock-type, OBF-type (for γ < 0.318, covering all practical significance levels). See Maurer & Bretz (2013).
- Time travel: If OS hypotheses are rejected and alpha passes to previously-tested PFS hypotheses, those PFS tests can be re-evaluated with updated bounds. This controls Type I error per Liu & Anderson (2008).
- One-sided testing: Maurer-Bretz designs assume one-sided testing or non-binding futility bounds.
- Alpha spending function: Lan-DeMets spending approximating O'Brien-Fleming (
sfLDOF) is a common default.
Key references
- Maurer W, Bretz F. Multiple testing in group sequential trials using graphical approaches. Stat Biopharm Res 2013; 5:311–320.
- Liu Q, Anderson KM. On adaptive extensions of group sequential trials for clinical investigations. JASA 2008; 103:1621–1630.
- Hung HMJ, Wang SJ, O'Neill R. Statistical considerations for testing multiple endpoints in group sequential or adaptive clinical trials. J Biopharm Stat 2007; 17:1201–1210.
- Spending time rules: Use
pmin(planned_IF, actual_IF)at interim analyses to protect against over-spending. Align spending time across populations (H2 uses H1's spending time). At final analysis, spending time = 1. - Stratified ORR: Use
gs_power_rd()withweight = "invar"for stratified overall population ORR, notfixed_design_rd(). - Repeated p-values: In verification, compute both sequential p-values (cumulative evidence) and repeated p-values (single-analysis evidence) to understand each analysis's contribution.