name: test-drive description: Use when the user has a plausible but untested idea, belief, thesis, strategy, decision, prompt, skill, message, artifact, or AI output and asks how to validate it, test it, try it, de-risk it, gather evidence, run analysis, run a regression, dry-run it, evaluate it before shipping, identify needed connectors, or turn judgment into evidence-seeking action.
Test Drive
Test Drive helps people test an idea, claim, or decision before they trust it.
The skill turns a plausible idea, claim, decision, belief, strategy, prompt, skill, message, artifact, or AI output into the smallest credible trial that can create useful evidence. It classifies the type of evidence needed, chooses a test route, drafts or builds the test artifact when safe, identifies connectors or capabilities required, and defines what would change the user's mind.
The goal is not to prove the user right. The goal is to create a low-risk learning loop before the user overcommits.
When To Use
Use Test Drive when the user is asking, explicitly or implicitly:
- "How do I test this?"
- "How do I validate this idea?"
- "What evidence would tell me if this is true?"
- "How do I know if this message, prompt, skill, plan, or strategy works?"
- "Can we run an analysis or regression?"
- "What would be the smallest experiment?"
- "What connector, tool, or data would we need?"
- "Can you create the thing needed to test it?"
Also use it when the user has something plausible but not yet trustworthy: a product idea, career move, hiring read, strategy, positioning line, customer insight, research thesis, data claim, prompt, skill, prototype, memo, or recommendation.
When Not To Use
- Use The Briefing Room when the context is still too messy to identify the belief or test.
- Use Ground Truth when the user mainly wants critique of reasoning, not an evidence plan.
- Use The Quorum when the user needs multi-lens deliberation on a consequential decision before choosing what to test.
- Use direct execution when the user has already specified the exact task and only needs it done.
- Ask clarifying questions first when the proposed test would require private data, external action, spending money, or irreversible commitments.
Core Principles
1. Test before trust. Treat plausible ideas as candidates for evidence, not conclusions.
2. Route by evidence type, not domain. Do not start by asking whether the idea is "product," "career," "strategy," or "writing." Start by asking what kind of evidence would make it more or less trustworthy.
3. Prefer the smallest credible test. Choose the lowest-friction action that can produce meaningful signal. Do not propose a giant research project when a small dry run, interview, data cut, or prototype would teach enough.
4. Create the test artifact when safe. When useful, draft the prompt, email, survey, interview guide, data query, eval rubric, landing-page outline, GitHub issue, or skill-test scenario needed to run the test.
5. Diagnose connectors and capabilities. Name the tools, connectors, data access, or permissions required. Include minimum scope and a manual fallback when possible.
6. Respect approval gates. Do not send, post, publish, buy, book, scrape private data, query sensitive systems, or change external state without explicit approval.
7. Define what would change the user's mind. A test is weak if success and failure signals are vague. Make the learning criteria explicit before execution.
8. Dry-run before external action when risk is real. If the test is public, sensitive, expensive, hard to undo, or involves other people, prefer a dry run, review artifact, or simulated pass before taking external action.
Evidence Types
Classify the primary evidence type before designing the test. Add secondary evidence types when needed.
| Evidence Type | Use When The Idea Depends On... | Common Test Routes |
|---|---|---|
| Human Reaction | what people understand, want, remember, trust, choose, or reject | interviews, surveys, message tests, posts, prototypes, feedback requests |
| Behavioral / Data | what people actually do or what a dataset shows | cohort analysis, funnel analysis, segmentation, correlation check, regression, before/after readout |
| Reasoning | whether the logic, assumptions, or argument holds | Ground Truth, adversarial review, counterexample search, assumption audit |
| Expert Judgment | trade-offs with no single obvious metric | The Quorum, scenario analysis, pre-mortem, stakeholder lens review |
| Artifact Performance | whether the thing works in use | dry run, eval cases, rubric scoring, edge-case testing, sample output comparison |
| Operational Feasibility | whether the workflow, connector, process, or rollout can work | capability check, permission review, implementation spike, pilot workflow, manual fallback |
If multiple evidence types apply, choose the one that would most change the user's next action.
Test Routes By Evidence Type
Use these route patterns to make the output concrete. Adapt them to the user's context, but do not skip the artifact, signal, and approval-gate pieces.
Human Reaction
Use when the idea depends on what people understand, feel, want, remember, trust, choose, or reject.
Default route:
- Name the audience segment.
- Draft the message, survey, interview guide, prototype brief, or feedback request.
- Define the response signal that would matter.
- Define the failure signal.
- Identify the channel or connector, plus manual fallback.
Good artifacts: email, Slack post, LinkedIn post, interview guide, survey, landing-page outline, prototype feedback script.
Behavioral / Data
Use when the idea depends on observed behavior or a dataset.
Default route:
- State the analysis question.
- Define the unit of analysis.
- Name the outcome variable, explanatory variables, and controls.
- Pick the lightest credible method before regression.
- Identify the data source or connector.
- State caveats and what result would change confidence.
Good artifacts: analysis plan, spreadsheet instructions, SQL sketch, notebook outline, data request, regression readiness checklist.
Reasoning
Use when the idea depends on logic, assumptions, argument quality, or counterexamples.
Default route:
- Name the core claim.
- Identify the assumption most worth attacking.
- Route to Ground Truth or draft an adversarial review prompt.
- Define what critique would require revision.
Good artifacts: Ground Truth prompt, assumption audit, counterexample prompt, red-team review brief.
Expert Judgment
Use when the idea has consequential trade-offs without a single obvious metric.
Default route:
- State the decision or trade-off.
- Name the perspectives needed.
- Route to The Quorum when stakes justify it.
- Define what disagreement or pre-mortem finding would change the next step.
Good artifacts: Quorum decision brief, scenario matrix, stakeholder lens review, pre-mortem prompt.
Artifact Performance
Use when the thing itself needs to be tried before reuse or publication.
Default route:
- Name the artifact and intended job.
- Create realistic eval cases or dry-run scenarios.
- Define pass/fail criteria.
- Run or propose the dry run.
- Recommend revisions before external use.
Good artifacts: eval prompts, rubric, edge-case list, sample input set, before/after comparison.
Operational Feasibility
Use when the question is whether a workflow, connector, data pull, integration, or process can actually work.
Default route:
- Name the needed capability.
- Explain why it matters to the test.
- Specify minimum permission scope.
- Identify available and missing connectors.
- Provide a manual fallback or implementation request.
- Ask for approval before configuration or external action.
Good artifacts: connector scope, implementation request, pilot checklist, manual fallback steps.
Default Workflow
Name the thing being tested. State the belief, idea, decision, claim, artifact, or output in one sentence.
Classify evidence type. Identify primary and secondary evidence types, with a short reason.
Pick the smallest credible test. Choose a test that is low-risk, fast enough to run, and capable of changing confidence.
Identify the artifact needed. Name what must exist to run the test: prompt, message, survey, interview guide, data query, prototype brief, eval rubric, landing-page outline, GitHub issue, connector spec, or skill invocation.
Capability check. Name required connectors, tools, data, permissions, or skills. Separate available, missing, optional, and fallback paths when known.
Draft or build the artifact. If the artifact is text, a prompt, a test plan, a query outline, or a safe local file, create it. If the artifact needs external action or private data, stop at an approval gate.
Define signals. Specify support signal, weaken signal, and what would change the user's mind.
Set the learning loop. Say what to collect, how to interpret it, and what skill or action should happen next.
Default Output Format
Use this structure unless the user asks for a shorter or more specialized output.
# Test Drive
## What We Are Testing
The belief, idea, decision, claim, artifact, or output being tested.
## Evidence Type
- Primary:
- Secondary:
- Why this route fits:
## Smallest Credible Test
The lowest-friction trial that can produce useful evidence.
## Artifact Needed
What needs to be created to run the test.
## Capability Check
Required:
- ...
Useful:
- ...
Available:
- ...
Missing:
- ...
Fallback:
- ...
## Draft / Build
The prompt, message, survey, interview guide, query outline, eval rubric, prototype brief, or other artifact needed.
## Signals
Support signal:
Weaken signal:
What would change my mind:
## Approval Gate
What needs explicit user approval before external action.
## Learning Loop
What to collect, how to interpret it, and what to do next.
For small requests, compress this into: Test Route, Artifact, Signals, Next Step.
Connector And Tool Guidance
When a connector or tool would improve the test, say so plainly. Do not silently assume access.
Use this language:
## Connector / Capability Needed
Needed capability:
Why it matters:
Minimum scope:
Approval needed:
Manual fallback:
Common connector paths:
| Need | Possible Capability |
|---|---|
| Send validation emails | Gmail or email connector |
| Post or pulse a team | Slack connector |
| Analyze spreadsheet data | Google Sheets, CSV, spreadsheet tool |
| Query warehouse data | BigQuery, Snowflake, Postgres, or approved data connector |
| Create docs or surveys | Google Docs, Notion, Forms, or manual doc |
| Test a website or landing page | Browser, Playwright, analytics, or manual review |
| Create an issue or eval task | GitHub connector |
| Schedule interviews | Calendar connector |
Always include a manual fallback unless the test is impossible without the connector.
Statistical And Analytical Tests
Use analytical tests when the claim depends on data rather than opinion.
Do not jump to regression by default. Pick the lightest credible analysis:
- Descriptive comparison
- Segmentation or cohort cut
- Funnel or pre/post analysis
- Correlation check
- Regression or causal model when controls, sample size, and data quality justify it
For analytical tests, include:
- analysis question
- dataset needed
- unit of analysis
- outcome variable
- explanatory variables
- controls or confounders
- minimum viability concerns
- suggested method
- interpretation caveats
- connector or data access needed
If the environment has data access and the user approves, run the analysis. If not, draft the query, notebook plan, or data request.
Multi-Agent And Skill Routing
Use other skills when they are the right test route:
- Use The Briefing Room if source context needs organizing before a test can be designed.
- Use Ground Truth if the best test is adversarial reasoning, assumption audit, or counterexample search.
- Use The Quorum if the test requires multiple expert lenses or consequential trade-off deliberation.
- Use a data, browser, document, spreadsheet, GitHub, Gmail, Slack, Notion, or calendar connector only when the evidence route requires it.
When recommending another skill, explain why:
Recommended route: Ground Truth
Why: The main uncertainty is reasoning quality, not human reaction or data.
Examples
Positioning
User: "I think 'judgment infrastructure' is the right frame. How do I test it?"
Good Test Drive: classify as Human Reaction + Artifact Performance, draft two post variants, define resonance signals, recommend manual LinkedIn posting or a social connector if available, and suggest feeding responses into The Briefing Room.
Data claim
User: "We think onboarding improves retention."
Good Test Drive: classify as Behavioral / Data, recommend cohort analysis before regression, list data needed, identify controls, name warehouse or Sheets connector, and define what effect size or caveat would change confidence.
Skill before publishing
User: "Test drive this skill before I ship it."
Good Test Drive: classify as Artifact Performance + Reasoning, create realistic eval scenarios, define pass/fail criteria, recommend Ground Truth for critique, and identify revisions before release.
Common Mistakes
| Mistake | Fix |
|---|---|
| Recommending a big experiment | Shrink to the smallest credible test. |
| Treating critique as evidence | Use Ground Truth for critique, then Test Drive for action or signal design. |
| Picking a domain before evidence type | Classify by what would make the belief more trustworthy. |
| Running external action too soon | Draft the artifact and stop at the approval gate. |
| Saying "use data" without an analysis plan | Name the dataset, variables, method, and caveats. |
| Saying "needs Gmail/Slack/etc." without scope | Explain why, minimum permissions, and fallback. |
| Defining only success | Include weaken signals and what would change the user's mind. |
| Overfitting the test to prove the idea right | Design for learning, not confirmation. |
Final Check
Before answering, check:
- Is the belief or artifact being tested stated clearly?
- Did you classify the evidence type?
- Is the test small enough to run but strong enough to teach?
- Did you name the artifact needed?
- Did you identify required connectors, permissions, and fallback?
- Did you stop before external action unless explicitly approved?
- Are support, weaken, and mind-changing signals clear?
- Does the next step preserve human agency?