name: E2E Test Runner description: "Provides the ability to run and iterate on HyperShift e2e tests. Auto-applies when implementing features that require e2e validation, fixing e2e test failures, or working on tasks that need live cluster testing."
HyperShift E2E Test Runner
This skill enables autonomous iteration on e2e tests - running tests, analyzing failures, making fixes, and re-running until tests pass.
When to Use This Skill
This skill automatically applies when:
- Implementing a feature that needs e2e test validation
- Fixing a failing e2e test
- Working on a task where the user wants you to iterate until tests pass
- Debugging test failures in the
test/e2e/directory - The user mentions running e2e tests or validating changes against a live cluster
Prerequisites
Source the environment file before using this skill:
source dev/claude-env.sh
Environment Configuration
Environment variables from dev/claude-env.sh:
| Variable | Description |
|---|---|
E2E_PLATFORM |
Test platform (AWS, Azure, etc.) |
AWS_CREDENTIALS |
Path to AWS credentials file |
OIDC_BUCKET |
S3 bucket for OIDC |
BASE_DOMAIN |
Base DNS domain |
PULL_SECRET |
Path to pull secret file |
AWS_REGION |
AWS region |
E2E_ARTIFACT_DIR |
Directory for test artifacts |
MGMT_KUBECONFIG |
Path to management cluster kubeconfig |
CPO_IMAGE_REPO |
Custom CPO image repository |
RUNTIME |
Container runtime (podman/docker) |
Running E2E Tests
Step 1: Check if Test Binary Needs Rebuilding
CRITICAL: Before running any e2e test, you MUST check if the test binary needs rebuilding:
# Check if binary exists
if [ ! -f ./bin/test-e2e ]; then
echo "Test binary missing, building..."
make e2e
fi
# Check if any test files are newer than the binary
NEWEST_TEST=$(find test/e2e -name "*.go" -newer ./bin/test-e2e 2>/dev/null | head -1)
if [ -n "$NEWEST_TEST" ]; then
echo "Test files changed (e.g., $NEWEST_TEST), rebuilding..."
make e2e
fi
Step 2: Run the Test
Build and execute the test command:
KUBECONFIG=$MGMT_KUBECONFIG \
./bin/test-e2e -test.v -test.timeout 2h \
-test.run "TEST_PATTERN" \
-test.v \
--e2e.platform $E2E_PLATFORM \
--e2e.aws-credentials-file $AWS_CREDENTIALS \
--e2e.aws-oidc-s3-bucket-name $OIDC_BUCKET \
--e2e.base-domain $BASE_DOMAIN \
--e2e.pull-secret-file $PULL_SECRET \
--e2e.aws-region $AWS_REGION \
--e2e.artifact-dir $E2E_ARTIFACT_DIR
Step 3: Add Custom CPO Image (When Testing Control Plane Changes)
If you've made changes to control-plane-operator code and built a custom image, add:
-e2e.control-plane-operator-image $CPO_IMAGE_REPO:TAG
Iteration Loop
When working autonomously on a task that requires e2e validation:
1. Initial Test Run
Run the test to establish baseline:
KUBECONFIG=$MGMT_KUBECONFIG ./bin/test-e2e -test.v -test.run "TestName" [flags...]
2. On Failure - Analyze
- Read the test output carefully
- Check artifacts in
$E2E_ARTIFACT_DIR/directory for:- Pod logs
- Events
- Resource states
- Identify the root cause
3. Make Fixes
- Edit the relevant code (test code, operator code, etc.)
- If you modified
test/e2e/*.gofiles, the binary will be rebuilt automatically on next run
4. Rebuild Images (If Needed)
If you modified control-plane-operator code: Use the build-cpo-image skill to build and push a new image.
$RUNTIME build -f Dockerfile.control-plane --platform linux/amd64 -t $CPO_IMAGE_REPO:NEW_TAG .
$RUNTIME push $CPO_IMAGE_REPO:NEW_TAG
5. Re-run Test
Run the test again with updated code/images. Repeat until passing.
Common Test Patterns
| Test Pattern | Description |
|---|---|
TestNodePool |
All NodePool tests |
TestNodePool/HostedCluster0/Main/TestSpotTerminationHandler |
Specific spot test |
TestNodePool.*Karpenter |
All Karpenter-related tests |
TestCreateCluster |
Cluster creation tests |
TestUpgrade |
Upgrade tests |
Analyzing Test Failures
Check Test Output
The test output includes:
- Test name and status
- Assertion failures with expected vs actual
- Timeout information
- Resource creation/deletion logs
Check Artifact Directory
After a test failure, examine:
ls -la $E2E_ARTIFACT_DIR/
# Contains: cluster manifests, pod logs, events, resource dumps
Common Failure Patterns
| Pattern | Likely Cause |
|---|---|
context deadline exceeded |
Resource didn't reach expected state in time |
not found |
Resource wasn't created or was deleted prematurely |
connection refused |
Service not ready or network issue |
forbidden |
RBAC or permission issue |
Building Test Binary
When test code changes, rebuild:
make e2e
This compiles ./bin/test-e2e with all tests from test/e2e/.
Notes
- Tests typically take 10-30+ minutes depending on complexity
- Some tests create real AWS resources (costs money, needs cleanup on failure)
- Use
-test.timeoutto set appropriate timeouts (default: 2h) - The artifact directory is overwritten on each run
- For long tests, consider running in background and checking periodically