name: evil-tdd description: Use "evil" mode to force proper TDD discipline and best test design. An evil implementer writes code to pass tests while breaking the spirit of TDD, forcing you to write tests so precise they prevent bad implementations. Ensures tests truly specify behavior, not just documentation. argument-hint: [feature to test] [test scenario] user-invokable: true disable-model-invocation: false
Evil TDD: A Forcing Function for Perfect Tests
This skill guides you through Evil TDD—a collaborative practice where one developer writes deliberately "evil" implementations designed to pass bad tests while violating the spirit of proper design. This forces the other developer to write tests so precise and comprehensive that only correct, well-designed code can pass them.
What is Evil TDD?
Evil TDD is a design forcing function based on the Red-Green-Refactor cycle:
- RED: You write a test for a feature
- EVIL GREEN: A partner writes the worst possible code that passes your test (while technically passing)
- Your response: Recognize what your test failed to specify, and write a better test
- REAL GREEN: Only when tests are ironclad can "good" code pass them
- REFACTOR: Clean up the implementation with confidence tests won't break
The goal: Write tests so precise they guarantee good design, because they've already defeated every cheap hack and shortcut.
Why Evil TDD Works
When you write a test, you think it's perfect. But an evil implementer will find every loophole:
- They'll return hardcoded values that technically pass
- They'll exploit vague assertions
- They'll implement the minimum possible to escape your test
- They'll implement behavior you didn't explicitly forbid
This is the point. Each loophole reveals what your test failed to specify. You're forced to:
- Write more specific assertions
- Test edge cases you missed
- Verify actual behavior, not just happy paths
- Think deeply about what "correct" means
Result: Tests become ironclad specifications that prevent bad design automatically.
Evil Implementation Patterns & How to Counter Them
1. The Hardcoded Return (Evil Returns Your Exact Expected Value)
Your Test (Round 1 - Too Vague):
[Fact]
public void GetMoodHistory_WithValidPeriod_ReturnsData()
{
var service = new MoodHistoryService(new InMemoryStorage());
service.AddMood("happy", DateTime.Now);
service.AddMood("sad", DateTime.Now.AddDays(-1));
var result = service.GetMoodHistory(DateRange.LastSevenDays);
Assert.NotNull(result); // Way too loose!
Assert.NotEmpty(result);
}
Evil's Implementation (Technically Passes):
public class MoodHistoryService
{
public IEnumerable<MoodEntry> GetMoodHistory(DateRange period)
{
return new[] { new MoodEntry("fake", DateTime.Now) }; // Ignores period!
}
}
What Your Test Failed To Specify:
- You didn't verify the data is actually filtered by the date range
- You didn't test that it returns the SPECIFIC moods you added
- You didn't test boundary conditions (edge of the time period)
- You didn't forbid fake data
Your Test (Round 2 - Precise Specifications):
[Fact]
public void GetMoodHistory_ReturnsOnlyMoodsInSpecifiedPeriod()
{
var storage = new InMemoryStorage();
var service = new MoodHistoryService(storage);
var sevenDaysAgo = DateTime.Now.AddDays(-7);
service.AddMood("happy", DateTime.Now);
service.AddMood("sad", sevenDaysAgo.AddHours(1)); // Just inside range
service.AddMood("angry", sevenDaysAgo.AddHours(-1)); // Just outside range
var result = service.GetMoodHistory(DateRange.LastSevenDays).ToList();
Assert.Equal(2, result.Count); // Only happy and sad, not angry
Assert.Contains(result, m => m.Mood == "happy" && m.DateTime == DateTime.Now);
Assert.Contains(result, m => m.Mood == "sad");
Assert.DoesNotContain(result, m => m.Mood == "angry");
}
[Fact]
public void GetMoodHistory_WithEmptyPeriod_ReturnsEmptyList()
{
var service = new MoodHistoryService(new InMemoryStorage());
var result = service.GetMoodHistory(DateRange.LastSevenDays);
Assert.Empty(result);
}
Now Evil Can't Fake It:
public class MoodHistoryService
{
public IEnumerable<MoodEntry> GetMoodHistory(DateRange period)
{
// Now must actually filter by date range
var startDate = period.CalculateStartDate();
return _storage.GetMoods()
.Where(m => m.DateTime >= startDate && m.DateTime <= DateTime.Now)
.ToList();
}
}
2. The Ignore-the-Parameter (Evil Ignores What You Pass In)
Your Test (Round 1 - Parameters Don't Matter):
[Fact]
public void RecordMood_WithAnyMood_UpdatesLastMoodDate()
{
var service = new MoodService();
service.RecordMood("happy");
Assert.NotNull(service.LastMoodDate); // Doesn't verify the mood itself
}
Evil's Implementation:
public class MoodService
{
public void RecordMood(string mood)
{
LastMoodDate = DateTime.Now;
// Ignores the mood parameter completely!
}
}
Your Test (Round 2 - Parameters Are Specifications):
[Fact]
public void RecordMood_StoresTheSpecificMoodProvided()
{
var service = new MoodService();
service.RecordMood("happy");
Assert.Equal("happy", service.GetLastMood()); // Verify stored mood
}
[Fact]
public void RecordMood_WithDifferentMoods_StoresDifferentValues()
{
var service = new MoodService();
service.RecordMood("happy");
Assert.Equal("happy", service.GetLastMood());
service.RecordMood("anxious");
Assert.Equal("anxious", service.GetLastMood());
}
[Fact]
public void RecordMood_WithInvalidMood_ThrowsArgumentException()
{
var service = new MoodService();
Assert.Throws<ArgumentException>(() => service.RecordMood(""));
Assert.Throws<ArgumentException>(() => service.RecordMood(null));
}
3. The Ignore-the-State (Evil Ignores Current State)
Your Test (Round 1 - Doesn't Test State Transitions):
[Fact]
public void IsWorkingHard_WithHighMood_ReturnsTrue()
{
var analyzer = new WorkMoodAnalyzer();
analyzer.SetCurrentMood("energized");
var result = analyzer.IsWorkingHard();
Assert.True(result); // Doesn't verify STATE changed
}
Evil's Implementation:
public class WorkMoodAnalyzer
{
public bool IsWorkingHard()
{
return true; // Always returns true, ignores state!
}
}
Your Test (Round 2 - State Matters):
[Fact]
public void IsWorkingHard_WithEnergizedMood_ReturnsTrue()
{
var analyzer = new WorkMoodAnalyzer();
analyzer.SetCurrentMood("energized");
Assert.True(analyzer.IsWorkingHard());
}
[Fact]
public void IsWorkingHard_WithTiredMood_ReturnsFalse()
{
var analyzer = new WorkMoodAnalyzer();
analyzer.SetCurrentMood("tired");
Assert.False(analyzer.IsWorkingHard());
}
[Fact]
public void IsWorkingHard_WithDifferentStates_ReturnsAppropriateValue()
{
var analyzer = new WorkMoodAnalyzer();
analyzer.SetCurrentMood("stressed");
Assert.False(analyzer.IsWorkingHard()); // Stressed ≠ working hard
analyzer.SetCurrentMood("focused");
Assert.True(analyzer.IsWorkingHard()); // Focused = working hard
}
4. The Ignore-Dependencies (Evil Uses Wrong Dependencies)
Your Test (Round 1 - Dependencies Unspecified):
[Fact]
public void SaveMood_PersistsToStorage()
{
var mockStorage = new Mock<IDataStorage>();
var service = new MoodDataService(mockStorage.Object);
service.SaveMood("happy");
// Doesn't verify if the right dependency was actually called!
}
Evil's Implementation:
public class MoodDataService
{
private IDataStorage _storage;
public void SaveMood(string mood)
{
// Ignores the injected storage, uses hardcoded location
File.WriteAllText("C:\\hardcoded\\mood.txt", mood);
}
}
Your Test (Round 2 - Verify Correct Dependencies Are Used):
[Fact]
public void SaveMood_CallsStorageWithMood()
{
var mockStorage = new Mock<IDataStorage>();
var service = new MoodDataService(mockStorage.Object);
service.SaveMood("happy");
mockStorage.Verify(x => x.Save("happy"), Times.Once); // Verify it was called
}
[Fact]
public void SaveMood_WithEmptyMood_DoesNotCallStorage()
{
var mockStorage = new Mock<IDataStorage>();
var service = new MoodDataService(mockStorage.Object);
Assert.Throws<ArgumentException>(() => service.SaveMood(""));
mockStorage.Verify(x => x.Save(It.IsAny<string>()), Times.Never); // Never called!
}
[Fact]
public void SaveMood_WithMultipleMoods_CallsStorageMultipleTimes()
{
var mockStorage = new Mock<IDataStorage>();
var service = new MoodDataService(mockStorage.Object);
service.SaveMood("happy");
service.SaveMood("sad");
mockStorage.Verify(x => x.Save(It.IsAny<string>()), Times.Exactly(2));
}
5. The Lazy Constant (Evil Returns Convenient Constants)
Your Test (Round 1 - Only Tests One Case):
[Fact]
public void CalculateStressLevel_WithHighWorkload_Returns80()
{
var analyzer = new WorkMoodAnalyzer();
var result = analyzer.CalculateStressLevel(highWorkload: true);
Assert.Equal(80, result);
}
Evil's Implementation:
public class WorkMoodAnalyzer
{
public int CalculateStressLevel(bool highWorkload)
{
return 80; // Always returns 80!
}
}
Your Test (Round 2 - Test Boundary Values and Variations):
[Fact]
public void CalculateStressLevel_WithHighWorkload_ReturnsHighValue()
{
var analyzer = new WorkMoodAnalyzer();
var result = analyzer.CalculateStressLevel(highWorkload: true);
Assert.InRange(result, 70, 100); // High but covers range
}
[Fact]
public void CalculateStressLevel_WithLowWorkload_ReturnsLowValue()
{
var analyzer = new WorkMoodAnalyzer();
var result = analyzer.CalculateStressLevel(highWorkload: false);
Assert.InRange(result, 0, 30); // Must be different!
}
[Fact]
public void CalculateStressLevel_WithOppositeInputs_ReturnsDifferentValues()
{
var analyzer = new WorkMoodAnalyzer();
var highStress = analyzer.CalculateStressLevel(highWorkload: true);
var lowStress = analyzer.CalculateStressLevel(highWorkload: false);
Assert.NotEqual(highStress, lowStress); // Can't return same value
}
6. The Off-by-One (Evil Gets The Boundary Wrong)
Your Test (Round 1 - Loose Boundaries):
[Fact]
public void GetConsecutiveMoods_WithSevenDays_ReturnsData()
{
var service = new MoodHistoryService();
service.AddMood("happy", DateTime.Now);
service.AddMood("sad", DateTime.Now.AddDays(-6)); // 6 days ago
var result = service.GetConsecutiveMoods(dayCount: 7);
Assert.Equal(2, result.Count()); // Off-by-one not caught!
}
Evil's Implementation:
public IEnumerable<MoodEntry> GetConsecutiveMoods(int dayCount)
{
var cutoff = DateTime.Now.AddDays(-(dayCount - 1)); // Wrong boundary!
return _moods.Where(m => m.DateTime >= cutoff);
}
Your Test (Round 2 - Explicit Boundaries):
[Fact]
public void GetConsecutiveMoods_WithSevenDays_IncludesExactlySevenDaysBack()
{
var service = new MoodHistoryService();
var now = DateTime.Now;
var sevenDaysAgo = now.AddDays(-7);
service.AddMood("happy", now);
service.AddMood("sad", sevenDaysAgo.AddSeconds(1)); // Just inside boundary
service.AddMood("angry", sevenDaysAgo.AddSeconds(-1)); // Just outside
var result = service.GetConsecutiveMoods(dayCount: 7).ToList();
Assert.Equal(2, result.Count); // Only happy and sad
Assert.Contains(result, m => m.Mood == "sad");
Assert.DoesNotContain(result, m => m.Mood == "angry");
}
[Fact]
public void GetConsecutiveMoods_WithOnDay_ReturnsOnlyToday()
{
var service = new MoodHistoryService();
var now = DateTime.Now;
service.AddMood("happy", now.AddSeconds(1)); // Today
service.AddMood("sad", now.AddDays(-1)); // Yesterday
var result = service.GetConsecutiveMoods(dayCount: 1).ToList();
Assert.Single(result);
Assert.Equal("happy", result.First().Mood);
}
How to Use Evil TDD in Your Team
Setup
- Pair with a partner: One writes tests, one writes code
- Take turns: Rotate who's the "test writer" and "code writer"
- Strictly follow roles: Don't peek at each other's work until round is done
The Cycle
- RED: Test writer writes a test that fails
- EVIL GREEN: Code writer writes the worst code that passes the test
- REVIEW: Test writer reviews the evil code and recognizes the loopholes
- IMPROVE TEST: Test writer strengthens the test to prevent the evil hack
- REAL GREEN: Code writer now writes proper code that passes ironclad tests
- REFACTOR: Both optimize implementation and test code
- REPEAT: Next feature, or swap roles
Example Session (45 minutes)
Person A (Test Writer) - Round 1:
[Fact]
public void GetMoodTrendForWeek_WithFiveEntries_CalculatesTrend()
{
var service = new MoodAnalysisService();
var moods = new[] { 3, 4, 5, 6, 7 }; // Ascending
var trend = service.GetMoodTrendForWeek(moods);
Assert.True(trend > 0); // Positive trend
}
Person B (Evil Code Writer) - First Attempt:
public double GetMoodTrendForWeek(int[] moods)
{
return 1.0; // Always returns positive! Passes the test!
}
Person A's Response: "Got me! Your implementation doesn't actually calculate the trend."
Person A (Improved Test):
[Fact]
public void GetMoodTrendForWeek_WithDescendingMoods_CalculatesNegativeTrend()
{
var service = new MoodAnalysisService();
var moods = new[] { 7, 6, 5, 4, 3 };
var trend = service.GetMoodTrendForWeek(moods);
Assert.True(trend < 0); // Must be negative here!
}
[Fact]
public void GetMoodTrendForWeek_WithAscendingVsDescending_ReturnsDifferentTrends()
{
var service = new MoodAnalysisService();
var ascending = service.GetMoodTrendForWeek(new[] { 3, 4, 5 });
var descending = service.GetMoodTrendForWeek(new[] { 5, 4, 3 });
Assert.NotEqual(ascending, descending);
Assert.True(ascending > descending);
}
[Fact]
public void GetMoodTrendForWeek_WithFlatMoods_CalculatesZeroTrend()
{
var service = new MoodAnalysisService();
var moods = new[] { 5, 5, 5, 5 };
var trend = service.GetMoodTrendForWeek(moods);
Assert.Equal(0, trend);
}
Person B (Real Code) - Now Forced to Implement Correctly:
public double GetMoodTrendForWeek(int[] moods)
{
if (moods.Length < 2) return 0;
var firstHalf = moods.Take(moods.Length / 2).Average();
var secondHalf = moods.Skip(moods.Length / 2).Average();
return secondHalf - firstHalf; // Or any real trend calculation
}
Why Evil TDD Forces Better Design
| Traditional TDD | Evil TDD |
|---|---|
| Write test, write code, refactor | Write test, evil code reveals loopholes, strengthen test, real code, refactor |
| Tests might miss edge cases | Tests debugged against malicious implementation |
| Parameters might be ignored silently | Tests forced to verify parameters matter |
| State transitions might not be tested | Evil implementation reveals missing state tests |
| Boundary conditions often missed | Off-by-one forced into open by evil boundary tests |
| Can pass with sloppy assertions | Loose assertions get destroyed by evil code |
The Learning Opportunity
When evil code breaks your test:
- Don't blame the code writer — they revealed a gap in your test
- Appreciate the insight — see what behavior you failed to specify
- Write better tests — tests that can only pass with correct implementation
- Understand TDD — tests are specifications, not just documentation
Each evil implementation teaches you to think like:
- A specification writer (what does this really need to do?)
- A quality assurance engineer (how do I verify it works?)
- A designer (what are all the ways this could go wrong?)
Anti-Patterns to Avoid in Evil TDD
| Anti-Pattern | Why It Fails |
|---|---|
| The Code Writer Goes Too Evil | Writing code that's technicallyIncorrect (throws exceptions, corrupts data) instead of just insufficiently specific |
| The Test Writer Gives Up | Writing tests so complex they're unmaintainable instead of precise |
| Silent Swaps | Code writer secretly writes good code instead of evil code |
| Test Writer Tests Implementation | Writing tests that verify internal state instead of behavior |
| No Refactoring | Keeping test and code messy after green |
When to Use Evil TDD
Great for:
- Complex behavioral logic
- Specifications that are easy to misunderstand
- Learning proper test specification
- Code reviews with junior developers
- Building shared understanding of "what correct looks like"
Less suitable for:
- Simple CRUD operations
- UI integration tests (too many moving parts)
- Performance-critical code (doesn't help with optimization)
- Existing code with weak tests (start with real TDD first)
See Also
- TDD Skill - Standard Test-Driven Development practice
- Test Organization & Hierarchy Skill - Organize tests rationally as they grow
- Code Smells Detection Skill - Recognize when designs are problematic