Automating TDD: Using AI to Generate Edge-Case Unit Tests
This article demonstrates a "Threat-Model-First" workflow where we use AI not just to write code, but to aggressively attack our logic before we implement it.
Join the DZone community and get the full member experience.
Join For FreeThe Problem: The "Happy Path" Trap in TDD
Test-driven development (Red-Green-Refactor) is the gold standard for reliable software. However, it has a flaw: The quality of your code is capped by the imagination of your test cases.
If you are building a payment processing function, you will naturally write a test for "valid payment." You might even remember "insufficient funds." But will you remember to test for:
- Floating point precision errors?
- Negative amounts causing credit reversals?
- SQL injection payloads in the reference field?
- Integer overflow boundaries?
Humans are naturally optimistic; we design for success. AI, however, can be prompted to be pessimistic. By inverting the typical AI workflow — asking it to break our logic before we build it — we can automate the generation of edge-case unit tests that harden our systems from day one.
The Solution: "Threat-Model-First" TDD
Instead of the standard TDD loop, we modify the workflow to include an AI "adversary" step.
- Define interface: Sketch the function signature.
- AI threat model: Prompt the AI to identify vulnerabilities and edge cases.
- Generate tests (red): Create failing unit tests based on the threat model.
- Implement (green): Write code to satisfy the strict tests.
- Refactor: Clean up.
Practical Tutorial: Building a Bulletproof Payment Validator
Let's build a function process_transaction(amount, user_balance, currency).
Step 1: The "Adversarial" Prompt
Do not ask the AI to "write tests for this function." That yields generic tests. Instead, use a persona-based prompt that encourages critical analysis.
The prompt:
"Act as a Senior QA Security Engineer. I am designing a Python function
process_transaction(amount: float, user_balance: float, currency: str).Analyze this interface for potential security risks, logical errors, and boundary conditions. Do not write the implementation. Instead, list 5-7 specific 'killer' edge cases that would break a naive implementation. Focus on financial accuracy and security."
The AI Output (Summarized)
- Floating point drift: Passing
0.1 + 0.2might not equal0.3due to IEEE 754. - Negative transactions: Sending
-100could credit the user instead of debiting. - Infinite/NaN inputs: Passing
math.inforNaN. - Currency normalization: "USD" vs. "usd" vs. " USD."
- Precision attacks: Submitting
10.0000001when the system only supports 2 decimal places.
Step 2: Generating the "Red" Test Suite
Now we ask the AI to convert these specific threats into a Pytest suite.
import pytest
import math
from decimal import Decimal
# We haven't written the function yet, but we import it
from payment_system import process_transaction
def test_reject_negative_amount():
"""Security: Prevent attackers from crediting themselves by sending negative amounts."""
with pytest.raises(ValueError, match="Amount must be positive"):
process_transaction(-50.00, 100.00, "USD")
def test_reject_floating_point_imprecision():
"""Accuracy: Ensure financial math handles floating point drift correctly."""
# A naive float check might fail this if it doesn't use Decimal or epsilon checks
user_balance = 0.30
# 0.1 + 0.2 is actually 0.30000000000000004 in Python floats
cost = 0.1 + 0.2
# This should pass without raising 'Insufficient Funds'
assert process_transaction(cost, user_balance, "USD") == True
def test_reject_excessive_precision():
"""Business Logic: Reject amounts with more than 2 decimal places."""
with pytest.raises(ValueError, match="Invalid precision"):
process_transaction(10.001, 100.00, "USD")
def test_reject_infinite_values():
"""Stability: Input validation must block non-finite numbers."""
with pytest.raises(ValueError, match="Invalid input"):
If you run this now, 100% of tests will fail. This is the perfect "Red" state.
Step 3: The "Green" Implementation
Now, we write the implementation. Because our tests are so strict, we are forced to write high-quality code immediately. We can't use simple floats; we have to handle the precision edge cases identified by the AI.
from decimal import Decimal, ROUND_HALF_UP
import math
def process_transaction(amount, user_balance, currency):
# 1. Input Sanitization (The "Infinite" Case)
if not isinstance(amount, (int, float)) or math.isnan(amount) or math.isinf(amount):
raise ValueError("Invalid input type")
# 2. Precision Handling (The "Floating Point" Case)
# Convert to Decimal for financial math
amount_dec = Decimal(str(amount))
balance_dec = Decimal(str(user_balance))
# 3. Business Logic Check (The "Excessive Precision" Case)
if amount_dec.as_tuple().exponent < -2:
raise ValueError("Invalid precision: max 2 decimal places")
# 4. Security Check (The "Negative Amount" Case)
if amount_dec <= 0:
raise ValueError("Amount must be positive")
# 5. Logic Execution
if balance_dec >= amount_dec:
return True # Transaction Approved
else:
return False # Insufficient Funds
Why This Matters
If we had used a standard "Write me some tests" prompt, the AI likely would have given us test_valid_transaction and test_insufficient_funds.
By using the Adversarial/Threat-Model approach, we forced the discovery of:
- Data type validation: Handling
NaNandInf. - Decimal precision: Forcing the use of
Decimaloverfloatto pass the precision tests. - Input sanitization: Catching negative inputs.
Best Practices for AI-Driven TDD
- Don't paste code first: If you paste your existing code and ask for tests, the AI will often write tests that confirm your existing bugs. Ask for tests based on requirements first.
- Use "fuzzing" prompts: Ask the AI: "Generate a Python dictionary of inputs that are likely to cause this function to crash."
- Iterate on failure: If the AI generates a test that fails, don't immediately assume the test is wrong. Often, the test is right, and your implementation is fragile.
Conclusion
AI doesn't just speed up coding; it can act as a critical partner in the design phase. By shifting AI usage to the left — before a single line of implementation code is written — you can generate edge-case tests that force you to write cleaner, safer, and more robust systems.
Next step: Try this on your next ticket. Before writing code, feed your requirements to ChatGPT or Claude with the prompt: "What are the security edge cases for this feature?" and see what tests it suggests.
Opinions expressed by DZone contributors are their own.
Comments