Automating TDD: Using AI to Generate Edge-Case Unit Tests

This article demonstrates a "Threat-Model-First" workflow where we use AI not just to write code, but to aggressively attack our logic before we implement it.

Nikita Kothari

Jan. 30, 26 · Tutorial

Likes (0)

Comment

Save

2.1K Views

The Problem: The "Happy Path" Trap in TDD

Test-driven development (Red-Green-Refactor) is the gold standard for reliable software. However, it has a flaw: The quality of your code is capped by the imagination of your test cases.

If you are building a payment processing function, you will naturally write a test for "valid payment." You might even remember "insufficient funds." But will you remember to test for:

Floating point precision errors?
Negative amounts causing credit reversals?
SQL injection payloads in the reference field?
Integer overflow boundaries?

Humans are naturally optimistic; we design for success. AI, however, can be prompted to be pessimistic. By inverting the typical AI workflow — asking it to break our logic before we build it — we can automate the generation of edge-case unit tests that harden our systems from day one.

The Solution: "Threat-Model-First" TDD

Instead of the standard TDD loop, we modify the workflow to include an AI "adversary" step.

Define interface: Sketch the function signature.
AI threat model: Prompt the AI to identify vulnerabilities and edge cases.
Generate tests (red): Create failing unit tests based on the threat model.
Implement (green): Write code to satisfy the strict tests.
Refactor: Clean up.

Practical Tutorial: Building a Bulletproof Payment Validator

Let's build a function process_transaction(amount, user_balance, currency).

Step 1: The "Adversarial" Prompt

Do not ask the AI to "write tests for this function." That yields generic tests. Instead, use a persona-based prompt that encourages critical analysis.

The prompt:

"Act as a Senior QA Security Engineer. I am designing a Python function process_transaction(amount: float, user_balance: float, currency: str).

Analyze this interface for potential security risks, logical errors, and boundary conditions. Do not write the implementation. Instead, list 5-7 specific 'killer' edge cases that would break a naive implementation. Focus on financial accuracy and security."

The AI Output (Summarized)

Floating point drift: Passing 0.1 + 0.2 might not equal 0.3 due to IEEE 754.
Negative transactions: Sending -100 could credit the user instead of debiting.
Infinite/NaN inputs: Passing math.inf or NaN.
Currency normalization: "USD" vs. "usd" vs. " USD."
Precision attacks: Submitting 10.0000001 when the system only supports 2 decimal places.

Step 2: Generating the "Red" Test Suite

Now we ask the AI to convert these specific threats into a Pytest suite.

    Python
   
 

   import pytest
import math
from decimal import Decimal

# We haven't written the function yet, but we import it
from payment_system import process_transaction

def test_reject_negative_amount():
    """Security: Prevent attackers from crediting themselves by sending negative amounts."""
    with pytest.raises(ValueError, match="Amount must be positive"):
        process_transaction(-50.00, 100.00, "USD")

def test_reject_floating_point_imprecision():
    """Accuracy: Ensure financial math handles floating point drift correctly."""
    # A naive float check might fail this if it doesn't use Decimal or epsilon checks
    user_balance = 0.30
    # 0.1 + 0.2 is actually 0.30000000000000004 in Python floats
    cost = 0.1 + 0.2 
    
    # This should pass without raising 'Insufficient Funds'
    assert process_transaction(cost, user_balance, "USD") == True

def test_reject_excessive_precision():
    """Business Logic: Reject amounts with more than 2 decimal places."""
    with pytest.raises(ValueError, match="Invalid precision"):
        process_transaction(10.001, 100.00, "USD")

def test_reject_infinite_values():
    """Stability: Input validation must block non-finite numbers."""
    with pytest.raises(ValueError, match="Invalid input"):
  

If you run this now, 100% of tests will fail. This is the perfect "Red" state.

Step 3: The "Green" Implementation

Now, we write the implementation. Because our tests are so strict, we are forced to write high-quality code immediately. We can't use simple floats; we have to handle the precision edge cases identified by the AI.

    Python
   
 

   from decimal import Decimal, ROUND_HALF_UP
import math

def process_transaction(amount, user_balance, currency):
    # 1. Input Sanitization (The "Infinite" Case)
    if not isinstance(amount, (int, float)) or math.isnan(amount) or math.isinf(amount):
        raise ValueError("Invalid input type")

    # 2. Precision Handling (The "Floating Point" Case)
    # Convert to Decimal for financial math
    amount_dec = Decimal(str(amount))
    balance_dec = Decimal(str(user_balance))
    
    # 3. Business Logic Check (The "Excessive Precision" Case)
    if amount_dec.as_tuple().exponent < -2:
        raise ValueError("Invalid precision: max 2 decimal places")

    # 4. Security Check (The "Negative Amount" Case)
    if amount_dec <= 0:
        raise ValueError("Amount must be positive")

    # 5. Logic Execution
    if balance_dec >= amount_dec:
        return True # Transaction Approved
    else:
        return False # Insufficient Funds
  

Why This Matters

If we had used a standard "Write me some tests" prompt, the AI likely would have given us test_valid_transaction and test_insufficient_funds.

By using the Adversarial/Threat-Model approach, we forced the discovery of:

Data type validation: Handling NaN and Inf.
Decimal precision: Forcing the use of Decimal over float to pass the precision tests.
Input sanitization: Catching negative inputs.

Best Practices for AI-Driven TDD

Don't paste code first: If you paste your existing code and ask for tests, the AI will often write tests that confirm your existing bugs. Ask for tests based on requirements first.
Use "fuzzing" prompts: Ask the AI: "Generate a Python dictionary of inputs that are likely to cause this function to crash."
Iterate on failure: If the AI generates a test that fails, don't immediately assume the test is wrong. Often, the test is right, and your implementation is fragile.

Conclusion

AI doesn't just speed up coding; it can act as a critical partner in the design phase. By shifting AI usage to the left — before a single line of implementation code is written — you can generate edge-case tests that force you to write cleaner, safer, and more robust systems.

Next step: Try this on your next ticket. Before writing code, feed your requirements to ChatGPT or Claude with the prompt: "What are the security edge cases for this feature?" and see what tests it suggests.

AI Edge case Testing

Opinions expressed by DZone contributors are their own.

Related

Trending