Software Specs 2.0: Evolving Requirements for the AI Era (2025 Edition)

Learn about key qualities for writing software requirements—documented, correct, testable, and more—tailored for both human developers and AI code generation.

Stelios Manioudakis

CORE ·

Jun. 09, 25 · Analysis

Likes (5)

Comment

Save

11.0K Views

Any form of data that we can use to make decisions for writing code, be it requirements, specifications, user stories, and the like, must have certain qualities. In agile development, for example, we have the INVEST qualities. More specifically, a user story must be Independent of all others and Negotiable, i.e., not a specific contract for features. It must be Valuable (or vertical) and Estimable (to a good approximation). It must also be Small to fit within an iteration and Testable (in principle, even if there isn’t a test for it yet).

This article goes beyond agile, waterfall, rapid application development, and the like. I will summarise a set of general and foundational qualities as a blueprint for software development.

To effectively leverage AI for code generation, while fundamental principles of software requirements remain, their emphasis and application must adapt. This ensures the AI, which lacks human intuition and context, can produce code that is not only functional but also robust, maintainable, and aligned with project constraints.

For each fundamental quality, I first explain its purpose. Its usefulness and applicability when code is generated by AI are also discussed.

The level of detail that I want to cover this topic necessitates two articles. This article summarizes the "what" we should do. A follow-up article gives an elaborate example about "how" we can do that.

Documented

Software requirements must be documented and should not just exist in our minds. Documentation may be as lightweight as possible as long as it’s easy to maintain. After all, documentation's purpose is to be a single source of truth.

When we say requirements must be "Documented" for human developers, we mean they need to be written down somewhere accessible (wiki, requirements doc, user stories, etc.). If they only exist in someone's head or if they are scattered across chat messages, they probably won't be very effective. This ensures alignment, provides a reference point, and helps with onboarding. While lightweight documentation is often preferred (like user stories), there's usually an implicit understanding that humans can fill in gaps through conversation, experience, and shared context.

For AI code generation, the "Documented" quality takes on a more demanding role:

The documentation is the primary input: AI-code assistants don't attend planning meetings. They may not ask clarifying questions in real-time (though some tools allow interactive refinement). Currently, they lack the years of contextual experience a human developer has. The written requirement document could be the most direct and often sole instruction set the AI receives for a specific task. Its quality can directly dictate the quality of the generated code.
Need for machine interpretability: While we can understand natural language fairly well, even with some ambiguity, AI models perform best with clear, structured, and unambiguous input. This means that the format and precision of the documentation could be a game-changer. Vague language can lead to unpredictable or incorrect assumptions by the AI.
Structured formats aid consistency: We could use Gherkin for BDD, specific prompt templates, or even structured data formats like JSON/YAML for configuration-like requirements. Using predefined structures or templates for requirements can be very useful. This way, the necessary details (like error handling, edge cases, and non-functional requirements) are consistently considered and provided to the AI. This can lead to more predictable and reliable code generation.
Single source of truth is paramount: Because the document is the spec fed to the AI, ensuring it's the definitive, up-to-date version is critical. Changes must be reflected in the documentation before regeneration is attempted.

Correct

We must understand correctly what is required from the system and what is not required. This may seem simple, but how many times have we implemented requirements that were wrong? The Garbage In, Garbage Out (GIGO) rule applies here.

For AI-code generation, the importance of correctness can be evaluated if we consider that:

AI executes literally: AI code generators are powerful tools for translating instructions (requirements) into code. However, they typically lack deep domain understanding. Currently, they lack the "common sense" to question if the instructions themselves are logical or align with broader business goals. If you feed an AI a requirement that is clearly stated but functionally wrong, the AI will likely generate code that perfectly implements that wrong functionality.
Reduced opportunity for implicit correction: We might read a requirement and, based on our experience or understanding of the project context, spot a logical flaw or something that contradicts a known business rule. We might say, "Are you sure this is right? Usually, we do X in this situation." This provides a valuable feedback loop to catch incorrect requirements early. An AI is much less likely to provide this kind of proactive, context-aware sanity check. It usually assumes the requirements it receives are the intended truth.
Validation is key: The burden of ensuring correctness falls heavily on the requirement definition and validation process before the AI gets involved. The people defining and reviewing the requirements must be rigorous in confirming that what they are asking for is truly what is needed.

Complete

This is about having no missing attributes or features. While incomplete requirements are an issue, again, we may infer missing details, ask clarifying questions, or rely on implicit knowledge. This is not always the case, however, even for us humans! Requirements may remain incomplete even after hours of meetings and discussions. In the case of AI-generated code, I've seen AI-assistants going both ways. There are cases where AI assistants generate what is explicitly stated. The resulting gaps led to incomplete features or the AI making potentially incorrect assumptions. There are also cases where the AI-assistant spotted the missing attributes and made suggestions.

In any case, for completeness, I think it's still worth being as explicit as we can be. Requirements must detail not just the "happy path" but also:

Edge cases: Explicitly list known edge cases.
Error handling: Specify how errors should be handled (e.g., specific exceptions, return codes, logging messages).
Non-functional requirements (NFRs): Performance targets, security constraints (input validation, output encoding, authentication/authorization points), scalability considerations, and data handling/privacy rules must be stated clearly.
Assumptions: Explicitly list any assumptions being made.

Unambiguous

When we read the requirements, we can all understand the same thing. Ambiguous requirements may lead to misunderstandings, long discussions, and meetings for clarification. They may also lead to rework and bugs. In the worst case, requirements may be interpreted differently and we may develop something different than what was expected. In the case of AI assistants, it also looks particularly dangerous.

Patterns and rules: AI models process the input text according to the patterns and rules they've learned. They don't inherently "understand" the underlying business goal or possess common sense in the human way. If a requirement can be interpreted in multiple ways, the AI might arbitrarily choose one interpretation based on its training data. This may not be the one intended by the stakeholders.
Unpredictable results: Ambiguity leads directly to unpredictability in the generated code. You might run the same ambiguous requirement through the AI (or slightly different versions of it) and get vastly different code implementations. Each time you run the code, the AI-assistant may handle the ambiguity in a different way.

Consistent

Consistency in requirements means using the same terminology for the same concepts. It means that statements don't contradict each other and maintain a logical flow across related requirements. For human teams, minor inconsistencies can often be resolved through discussion or inferred context. In the worst case, inconsistency can also lead to bugs and rework.

However, for AI code generators, consistency is vital for several reasons:

Pattern recognition: The AI assistants will try to extract patterns for your requirements. Because LLMs lack an internal semantic model of the system, they won’t infer that ‘Client’ and ‘User’ refer to the same entity unless that connection is made explicit. This can lead to generating separate, potentially redundant code structures, data fields, or logic paths, or fail to correctly link related functionalities.
Inability to resolve contradictions: AI models struggle with logical contradictions. If one requirement states "Data must be deleted after 30 days," and another related requirement states "User data must be archived indefinitely," the AI may not ask for clarification or determine the correct business rule. It might implement only one rule (often the last one it processed), try to implement both (leading to errors), or fail unpredictably.
Impact on code quality: Consistency in requirements often translates to consistency in the generated code. If requirements consistently use specific naming conventions for features or data elements, the AI is more likely to follow those conventions in the generated code (variables, functions, classes). Inconsistent requirements can lead to inconsistently structured and named code. This makes it harder to understand and maintain.
Logical flow: Describing processes or workflows requires a consistent logical sequence. Jumbled or contradictory steps in the requirements can confuse the AI about the intended order of operations.

Testable

We must have an idea about how to test that the requirements are fulfilled. A requirement is testable if there are practical and objective ways to determine whether the implemented solution meets it. Testability is paramount for both human-generated code and AI-generated code. Our confidence must primarily come from verifying code behavior. Rigorous testing against clear, testable requirements is the primary mechanism to ensure that the code is reliable and fit for purpose. Testable requirements provide the blueprint for verification.

Testability calls for smallness, observability, and controllability. A small requirement here implies that it results in a small unit of code under test. This is where decomposability, simplicity, and modularity become important. Smaller, well-defined, and simpler units of code with a single responsibility are inherently easier to understand, test comprehensively, and reason about than large, monolithic, and complex components. If an AI generates a massive, tangled function, even if it "works" for the happy path, verifying all its internal logic and edge cases is extremely difficult. You can't be sure what unintended behaviours might lurk within. For smallness, decompose large requirements into smaller, more manageable sub-requirements. Each sub-requirement should ideally describe a single, coherent piece of functionality with its own testable outcomes.

Observability is the ease with which you can determine the internal state of a component and its outputs, based on its inputs. This holds true before, during, and after a test execution. Essentially, can you "see" what the software is doing and what its results are? To test, we need to be able to observe behaviour or state. If the effects of an action are purely internal and not visible, testing is difficult. For observability, we need clear and comprehensive logging, exposing relevant state via getters or status endpoints. We need to return detailed and structured error messages, implement event publishing, or use debuggers effectively. This way we can verify intermediate steps, understand the flow of execution, and diagnose why a test might be failing.

Describe external behavior: Focus on what the system does that can be seen, not how it does it internally (unless the internal "how" impacts an NFR like performance that needs constraint).
Specify outputs: Detail the format, content, and destination of any outputs (UI display, API responses, file generation, database entries, logged messages).
- Example: Upon successful registration, the system MUST return an HTTP 201 response with a JSON body containing user_id and email.
Define state changes: If a state change is an important outcome, specify how that state can be observed.
- Example: After order submission, the order status MUST be 'PENDING_PAYMENT' and this status MUST be retrievable via the /orders/{orderId}/status endpoint.
Require logging for key events: Log key state changes and decision points at INFO level. The system MUST log an audit event with event_type='USER_LOGIN_SUCCESS' and user_id upon successful login.

Controllability is the ease with which we can "steer" a component into specific states or conditions. How easily can we provide a component with the necessary inputs (including states of dependencies) to execute a test and isolate it from external factors that are not part of the test? We can achieve this through techniques like dependency injection (DI), designing clear APIs and interfaces, using mock objects or stubs for dependencies, and providing configuration options. This allows us to easily set up specific scenarios, test individual code paths in isolation, and create deterministic tests.

Problems Caused by Poor Controllability

Hardcoded Dependencies

They can force you to test your unit along with its real dependencies. This turns unit tests into slow, potentially unreliable integration tests. You can't easily simulate error conditions from the dependency.

Reliance on Global State

If a component reads or writes to global variables or singletons, it's hard to isolate tests. One test might alter the global state, causing subsequent tests to fail or behave unpredictably. Resetting the global state between tests can be complex.

Lack of Clear Input Mechanisms

If a component's behaviour is triggered by intricate internal state changes or relies on data from opaque sources rather than clear input parameters, it's difficult to force it into the specific state needed for a particular test.

Consequences

Slow tests: Tests that need to set up databases, call real APIs, or wait for real timeouts run slowly, discouraging frequent execution.
Flaky tests: Tests relying on external systems or shared state can fail intermittently due to factors outside the code under test (e.g., network issues, API rate limits).
Difficult to write and maintain: Complex setups and non-deterministic behaviour lead to tests that are hard to write, understand, and debug when they fail. The "Arrange" phase of a test becomes a huge effort.

Traceable

Traceability in software requirements means being able to follow the life of a requirement both forwards and backwards. You should be able to link a specific requirement to the design elements, code modules, and test cases that implement and verify it. Conversely, looking at a piece of code or a test case, you should be able to trace it back to the requirement(s) it fulfills. Traceability tells us why that code exists and what business rule or functionality it's supposed to implement. Without this link, code can quickly become opaque "magic" that developers are hesitant to touch.

Debugging and root cause analysis: When AI-generated code exhibits a bug or unexpected behavior, tracing it back to the source requirement is often the first step. Was the requirement flawed? Did the AI misinterpret a correct requirement? Traceability guides the debugging process.
Maintenance and change impact analysis: Requirements inevitably change. If REQ-123 is updated, traceability allows you to quickly identify the specific code sections (potentially AI-generated). Tests associated with REQ-123 will need review, modification, or regeneration. Without traceability, finding all affected code sections becomes a time-consuming and error-prone manual search.
Verification and coverage: Traceability helps verify that our requirements have code and tests. You can check if any requirements have been missed or if any generated code doesn't trace back to a valid requirement.

Viable

A requirement is "Viable" if it can realistically be implemented within the project's given constraints. These constraints typically include available time, budget, personnel skills, existing technology stack, architectural patterns, security policies, industry regulations, performance targets, and the deployment environment.

Need for explicit constraints: To ensure that AI assistants generate viable code, the requirements must explicitly state the relevant constraints. These act as guardrails, guiding the AI towards solutions that are not just technically possible but also practical and appropriate for your specific project context. Perhaps your company standardized on using the FastAPI framework for Python microservices. Maybe that direct database access from certain services is forbidden by the security policy. Maybe your deployment target is a low-memory container environment, or maybe a specific external (paid) API suggested by the AI exceeds the project budget.

Wrapping Up

When writing requirements for AI-generated code, the fundamental principles remain, but the emphasis shifts towards:

Extreme explicitness: Cover edge cases, errors, and NFRs meticulously.
Unambiguity and precision: Use clear, machine-interpretable language.
Constraint definition: Guide the AI by specifying architecture, tech stack, patterns, and NFRs.
Testability: Define clear, measurable acceptance criteria. Smallness, observability, and controllability are important.
Structured input: Format requirements for optimal AI consumption.

In essence, the requirements for AI code generation mean being more deliberate, detailed, and directive. It's about providing the AI with a high-fidelity blueprint that minimizes guesswork. A blueprint that maximizes the probability of generating correct, secure, efficient, and maintainable code. Code that aligns with project goals and technical standards. This involves amplifying the importance of qualities like completeness, unambiguity, and testability. It also involves evolving the interpretation of understandability to suit an AI "developer."

Currently, it seems that carefully crafting software requirements can also reduce hallucinations in AI-generated code. However, it's not expected to eliminate hallucinations entirely just through the requirements alone. The quality and structure of the input prompt (including the requirements) significantly influence how prone the AI is to hallucinate details. Hallucinations also stem from model limitations, training data artifacts, and prompt-context boundaries. Such factors are beyond the scope of this article.

AI Software requirements Testing

Opinions expressed by DZone contributors are their own.

Related

Trending