When AI Crashes: Classifying Failure Modes in Safety-Critical Software

AI fails silently in safety-critical systems — classify failures and enforce safety with voting, OOD detection, and a Simplex-style deterministic override.

Rahul Karne

Mar. 31, 26 · Analysis

Likes (1)

Comment

Save

10.3K Views

It is dangerous to treat AI systems like any other type of software development. While the code may run properly, the model may still have 99% confidence that a kangaroo is a pedestrian.

AI systems can be broken down into two types of failures; perception failures and planning failures. These failures are difficult to determine because they do not create error messages that a developer would see while developing the application. Instead of returning an error message stating that the model does not understand the input, the model could return a prediction such as "speed limit 45 mph".

Silent failure is an important "property" for safety-critical systems because of how AI differs from traditional software. Silent failure refers to when a system fails, but the failure doesn't produce error messages, crash reports, exception errors, etc. Therefore, the failure looks like correct output from the system. Thus, safety-critical systems must be engineered with the expectation of silent failure and developed to operate safely even though the system produced incorrect results. Therefore, the engineer's perspective changes from developing a perfect model to making sure the system is able to be resilient and operate safely when the model produces incorrect results.

In order to develop safety-critical systems, engineers need to find and classify these silent failures into one of the three categories of failure discussed above:

A good way to practically apply these categories is to relate each category to

Its source (environment, sensor, model, planner)
Its detectability (how you would determine the failure exists) (invariant violations, uncertainty signals, inconsistency checks), and
How the system will respond after detecting the failure (minimum-risk maneuvers, slow-down, hand-over).

Therefore, classification of failures has value only if it enables a predictable strategy for responding safely to the failure.

1. Perception Failures (The "Eyes" Break)

Perception failures occur when the data received from the sensors is accurate and the model interprets the data incorrectly.

Ghost Objects: The model identifies an object that is not present and the vehicle stops in front of the object (rear end collision).

Classification Blindness: The model is unable to identify an object due to being Out of Distribution (OOD).

Example: A self-driving car developed in California is unable to identify a kangaroo in Australia.

Sensor Fusion Conflicts: The camera indicates "road clear" while the LIDAR indicates "wall ahead". The model selects the camera as its primary source of information.

Engineering Mitigation: Voting Architecture

Run three different models and select the first two models that agree with each other. If the models disagree then switch to a "safe state" (slow down).

    Python
   
   def fuse_sensors(camera_obj, lidar_obj, radar_obj):

if camera_obj.class != lidar_obj.class:

# Conflict detected!

if radar_obj.time_to_collision < 2.0:

return EMERGENCY_BRAKE

else:

return HANDOVER_TO_HUMAN

return camera_obj.action

2. Planning Failures (The "Brain" Breaks)

Planning failures occur when the perception is accurate and the AI makes a decision based on the perception that results in a catastrophic event. This occurs when the reward function is too simple or too limited to capture the true intent of the problem.

The "shortcut" problem: When an AI is trained to minimize the amount of time required to reach a destination, the AI may find that breaking a traffic law will reduce the travel time. However, unless the reward function specifically includes penalties for breaking the law, the AI will view traffic laws as suggestions.
Frozen Robot Syndrome: The AI planner becomes overly cautious in a situation where there is uncertainty. The AI planner may determine that the safest course of action is to not move at all.

Engineering Mitigation: Safety Envelope

Create a deterministic "Safety Envelope" (Guardrail or Watchdog) around the AI Planner. The Safety Envelope will override the neural network with a rule-based approach.
Rule: If speed > 0 and obstacle_distance < 5m Then force_brake = True
No matter what the Neural Network recommends (i.e. "accelerate to pass"), the Safety Envelope will physically cut the throttle.

The primary engineering principle is that planners are optimization engines, not moral agents. Planners optimize towards their objective functions. When the objective function is incomplete, planners exploit loopholes ("shortcuts") and/or fail to act ("frozen robot syndrome"). Therefore, the Safety Envelope must contain non-negotiable invariants (e.g., no collision, maintain a minimum following distance, do not exceed maximum acceleration, obey all legal constraints when applicable) that are independently testable and auditable -- separate from the training loop of the ML model.

3. Adversarial and Distributional Failures

These failures occur when the environment acts maliciously or unexpectedly.

Adversarial Patches: A sticker is placed on a stop sign that looks like graffiti to humans but tells the CNN that it is a "Speed Limit 60" sign.
Data Drift: A model trained on 2020 medical images may fail to provide an accurate diagnosis for patients treated in 2025 due to changes in imaging equipment or standard of care.

Engineering Mitigation: Out-of-Distribution (OOD) detection

Determine when the model is unsure of what to do.

Technique: Calculate the Mahalanobis distance between the current input and the training dataset.
If Distance(input, training_data) > Threshold then flag the input as "unknown" instead of guessing.

Architectural Pattern: Two-Channel Safety System

The gold standard for safety-critical AI (i.e. Avionics and Autonomous Driving) is the Simplex Architecture.

Channel A (The AI): High-performance, complex, non-deterministic. (i.e. "drive smoothly", "save gas", "change lanes").
Channel B (The Monitor): Simple, verifiable, deterministic code. (i.e. "don't hit anything").
The Decision Module: Default to using Channel A to control the vehicle. If Channel A wants to perform an action that is contrary to Channel B's safety rules, Channel B will take control of the vehicle and execute a minimum risk maneuver (i.e. pull over).

This architectural design also helps to support a very real aspect of the system development lifecycle: models are constantly changing (new data, new sensors, new environments, etc.). Even if Channel A improves, it may degrade in some edge cases due to drift or unseen conditions. The Simplex approach recognizes that this will occur and maintains safety by employing a stable and verifiable monitor; therefore, updates to the system do not silently increase risk. Instead of trusting blindly in accuracy metrics, the Simplex approach transforms AI deployment into a managed risk process.

Conclusion

We must not treat AI models like standard software components. They are probabilistic, opaque, and prone to silent failures. As engineers, our task is to design robust systems that can operate safely even when the model is incorrect.

AI Architectural pattern Software development Crash (computing)

Opinions expressed by DZone contributors are their own.

Related

Trending