Making AI Better: A Deep Dive Across Users, Developers, and Businesses
In this sequel to Making AI Faster, I explore why making AI better matters and share strategies across user, developer, and business perspectives.
Join the DZone community and get the full member experience.
Join For FreeIntroduction - Making AI Better.
In my previous article, I discussed why making AI faster, better, and cheaper is a critical need today. And I introduced my aim to draw from real-world experiences to discuss doing so I also shared a deep dive into the main challenges and strategies to make AI Faster, while bringing out three key perspectives: End Users, AI Developers, and Businesses.
In this article, I will focus on and take a deep dive into the second pillar — Making AI Better.
Why "Better" Matters
-
For end users: Better AI means easier use and greater trust. Whether it's Alexa responding to voice commands or Google surfacing relevant content, users expect products that just work. However, what adds to the complexity is that users lose trust in cases where a model is not fair, e.g. a recruiting model screening résumés based on a gender bias derived from previous hires. As AI becomes more integrated into our lives, edge-case failures like this can erode trust.
-
For developers: From a developer’s perspective, better AI accelerates development velocity and improves development experience. It includes using AI tools to improve productivity and adopting better processes that enable shipping smaller, incremental improvements with confidence. Allowing developers to reduce technical debt and focus on innovation.
-
For businesses: Better AI systems bring operational efficiency and enable sustainable growth. Fair and accurate AI models lead to better customer retention. Transparent, compliant AI practices protect businesses from penalties.
End-User Perspective: Building Trust
Challenges
-
Wrong/Inconsistent answers make people lose trust: Users quickly lose confidence in AI features that provide incorrect or inconsistent outputs. e.g., Apple Maps sent Hawai'i tourists onto an unpaved, hazardous detour for two weeks, breaking user trust. (Civil Beat)
-
Biased answers are often unacceptable and cause brand damage: In early 2024, Google temporarily paused Gemini image generation owing to “inaccurate or even offensive” depictions of Nazi soldiers from World War II. Failing to address bias can not only harm users but can also halt multi-million dollar investments. (Google Blog)
-
Minority class underrepresentation creates systemic gaps: A Northwestern study found that dermatology AI systems were less accurate for dark-skinned patients than light-skinned ones. Such gaps can become self-reinforcing if doctors begin to leverage incorrect AI recommendations for dark-skinned patients and AI models are further trained on this feedback. (Northwestern)
Strategies
-
Build metric scorecards: To build better AI, start by deeply understanding the experience you want to deliver. Create a scorecard to track metrics for this experience. For example, Google Assistant uses False Wake Rate metric to monitor how often the its AI assistant activates when it shouldn’t.(Learn how to make your scorecard)
-
Evolve benchmarks: Static benchmarks quickly become obsolete as user behavior changes. We must upgrade benchmarks to include new usage patterns and edge cases we discover.
-
Develop responsible AI guardrails: Integrate automated checks for fairness (equalized odds, equal opportunity) and toxicity (profanity rate, hate speech rate, safety red-teaming). These checks must be launch blocking and must be tested before deploying new model versions to production.
-
Use synthetic data for testing: Leverage synthetic data to identify broken behaviors early. For example, Waymo has tested its self-driving stack over 20 billion simulated miles. Similar approaches can be used for AI in domains such as finance and healthcare, where production errors are very costly. (Waymo)
Developer Perspective: Efficient Dev Lifecycle
Challenges
-
Poor data quality stalls progress: Data is the foundation of machine learning, yet many teams still struggle with missing or mislabeled data. Hand-labeling becomes a fallback, but does not scale, eventually blocking engineering progress. The 2025 CDO Insights Survey found that 43% of stalled AI projects were attributed to poor data quality and data readiness. (WorkOS)
-
Technical debt: AI systems often evolve through one-off patches to data or to the related prompts. Google’s paper, The High-Interest Credit Card of Technical Debt, highlights how these layers become harder to replicate and significantly slow the growth of growing teams. (Google Research)
-
Knowledge silos create duplication: Without shared infrastructure like feature stores and registries, different teams end up solving the same problems. (Medium)
-
Error-prone multi-GPU environments: Multi-GPU setups are essential for training large models, but they’re also hard to debug. A common error — “CUDA driver version is insufficient for CUDA runtime version” — frequently halts training until nodes are re-imaged. These instabilities waste compute, delay launches, and frustrate engineers. (NVIDIA Developer Forums)
Strategies
-
Construct data-centric AI pipelines: Timely investment in data pipelines and processes to continuously curate diverse, high-quality datasets. Remember, data collection takes time —sometimes months. Plan for it. Practices like active labeling, noise filtering, and subgroup tracking can improve performance. For example, Snorkel.ai uses weak supervision to produce labeled datasets at scale. (Snorkel)
-
Build central tooling and feature stores: Adopt shared tooling that makes features discoverable, versioned, and reusable. A central source of truth enables more confident experimentation and faster Time to Value. Coreweave's Weights & Biases provides lineage tracking that shows where a feature comes from and which models use them. (Weights & Biases)
-
Conduct reproducible multi-GPU training: Containerize training environments with pinned CUDA versions, to auto-restart on failure. Stream NCCL performance and memory utilization data to dashboards so engineers can proactively debug.
Business Perspective: Reduced Risk
Challenges
-
Revenue loss from model error: Zillow’s “iBuyer” program used an AI pricing model to buy and sell homes at scale. The model failed to account for the impact of a volatile 2021 housing market and overpaid for thousands of properties. The company lost over $500 million. (GeekWire)
-
Regulatory fines: Clearview AI scraped billions of images from the internet to build its facial recognition database. This violated privacy laws such as GDPR and BIPA, resulting in global fines of $34 million. (Barracuda)
-
Brand damage and operations shutdown: In 2023, a Cruise robotaxi struck and dragged a pedestrian for 20 feet due to an unhandled edge case in its perception stack. Regulators ended up suspending Cruise operations. (Reuters)
Strategies
-
Seek out blue/green deployments with guardrails: Run new models in a "green" stack alongside the production "blue" system. Route a small percentage of traffic to green, monitor both, and automatically roll back if critical metrics diverge.
-
Adopt red teaming: Systematically probe your AI systems for safety, misuse, and adversarial vulnerabilities. OpenAI's GPT-4 system card describes multiple phases of red teaming that shaped safety mitigations before launch. (OpenAI GPT-4 System Card)
-
Publish model cards: Publish detailed model documentation that covers training data sources, intended use cases, and evaluation metrics. This improves transparency with customers and regulators alike. (Sample card from Anthropic)
-
Practice Active Learning with Human-in-the-Loop: Active learning uses uncertainty sampling to identify examples the model finds ambiguous and routes them to human experts. This drastically reduces annotation needs while improving performance on difficult cases.
Conclusion
Making AI better is not a luxury, it's a necessity. End users expect fairness and reliability. Developers need stable infrastructure and fast iteration cycles. Businesses need to navigate financial and regulatory risks. We must prioritize AI quality before we try to unlock scale. Dive deeper into “Making AI Faster”. Stay tuned for the final part of the series: Making AI Cheaper.
Disclaimer
Views expressed are my own and do not represent those of Meta or its affiliates.
Opinions expressed by DZone contributors are their own.
Comments