DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Amazon Q Developer for AI Infrastructure: Architecting Automated ML Pipelines
  • The Citizen Developer Boom: How Generative AI Lowers the Barrier to Entry
  • AI Won't Replace Front-End Developers, It'll Replace the Boring Parts
  • Stop Your GenAI From Burning Cash in Production

Trending

  • Observability in Spring Boot 4
  • Building an Image Classification Pipeline With Apache Camel and Deep Java Library (DJL)
  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI
  • Zone-Free Angular: Unlocking High-Performance Change Detection With Signals and Modern Reactivity
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Making AI Better: A Deep Dive Across Users, Developers, and Businesses

Making AI Better: A Deep Dive Across Users, Developers, and Businesses

In this sequel to Making AI Faster, I explore why making AI better matters and share strategies across user, developer, and business perspectives.

By 
Gunveer Gujral user avatar
Gunveer Gujral
·
Oct. 14, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
1.2K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction - Making AI Better.

In my previous article, I discussed why making AI faster, better, and cheaper is a critical need today. And I introduced my aim to draw from real-world experiences to discuss doing so I also shared a deep dive into the main challenges and strategies to make AI Faster, while bringing out three key perspectives: End Users, AI Developers, and Businesses.

In this article, I will focus on and take a deep dive into the second pillar — Making AI Better.

Why "Better" Matters

  1. For end users: Better AI means easier use and greater trust. Whether it's Alexa responding to voice commands or Google surfacing relevant content, users expect products that just work. However, what adds to the complexity is that users lose trust in cases where a model is not fair, e.g. a recruiting model screening résumés based on a gender bias derived from previous hires. As AI becomes more integrated into our lives, edge-case failures like this can erode trust.

  2. For developers: From a developer’s perspective, better AI accelerates development velocity and improves development experience. It includes using AI tools to improve productivity and adopting better processes that enable shipping smaller, incremental improvements with confidence. Allowing developers to reduce technical debt and focus on innovation.

  3. For businesses: Better AI systems bring operational efficiency and enable sustainable growth. Fair and accurate AI models lead to better customer retention. Transparent, compliant AI practices protect businesses from penalties.

End-User Perspective: Building Trust

Challenges

  1. Wrong/Inconsistent answers make people lose trust: Users quickly lose confidence in AI features that provide incorrect or inconsistent outputs. e.g., Apple Maps sent Hawai'i tourists onto an unpaved, hazardous detour for two weeks, breaking user trust. (Civil Beat)

  2. Biased answers are often unacceptable and cause brand damage: In early 2024, Google temporarily paused Gemini image generation owing to “inaccurate or even offensive” depictions of Nazi soldiers from World War II. Failing to address bias can not only harm users but can also halt multi-million dollar investments. (Google Blog)

  3. Minority class underrepresentation creates systemic gaps: A Northwestern study found that dermatology AI systems were less accurate for dark-skinned patients than light-skinned ones. Such gaps can become self-reinforcing if doctors begin to leverage incorrect AI recommendations for dark-skinned patients and AI models are further trained on this feedback. (Northwestern)

Strategies

  1. Build metric scorecards: To build better AI, start by deeply understanding the experience you want to deliver. Create a scorecard to track metrics for this experience. For example, Google Assistant uses False Wake Rate metric to monitor how often the its AI assistant activates when it shouldn’t.(Learn how to make your scorecard)

  2. Evolve benchmarks: Static benchmarks quickly become obsolete as user behavior changes. We must upgrade benchmarks to include new usage patterns and edge cases we discover.

  3. Develop responsible AI guardrails: Integrate automated checks for fairness (equalized odds, equal opportunity) and toxicity (profanity rate, hate speech rate, safety red-teaming). These checks must be launch blocking and must be tested before deploying new model versions to production. 

  4. Use synthetic data for testing: Leverage synthetic data to identify broken behaviors early. For example, Waymo has tested its self-driving stack over 20 billion simulated miles. Similar approaches can be used for AI in domains such as finance and healthcare, where production errors are very costly. (Waymo)

Developer Perspective: Efficient Dev Lifecycle

Challenges

  1. Poor data quality stalls progress: Data is the foundation of machine learning, yet many teams still struggle with missing or mislabeled data. Hand-labeling becomes a fallback, but does not scale, eventually blocking engineering progress. The 2025 CDO Insights Survey found that 43% of stalled AI projects were attributed to poor data quality and data readiness. (WorkOS)

  2. Technical debt: AI systems often evolve through one-off patches to data or to the related prompts. Google’s paper, The High-Interest Credit Card of Technical Debt, highlights how these layers become harder to replicate and significantly slow the growth of growing teams. (Google Research)

  3. Knowledge silos create duplication: Without shared infrastructure like feature stores and registries, different teams end up solving the same problems. (Medium)

  4. Error-prone multi-GPU environments: Multi-GPU setups are essential for training large models, but they’re also hard to debug. A common error — “CUDA driver version is insufficient for CUDA runtime version” — frequently halts training until nodes are re-imaged. These instabilities waste compute, delay launches, and frustrate engineers. (NVIDIA Developer Forums)

Strategies

  1. Construct data-centric AI pipelines: Timely investment in data pipelines and processes to continuously curate diverse, high-quality datasets. Remember, data collection takes time —sometimes months. Plan for it. Practices like active labeling, noise filtering, and subgroup tracking can improve performance. For example, Snorkel.ai uses weak supervision to produce labeled datasets at scale. (Snorkel)

  2. Build central tooling and feature stores: Adopt shared tooling that makes features discoverable, versioned, and reusable. A central source of truth enables more confident experimentation and faster Time to Value. Coreweave's Weights & Biases provides lineage tracking that shows where a feature comes from and which models use them. (Weights & Biases)

  3. Conduct reproducible multi-GPU training: Containerize training environments with pinned CUDA versions, to auto-restart on failure. Stream NCCL performance and memory utilization data to dashboards so engineers can proactively debug.

Business Perspective: Reduced Risk

Challenges

  1. Revenue loss from model error: Zillow’s “iBuyer” program used an AI pricing model to buy and sell homes at scale. The model failed to account for the impact of a volatile 2021 housing market and overpaid for thousands of properties. The company lost over $500 million. (GeekWire)

  2. Regulatory fines: Clearview AI scraped billions of images from the internet to build its facial recognition database. This violated privacy laws such as GDPR and BIPA, resulting in global fines of $34 million. (Barracuda)

  3. Brand damage and operations shutdown: In 2023, a Cruise robotaxi struck and dragged a pedestrian for 20 feet due to an unhandled edge case in its perception stack. Regulators ended up suspending Cruise operations. (Reuters)

Strategies

  1. Seek out blue/green deployments with guardrails: Run new models in a "green" stack alongside the production "blue" system. Route a small percentage of traffic to green, monitor both, and automatically roll back if critical metrics diverge.

  2. Adopt red teaming: Systematically probe your AI systems for safety, misuse, and adversarial vulnerabilities. OpenAI's GPT-4 system card describes multiple phases of red teaming that shaped safety mitigations before launch. (OpenAI GPT-4 System Card)

  3. Publish model cards: Publish detailed model documentation that covers training data sources, intended use cases, and evaluation metrics. This improves transparency with customers and regulators alike. (Sample card from Anthropic)

  4. Practice Active Learning with Human-in-the-Loop: Active learning uses uncertainty sampling to identify examples the model finds ambiguous and routes them to human experts. This drastically reduces annotation needs while improving performance on difficult cases.


Conclusion

Making AI better is not a luxury, it's a necessity. End users expect fairness and reliability. Developers need stable infrastructure and fast iteration cycles. Businesses need to navigate financial and regulatory risks. We must prioritize AI quality before we try to unlock scale. Dive deeper into “Making AI Faster”. Stay tuned for the final part of the series: Making AI Cheaper.


Disclaimer

Views expressed are my own and do not represent those of Meta or its affiliates.

AI dev

Opinions expressed by DZone contributors are their own.

Related

  • Amazon Q Developer for AI Infrastructure: Architecting Automated ML Pipelines
  • The Citizen Developer Boom: How Generative AI Lowers the Barrier to Entry
  • AI Won't Replace Front-End Developers, It'll Replace the Boring Parts
  • Stop Your GenAI From Burning Cash in Production

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook