DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Modern Test Automation With AI (LLM) and Playwright MCP
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • AI-Driven Test Automation Techniques for Multimodal Systems
  • Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application

Trending

  • Agile’s Quarter-Century Crisis
  • Apple and Anthropic Partner on AI-Powered Vibe-Coding Tool – Public Release TBD
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • Creating a Web Project: Caching for Performance Optimization
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Testing AI Is the First Step to Using AI

Testing AI Is the First Step to Using AI

Sales and marketing professionals in high tech should use AI, but how do you start? There's a crucial first step that often gets skipped: testing.

By 
Jesse Casman user avatar
Jesse Casman
DZone Core CORE ·
Apr. 01, 25 · Analysis
Likes (2)
Comment
Save
Tweet
Share
2.3K Views

Join the DZone community and get the full member experience.

Join For Free

Sales and marketing professionals in high tech are being told — daily, loudly, and from every direction — that they should be using AI. And while it's absolutely true that AI has real potential to boost productivity and unlock new insights, there's a crucial first step that often gets skipped: testing.

This article is the result of testing AI with data from a large developer community with over 2,800 registered users, almost 800 unique organizations, and nine years of cumulative data from 41 countries. It's simply too much data for a marketing team to realistically try to analyze, too much information for working in a spreadsheet or a series of spreadsheets.

Testing AI means understanding what you're working with, how to evaluate its output, and knowing when to set it aside. Without that foundational step, you risk wasting time, drawing bad conclusions, or simply being wowed by a tool that's confident but wrong. 

Let’s break down what to look for when evaluating tools like ChatGPT and how to determine when they can help—and when they can’t.

1. The Natural Language Interface and the LLMs Are Not the Same Thing

First things first: the natural language interface you’re typing into and the large language model (LLM) powering it are two very different things. The interface feels conversational. It's designed to be smooth, intuitive, and — honestly — a little magical. But that ease of interaction can make it hard to critically assess what’s coming back.

Just because something reads well doesn’t mean it’s correct. That’s the “mesmerizing” effect. You ask a wide-open question like, "How should I position my product for enterprise buyers?" and what you get back might be fluent, organized, and peppered with business jargon. But is it grounded in your actual market? Your real competitors? Probably not.

The broader the prompt, the more likely you’ll get vague or generic advice. And the more natural the answer sounds, the more likely you are to believe it — even if it wouldn’t hold up to scrutiny from a colleague. So the first rule when evaluating AI output is: don’t confuse fluency with accuracy.

Be specific in your prompts. Ask questions where you already know the right answer, and compare what the AI returns. That’s how you calibrate your trust.

Choosing an LLM

There are many LLMs. Too many. Here’s a quick overview of three major LLMs being used in sales and marketing right now:

1. GPT-4o (OpenAI)

  • Use cases: Email generation, content creation, customer support, lead scoring
  • Strengths: Industry-leading reputation
  • Pricing:
    • ChatGPT Pro (GPT-4-Turbo): $20/month per user
    • API: ~$0.01–$0.03 per 1K tokens (input/output)

2. Claude 3.7 Sonnet (Anthropic)

  • Use cases: Copywriting, long-form content, personalized messaging
  • Strengths: Strong in tone control and brand voice
  • Pricing:
    • Claude 3 Opus: ~$15 per million input tokens, ~$75 per million output tokens
    • Free version available with Claude 3 Haiku

3. Gemini (Google)

  • Use cases: Multimodal marketing content, customer research, SEO optimization
  • Strengths: Deep integration with Google ecosystem
  • Pricing:
    • Gemini Advanced (Ultra 1.0): $20/month via Google One AI Premium

2. AI and Spreadsheets Do Not Mix Well, Currently

Spreadsheets force structure. Every column of data has to line up. And if you’re pulling together a report or tracking campaign performance, a spreadsheet is a known way to give you a clear, reliable picture of your data.

AI can help with spreadsheet work, potentially speed it up exponentially — especially now that tools like ChatGPT, Claude, and Gemini allow you to upload files directly. 

This feels like a powerful shortcut: just drop in your messy spreadsheet and ask the AI to summarize trends or clean things up. 

But in practice, it’s not that smooth.

First, LLMs often struggle with multi-tab spreadsheets or files with nonstandard formatting. If your sheet has frozen panes, merged cells, or inconsistent headers, you might get confusing or incomplete results. AI models also have a tendency to misinterpret the structure of the file, especially when there’s a mix of text and numerical data or inconsistent row spacing.

Second, AIs don't "see" the file the way you do in Excel or Google Sheets. They process the underlying data and infer structure — but those inferences aren’t always correct. That means things like column order, hidden formulas, or conditional formatting can get lost or misread. What looks like a clean input to you might be interpreted in a totally different way by the model, leading to errors you wouldn’t spot unless you double-check every result.

Third, spreadsheets use up a lot of AI tokens. You'll likely hit limits in your AI configuration before you understand it. And the results — vague answers, evasive answers, wrong answers — will burn up a lot of that time you were hoping to save.

3. AI Is Not Good at Scrubbing Data

At our company, we ran a series of tests to see how well LLMs could clean up common B2B marketing and sales data problems: inconsistent company names, missing fields, weird formatting, multiple values crammed into one cell — you know, the usual.

The results? Not great.

LLMs often failed to apply consistent logic across rows. It would clean up one cell correctly and totally miss the next. Worse, it would sometimes invent data that wasn’t there, guessing at missing values with surprising confidence. 

In marketing and sales, where data quality directly impacts decision-making, that's a red flag.

What Can You Do Instead?

There are workarounds. If you still want to use GPT-4o or another LLM to help with spreadsheet tasks, here are two things that worked better in our tests:

  • Break large spreadsheets into smaller chunks. We found that keeping each file under 80 rows made a big difference. The model could "see" the structure more clearly and was less likely to make logic errors or hallucinate.
  • Clean the source data before feeding it to AI. Garbage in, garbage out still applies. If your spreadsheet is inconsistent or incomplete, fix that first using your normal tools before handing it over to AI. Think of the LLM as a second-pass editor, not your ace marketing employee.

Start Using AI Now, and Start With Testing

The news headlines vacillate between “AI is highly inaccurate, and here's the latest example of a bad mistake” and “AI is so good it will be taking your job within a year.” The truth is, it’s somewhere in between. Just reading the news about AI is not good enough. You need to do your own testing. When you take the time to test how AI performs on your specific sales or marketing tasks — your data, your workflows — you’ll start to see where it can really help, and where it still falls short.

AI is a tool — an extremely powerful one — but not a magical one. For high-tech sales and marketing teams, it's easy to get dazzled by a fluent chatbot that seems to understand your world. But testing matters.

Test it like you’d test a new sales rep: give it structured input, ask it to perform repeatable tasks, and check the results. If it can’t pass those basic tests, it’s not ready to join the team — yet.

When AI does work, it can save hours or spark ideas you wouldn’t have come up with yourself. But the first step to getting value from AI is knowing when and how to evaluate what it's actually doing.

AI Testing large language model

Opinions expressed by DZone contributors are their own.

Related

  • Modern Test Automation With AI (LLM) and Playwright MCP
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • AI-Driven Test Automation Techniques for Multimodal Systems
  • Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!