Testing AI Is the First Step to Using AI

Sales and marketing professionals in high tech should use AI, but how do you start? There's a crucial first step that often gets skipped: testing.

Jesse Casman

CORE ·

Apr. 01, 25 · Analysis

Likes (2)

Comment

Save

2.3K Views

Sales and marketing professionals in high tech are being told — daily, loudly, and from every direction — that they should be using AI. And while it's absolutely true that AI has real potential to boost productivity and unlock new insights, there's a crucial first step that often gets skipped: testing.

This article is the result of testing AI with data from a large developer community with over 2,800 registered users, almost 800 unique organizations, and nine years of cumulative data from 41 countries. It's simply too much data for a marketing team to realistically try to analyze, too much information for working in a spreadsheet or a series of spreadsheets.

Testing AI means understanding what you're working with, how to evaluate its output, and knowing when to set it aside. Without that foundational step, you risk wasting time, drawing bad conclusions, or simply being wowed by a tool that's confident but wrong.

Let’s break down what to look for when evaluating tools like ChatGPT and how to determine when they can help—and when they can’t.

1. The Natural Language Interface and the LLMs Are Not the Same Thing

First things first: the natural language interface you’re typing into and the large language model (LLM) powering it are two very different things. The interface feels conversational. It's designed to be smooth, intuitive, and — honestly — a little magical. But that ease of interaction can make it hard to critically assess what’s coming back.

Just because something reads well doesn’t mean it’s correct. That’s the “mesmerizing” effect. You ask a wide-open question like, "How should I position my product for enterprise buyers?" and what you get back might be fluent, organized, and peppered with business jargon. But is it grounded in your actual market? Your real competitors? Probably not.

The broader the prompt, the more likely you’ll get vague or generic advice. And the more natural the answer sounds, the more likely you are to believe it — even if it wouldn’t hold up to scrutiny from a colleague. So the first rule when evaluating AI output is: don’t confuse fluency with accuracy.

Be specific in your prompts. Ask questions where you already know the right answer, and compare what the AI returns. That’s how you calibrate your trust.

Choosing an LLM

There are many LLMs. Too many. Here’s a quick overview of three major LLMs being used in sales and marketing right now:

1. GPT-4o (OpenAI)

Use cases: Email generation, content creation, customer support, lead scoring
Strengths: Industry-leading reputation
Pricing:
- ChatGPT Pro (GPT-4-Turbo): $20/month per user
- API: ~$0.01–$0.03 per 1K tokens (input/output)

2. Claude 3.7 Sonnet (Anthropic)

Use cases: Copywriting, long-form content, personalized messaging
Strengths: Strong in tone control and brand voice
Pricing:
- Claude 3 Opus: ~$15 per million input tokens, ~$75 per million output tokens
- Free version available with Claude 3 Haiku

3. Gemini (Google)

Use cases: Multimodal marketing content, customer research, SEO optimization
Strengths: Deep integration with Google ecosystem
Pricing:
- Gemini Advanced (Ultra 1.0): $20/month via Google One AI Premium

2. AI and Spreadsheets Do Not Mix Well, Currently

Spreadsheets force structure. Every column of data has to line up. And if you’re pulling together a report or tracking campaign performance, a spreadsheet is a known way to give you a clear, reliable picture of your data.

AI can help with spreadsheet work, potentially speed it up exponentially — especially now that tools like ChatGPT, Claude, and Gemini allow you to upload files directly.

This feels like a powerful shortcut: just drop in your messy spreadsheet and ask the AI to summarize trends or clean things up.

But in practice, it’s not that smooth.

First, LLMs often struggle with multi-tab spreadsheets or files with nonstandard formatting. If your sheet has frozen panes, merged cells, or inconsistent headers, you might get confusing or incomplete results. AI models also have a tendency to misinterpret the structure of the file, especially when there’s a mix of text and numerical data or inconsistent row spacing.

Second, AIs don't "see" the file the way you do in Excel or Google Sheets. They process the underlying data and infer structure — but those inferences aren’t always correct. That means things like column order, hidden formulas, or conditional formatting can get lost or misread. What looks like a clean input to you might be interpreted in a totally different way by the model, leading to errors you wouldn’t spot unless you double-check every result.

Third, spreadsheets use up a lot of AI tokens. You'll likely hit limits in your AI configuration before you understand it. And the results — vague answers, evasive answers, wrong answers — will burn up a lot of that time you were hoping to save.

3. AI Is Not Good at Scrubbing Data

At our company, we ran a series of tests to see how well LLMs could clean up common B2B marketing and sales data problems: inconsistent company names, missing fields, weird formatting, multiple values crammed into one cell — you know, the usual.

The results? Not great.

LLMs often failed to apply consistent logic across rows. It would clean up one cell correctly and totally miss the next. Worse, it would sometimes invent data that wasn’t there, guessing at missing values with surprising confidence.

In marketing and sales, where data quality directly impacts decision-making, that's a red flag.

What Can You Do Instead?

There are workarounds. If you still want to use GPT-4o or another LLM to help with spreadsheet tasks, here are two things that worked better in our tests:

Break large spreadsheets into smaller chunks. We found that keeping each file under 80 rows made a big difference. The model could "see" the structure more clearly and was less likely to make logic errors or hallucinate.
Clean the source data before feeding it to AI. Garbage in, garbage out still applies. If your spreadsheet is inconsistent or incomplete, fix that first using your normal tools before handing it over to AI. Think of the LLM as a second-pass editor, not your ace marketing employee.

Start Using AI Now, and Start With Testing

The news headlines vacillate between “AI is highly inaccurate, and here's the latest example of a bad mistake” and “AI is so good it will be taking your job within a year.” The truth is, it’s somewhere in between. Just reading the news about AI is not good enough. You need to do your own testing. When you take the time to test how AI performs on your specific sales or marketing tasks — your data, your workflows — you’ll start to see where it can really help, and where it still falls short.

AI is a tool — an extremely powerful one — but not a magical one. For high-tech sales and marketing teams, it's easy to get dazzled by a fluent chatbot that seems to understand your world. But testing matters.

Test it like you’d test a new sales rep: give it structured input, ask it to perform repeatable tasks, and check the results. If it can’t pass those basic tests, it’s not ready to join the team — yet.

When AI does work, it can save hours or spark ideas you wouldn’t have come up with yourself. But the first step to getting value from AI is knowing when and how to evaluate what it's actually doing.

AI Testing large language model

Opinions expressed by DZone contributors are their own.

Related

Trending