Fact-Checking LLM Outputs Programmatically: Building a Verification Layer That Catches Hallucinations

LLMs confidently hallucinate plausible data and asking them to "be careful" doesn't fix it. The most effective safeguard is an automated verification layer.

Raviteja Nekkalapu

May. 21, 26 · Tutorial

Likes (0)

Comment

Save

1.7K Views

Last month, I asked an LLM to analyze a company's financials. The report it generated included this sentence, "The company's revenue grew 23% year-over-year to $4.2 billion in Q3 2025."

The actual revenue was $3.8 billion. Growth was 14%. The model made up both numbers with zero hesitation.

This is the fundamental problem with LLM-generated content. It reads perfectly. It sounds authoritative. And sometimes it's wrong. Not "obvious nonsense" wrong. More like "plausible-sounding data point that doesn't match any real source" wrong. The output looks identical whether the claim is grounded in reality or fabricated from noise in the training data.

I have tried asking models to be more careful. They just hallucinate more politely. I have tried using bigger models. They hallucinate less often, but they still hallucinate. The only thing that's actually worked for me is verifying the output after it's been generated, using a separate system that evaluates each claim against the source data independently.

How I Built It

The Approach: Use a Second Model to Verify the First

Take the claims from your primary model's output and send them, along with the source data, to a different model that's specifically prompted to classify each claim as supported or not.

The critical detail: the verifier must be a different model from the generator. Using the same model to check its own work is like asking a student to grade their own test. It'll be consistently wrong about the same things. You need a genuinely independent check.

I use Cohere's API for verification, but any model that handles natural language inference (NLI) tasks well would work here. The important thing is independence from the generator.

Step 1: Extract Individual Claims

Break the model's output into individual claims, statements that can each be independently evaluated.

"The company's revenue grew 23% YoY to $4.2B with improving margins" contains two distinct claims:

Revenue grew 23% YoY to $4.2B
Margins are improving

You can do this extraction with a simple sentence-level split plus a filter:

    JavaScript
   
 

   function extractClaims(text) {
  const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
  return sentences.filter(s => {
    const lower = s.toLowerCase().trim();
    // Skip transitions, summaries, and filler
    if (lower.startsWith('in summary') || lower.startsWith('overall')) return false;
    if (lower.length < 20) return false;
    // Keep sentences containing specific data or measurable assertions
    return /\d|percent|grew|declined|increased|decreased|revenue|profit|market|ratio/i.test(s);
  });
}
  

This heuristic is rough. It misses some verifiable claims and lets a few non-claims through. But catching 80% of the factual assertions in a generated report is dramatically better than catching 0%, which is what most applications do today.

Step 2: Verify Each Claim Against Your Source Data

For each extracted claim, check it against the raw data you fed to the primary model. If your generator was summarizing financial data from Finnhub's API, compare its claims against the actual Finnhub response.

    JavaScript
   
 

   async function verifyClaim(claim, sourceData, cohereKey) {
  const response = await fetch('https://api.cohere.ai/v1/chat', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${cohereKey}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'command-r-plus',
      message: `Based solely on the source data provided below, classify this claim.\n\nClaim: "${claim}"\n\nSource data: ${JSON.stringify(sourceData)}\n\nRespond with exactly one word: GROUNDED, SPECULATIVE, or UNVERIFIABLE.`,
      temperature: 0,
    }),
  });

  const data = await response.json();
  return data.text.trim().toUpperCase();
}
  

The three classifications:

GROUNDED: The source data directly supports this claim. The numbers match. The facts check out.
SPECULATIVE: The claim makes inferences that go beyond what the data shows. Not necessarily wrong, but not directly supported either. "This suggests strong growth potential," when the data only shows past growth.
UNVERIFIABLE: The source data doesn't contain enough information to confirm or deny the claim. This is where the dangerous hallucinations hide. The model stated something as fact, but there's no data to back it up.

Step 3: Assemble the Audit Report

For each claim, you now have a verdict. Package them together into an audit object the frontend can display:

    JavaScript
   
 

   async function buildFactAudit(generatedText, sourceData, cohereKey) {
  const claims = extractClaims(generatedText);

  const results = await Promise.all(
    claims.map(async claim => ({
      claim: claim.trim(),
      verdict: await verifyClaim(claim, sourceData, cohereKey),
    }))
  );

  const grounded = results.filter(r => r.verdict === 'GROUNDED').length;
  const total = results.length;

  return {
    overallScore: total > 0 ? Math.round((grounded / total) * 100) : 0,
    claims: results,
    summary: `${grounded}/${total} claims grounded in source data`,
  };
}
  

Display the audit alongside the generated content. Show each claim with its verdict. Let the user see which statements are supported by data and which ones the model pulled from thin air.

What the Data Showed

I ran this verification layer against 100 generated reports. The breakdown:

72% of claims were grounded, directly supported by the source data
18% were speculative – reasonable inferences that went beyond what the data strictly shows
10% were unverifiable – statements presented as fact with zero supporting data in the pipeline

That 10% is the category that keeps me up at night. These are claims that sound authoritative, specific numbers, named sources, and precise percentages, but correspond to nothing in the data the model was given. Without the verification layer, a user reads "analysts project 30% upside" and assumes a human analyst actually said that. With the verification layer, that claim gets flagged as UNVERIFIABLE, and the user can decide what to do with it.

The Cost Is Almost Nothing

Each verification call to Cohere costs a small fraction of a cent. For a report with 25-30 extractable claims, the total verification cost is well under $0.05. For context, that's typically less than the cost of generating the report in the first place.

And the value is enormous. The difference between "this report says revenue grew 23%" and "this report says revenue grew 23% [GROUNDED]" is the difference between blind faith and informed reading.

Being Honest About the Limitations

This approach has real constraints, and I want to be upfront about them.

It can only verify against the data you have. If the model hallucinates a fact about something your source data doesn't cover, the verifier will say UNVERIFIABLE, not WRONG. You catch the gap, but you can't confirm the error.

The verifier model isn't perfect either. I have seen Cohere classify a clearly wrong claim as SPECULATIVE when it should have been UNVERIFIABLE. Cross-model verification reduces error; it doesn't eliminate it.

Narrative claims resist verification. "Revenue was $4.2B" is easy to check against source data. "The company has a strong competitive position" is a judgment that no automated system can objectively evaluate.

Despite those constraints, this layer catches the most damaging mistakes, wrong numbers, invented metrics, and hallucinated analyst quotes before they reach the user.

I built this into Nipun-AI, an open-source financial analysis tool where every AI-generated claim is classified as GROUNDED, SPECULATIVE, or UNVERIFIABLE. The code above is most of the implementation. The hard part isn't writing it, but deciding to write it in the first place, rather than trusting the primary model's output on faith.

Stop trusting. Start verifying. The code is 40 lines. The excuses for not doing it ran out a long time ago.

large language model

Opinions expressed by DZone contributors are their own.

Related

Trending