DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Building an Internal Document Search Tool with Retrieval-Augmented Generation (RAG)
  • Architecting Intelligence: A Complete LLM-Powered Pipeline for Unstructured Document Analytics
  • Parent Document Retrieval (PDR): Useful Technique in RAG
  • Why Knowing Your LLM Hallucinated Is Not Enough

Trending

  • Why Your QA Engineer Should Be the Most Stubborn Person on the Team
  • Throughput vs Goodput: The Performance Metric You Are Probably Ignoring in LLM Testing
  • RAG Done Right: When to Use SQL, Search, and Vector Retrieval and How To Combine Them
  • Detecting Bugs and Vulnerabilities in Java With SonarQube
  1. DZone
  2. Coding
  3. Languages
  4. Implementing Effective Document Fraud Detection in C#

Implementing Effective Document Fraud Detection in C#

This article discusses some of the challenges of (and solutions for) adding fraud detection logic to automated document processing pipelines.

By 
Brian O'Neill user avatar
Brian O'Neill
DZone Core CORE ·
Apr. 29, 26 · Analysis
Likes (0)
Comment
Save
Tweet
Share
2.2K Views

Join the DZone community and get the full member experience.

Join For Free

Document fraud is a persistent problem across a range of industries, and the attack surface is much wider than most organizations want to admit. Enterprises that accept uploaded documents as part of their workflows (particularly insurance carriers) are routinely exposed to carefully forged files and, increasingly, a mountain of AI-generated fakes. The challenge isn’t all about detecting convincing fraud, however: it’s also about the real-world constraints of building a system that can reason about a document’s content holistically and flag suspicious patterns before they cause damage in some downstream workflow.

Most development teams aren’t staffed to build that kind of solution from scratch. Fraud detection requires understanding document semantics, not just structure, and the signals that indicate fraudulent content tend to be contextual more than syntactic. For example, a document that looks perfectly valid and routine on the surface might contain financial liability l language inconsistent with its stated purpose, or it might have been clearly generated by an AI tool rather than produced by a legitimate issuing authority.

In this article, we’ll look at what it means to implement document fraud detection in C#, and we’ll explore some of the challenges involved in building that capability in-house. Ultimately, we’ll walk through an especially developer-friendly API that handles the end-to-end process in a single call.

Why Fraud Detection Is a Hard Problem for Document Pipelines

Before we get into the implementation side of things, it’s worth understanding why document fraud detection is so difficult to build (well).

The biggest and most obvious challenge is format variability. Enterprise document pipelines accept a wide range of file types, and the signals that indicate fraud can show up differently depending on how any given document is structured. A manipulated PDF behaves differently from a doctored image of a form, and a forged email attachment presents different detection challenges than a tampered spreadsheet.

Beyond formatting, there’s the problem of content reasoning. Detecting fraud isn’t a challenge limited to checking file metadata or making pixel-level comparisons. It requires an understanding of what a document is claiming to be versus what it actually contains. For example, a Form W-2 that contains language normally associated with a purchase agreement is cause for suspicion. An expense receipt with a date from three years ago attached to a current reimbursement request is also cause for suspicion. Building heuristics to catch all of those cases is a pretty significant maintenance burden on developers.

Then, of course, there’s the AI-generated content problem. And what a problem that’s become in the past few years. Fraudsters now have access to tools capable of producing convincing fraudulent documents at scale, and it’s not just dedicated fraudsters considering this kind of thing anymore.  

According to Verisk, 36% of insurance consumers said they would consider digitally altering an insurance claims document to strengthen their case. That’s a pretty alarming figure if you’re in the insurance business. A detection system that was trained or tuned before those tools became widely available may not catch what’s coming through the pipeline today.

It’s also worth mentioning that user context adds another layer. A document submitted by a user with an unverified email address carries a different risk than the same document submitted through a fully verified account. Folding that context into the fraud assessment requires either building a scoring system that weighs document-level signals against user-level signals or finding a solution that handles both inputs together.

Open-Source Options for Fraud Detection in .NET

I always like to address open-source tools for those who prefer taking on the in-house approach.  In this case, a fraud detection pipeline in C# would generally require assembling several unique components.

It all starts with text extraction. For PDFs, itext and PDFPig are both well-regarded options with a solid NuGet presence. For Office formats, the fantastically well-documented Open XML SDK covers Word, Excel, and PowerPoint. Image-based documents require an OCR step first, and that can be especially tricky. Tesseract remains the most commonly used open-source option available for .NET via the Tesseract NuGet package.

Once text is extracted, the fraud classification problem itself needs to be addressed. This, unfortunately, is where the off-the-shelf open-source tooling runs a bit thin. Most approaches at this stage involve calling a hosted LLM with a carefully engineered prompt. This can work, but it does introduce its own reliability concerns around things like prompt sensitivity and response consistency. Running a local classification model is an option too, and it’s probably what most teams are thinking of when building a modern solution to this problem. The challenge here runs deep, however: it requires managing model versioning and handling the tokenization and post-processing work that a hosted service would otherwise abstract away effortlessly.

All things considered, neither path is particularly lightweight, and neither handles the user context scoring component in an integrated way. When teams end up stitching together multiple services with custom “glue” code, that “glue” tends to become brittle.

Fraud Detection With a Dedicated API

For most production use cases, a dedicated API with a reliable fraud detection AI model is a more practical option.  We’ll cover a quick C# implementation of one such option below.

This API accepts a wide range of input formats, including PDF, DOC/DOCX, XLS/XLSX, PPT/PPTX, HTML, EML/MSG, PNG, JPG, and WEBP, and it returns a structured fraud assessment that covers both document-level signals and user context.

To get started, we’ll first install the .NET SDK via NuGet:

C#
 
Install-Package Cloudmersive.APIClient.NETCore.FraudDetection -Version 2.0.3


And right after that, we’ll import the required classes:

C#
 
using System;
using System.Diagnostics;
using Cloudmersive.APIClient.NETCore.FraudDetection.Api;
using Cloudmersive.APIClient.NETCore.FraudDetection.Client;
using Cloudmersive.APIClient.NETCore.FraudDetection.Model;


At this point, the request is straightforward. Most of the configuration happens through request headers, which makes this API easy to slot into an existing document intake workflow without having to restructure too much around it. 

Here’s an example call structure:

C#
 
namespace Example
{
    public class DocumentDetectFraudAdvancedExample
    {
        public void main()
        {
            // Configure API key authorization: Apikey
            Configuration.Default.AddApiKey("Apikey", "YOUR_API_KEY");
            
            

            var apiInstance = new FraudDetectionApi();
            var preprocessing = preprocessing_example;  // string | Optional: Set the level of image pre-processing to enhance accuracy.  Possible values are 'Auto' and 'None'.  Default is Auto. (optional) 
            var resultCrossCheck = resultCrossCheck_example;  // string | Optional: Set the level of output accuracy cross-checking to perform on the input.  Possible values are 'None' and 'Advanced'.  Default is None. (optional) 
            var userEmailAddress = userEmailAddress_example;  // string | User email address for context (optional) (optional) 
            var userEmailAddressVerified = true;  // bool? | True if the user's email address was verified (optional) (optional) 
            var inputFile = new System.IO.FileStream("C:\\temp\\inputfile", System.IO.FileMode.Open); // System.IO.Stream | Input document, or photos of a document, to perform fraud detection on (optional) 

            try
            {
                // Advanced AI Fraud Detection for Documents
                AdvancedFraudDetectionResult result = apiInstance.DocumentDetectFraudAdvanced(preprocessing, resultCrossCheck, userEmailAddress, userEmailAddressVerified, inputFile);
                Debug.WriteLine(result);
            }
            catch (Exception e)
            {
                Debug.Print("Exception when calling FraudDetectionApi.DocumentDetectFraudAdvanced: " + e.Message );
            }
        }
    }
}


Most of the complexity is abstracted away here, but there are a few parameters worth understanding before filling this in.

For starters, preprocessing controls how aggressively the API tries to enhance image quality before analysis. The default setting is Auto, which handles most real-world input well. Setting this to None can reduce processing time for documents, but only if you’re already confident they’re clean and high-resolution.

resultCrossCheck is set to None by default, but switching it to Advanced gives you a second-pass verification step on the output. For any workflow you’d consider “high stakes” (e.g., claims processing for insurance folks), the added latency is probably worth it.

UserEmailAddress and UserEmailAddressVerified are both optional but meaningful. Passing user context alongside a document allows the API to factor submission-level signals into the fraud risk score rather than simply evaluate the document in isolation.

CustomPolicyID allows the request to be evaluated against a saved policy configuration, which is useful for organizations that need different fraud detection thresholds across different document types or business units.

Interpreting the Response

The API response object is more detailed than what you’d get from a simple classification API, and it’s worth taking a moment to unpack each field.

JSON
 
{
  "Successful": true,
  "CleanResult": true,
  "FraudRiskLevel": 0,
  "ContainsFinancialLiability": true,
  "ContainsSensitiveInformationCollection": true,
  "ContainsAssetTransfer": true,
  "ContainsPurchaseAgreement": true,
  "ContainsEmploymentAgreement": true,
  "ContainsExpiredDocument": true,
  "ContainsAiGeneratedContent": true,
  "AnalysisRationale": "string",
  "DocumentClass": "string"
}


Successful is just a sanity check confirming the request completed without error. CleanResult is a top-level Boolean that indicates whether the document passed the fraud assessment. FraudRiskLevel gives a numeric score that can be used to build complex tiered routing logic rather than using the CleanResult response as a pass/fail gate.

The Boolean flags below that each point to a specific category of risk. ContainsFinancialLiability and ContainsPurchaseAgreement are useful for catching documents whose content doesn’t match their declared type. ContainsExpiredDocument catches document submitted past their valid date.

ContainsAiGeneratedContent is worth mentioning on its own. This is one of the most relevant flags to the current threat landscape, identifying documents likely produced by generative AI tools rather than legitimate sources. That can include legitimate documents doctored to benefit the submitter (e.g., an expense receipt with more items and a greater total than the employee actually spent).

AnalysisRationale returns a plain-language explanation of how the fraud assessment was reached, which is useful for audit trails and for surfacing context to human reviewers rather than just slapping a score on their desk. DocumentClass gives the API’s assessment of what type of document was submitted.

Conclusion

In this article, we walked through the challenge of building document fraud detection into a C# pipeline.   We looked at the components required for an in-house approach, and we explored a dedicated API that consolidates those concerns into a single call.  The combination of document-level content analysis, user context scoring, and AI-generated content detection makes it a good option for any document pipeline where authenticity is of utmost importance.

Document large language model .NET

Opinions expressed by DZone contributors are their own.

Related

  • Building an Internal Document Search Tool with Retrieval-Augmented Generation (RAG)
  • Architecting Intelligence: A Complete LLM-Powered Pipeline for Unstructured Document Analytics
  • Parent Document Retrieval (PDR): Useful Technique in RAG
  • Why Knowing Your LLM Hallucinated Is Not Enough

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook