DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Large Language Models: A Comprehensive Analysis of Real-World CX Applications
  • How To Create a Question-Answering Model From Scratch
  • Leveraging Natural Language Processing for Enhancing Sales Chatbots
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models

Trending

  • The Smart Way to Talk to Your Database: Why Hybrid API + NL2SQL Wins
  • How To Build Resilient Microservices Using Circuit Breakers and Retries: A Developer’s Guide To Surviving
  • GitHub Copilot's New AI Coding Agent Saves Developers Time – And Requires Their Oversight
  • The Future of Java and AI: Coding in 2025
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Building an Intelligent QA System With NLP and Milvus

Building an Intelligent QA System With NLP and Milvus

This article uses Google’s open-source BERT model and Milvus to quickly build a Q and A bot based on semantic understanding.

By 
Jun Gu user avatar
Jun Gu
DZone Core CORE ·
May. 21, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
4.9K Views

Join the DZone community and get the full member experience.

Join For Free

Milvus Project: github.com/milvus-io/milvus

The question answering system is commonly used in the field of natural language processing. It is used to answer questions in the form of natural language and has a wide range of applications. Typical applications include intelligent voice interaction, online customer service, knowledge acquisition, personalized emotional chatting, and more. 

Most question answering systems can be classified as generative and retrieval question answering systems, single-round question answering and multi-round question answering systems, open question answering systems, and specific question-answering systems.

This article mainly deals with a QA system designed for a specific field, which is usually called an intelligent customer service robot. In the past, building a customer service robot usually required the conversion of the domain knowledge into a series of rules and knowledge graphs. The construction process relies heavily on “human” intelligence. Once the scenes were changed, a lot of repetitive work would be required.

With the application of deep learning in natural language processing (NLP), machine reading can automatically find answers to matching questions directly from documents. The deep learning language model converts the questions and documents to semantic vectors to find the matching answer.

This article uses Google’s open-source BERT model and Milvus, an open-source vector search engine, to quickly build a Q and A bot based on semantic understanding.

Overall Architecture

This article implements a question answering system through semantic similarity matching. The general construction process is as follows:

  1. Obtain a large number of questions with answers in a specific field ( a standard question set).
  2. Use the BERT model to convert these questions into feature vectors and store them in Milvus. And Milvus will assign a vector ID to each feature vector at the same time.
  3. Store these representative question IDs and their corresponding answers in PostgreSQL.

When a user asks a question:

  1. The BERT model converts it to a feature vector.
  2. Milvus performs a similarity search and retrieves the ID most similar to the question.
  3. PostgreSQL returns the corresponding answer.

The system architecture diagram is as follows (the blue lines represent the import process and the yellow lines represent the query process):

Milvus

Next, we will show you how to build an online Q and A system step by step.

Steps to Build the Q and A System

Before you start, you need to install Milvus and PostgreSQL. For the specific installation steps, see the Milvus official website.

1. Data preparation

The experimental data in this article comes from https://github.com/chatopera/insuranceqa-corpus-zh

The data set contains a question and answers data pairs related to the insurance industry. In this article, we extract 20,000 question and answer pairs from it. Through this set of question and answer data sets, you can quickly build a customer service robot for the insurance industry.

2. Generate feature vectors

This system uses a model that BERT has pre-trained. Download it from the link below before starting a service: https://storage.googleapis.com/bert_models/2018_10_18/cased_L-24_H-1024_A-16.zip

Use this model to convert the question database to feature vectors for future similarity search. For more information about the BERT service, see https://github.com/hanxiao/bert-as-service.

BERT service

3. Import to Milvus and PostgreSQL

Normalize and import the generated feature vectors import to Milvus, and then import the IDs returned by Milvus and the corresponding answers to PostgreSQL. The following shows the table structure in PostgreSQL:

name and type

milvus

4. Retrieve Answers

The user inputs a question, and after generating the feature vector through BERT, they can find the most similar question in the Milvus library. This article uses the cosine distance to represent the similarity between two sentences. Because all vectors are normalized, the closer the cosine distance of the two feature vectors to 1, the higher the similarity.

In practice, your system may not have perfectly matched questions in the library. Then, you can set a threshold of 0.9. If the greatest similarity distance retrieved is less than this threshold, the system will prompt that it does not include related questions.

milvus

System Demonstration

The following shows an example interface of the system:

AI Q&A system


Enter your question in the dialog box and you will receive a corresponding answer:

AI Q&A system


Summary

After reading this article, we hope you find it easy to build your Q and A System.

With the BERT model, you no longer need to sort and organize the text corpora beforehand. At the same time, thanks to the high performance and high scalability of the open-source vector search engine Milvus, your QA system can support a corpus of up to hundreds of millions of texts.

Question answering NLP

Opinions expressed by DZone contributors are their own.

Related

  • Large Language Models: A Comprehensive Analysis of Real-World CX Applications
  • How To Create a Question-Answering Model From Scratch
  • Leveraging Natural Language Processing for Enhancing Sales Chatbots
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!