How to Implement Semantic Search Using OpenAI GPT-3
Semantic search is a mostly overlooked feature of OpenAI GPT-3. In this blog, we discuss how you can implement a semantic search for groups of documents using GPT-3.
Join the DZone community and get the full member experience.
Join For FreeGenerative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model used for text generation created by OpenAI. GPT-3 showed the amazing potential for a really smart language model to generate text and has the ability to do amazing tasks such as question answering, summarization, semantic search, chatbot, and writing poetry or an essay. Among them, we have already experimented with question answering using GPT-3, ads generation, sentence paraphrasing, and intent classification. Now let’s do some experiments for a semantic search task using GPT-3 API endpoint provided by OpenAI.
OpenAI’s API for search allows you to do a semantic search among a group of documents. Based on the semantically related query text, it provides the scores to each document and gives them ranks.
As it is API-based access, it is easy to use. We just have to provide text in the form of documents and then query the text. API will respond back with multiple results matching the query sorted by relevance score.
Below are steps to use OpenAI API for semantic search.
Installing OpenAI for Semantic Search
Here we are using Python for API calls. However, you can also make a cURL request.
Let’s create virtualenv
by following these steps:
virtualenv env_gpt --python=python3
source env_gpt/bin/activate
Next, install the OpenAI Python package to use its API and engines.
pip install openai
Semantic Search Using GPT-3
To perform a semantic search, we first need to upload our documents in the JSONL file format. The following is a JSONL file format sample.
{"text": "Hello OpenAI", "metadata": "sample data"}
Next, we will create a JSONL file for semantic search. Name it sample_search.jsonl and copy the following code into it:
{“text”: “The rebuilding of economies after the COVID-19 crisis offers a unique opportunity to transform the global food system and make it resilient to future shocks, ensuring environmentally sustainable and healthy nutrition for all. To make this happen, United Nations agencies like the Food and Agriculture Organization, the United Nations Environment Program, the Intergovernmental Panel on Climate Change, the International Fund for Agricultural Development, and the World Food Program, collectively, suggest four broad shifts in the food system.”, “metadata”: “Economic reset”}
{“text”: “In the past few weeks healthcare professionals have been fully focussed caring for enormous numbers of people infected with COVID-19. They did an amazing job. Not in the least because healthcare professionals and leaders have been using continues improvement as part of their accreditation program for many years. It has become part of their DNA. This has enabled them to change many processes as needed during COVID-19, using a cross-functional problem solving approach in (very) rapid improvement cycles.”, “metadata”: “Supporting adaptive healthcare”}
Now it’s time to upload this JSONL file using API key by setting purpose
as search
for semantic search. Create a file named upload_file.py, then copy the below code and provide your OpenAI API key.
import openai
openai.api_key = "YOUR-API-KEY" response = openai.File.create(file=open("sample_doc.jsonl"), purpose="search")
print(response)
When you run the upload_file.py file, you will get the response below:
Copy id
from the response in the above step.
Now let’s test it. To test the capability of GPT-3 semantic search, provide your query
in the query text parameter.
import openai
openai.api_key = "YOUR-API-KEY"
search_response = openai.Engine("davinci").search(
search_model="davinci",
query="healthcare",
max_rerank=5,
file="file-8ejPA5eM13J4J0dWy3bBbvTf",
return_metadata=True
)
print(search_response)
Let’s understand the parameters of the openai.Engine.search
.
search_model
:- OpenAI’s API lets us use different engines like Davinci, Babbage, Ada, Curie, etc.
- Davinci is the most powerful engine and costliest, too
query
:- Query text is the text used for the semantic search
max_rerank
:- The output documents are re-ranked by semantic search in the response, where the response contains documents with the most
max_rerank
- The output documents are re-ranked by semantic search in the response, where the response contains documents with the most
file
:- File ID, which we got while uploading the documents
return_metadata
:- Enable to get metadata in the response
And the response will look like the below image:
In the JSON response, we get the document text which was matched with the query, and score
shows the relevance of the result. In our test, we provided only one document. If we provide multiple documents then we will get multiple results with different scores.
As we can see, it is simple to perform a semantic search using GPT-3 for a given query. GPT-3’s results are quite amazing.
Limitation
There is a limitation on the size of the document we can upload. There must be no more than 2,048 tokens in the document, and we can upload a maximum of 200 documents.
Do let us know in the comments if you have any queries regarding OpenAI semantic search.
Published at DZone with permission of Mittal Patel. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments