How to Implement Semantic Search Using OpenAI GPT-3

Semantic search is a mostly overlooked feature of OpenAI GPT-3. In this blog, we discuss how you can implement a semantic search for groups of documents using GPT-3.

Mittal Patel

Updated Sep. 29, 21 · Tutorial

Likes (3)

Comment

Save

12.3K Views

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model used for text generation created by OpenAI. GPT-3 showed the amazing potential for a really smart language model to generate text and has the ability to do amazing tasks such as question answering, summarization, semantic search, chatbot, and writing poetry or an essay. Among them, we have already experimented with question answering using GPT-3, ads generation, sentence paraphrasing, and intent classification. Now let’s do some experiments for a semantic search task using GPT-3 API endpoint provided by OpenAI.

OpenAI’s API for search allows you to do a semantic search among a group of documents. Based on the semantically related query text, it provides the scores to each document and gives them ranks.

As it is API-based access, it is easy to use. We just have to provide text in the form of documents and then query the text. API will respond back with multiple results matching the query sorted by relevance score.

Below are steps to use OpenAI API for semantic search.

Installing OpenAI for Semantic Search

Here we are using Python for API calls. However, you can also make a cURL request.

Let’s create virtualenv by following these steps:

    Python
   
   virtualenv env_gpt --python=python3
source env_gpt/bin/activate

Next, install the OpenAI Python package to use its API and engines.

    Python
   
   pip install openai

Semantic Search Using GPT-3

To perform a semantic search, we first need to upload our documents in the JSONL file format. The following is a JSONL file format sample.

    Python
   
   {"text": "Hello OpenAI", "metadata": "sample data"}

Next, we will create a JSONL file for semantic search. Name it sample_search.jsonl and copy the following code into it:

    Python
   
   {“text”: “The rebuilding of economies after the COVID-19 crisis offers a unique opportunity to transform the global food system and make it resilient to future shocks, ensuring environmentally sustainable and healthy nutrition for all. To make this happen, United Nations agencies like the Food and Agriculture Organization, the United Nations Environment Program, the Intergovernmental Panel on Climate Change, the International Fund for Agricultural Development, and the World Food Program, collectively, suggest four broad shifts in the food system.”, “metadata”: “Economic reset”} 
{“text”: “In the past few weeks healthcare professionals have been fully focussed caring for enormous numbers of people infected with COVID-19. They did an amazing job. Not in the least because healthcare professionals and leaders have been using continues improvement as part of their accreditation program for many years. It has become part of their DNA. This has enabled them to change many processes as needed during COVID-19, using a cross-functional problem solving approach in (very) rapid improvement cycles.”, “metadata”: “Supporting adaptive healthcare”}

Now it’s time to upload this JSONL file using API key by setting purpose as search for semantic search. Create a file named upload_file.py, then copy the below code and provide your OpenAI API key.

    Python
   
   import openai
openai.api_key = "YOUR-API-KEY" response = openai.File.create(file=open("sample_doc.jsonl"), purpose="search")
print(response)

When you run the upload_file.py file, you will get the response below:

Copy id from the response in the above step.

Now let’s test it. To test the capability of GPT-3 semantic search, provide your query in the query text parameter.

    Python
   
 

   import openai
openai.api_key = "YOUR-API-KEY"
 
search_response = openai.Engine("davinci").search(
    search_model="davinci",
    query="healthcare",
    max_rerank=5,
    file="file-8ejPA5eM13J4J0dWy3bBbvTf",
    return_metadata=True
 )
 print(search_response)
  

Let’s understand the parameters of the openai.Engine.search.

search_model:
- OpenAI’s API lets us use different engines like Davinci, Babbage, Ada, Curie, etc.
- Davinci is the most powerful engine and costliest, too
query:
- Query text is the text used for the semantic search
max_rerank:
- The output documents are re-ranked by semantic search in the response, where the response contains documents with the most max_rerank
file:
- File ID, which we got while uploading the documents
return_metadata:
- Enable to get metadata in the response

And the response will look like the below image:

In the JSON response, we get the document text which was matched with the query, and score shows the relevance of the result. In our test, we provided only one document. If we provide multiple documents then we will get multiple results with different scores.

As we can see, it is simple to perform a semantic search using GPT-3 for a given query. GPT-3’s results are quite amazing.

Limitation

There is a limitation on the size of the document we can upload. There must be no more than 2,048 tokens in the document, and we can upload a maximum of 200 documents.

Do let us know in the comments if you have any queries regarding OpenAI semantic search.

GPT-3 Semantic search Semantics (computer science)

Published at DZone with permission of Mittal Patel. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending