DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Chat Completion Models vs OpenAI Assistants API
  • Parent Document Retrieval (PDR): Useful Technique in RAG
  • Optimizing Search Precision With Self-Querying Retrieval (SQR) and Langchain
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)

Trending

  • Navigating Change Management: A Guide for Engineers
  • Dropwizard vs. Micronaut: Unpacking the Best Framework for Microservices
  • Building an AI/ML Data Lake With Apache Iceberg
  • Advancing Robot Vision and Control
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. An In-Depth Guide to Threads in OpenAI Assistants API

An In-Depth Guide to Threads in OpenAI Assistants API

Learn how to use OpenAI's Assistants API to manage threads and messages — create, list, retrieve, modify, delete — plus handle files, metadata, and more.

By 
Mohammed Talib user avatar
Mohammed Talib
·
Feb. 10, 25 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.3K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog, we will explore what chat completion models can and cannot do and then see how Assistants API addresses those limitations.

We will also focus on threads and messages  —  how to create them, list them, retrieve them, modify them, and delete them. Additionally, we will add some Python code snippets and show possible outputs based on the script language.

Limitations of Chat Completion Models

No Memory

Chat completion models do not have a memory concept. For example, if you ask: “What’s the capital of Japan?”

The model might say: “The capital of Japan is Tokyo.”

But when you ask again: “Tell me something about that city.”

It often responds with: “I’m sorry but you didn’t specify which city you are referring to.”

It does not understand what was discussed previously. That’s the main issue: there is no memory concept in chat completions.

Poor at Computational Tasks

Chat completion models are really bad at direct computational tasks. For instance, if you want to reverse the string “openaichatgpt,” it may generate the wrong output, like inserting extra characters or missing some letters.

No Direct File Handling

In chat completions, there is no way to process text files or Word documents directly. You have to convert those files to text, do chunking (divide documents into smaller chunks), create embeddings, and do vector searches yourself. Only then do you pass some relevant text chunks to the model as context.

Synchronous Only

Chat completion models are not asynchronous. You must ask a question and wait for it to finish. You cannot do something else while it’s processing without extra workarounds.

Capabilities of the Assistants API

Context Support With Threads

In Assistants API, you can create a thread for each user. A thread is like a chat container where you can add many messages. It persists the conversation, so when the user logs in again, you can pass the same thread ID to retrieve what was discussed previously. This is very helpful.

Code Interpreter

There is also a code interpreter. Whenever you ask for some computational task, it runs Python code. It then uses that answer to expand or explain. This makes it very helpful for reversing strings, finding dates, or any Python-based operations.

Retrieval With Files

The Assistants API has retrieval support, letting you upload files and ask questions based on those files. The system handles the vector search process and then uses relevant chunks as context. You can upload up to 20 files in Assistants as context. This is very helpful for referencing company documents, reports, or data sets.

Function Calling

Function calling allows the model to tell you what function to call and what arguments to pass, so that you can get external data (like weather or sales from your own database). It does not call the function automatically; it indicates which function to call and with what parameters, and then you handle that externally.

Asynchronous Workflows

The Assistants API is asynchronous. You can run a request, and you don’t have to wait for it immediately. You can keep checking if it’s done after a few seconds. This is very helpful if you have multiple tasks or want to do other things in parallel.

Focusing on Threads and Messages

A thread is essentially a container that holds all messages in a conversation. OpenAI recommends creating one thread per user as soon as they start using your product. This thread can store any number of messages, so you do not have to manually manage the context window.

  • Unlimited messages. You can add as many user queries and assistant responses as you want.
  • Automatic context handling. The system uses truncation if the conversation grows beyond token limits.
  • Metadata storage. You can store additional data in the thread’s metadata (for example, user feedback or premium status).

Below are code snippets to demonstrate how to create, retrieve, modify, and delete threads.

1. Creating an Assistant

First, you can create an assistant with instructions and tools. For example:

Python
 
from openai import OpenAI
client = OpenAI()

file_input = client.files.create(file=open("Location/to/the/path", "rb"), purpose = "assistants")

file_input.model_dump()


Python
 
assistant = client.beta.assistants.create(
    name="data_science_tutor",
    instructions="This assistant is a data science tutor.",
    tools=[{"type":"code_interpreter", {"type":"retrieval"}}],
    model="gpt-4-1106-preview",
    file_ids=[file_input.id]
)
print(assistant.model_dump())


2. Creating Threads

A thread is like a container that holds the conversation. We can create one thread per user.

Python
 
thread = client.beta.threads.create()
print(thread.model_dump())

Creating threads

  • id – a unique identifier that starts with thr-
  • object – always "thread"
  • metadata – an empty dictionary by default

Why Create Separate Threads?

OpenAI recommends creating one thread per user as soon as they start using your product. This structure ensures that the conversation context remains isolated for each user.

3. Retrieving a Thread

Python
 
retrieved_thread = client.beta.threads.retrieve(thread_id=thread.id)
print(retrieved_thread.model_dump())


This returns a JSON object similar to what you get when you create a thread, including the id, object, and metadata fields.

4. Modifying a Thread

You can update the thread’s metadata to store important flags or notes relevant to your application. For instance, you might track if the user is premium or if the conversation has been reviewed by a manager.

Python
 
updated_thread = client.beta.threads.update(
    thread_id=thread.id,
    metadata={"modified_today": True, "user_is_premium": True}
)
print(updated_thread.model_dump())

Modifying a thread


  • modified_today – a custom Boolean to note whether you changed the thread today
  • user_is_premium – a Boolean flag for user account tier
  • conversation_topic – a string that labels this thread’s main subject

Further Metadata Examples

  • {"language_preference": "English"} – if the user prefers answers in English or another language
  • {"escalated": true} – if the thread needs special attention from a support team
  • {"feedback_rating": 4.5} – if you collect a rating for the conversation

5. Deleting a Thread

When you no longer need a thread, or if a user deletes their account, you can remove the entire conversation container:

Python
 
delete_response = client.beta.threads.delete(thread_id=thread.id)
print(delete_response.model_dump())


Once deleted, you can no longer retrieve this thread or any messages it contained.

Working With Messages

Previously, we focused on threads — the containers that hold conversations in the Assistants API. Now, let’s explore messages, which are the individual pieces of content (questions, responses, or system notes) you add to a thread. We’ll walk through creating messages, attaching files, listing and retrieving messages, and updating message metadata. We’ll also show Python code snippets illustrating these steps.

Messages and Their Role in Threads

What Are Messages?

Messages are mostly text (like user queries or assistant answers), but they can also include file references. Each thread can have many messages, and every message is stored with an ID, a role (for example, "user" or "assistant"), optional file attachments, and other metadata.

Opposite Index Order

Unlike chat completions, where the first message in the list is the earliest, here, the first message you see in the array is actually the most recent. So, index 0 corresponds to the newest message in the thread.

Annotations and File Attachments

Messages can include annotations, for instance, if a retrieval step references certain files. When using a code interpreter, any new files generated may also appear as part of the message annotations.

Create a Message in a Thread

Messages are added to a thread. Each message can be a user message or an assistant message. Messages can also contain file references.

Before adding messages, we need a thread. If you do not already have one:

Python
 
# Create a new thread
new_thread = client.beta.threads.create()
print(thread.model_dump())  # Shows the thread's detailspython


Python
 
# Create a new message in the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id, 
    role="user",
    content="ELI5: What is a neural network?",
    file_ids=[file_input.id]  # Passing one or more file IDs
)
print(message.model_dump())


Here, you can see:

  • Message ID – unique identifier starting with msg
  • Role – user, indicating this is a user input
  • File attachments – the file_ids list includes any referenced files
  • Annotations – empty at creation, but can include details like file citations if retrieval is involved
  • Metadata – a placeholder for storing additional key-value pairs

List Messages in a Thread

To list messages in a thread, use the list method. The limit parameter determines how many recent messages to retrieve.

Now, let’s try to list all the messages:

You will see only the most recent messages. For instance, if we’ve added just one message, the output will look like:

Python
 
messages_list = client.beta.threads.messages.list(
    thread_id=thread.id, 
    limit=5
)
for msg in messages_list.data:
    print(msg.id, msg.content)


If there are multiple messages, the system works like a linked list:

  • The first ID points to the newest message.
  • The last ID points to the earliest message.

Retrieve a Specific Message

Python
 
retrieved_msg = client.beta.threads.messages.retrieve(
    thread_id=new_thread.id,
    message_id=message.id
)
print(retrieved_msg.model_dump())


Retrieve a Message File

Now, let’s retrieve a message file:

This provides the file’s metadata, including its creation timestamp.

Python
 
files_in_msg = client.beta.threads.messages.files.list(
    thread_id=new_thread.id,
    message_id=message.id
)
print(files_in_msg.model_dump())

Retrieving the message file

Modify a Message

Python
 
updated_msg = client.beta.threads.messages.update(
    thread_id=new_thread.id,
    message_id=message.id,
    metadata={"added_note": "Revised content"}
)
print(updated_msg.model_dump())


Delete a Message

Python
 
deleted_msg = client.beta.threads.messages.delete(
    thread_id=new_thread.id,
    message_id=message.id
)
print(deleted_msg.model_dump())


We have seen that chat completion models have no memory concept, are bad at computational tasks, cannot process files directly, and are not asynchronous. Meanwhile, Assistants API has context support with threads, code interpreter for computational tasks, retrieval for files, function calling for external data, and it also supports asynchronous usage.

In this blog, we focused on how to create, list, retrieve, modify, and delete threads and messages. We also saw how to handle file references within messages. In the next session, we will learn more about runs, which connect threads and assistants to get actual outputs from the model.

I hope this is helpful. Thank you for reading!

Let’s connect on LinkedIn!

Further Reading

  • Where did multi-agent systems come from?
  • Summarising Large Documents with GPT-4o
  • How does LlamaIndex compare to LangChain in terms of ease of use for beginners
  • Pre-training vs. Fine-tuning [With code implementation]
  • Costs of Hosting Open Source LLMs vs Closed Sourced (OpenAI)
  • Embeddings: The Back Bone of LLMs
  • How to Use a Fine-Tuned Language Model for Summarization
API Python (language) large language model

Published at DZone with permission of Mohammed Talib. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Chat Completion Models vs OpenAI Assistants API
  • Parent Document Retrieval (PDR): Useful Technique in RAG
  • Optimizing Search Precision With Self-Querying Retrieval (SQR) and Langchain
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!