Building an LLM-Based Agent: Step 0

In this post, we successfully ran a simple LLM setup both through an API and locally, and we learned practical ways to improve output quality.

Trần Ngọc Minh

Mar. 23, 26 · Tutorial

Likes (0)

Comment

Save

1.2K Views

Newcomers to the LLM world often start by "chatting for fun," but very quickly they run into a bigger question about how to make it not only answer, but actually do work. From that moment, they begin to touch on the idea of an agent, meaning a system that can accept a goal, break down tasks, use tools, and self-check results to get real work done.

This first post serves as the foundation for the entire series. We will not go far yet and focus on one concrete thing: building a minimal yet correct chat framework that we can later upgrade into an agent without rewriting everything from scratch. Once this framework is stable, we will see that two factors determine quality from the very beginning, namely how we package context using messages and how we write prompts in a clear and disciplined way.

From that chat framework, the next posts will expand along a natural path. We will gradually attach the necessary components so the agent becomes a real working system, starting from role definition and behavioral rules, adding memory and knowledge, enabling tool calling, organizing a planning and execution loop, and finally learning how to evaluate and calibrate the agent so it stays stable in real-world use.

1. LLM and Why the Chat/Messages Format Matters

An LLM, or large language model, is a language generation model. When we provide a request, it produces an answer in text form. The key point is that an LLM reacts strongly to context, meaning the same question can yield different answers if the context changes. That is why many modern systems do not send a single standalone question, but instead send a conversation structure called messages.

A common syntax for messages is a list of objects, where each object has a role and content, such as {"role": "system", "content": "..."} and {"role": "user", "content": "..."}. The system role acts like the communication rules, where we set the tone, level of detail, output format, and the correctness criteria we want the model to follow. The user role contains the user’s concrete request, meaning the question or task to solve. When we need more consistent dialogue, we can add earlier messages so the model sees the history and follows the conversational thread.

When we use chat based on messages, we are effectively shaping the model’s behavior before asking the question. This is an important foundation for building serious conversational systems such as a teaching assistant, a chatbot, or an agent, because we control the context and response rules rather than leaving everything to a single isolated question.

2. Example About Connecting to an LLM via API With Python

Next, we will use Python and OpenAI as an example to illustrate how an LLM works in practice, including loading a key, creating a client, sending messages, and reading the answer.

Step 1: Prepare the Environment and the Secret Key

We install two libraries, one for calling the API and one for loading environment variables from a .env file.

    Shell
   
   pip install openai python-dotenv

Create a .env file in the same folder as your code.

    Shell
   
   LLM_KEY="YOUR_KEY_HERE"

We use the variable LLM_KEY to store the key from the provider.

Step 2: Write a Simple Chat Program

Create a file named chat_demo.py.

    Python
   
 

   import os
from dotenv import load_dotenv
from openai import OpenAI
 
def read_secret() -> str:
    """
    Load environment variables from .env and read the API key.
    Keeping this in a separate function makes errors easier to detect
    and makes the code easier to reuse.
    """
    load_dotenv()
    token = os.getenv("LLM_KEY")  # you can switch to OPENAI_API_KEY if you prefer
    if not token:
        raise RuntimeError("LLM_KEY is missing in the .env file")
    return token
 
def make_llm() -> OpenAI:
    """
    Create a client to call the LLM.
    Think of the client as a 'phone' you use to reach the LLM service.
    """
    return OpenAI(api_key=read_secret())
 
def ask_llm(client: OpenAI, question: str, *, temp: float = 0.3) -> str:
    """
    Send a request in chat format.
    - messages: the conversation context package
    - temperature: randomness control (lower = more stable)
    """
    dialog = [
        {"role": "system", "content": "You are a programming TA. Answer clearly, briefly, and add a small example when needed."},
        {"role": "user", "content": question},
    ]
 
    result = client.chat.completions.create(
        model="gpt-4.1-mini",   # example only; change to a model you have access to
        messages=dialog,
        temperature=temp,
    )
 
    # choices is a list of candidate replies; we usually take the first one
    return result.choices[0].message.content
 
if __name__ == "__main__":
    llm = make_llm()
    reply = ask_llm(llm, "Explain 'temperature' in LLMs in 2 sentences and 1 example.", temp=0.2)
    print(reply)
  

Why Do We Need `read_secret()`?

Beginners often fail to run code simply because they forgot the key. This function catches that early and reports a clear error. Separating it also lets us change how we load the key later, for example from a file or from the operating system, without rewriting the entire program.

Why Separate `make_llm()`?

In a real project, we will call the LLM from multiple places. Keeping client creation in one function makes the code cleaner and easier to maintain.

What Matters Most Inside `ask_llm()`?

The key part is dialog and the model parameters.

The dialog controls whether the answer “feels like a teaching assistant.”
Temperature is a tuning knob, where lowering it increases stability and raising it increases variety.
The model is the model we choose.

Run the program:

    Shell
   
   python chat_demo.py

3. Running an LLM Locally

After we can chat via API, a very natural next step is to run a model on our own machine so we can test prompts faster or work offline. We can use LM Studio or a similar tool to download a model, try it in a chat UI, and enable server mode so our program can call localhost.

With this approach, the code logic does not change. We still send messages and still read choices[0].message.content. We only change where we send the request, moving from an online service to a local server.

For example, assume the local server is running at http://localhost:1234/v1.

    Python
   
 

   from openai import OpenAI
 
def local_llm() -> OpenAI:
    """
    Create a client that points to the local LLM server.
    The api_key here is only a placeholder example.
    """
    return OpenAI(base_url="http://localhost:1234/v1", api_key="local")
 
def ask_local(question: str) -> str:
    client = local_llm()
    res = client.chat.completions.create(
        model="local-any",  # local model name may differ depending on your tool
        messages=[
            {"role": "system", "content": "You are a tutor. Answer clearly and include a small example."},
            {"role": "user", "content": question},
        ],
        temperature=0.4,
    )
    return res.choices[0].message.content
 
if __name__ == "__main__":
    print(ask_local("Write one example that explains delimiters in a prompt in 3 sentences."))
  

Why do we only need to change base_url? Because the "communication contract" remains the same, which is sending a chat request to an endpoint. When the local server supports a compatible endpoint, the code structure stays intact.

4. How to Make the Model Follow Your Requirements

At this point, we meet a problem that matters more than the code, which is how you ask the question. Many beginners ask in a vague way, then feel disappointed because the answer is long, generic, or off target.

The mindset is similar to how a teacher writes an exam prompt. You do not just say "do the exercise," you make the requirements explicit.

What output format do you want?
How long should it be?
What counts as correct?
If there is input data, where is it and which part should be processed?

4.1. Before and After Example, From Vague to Clear

A vague prompt could be:

"Explain prompt engineering."

The result often rambles because the model does not know what we want, whether we want a summary or a deep explanation, who the audience is, or how long it should be.

A clearer prompt could be:

"You are a teaching assistant. Explain prompt engineering for beginners in exactly 4 sentences. In the last sentence, provide one sample prompt."

The output will feel different, because it becomes concise, length-controlled, and includes an example.

If we need a stable format for an article or a learning note, we should lock the structure.

"Answer using this template
Definition: …
Why it matters: …
Sample prompt: …"

The model follows this far more easily than if we only say “explain.”

4.2. Delimiters as an Anti-Distraction Tool for Long Inputs

A delimiter is a boundary line between the data and the instructions.

For example:

"Summarize the passage inside … into 3 sentences. Do not add ideas beyond that passage."

Without delimiters, the model sometimes cannot tell what data is and what is an instruction. For beginners, delimiters reduce errors very quickly.

4.3. Step-by-Step Requirements for Multi-Stage Tasks

If we want results that go from simple to complete, we can ask in steps.

For example:

"Step 1, summarize in one sentence.
Step 2, convert it into a 5-line checklist.
Step 3, give one applied example."

This is especially suitable for tutorials because readers can follow easily, and we can control output quality.

4.4. Provide Output Examples for Few-Shot Formatting

If we want the model to consistently answer using one template, we should provide a short example.

For example:

"Answer in the same format as this example, which is only a format guide
Concept: …
Example: …
Common mistakes: …"

This is a good way to keep writing consistently across many topics.

Conclusion

In this post, we successfully ran a simple LLM setup both through an API and locally, and we learned practical ways to improve output quality. In the next posts, we will gradually attach the required components so the agent becomes a real working system, including role definition and behavioral rules, adding memory and knowledge, enabling tool calling, organizing a planning and execution loop, and finally evaluating and calibrating the agent so it remains stable in real-world use.

large language model

Opinions expressed by DZone contributors are their own.

Related

Trending

Building an LLM-Based Agent: Step 0

In this post, we successfully ran a simple LLM setup both through an API and locally, and we learned practical ways to improve output quality.

1. LLM and Why the Chat/Messages Format Matters

2. Example About Connecting to an LLM via API With Python

Step 1: Prepare the Environment and the Secret Key

Step 2: Write a Simple Chat Program

Why Do We Need read_secret()?

Why Separate make_llm()?

What Matters Most Inside ask_llm()?

3. Running an LLM Locally

4. How to Make the Model Follow Your Requirements

4.1. Before and After Example, From Vague to Clear

4.2. Delimiters as an Anti-Distraction Tool for Long Inputs

4.3. Step-by-Step Requirements for Multi-Stage Tasks

4.4. Provide Output Examples for Few-Shot Formatting

Conclusion

Related

Partner Resources

Why Do We Need `read_secret()`?

Why Separate `make_llm()`?

What Matters Most Inside `ask_llm()`?