Zero-Cost AI with Java

Create a zero-cost AI application quickly using Ollama and Java with Spring AI — with no extra costs and full compatibility with other LLMs like OpenAI.

Fernando Boaglio

Mar. 18, 26 · Tutorial

Likes (5)

Comment

Save

4.7K Views

So you have a new AI-based idea and need to create an MVP app to test it?

If your AI knowledge is limited to OpenAI, I have bad news for you… it’s not going to be free.

Even worse, before you deploy your app — while you’re still building and testing locally — yes, you’ll need to spend some money.

More tests? Yes, you can add that cost too.

And guess what?

AI POCs unexpectedly turn into real bills.

This problem scales with your team: more developers, bigger bills =(

That’s when you realize AI has moved from experimentation to a budget line — and how high the cost of production mistakes can be.

You have freemium online options like Groq, but running AI locally is a great way to remove these constraints.

Why Running AI Locally Changes the Game

When we talk about “no cost,” we mean developing your app with:

No token-based pricing
No external API calls
No cloud dependency

When your app runs in the cloud, you need to use paid services.

So how can we solve this problem?

Spring AI is the answer — but we’ll get to that soon.

Let me say this again: by running a local LLM (Large Language Model), your team has nothing to pay. Of course, there are some drawbacks, such as higher CPU/RAM usage on the development machine and some setup time for the local AI environment. But it’s totally worth it.

Ollama: Local LLMs Made Simple

Ollama is an open-source tool designed to run LLMs directly on your local machine (Windows, macOS, or Linux) without needing cloud services. (They also offer a free cloud service, but that’s not the point here.)

Ollama is one of the easiest ways to get started with LLMs such as gpt-oss (yes, the LLM provided by OpenAI!), Gemma 3, DeepSeek-R1, Qwen3, and many more.

Yes, we have Ollama — a great open-source alternative to paid LLM services.

Our quick start is very simple:

Download it — just go to https://ollama.com/download
Download a model — there are many options, but we’ll use a small and powerful model created by Microsoft: Phi-3 (https://ollama.com/library/phi3)

    Shell
   
   ollama pull phi3

Now we have our local AI ready to go.

Let’s test the model:

    Shell
   
   ollama run  phi3 "who are you"

I'm Phi, developed by Microsoft. How can I help you today?

Choosing the Right Model: Why Phi-3?

If we have so many free models available, why start with Phi-3?

Here are a few reasons.

First, the larger the model, the more resources it consumes — and sometimes it’s slower. Picking a small but powerful model is a good way to start. Later, you should definitely test other models.

Another powerful and compact model is “ministral-3.” The Ministral 3 family is designed for edge deployment and can run on a wide range of hardware.

If you’re new to Ollama, though, Phi-3 is a great starting point. It’s not the best model overall, but it’s one of the best to begin with.

Spring Boot and Spring AI: A Natural Fit

In the Java world, we have other options, but just like Spring Boot, Spring AI is becoming a mature and reliable choice for AI applications.

You can start with Ollama and later switch to OpenAI — or even use multiple models in your app. No problem. Spring AI can handle it easily.

This frees you from manually handling all LLM APIs using RestTemplate or RestClient. Spring AI does that for you.

We won’t build a complex app here. Instead, we’ll create a very simple one to demonstrate how powerful Spring AI is.

We’ll build an app with an API that generates a joke — no input required.

I recommend IntelliJ Community Edition, but you can use any IDE.

The easiest way is to go to https://start.spring.io and add the Ollama dependency. Or you can create a plain Spring Boot MVC app and add this to your pom.xml:

    XML
   
 

   <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
        <dependency>
            <groupId>org.springdoc</groupId>
            <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
            <version>${springdoc-openapi-starter-webmvc-ui.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-ollama</artifactId>
        </dependency>
  

Our app needs just two more files.

First, configure which Ollama model we’ll use in application.yaml:

    YAML
   
 

   spring:
  application:
   name: zerocostapp
  ai:
    ollama:
      chat:
        model: phi3
  

Make sure the Ollama service is running and the Phi-3 model is installed.

Building a Simple “Jokes as a Service” API

Now we create an API to provide our “Jokes as a Service.”

Spring AI provides the ChatClient class, which communicates with LLMs and gives developers a Builder to define inputs.

    Java
   
 

   @RestController
public class JokesAPI {
  
    @Autowired
    private ChatClient.Builder chatClient;
    @GetMapping("/api/new-joke")
    public String process() {

        return chatClient
                .build()
                .prompt("Tell me a joke")
                .call()
                .content();
    }
}
  

In this case, we use a fixed prompt that asks the LLM to tell a joke. The response is converted to a String and returned by the API.

Calling it with curl:

    Shell
   
   curl http://localhost:8080/api/new-joke

Why don't scientists trust atoms? 
Because they make up everything, even jokes!

That’s it.

You now have a fully functional LLM integrated into a Java application. =)

Architecture Overview

Let’s recap the flow:

HTTP client (curl)
Spring REST controller (JokesAPI)
Spring AI (ChatClient)
Ollama runtime
Local LLM model (Phi-3)

When your app is deployed elsewhere, by changing dependencies and configuration properties, the flow could become:

HTTP client (customer)
Spring REST controller (JokesAPI)
Spring AI (ChatClient)
Cloud LLM runtime
Cloud-hosted LLM model

Limitations and Trade-offs

If you encounter performance issues, be careful about drawing conclusions based only on local tests. You may want to run paid remote tests for comparison.

As with any online application, security matters. This sample does not expose user input, but whenever you allow input to reach an AI model, you risk prompt injection attacks.

What’s Next: From Jokes to Real Applications

You might need logging, chat history per user, or database storage.

Don’t worry — Spring AI can handle this with just a few lines of code.

You can also enrich your model with additional documents to improve response quality. This is called RAG (Retrieval-Augmented Generation), and Spring AI supports it.

If you need to call external services — or expose your service to other LLMs — MCP (Model Context Protocol) is an emerging standard created by Anthropic. The Spring AI team helps maintain its Java implementation.

This is just a glimpse into the vast world of Ollama models and Spring AI. I hope you enjoyed it!

AI Java (programming language) Spring Boot large language model

Opinions expressed by DZone contributors are their own.

Related

Trending