DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • Building a Retrieval-Augmented Generation (RAG) System in Java With Spring AI, Vertex AI, and BigQuery
  • Introducing SmallRye LLM: Injecting Langchain4J AI Services
  • Leverage Amazon BedRock Chat Model With Java and Spring AI

Trending

  • Build a GitHub Slack Bot With AWS Bedrock and MCP, Part 1
  • How to Build an Agentic AI SRE Co-Pilot for Incident Response
  • Amazon Quick: AWS's Agentic Workspace, Explained for Engineers
  • Compliance Automated Standard Solution (COMPASS), Part 11: Compliance as Code, the OSCAL MCP Server Way
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Zero-Cost AI with Java

Zero-Cost AI with Java

Create a zero-cost AI application quickly using Ollama and Java with Spring AI — with no extra costs and full compatibility with other LLMs like OpenAI.

By 
Fernando Boaglio user avatar
Fernando Boaglio
·
Mar. 18, 26 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
4.5K Views

Join the DZone community and get the full member experience.

Join For Free

So you have a new AI-based idea and need to create an MVP app to test it?

If your AI knowledge is limited to OpenAI, I have bad news for you… it’s not going to be free.

Even worse, before you deploy your app — while you’re still building and testing locally — yes, you’ll need to spend some money.

More tests? Yes, you can add that cost too.

And guess what?

AI POCs unexpectedly turn into real bills.

This problem scales with your team: more developers, bigger bills =(

That’s when you realize AI has moved from experimentation to a budget line — and how high the cost of production mistakes can be.

You have freemium online options like Groq, but running AI locally is a great way to remove these constraints.

Why Running AI Locally Changes the Game

When we talk about “no cost,” we mean developing your app with:

  • No token-based pricing
  • No external API calls
  • No cloud dependency

When your app runs in the cloud, you need to use paid services.

So how can we solve this problem?

Spring AI is the answer — but we’ll get to that soon.

Let me say this again: by running a local LLM (Large Language Model), your team has nothing to pay. Of course, there are some drawbacks, such as higher CPU/RAM usage on the development machine and some setup time for the local AI environment. But it’s totally worth it.

Ollama: Local LLMs Made Simple

Ollama is an open-source tool designed to run LLMs directly on your local machine (Windows, macOS, or Linux) without needing cloud services. (They also offer a free cloud service, but that’s not the point here.)

Ollama is one of the easiest ways to get started with LLMs such as gpt-oss (yes, the LLM provided by OpenAI!), Gemma 3, DeepSeek-R1, Qwen3, and many more.

Yes, we have Ollama — a great open-source alternative to paid LLM services.

Our quick start is very simple:

  1. Download it — just go to https://ollama.com/download
  2. Download a model — there are many options, but we’ll use a small and powerful model created by Microsoft: Phi-3 (https://ollama.com/library/phi3)
Shell
 
ollama pull phi3


Now we have our local AI ready to go.

Let’s test the model:

Shell
 
ollama run  phi3 "who are you"

I'm Phi, developed by Microsoft. How can I help you today?


Choosing the Right Model: Why Phi-3?

If we have so many free models available, why start with Phi-3?

Here are a few reasons.

First, the larger the model, the more resources it consumes — and sometimes it’s slower. Picking a small but powerful model is a good way to start. Later, you should definitely test other models.

Another powerful and compact model is “ministral-3.” The Ministral 3 family is designed for edge deployment and can run on a wide range of hardware.

If you’re new to Ollama, though, Phi-3 is a great starting point. It’s not the best model overall, but it’s one of the best to begin with.

Spring Boot and Spring AI: A Natural Fit

In the Java world, we have other options, but just like Spring Boot, Spring AI is becoming a mature and reliable choice for AI applications.

You can start with Ollama and later switch to OpenAI — or even use multiple models in your app. No problem. Spring AI can handle it easily.

This frees you from manually handling all LLM APIs using RestTemplate or RestClient. Spring AI does that for you.

We won’t build a complex app here. Instead, we’ll create a very simple one to demonstrate how powerful Spring AI is.

We’ll build an app with an API that generates a joke — no input required.

I recommend IntelliJ Community Edition, but you can use any IDE.

The easiest way is to go to https://start.spring.io and add the Ollama dependency. Or you can create a plain Spring Boot MVC app and add this to your pom.xml:

XML
 
<dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
        <dependency>
            <groupId>org.springdoc</groupId>
            <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
            <version>${springdoc-openapi-starter-webmvc-ui.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-ollama</artifactId>
        </dependency>


Our app needs just two more files.

First, configure which Ollama model we’ll use in application.yaml:

YAML
 
spring:
  application:
   name: zerocostapp
  ai:
    ollama:
      chat:
        model: phi3


Make sure the Ollama service is running and the Phi-3 model is installed.

Building a Simple “Jokes as a Service” API

Now we create an API to provide our “Jokes as a Service.”

Spring AI provides the ChatClient class, which communicates with LLMs and gives developers a Builder to define inputs.

Java
 
@RestController
public class JokesAPI {
  
    @Autowired
    private ChatClient.Builder chatClient;
    @GetMapping("/api/new-joke")
    public String process() {

        return chatClient
                .build()
                .prompt("Tell me a joke")
                .call()
                .content();
    }
}


In this case, we use a fixed prompt that asks the LLM to tell a joke. The response is converted to a String and returned by the API.

Calling it with curl:

Shell
 
curl http://localhost:8080/api/new-joke

Why don't scientists trust atoms? 
Because they make up everything, even jokes!


That’s it.

You now have a fully functional LLM integrated into a Java application. =)

Architecture Overview

Let’s recap the flow:

  1. HTTP client (curl)
  2. Spring REST controller (JokesAPI)
  3. Spring AI (ChatClient)
  4. Ollama runtime
  5. Local LLM model (Phi-3)

When your app is deployed elsewhere, by changing dependencies and configuration properties, the flow could become:

  1. HTTP client (customer)
  2. Spring REST controller (JokesAPI)
  3. Spring AI (ChatClient)
  4. Cloud LLM runtime
  5. Cloud-hosted LLM model

Limitations and Trade-offs

If you encounter performance issues, be careful about drawing conclusions based only on local tests. You may want to run paid remote tests for comparison.

As with any online application, security matters. This sample does not expose user input, but whenever you allow input to reach an AI model, you risk prompt injection attacks.

What’s Next: From Jokes to Real Applications

You might need logging, chat history per user, or database storage.

Don’t worry — Spring AI can handle this with just a few lines of code.

You can also enrich your model with additional documents to improve response quality. This is called RAG (Retrieval-Augmented Generation), and Spring AI supports it.

If you need to call external services — or expose your service to other LLMs — MCP (Model Context Protocol) is an emerging standard created by Anthropic. The Spring AI team helps maintain its Java implementation.

This is just a glimpse into the vast world of Ollama models and Spring AI. I hope you enjoyed it!

AI Java (programming language) Spring Boot large language model

Opinions expressed by DZone contributors are their own.

Related

  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • Building a Retrieval-Augmented Generation (RAG) System in Java With Spring AI, Vertex AI, and BigQuery
  • Introducing SmallRye LLM: Injecting Langchain4J AI Services
  • Leverage Amazon BedRock Chat Model With Java and Spring AI

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook