DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • My Dive into Local LLMs, Part 2: Taming Personal Finance with Homegrown AI (and Why Privacy Matters)
  • Master AI Development: The Ultimate Guide to LangChain, LangGraph, LangFlow, and LangSmith
  • Stop Prompt Hacking: How I Connected My AI Agent to Any API With MCP
  • Toward Indigenous AI: A Critical Analysis of BharatGen’s Role in Data Sovereignty and Language Equity

Trending

  • AI-Powered Ransomware and Malware Detection in Cloud Environments
  • Simplifying Code Migration: The Benefits of the New Ampere Porting Advisor for x86 to Arm64
  • How We Broke the Monolith (and Kept Our Sanity): Lessons From Moving to Microservices
  • Designing Configuration-Driven Apache Spark SQL ETL Jobs with Delta Lake CDC
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. My Dive into Local LLMs, Part 1: From Alexa Curiosity to Homegrown AI

My Dive into Local LLMs, Part 1: From Alexa Curiosity to Homegrown AI

I set up Ollama on Ubuntu to run LLMs locally. CPU performance was a drag, but when GPU acceleration kicked in, it was a total "whoa" moment.

By 
Praveen Chinnusamy user avatar
Praveen Chinnusamy
·
Jun. 27, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
1.6K Views

Join the DZone community and get the full member experience.

Join For Free

So, I'm a software dev manager over at the Alexa team, and being around the Alexa journey, you kinda get a front-row seat to some seriously cool AI. It got me thinking, "Hey, I wanna poke at these LLMs on my own machine, see what makes 'em tick without needing a massive server farm." This is basically my log of figuring out how to get a personal LLM setup going.

In this article:

  • Setting up Ollama on Ubuntu with GPU acceleration
  • Understanding CPU vs GPU performance impact
  • Choosing the right model for your hardware
  • First steps toward building local AI applications

The Why

Honestly, it was pure curiosity at first. Working with Alexa, you see the potential, but I wanted that hands-on feel. The idea of having a model running locally, something I could tinker with, feed my own stuff, and just play – that was the goal. No big corporate project, just me, my laptop, and some AI.

Getting Started: The Overwhelming Options

First off, wow, there's a LOT out there. You got your Ollama, LM Studio, GPT4All, Docker setups... it's a bit of a jungle. My main thing was, I wanted something fairly straightforward for a first go. I wasn't trying to build Rome in a day, y'know?

I ended up leaning towards Ollama for my initial experiments because it seemed pretty popular for command-line folks and had a simple ollama pull llama3 kind of vibe to get models. LM Studio also looked cool with its GUI, especially for just downloading and chatting.

My Ubuntu Setup: The Nitty-Gritty

My daily driver is an Ubuntu machine (22.04 LTS with an AMD Ryzen 7 5800X), so that's where I decided to build my little AI playground. This is where the rubber really met the road, moving from theory to a running process in my own terminal.

First thing's first: resources. You can't run a V8 engine on a lawnmower's gas tank. I popped open htop to get a feel for my RAM and CPU situation. With 32GB of RAM and 8 cores/16 threads, I felt pretty good about starting, but I knew the real bottleneck could be the GPU.

Installing Ollama on Ubuntu was a breeze, which I really appreciated. It's a single command, which you love to see:

Shell
 
# Installs Ollama on Linux
curl -fsSL https://ollama.ai/install.sh | sh


After running that, a quick ollama --version confirmed it was ready to go. My initial test was to pull a smaller model to see how it ran on the CPU alone.

Shell
 
ollama pull phi3


Then, ollama run phi3. And... it worked! But let's be honest, the performance was, let's say, deliberate. You could practically feel the CPU cores churning away, and the tokens trickled out onto the screen. It was cool to see it running, but not exactly a "conversational" speed. This confirmed my suspicion: to do this right, I needed to get my NVIDIA GPU in the game.

Performance Reality Check

Here's what I was seeing:

Model Hardware Tokens/Second User Experience
Phi-3 CPU only 2-3 "Deliberate" - watching paint dry
Phi-3 RTX 3080 45-50 Snappy responses
Llama 3 CPU only 0.5-1 Time for coffee... or three
Llama 3 RTX 3080 25-30 Natural conversation flow


This is where the real fun began. For Ollama to leverage the GPU on Ubuntu, it's not always automatic. You have to make sure the whole stack is in place.

GPU Setup: The Missing Pieces

NVIDIA Drivers: I had them installed, but if you don't, nvidia-smi is your moment of truth. If that command fails, you've got your first mission. The sudo ubuntu-drivers autoinstall command is usually a good, low-fuss way to handle it.

Shell
 
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+===============================+======================+======================+
| 0  NVIDIA RTX 3080     Off  | 00000000:2B:00.0 Off |                  N/A |
| 30%   45C    P8    20W / 320W |    364MiB / 10240MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+


NVIDIA Container Toolkit: This was the key. Since Ollama uses containers under the hood to manage environments, you need to install the NVIDIA Container Toolkit so that the container can actually see and use the GPU. This involves adding their repository and installing a few packages.

Shell
 
# Add NVIDIA Container Toolkit repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker


After getting that set up and restarting the Docker daemon, I ran nvidia-smi again. Everything looked good. Now for the real test. I decided to pull a more powerful model this time.

Shell
 
ollama pull llama3


Then, the magic command:

Shell
 
ollama run llama3


The difference was night and day. Suddenly, responses were snappy. The text flowed. The model felt alive. This wasn't just a technical demo anymore; it was a genuinely usable tool running entirely on the hardware under my desk. That was the "whoa" moment I was looking for.

My Basic "Get it Running" Flow

Here's kinda how it went down in a nutshell, and how you could probably do it too. Total setup time: about 30-45 minutes if everything goes smoothly.

AI Curiosity Flow

Model Recommendations by Use Case

Based on my experiments, here's what I'd recommend:

Limited RAM (8-16GB)

  • Phi-3: Fast and capable for basic tasks
  • Mistral 7B: Good balance of performance and capability
  • Llama 2 7B: Solid all-rounder

Good GPU (8GB+ VRAM)

  • Llama 3 8B: My current favorite - great performance
  • Mistral 7B: Runs beautifully with room to spare
  • CodeLlama 7B: If you're focusing on coding tasks

Powerful Setup (24GB+ VRAM)

  • Llama 3 70B: The full experience
  • Mixtral 8x7B: Impressive mixture-of-experts model
  • Yi 34B: Great for longer contexts

Challenges and What I Bumped Into

Model Sizes: These things can be chunky! Some models need a good bit of RAM. My setup is decent, but you've got to pay attention to those model requirements. A good habit is to run ollama list to see what you've got downloaded and their sizes.

Prompting is an Art: Just like with the big guys, getting the right answer means asking the right way. It's a skill in itself, and I'm still figuring that bit out.

Which Model When?: So many models, each good at different things (coding, chat, summarization). It's a bit of trial and error to find your favorites for different tasks. Phi-3 is great for quick tasks on low-spec hardware, while Llama 3 is a fantastic all-rounder if you have the GPU power.

Quick Troubleshooting Tips

  • nvidia-smi not found? Run sudo ubuntu-drivers autoinstall and reboot
  • Ollama using CPU instead of GPU? Check docker run --gpus all support with: docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
  • Out of memory errors? Try smaller models or quantized versions: ollama pull llama3:7b-q4_0
  • Model won't download? Check disk space - some models need 20GB+ free space

What I Used (Mostly)

  • Ollama: For running the models locally on my Ubuntu box
  • Llama 3 / Phi-3: Seemed like good starting points, not too massive but very capable
  • Python & LangChain (a little bit): Started looking into how LangChain can talk to these local models because that opens up building actual little apps

Was it Worth It? Heck Yeah

Even though it's just a personal sandbox, understanding what it takes to run these LLMs locally has been super insightful. It's one thing to use an API, but it's quite another to configure the drivers, manage the models, and see the performance difference firsthand. It definitely gives me a different perspective on the tech we work with every day.

Next Steps for Me

Probably dive deeper into using LangChain with a local model for a simple RAG (Retrieval Augmented Generation) setup with my own documents. That seems like the next fun challenge. Now that the environment is stable, I might try out a more visual tool like LM Studio or GPT4All to compare the experience.

Actually, I'm already working on something practical - using this setup for personal finance analysis. Imagine having an AI that can categorize your expenses and give you insights, all while keeping your financial data completely private on your own machine. Stay tuned for Part 2!

Getting Started Yourself

If you're in dev and curious about LLMs, I'd say give it a shot. It's easier than you might think to get something basic running, and it's a cool learning curve. Just pick a tool, grab a model, and see what happens!

  1. Minimum specs: 16GB RAM, 6GB+ VRAM GPU (or patience for CPU-only)
  2. Install Ollama: That one-liner command above
  3. Start small: Try Phi-3 first, then work up to larger models
  4. Join the community: The Ollama Discord is super helpful

Helpful Resources

  • Ollama Model Library - Browse available models and their requirements
  • NVIDIA Container Toolkit Docs - Official installation guide
  • LangChain Local LLM Guide - For when you're ready to build apps
AI ubuntu large language model

Opinions expressed by DZone contributors are their own.

Related

  • My Dive into Local LLMs, Part 2: Taming Personal Finance with Homegrown AI (and Why Privacy Matters)
  • Master AI Development: The Ultimate Guide to LangChain, LangGraph, LangFlow, and LangSmith
  • Stop Prompt Hacking: How I Connected My AI Agent to Any API With MCP
  • Toward Indigenous AI: A Critical Analysis of BharatGen’s Role in Data Sovereignty and Language Equity

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: