DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Start Coding With Google Cloud Workstations
  • Type Variance in Java and Kotlin
  • From Indicators to Insights: Automating IOC Enrichment Using Python and Threat Feeds
  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever

Trending

  • The Developer's Guide to Context-Aware AI: When Your Code Documentation Becomes Intelligent
  • Engineering LLMOps: Building Robust CI/CD Pipelines for LLM Applications on Google Cloud
  • Querying Without a Query Language
  • Retesting Best Practices for Agile Teams: A Quick Guide to Bug Fix Verification
  1. DZone
  2. Coding
  3. Languages
  4. Install Llama-Cpp-Python With GPU Support

Install Llama-Cpp-Python With GPU Support

This article is a walk-through to install the llama-cpp-python package with GPU capability (CUBLAS) to load models easily on the GPU.

By 
Manish Kovelamudi user avatar
Manish Kovelamudi
·
May. 01, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
10.1K Views

Join the DZone community and get the full member experience.

Join For Free

If you are looking for a step-wise approach to installing the llama-cpp-python package, you are in the right place. This guide summarizes the steps required for installation.

Before we install, are you wondering why we need to install this package separately with GPU capability?

This package gives us a class or interface (LlamaCPP) to create a model instance or object, primarily for pre-trained LLM models.

By default, even if you have a Nvidia GPU in your system with all the CUDA compilers and packages installed, this package only installs CPU capability.

Installing with GPU capability enabled eases the computation of LLMs (Larger Language Models) by automatically transferring the model onto GPU.

In this guide, detailed steps are provided to install this package using cuBLAS (GPU-accelerated library) provided by Nvidia.

Tested System Configuration

  • System — Azure VM
  • OS — Ubuntu 20.04
  • LLM model used — Mistral -7B

Prerequisites

  1. Ensure the Nvidia CUDA toolkit is installed, the minimum required package version is 12.2
  • Download the required package from Nvidia's official website and install it.
  • Verify the successful installation of the toolkit by using this command nvidia-smi. This command should detect your GPU.
  • Also, verify in the source folder by checking in the /usr/local/ directory, there should be cuda-12.2 directory created and inside all the required files will be created.

2. Install GCC and G++ compilers to compile and install packages

  • Add the gcc repository using the below command.
  • sudo add-apt-repository ppa:ubuntu-toolchain-r/test
  • Install gcc and g++ compilers using the command below.
  • sudo apt install gcc-11 g++-11 (minimum required version is 11 for gcc and g++ compilers)
  • Update alternatives using the below command to change default version 11
  • sudo update-alternatives — install /usr/bin/gcc gcc /usr/bin/gcc-11 60 — slave /usr/bin/g++ g++ /usr/bin/g++-11
  • Check the installed versions of GCC and G++ for correct installation.
  • gcc — version # This should printout gcc version as 11.4.0
  • g++ — version # This should printout gcc version as 11.4.0

3. Install Langchain and cmake packages using the below command

Python
 
pip install langchain cmake


Llama-CPP Installation

  • By default, the LlamaCPP package tries to pick up the default version available on the VM. If there are multiple CUDA versions, a specific version needs to be mentioned.
  • Use the below command for the installation of the package.
Python
 
CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.2 -DCUDAToolkit_ROOT=/usr/local/cuda-12.2 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.2/lib64" FORCE_CMAKE=1 pip install llama-cpp-python - no-cache-dir


Verifying Installation

Verify by creating an instance of the LLM model by enabling verbose = True parameter.

Python
 
from langchain.llm import LlamaCpp
model = LlamaCpp(model_path, n_gpu_layers = -1, verbose = True)


n_gpu_layers = -1 is the main parameter that transfers available computation layers onto the GPU. Alternatively, you can set the number of layers you want to transfer, but -1 will automatically calculate and transfer them.

verbose = True prints the models details and parameters

On the terminal console, when the model is loaded, check for the following lines.

Device: <your-gpu-name> (Ex: Device 0: Tesla T4)

BLAS = 1 (indicates that the model is loaded onto the GPU)

Comparison

LlamaCPP With CPU

Time taken to load Mistral-7B model: 1 min (approx)

Time taken to generate a response to a query: 20 min (approx)

LlamaCPP With GPU

Time taken to load Mistral-7B model: 30 sec(approx)

Time taken to generate a response to query: 30 sec (approx)

Conclusion

Based on the load time and response generation, there is a significant performance difference when we use llama-cpp-python package with GPU support. Consider installing this package for better performance, if you have GPU/s attached to your system.

CPU time CUDA Python (language)

Published at DZone with permission of Manish Kovelamudi. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Start Coding With Google Cloud Workstations
  • Type Variance in Java and Kotlin
  • From Indicators to Insights: Automating IOC Enrichment Using Python and Threat Feeds
  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook