Friday, November 8, 2024

Check if Ollama is Using Your GPU or CPU


How to Check if Ollama is Using Your GPU or CPU

If you're aiming to get the most out of Ollama, knowing whether it’s using your GPU or CPU can make a significant difference in performance. This blog post is part of a series where I dive into ways to boost your Ollama experience, and today, we're focusing on how to determine if Ollama is actually using your GPU.

Why Does It Matter?

Using your GPU for inference and completions with local generative AI models can greatly enhance speed. However, I've occasionally seen cases where the CPU performs faster than the GPU. If that's happening, it's likely an indicator that you should switch to a different model.

Another reason Ollama might not be using your GPU is if your graphics card isn’t officially supported. If you’re in this boat, don’t worry—I’ve got a video for that too.

Four Ways to Check If Ollama is Using Your GPU

Let’s walk through the steps you can take to verify whether Ollama is using your GPU or CPU.

  1. Use the ollama ps Command
    This command gives you a quick answer. Simply type ollama ps in the terminal, and it will show whether the model is loaded onto your CPU or GPU. If you see “100% CPU,” then it’s clear that your GPU isn’t being utilized.

  2. Check the Ollama Log File
    Ollama keeps logs that can provide more insight. Right-click on Ollama in your system tray, click "View logs" to open up the log file. If you see a message saying, “no compatible GPUs were discovered,” then it’s a good indication Ollama defaulted to using your CPU.

  3. Run ollama serve
    Running Ollama in server mode without entering chat mode can also give you clues. Make sure and quit Ollama if it's already running, then open a command prompt and type ollama serve. The terminal might display a message about GPU compatibility, specifically noting whether your GPU (such as an AMD card) isn’t supported. This can prevent Ollama from utilizing your GPU and instead default to CPU use.

  4. Monitor Resource Usage with a Performance Tool
    A performance monitoring tool like AMD Adrenaline (or whatever came with your GPU/CPU, or Operating System) will show you real-time usage. If Ollama is using your CPU, you’ll see high spikes in CPU usage and almost no activity on the GPU. Conversely, if it’s using your GPU, you’ll see spikes on the GPU with little activity on the CPU.

Why I Started Digging into Ollama’s GPU Use

I noticed issues with speed when running Llama 3 on my setup, and I wanted to see exactly what was going on under the hood. When I tested Ollama with prompts, I found it took a minute or more to respond—a strong hint that it wasn’t using my GPU.

Wrapping Up

Knowing whether Ollama is utilizing your GPU can help optimize your experience, and if you’re using an AMD card that isn’t officially supported, I’ve got a solution video for that as well. Don’t miss my other videos in this series on optimizing Ollama performance—these tips will make a big difference in your AI workflow!

Happy optimizing, and see you in the next video!