Ollama Server: Your Guide to Running Large Language Models Locally

Ollama is a powerful and user-friendly tool designed to get you started with large language models (LLMs) on your local machine quickly. Whether you’re a developer, researcher, or simply an enthusiast eager to explore the capabilities of LLMs, Ollama provides a seamless experience for running, managing, and customizing these models right on your desktop or server. This guide will walk you through everything you need to know about Ollama Server, from installation to advanced customization, ensuring you can harness the power of local LLMs effectively.

Get Ollama Server

Setting up Ollama Server is straightforward across different operating systems. Choose your platform below to begin the installation process.

macOS

Download Ollama for macOS

Windows

Download Ollama for Windows

Linux

For Linux users, Ollama provides a simple installation script via curl:

curl -fsSL https://ollama.com/install.sh | sh

For more detailed instructions or manual installation options, refer to the official Linux manual install guide.

Docker

Ollama is also available as a Docker image. You can find the official Ollama Docker image on Docker Hub, making it easy to integrate Ollama into containerized environments.

Quick Start with Ollama Server

Once installed, getting started with Ollama Server is incredibly simple. To run and chat with the Llama 3.2 model, just execute the following command in your terminal:

ollama run llama3.2

This command will download the Llama 3.2 model (if you don’t already have it) and launch an interactive chat session, allowing you to start experimenting with LLMs immediately.

Explore the Model Library

Ollama boasts a rich model library with a variety of models to suit different needs and hardware capabilities. Here are some popular models available for download and use with Ollama Server:

Model Parameters Size Download Command
QwQ 32B 20GB ollama run qwq
DeepSeek-R1 7B 4.7GB ollama run deepseek-r1
DeepSeek-R1 (Large) 671B 404GB ollama run deepseek-r1:671b
Llama 3.3 70B 43GB ollama run llama3.3
Llama 3.2 3B 2.0GB ollama run llama3.2
Llama 3.2 (Small) 1B 1.3GB ollama run llama3.2:1b
Llama 3.2 Vision 11B 7.9GB ollama run llama3.2-vision
Llama 3.2 Vision (Large) 90B 55GB ollama run llama3.2-vision:90b
Llama 3.1 8B 4.7GB ollama run llama3.1
Llama 3.1 (Large) 405B 231GB ollama run llama3.1:405b
Phi 4 14B 9.1GB ollama run phi4
Phi 4 Mini 3.8B 2.5GB ollama run phi4-mini
Gemma 2 (Small) 2B 1.6GB ollama run gemma2:2b
Gemma 2 9B 5.5GB ollama run gemma2
Gemma 2 (Large) 27B 16GB ollama run gemma2:27b
Mistral 7B 4.1GB ollama run mistral
Moondream 2 1.4B 829MB ollama run moondream
Neural Chat 7B 4.1GB ollama run neural-chat
Starling 7B 4.1GB ollama run starling-lm
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
LLaVA 7B 4.5GB ollama run llava
Granite-3.2 8B 4.9GB ollama run granite3.2

Note: The required RAM varies depending on the model size. 7B models need at least 8GB RAM, 13B models need 16GB, and 33B models require 32GB or more. Choose a model that aligns with your system’s resources for optimal performance.

Customize Your Ollama Server Models

Ollama Server provides extensive customization options, allowing you to tailor models to your specific requirements.

Importing Models from GGUF and Safetensors

Ollama Server supports importing models in both GGUF and Safetensors formats.

Importing from GGUF:

  1. Create a Modelfile: Start by creating a file named Modelfile. This file will contain instructions for Ollama on how to build your custom model. Use the FROM instruction followed by the local path to your GGUF model file:

    FROM ./vicuna-33b.Q4_0.gguf
  2. Create the Model in Ollama Server: Use the ollama create command, specifying a name for your model and pointing to your Modelfile:

    ollama create example -f Modelfile
  3. Run Your Custom Model: You can now run your imported model using the ollama run command:

    ollama run example

Importing from Safetensors:

For detailed guidance on importing models from Safetensors, refer to the official Ollama documentation on model importing.

Customizing Prompts

You can further customize models from the Ollama library by modifying their prompts. Here’s how to customize the llama3.2 model:

  1. Pull the Model: First, pull the llama3.2 model from the Ollama library:

    ollama pull llama3.2
  2. Create a Modelfile: Create a Modelfile with the following content to customize the prompt and parameters:

    FROM llama3.2
    # Set the temperature [higher is more creative, lower is more coherent]
    PARAMETER temperature 1
    # Define a custom system message
    SYSTEM """
    You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
    """
  3. Create and Run the Customized Model: Create a new model named mario using your Modelfile and run it:

    ollama create mario -f ./Modelfile
    ollama run mario
    >>> hi
    Hello! It's your friend Mario.

For more in-depth information on working with Modelfiles, consult the comprehensive Modelfile documentation.

Ollama Server CLI Reference

Ollama Server comes with a powerful command-line interface (CLI) for managing models and interacting with the server. Here are some essential CLI commands:

ollama create

Used to build a model from a Modelfile:

ollama create mymodel -f ./Modelfile

ollama pull

Downloads a model from the Ollama library. It can also update existing local models by pulling only the differences:

ollama pull llama3.2

ollama rm

Removes a model from your local system:

ollama rm llama3.2

ollama cp

Copies a model, creating a duplicate with a new name:

ollama cp llama3.2 my-model

Multiline Input

For prompts spanning multiple lines, use triple quotes """:

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

Multimodal Models

Ollama Server supports multimodal models, allowing you to interact with models like LLaVA using images:

ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"

Output: The image features a yellow smiley face, which is likely the central focus of the picture.

Passing Prompts as Arguments

You can pass prompts directly as command-line arguments:

ollama run llama3.2 "Summarize this file: $(cat README.md)"

Output: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

ollama show

Displays detailed information about a model:

ollama show llama3.2

ollama list

Lists all models currently available on your computer:

ollama list

ollama ps

Shows which models are loaded and running in Ollama Server:

ollama ps

ollama stop

Stops a running model:

ollama stop llama3.2

ollama serve

Starts the Ollama Server in the background. Useful when you want to run Ollama without the desktop application:

ollama serve

Building Ollama Server from Source

For developers looking to contribute or customize Ollama further, building from source is an option. Refer to the developer guide for detailed instructions.

Running Local Builds

After building Ollama from source, you can start the server and run models using the local binaries:

  1. Start the Server:

    ./ollama serve
  2. Run a Model (in a separate terminal):

    ./ollama run llama3.2

REST API for Ollama Server

Ollama Server exposes a REST API, enabling programmatic interaction with models.

Generate a Response via API

To generate text from a model using the API, send a POST request to the /api/generate endpoint:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

Chat with a Model via API

For conversational interactions, use the /api/chat endpoint:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Explore the complete API documentation for all available endpoints and functionalities.

Community Integrations and Ecosystem

Ollama Server benefits from a vibrant community, leading to numerous integrations across different platforms and tools:

  • Web & Desktop: Various web and desktop applications are integrating with Ollama to provide user-friendly interfaces for interacting with local LLMs.
  • Cloud: Integrations are emerging to deploy Ollama in cloud environments.
  • Terminal: Several terminal-based tools enhance the Ollama CLI experience.
  • Apple Vision Pro: Ollama is even finding its way into innovative platforms like Apple Vision Pro.
  • Database: Integrations with databases are being developed for advanced data processing and analysis using LLMs.
  • Package managers: Ollama’s accessibility is expanding through package manager integrations.
  • Libraries: Client libraries in various programming languages simplify interaction with the Ollama API.
  • Mobile: Efforts are underway to bring Ollama to mobile platforms.
  • Extensions & Plugins: Ecosystem growth includes extensions and plugins for various applications and workflows.
  • Supported Backends: Ollama supports multiple backends, ensuring compatibility across different hardware.
  • Observability: Tools for monitoring and observing Ollama Server are being developed to enhance management and performance tuning.

Ollama Server is rapidly evolving, making it an exciting platform for anyone looking to leverage the power of large language models locally. Whether you are just starting or are an experienced AI practitioner, Ollama provides the tools and flexibility to explore the world of LLMs on your terms.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *