Ollama is a powerful and user-friendly tool designed to get you started with large language models (LLMs) on your local machine quickly. Whether you’re a developer, researcher, or simply an enthusiast eager to explore the capabilities of LLMs, Ollama provides a seamless experience for running, managing, and customizing these models right on your desktop or server. This guide will walk you through everything you need to know about Ollama Server, from installation to advanced customization, ensuring you can harness the power of local LLMs effectively.
Get Ollama Server
Setting up Ollama Server is straightforward across different operating systems. Choose your platform below to begin the installation process.
macOS
Windows
Linux
For Linux users, Ollama provides a simple installation script via curl:
curl -fsSL https://ollama.com/install.sh | sh
For more detailed instructions or manual installation options, refer to the official Linux manual install guide.
Docker
Ollama is also available as a Docker image. You can find the official Ollama Docker image on Docker Hub, making it easy to integrate Ollama into containerized environments.
Quick Start with Ollama Server
Once installed, getting started with Ollama Server is incredibly simple. To run and chat with the Llama 3.2 model, just execute the following command in your terminal:
ollama run llama3.2
This command will download the Llama 3.2 model (if you don’t already have it) and launch an interactive chat session, allowing you to start experimenting with LLMs immediately.
Explore the Model Library
Ollama boasts a rich model library with a variety of models to suit different needs and hardware capabilities. Here are some popular models available for download and use with Ollama Server:
Model | Parameters | Size | Download Command |
---|---|---|---|
QwQ | 32B | 20GB | ollama run qwq |
DeepSeek-R1 | 7B | 4.7GB | ollama run deepseek-r1 |
DeepSeek-R1 (Large) | 671B | 404GB | ollama run deepseek-r1:671b |
Llama 3.3 | 70B | 43GB | ollama run llama3.3 |
Llama 3.2 | 3B | 2.0GB | ollama run llama3.2 |
Llama 3.2 (Small) | 1B | 1.3GB | ollama run llama3.2:1b |
Llama 3.2 Vision | 11B | 7.9GB | ollama run llama3.2-vision |
Llama 3.2 Vision (Large) | 90B | 55GB | ollama run llama3.2-vision:90b |
Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
Llama 3.1 (Large) | 405B | 231GB | ollama run llama3.1:405b |
Phi 4 | 14B | 9.1GB | ollama run phi4 |
Phi 4 Mini | 3.8B | 2.5GB | ollama run phi4-mini |
Gemma 2 (Small) | 2B | 1.6GB | ollama run gemma2:2b |
Gemma 2 | 9B | 5.5GB | ollama run gemma2 |
Gemma 2 (Large) | 27B | 16GB | ollama run gemma2:27b |
Mistral | 7B | 4.1GB | ollama run mistral |
Moondream 2 | 1.4B | 829MB | ollama run moondream |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Granite-3.2 | 8B | 4.9GB | ollama run granite3.2 |
Note: The required RAM varies depending on the model size. 7B models need at least 8GB RAM, 13B models need 16GB, and 33B models require 32GB or more. Choose a model that aligns with your system’s resources for optimal performance.
Customize Your Ollama Server Models
Ollama Server provides extensive customization options, allowing you to tailor models to your specific requirements.
Importing Models from GGUF and Safetensors
Ollama Server supports importing models in both GGUF and Safetensors formats.
Importing from GGUF:
-
Create a Modelfile: Start by creating a file named
Modelfile
. This file will contain instructions for Ollama on how to build your custom model. Use theFROM
instruction followed by the local path to your GGUF model file:FROM ./vicuna-33b.Q4_0.gguf
-
Create the Model in Ollama Server: Use the
ollama create
command, specifying a name for your model and pointing to your Modelfile:ollama create example -f Modelfile
-
Run Your Custom Model: You can now run your imported model using the
ollama run
command:ollama run example
Importing from Safetensors:
For detailed guidance on importing models from Safetensors, refer to the official Ollama documentation on model importing.
Customizing Prompts
You can further customize models from the Ollama library by modifying their prompts. Here’s how to customize the llama3.2
model:
-
Pull the Model: First, pull the
llama3.2
model from the Ollama library:ollama pull llama3.2
-
Create a Modelfile: Create a
Modelfile
with the following content to customize the prompt and parameters:FROM llama3.2 # Set the temperature [higher is more creative, lower is more coherent] PARAMETER temperature 1 # Define a custom system message SYSTEM """ You are Mario from Super Mario Bros. Answer as Mario, the assistant, only. """
-
Create and Run the Customized Model: Create a new model named
mario
using your Modelfile and run it:ollama create mario -f ./Modelfile ollama run mario >>> hi Hello! It's your friend Mario.
For more in-depth information on working with Modelfiles, consult the comprehensive Modelfile documentation.
Ollama Server CLI Reference
Ollama Server comes with a powerful command-line interface (CLI) for managing models and interacting with the server. Here are some essential CLI commands:
ollama create
Used to build a model from a Modelfile:
ollama create mymodel -f ./Modelfile
ollama pull
Downloads a model from the Ollama library. It can also update existing local models by pulling only the differences:
ollama pull llama3.2
ollama rm
Removes a model from your local system:
ollama rm llama3.2
ollama cp
Copies a model, creating a duplicate with a new name:
ollama cp llama3.2 my-model
Multiline Input
For prompts spanning multiple lines, use triple quotes """
:
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
Multimodal Models
Ollama Server supports multimodal models, allowing you to interact with models like LLaVA using images:
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
Output: The image features a yellow smiley face, which is likely the central focus of the picture.
Passing Prompts as Arguments
You can pass prompts directly as command-line arguments:
ollama run llama3.2 "Summarize this file: $(cat README.md)"
Output: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
ollama show
Displays detailed information about a model:
ollama show llama3.2
ollama list
Lists all models currently available on your computer:
ollama list
ollama ps
Shows which models are loaded and running in Ollama Server:
ollama ps
ollama stop
Stops a running model:
ollama stop llama3.2
ollama serve
Starts the Ollama Server in the background. Useful when you want to run Ollama without the desktop application:
ollama serve
Building Ollama Server from Source
For developers looking to contribute or customize Ollama further, building from source is an option. Refer to the developer guide for detailed instructions.
Running Local Builds
After building Ollama from source, you can start the server and run models using the local binaries:
-
Start the Server:
./ollama serve
-
Run a Model (in a separate terminal):
./ollama run llama3.2
REST API for Ollama Server
Ollama Server exposes a REST API, enabling programmatic interaction with models.
Generate a Response via API
To generate text from a model using the API, send a POST request to the /api/generate
endpoint:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'
Chat with a Model via API
For conversational interactions, use the /api/chat
endpoint:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Explore the complete API documentation for all available endpoints and functionalities.
Community Integrations and Ecosystem
Ollama Server benefits from a vibrant community, leading to numerous integrations across different platforms and tools:
- Web & Desktop: Various web and desktop applications are integrating with Ollama to provide user-friendly interfaces for interacting with local LLMs.
- Cloud: Integrations are emerging to deploy Ollama in cloud environments.
- Terminal: Several terminal-based tools enhance the Ollama CLI experience.
- Apple Vision Pro: Ollama is even finding its way into innovative platforms like Apple Vision Pro.
- Database: Integrations with databases are being developed for advanced data processing and analysis using LLMs.
- Package managers: Ollama’s accessibility is expanding through package manager integrations.
- Libraries: Client libraries in various programming languages simplify interaction with the Ollama API.
- Mobile: Efforts are underway to bring Ollama to mobile platforms.
- Extensions & Plugins: Ecosystem growth includes extensions and plugins for various applications and workflows.
- Supported Backends: Ollama supports multiple backends, ensuring compatibility across different hardware.
- Observability: Tools for monitoring and observing Ollama Server are being developed to enhance management and performance tuning.
Ollama Server is rapidly evolving, making it an exciting platform for anyone looking to leverage the power of large language models locally. Whether you are just starting or are an experienced AI practitioner, Ollama provides the tools and flexibility to explore the world of LLMs on your terms.