Getting Started with Ollama: Run Local LLMs on Your Machine
Running large language models locally gives you privacy, no API costs, and full control over your AI workflows. Ollama makes this accessible by bundling model weights and runtime into a simple executable.
This guide walks you through installing Ollama, pulling models, and using them for code generation.
What is Ollama?
Ollama is an open-source framework for running large language models locally. It supports a variety of models optimized for coding, reasoning, and multi-modal tasks. With Ollama, you can run models like Llama 3.1, Qwen 2.5, and DeepSeek-Coder directly on your machine.
Installing Ollama
macOS
# Download and install from the website
# https://ollama.com/download/mac
# Or via Homebrew
brew install ollama
Linux
# Install via the official script
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
Windows
Windows users have two options:
- Windows Subsystem for Linux (WSL): Install Ollama within WSL2 for the best performance
- Direct install: Download the Windows installer from ollama.com/download/windows
# If using WSL2, install inside the Linux distribution
curl -fsSL https://ollama.com/install.sh | sh
# Start the Ollama service
ollama serve
# In another terminal, verify it works
ollama --version
Pulling Your First Model
Ollama uses a simple pull command to download models. Let's start with a coding-focused model:
# Pull Qwen2.5-Coder (optimized for code generation)
ollama pull qwen2.5-coder:7b
# Or try DeepSeek-Coder
ollama pull deepseek-coder:6.7b
# List available models
ollama list
Popular Models for Developers
| Model | Size | Best For | VRAM Needed |
|---|---|---|---|
| qwen2.5-coder:7b | 4.7 GB | General coding | 8 GB |
| deepseek-coder:6.7b | 3.9 GB | Code completion | 8 GB |
| codestral:7b | 4.4 GB | Code generation | 8 GB |
| llama3.1:8b | 4.9 GB | General purpose | 8 GB |
| phi4:14b | 8.3 GB | Reasoning | 16 GB |
Running Ollama Interactively
The simplest way to use Ollama is in interactive chat mode:
# Start an interactive session
ollama run qwen2.5-coder:7b
# You can now chat with the model
# Try: "Write a Python function to reverse a string"
Using Ollama as a Local API Server
For integration with editors and tools, run Ollama as a server:
# Start the server (runs on port 11434 by default)
ollama serve
# In another terminal, make API calls
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Write a Python function that reads a CSV file and returns the average of a column",
"stream": false
}'
JSON Mode
For structured output, use JSON mode:
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Return a JSON object with name and email fields",
"format": "json"
}'
Connecting VS Code
You can use Ollama with VS Code through the Continue extension or similar:
Using Continue Extension
# Install the Continue extension in VS Code
# Then configure it to use your local Ollama server
# In your config.json (Continue):
{
"models": [
{
"model": "qwen2.5-coder:7b",
"provider": "ollama",
"api_base": "http://localhost:11434"
}
]
}
Using CodeGPT or Other Extensions
Most VS Code AI extensions support custom endpoints. Configure them to point to http://localhost:11434.
Connecting OpenCode
To use Ollama with OpenCode or other AI coding assistants:
# Ensure Ollama is running
ollama serve
# Configure your tool to use the Ollama API
# The endpoint is: http://localhost:11434
# For OpenCode, you would configure the model provider:
{
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"api_base": "http://localhost:11434"
}
Basic Examples
Generating a Python Script
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Write a Python script that monitors a directory for new files and logs their names to a file. Include proper error handling.",
"stream": false
}'
Creating a Project Scaffold
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Create a Flask API project structure with routes for CRUD operations on a User model. Include requirements.txt and basic tests.",
"stream": false
}'
Explaining Code
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Explain what this code does:\n\ndef fibonacci(n):\n if n <= 1:\n return n\n return fibonacci(n-1) + fibonacci(n-2)",
"stream": false
}'
Multi-Modal: Using Vision Models
Ollama also supports vision models that can analyze images:
# Pull a vision model
ollama pull llava:7b
# Analyze an image
curl http://localhost:11434/api/generate -d '{
"model": "llava:7b",
"prompt": "Describe what you see in this image",
"images": ["$(base64 -w0 /path/to/image.jpg)"]
}'
Troubleshooting Common Issues
MIME Type Errors
If you see MIME type errors when serving files:
# Ensure you're using the correct Content-Type header
curl -H "Content-Type: application/json" http://localhost:11434/api/generate -d '{...}'
# For file uploads, ensure base64 encoding is correct
base64 -w0 image.png # Linux/macOS
base64 image.png # Windows (PowerShell)
File Path Resolution
When working with files:
# Use absolute paths
# Wrong: "read file.txt"
# Correct: "read /home/user/project/file.txt"
# For Windows paths, use forward slashes
# Correct: "read C:/Users/Project/file.txt"
Token Limit Errors
If responses are truncated:
# Increase the token limit
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Your long prompt here",
"options": {
"num_ctx": 8192,
"num_predict": 2048
}
}'
Out of Memory Errors
If you run out of VRAM:
# Use a smaller model
ollama pull qwen2.5-coder:3b # 1.9GB instead of 4.7GB
# Or reduce context size
curl ... -d '{
"options": {
"num_ctx": 2048
}
}'
Model Not Found
# Check available models
ollama list
# Pull the model again if needed
ollama pull qwen2.5-coder:7b
Creating Custom Models
You can create custom Ollama models using Modelfiles:
# Create a Modelfile
cat > MyCoder << 'EOF'
FROM qwen2.5-coder:7b
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are an expert Python developer. Provide clean, well-documented code.
EOF
# Create the custom model
ollama create my-coder -f MyCoder
# Use it
ollama run my-coder
Next Steps
Now that you have Ollama running locally, you can:
- Integrate it with your code editor for AI-assisted coding
- Build automation scripts that use local LLMs
- Experiment with different models for different tasks
- Create custom model configurations for specific use cases
Ollama provides a powerful, private, and cost-effective way to use AI for development workflows without relying on cloud services.