Getting Started with Ollama: Run Local LLMs on Your Machine

Running large language models locally gives you privacy, no API costs, and full control over your AI workflows. Ollama makes this accessible by bundling model weights and runtime into a simple executable.

This guide walks you through installing Ollama, pulling models, and using them for code generation.

What is Ollama?

Ollama is an open-source framework for running large language models locally. It supports a variety of models optimized for coding, reasoning, and multi-modal tasks. With Ollama, you can run models like Llama 3.1, Qwen 2.5, and DeepSeek-Coder directly on your machine.

Installing Ollama

macOS

# Download and install from the website
# https://ollama.com/download/mac

# Or via Homebrew
brew install ollama

Linux

# Install via the official script
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

Windows

Windows users have two options:

Windows Subsystem for Linux (WSL): Install Ollama within WSL2 for the best performance
Direct install: Download the Windows installer from ollama.com/download/windows

# If using WSL2, install inside the Linux distribution
curl -fsSL https://ollama.com/install.sh | sh

# Start the Ollama service
ollama serve

# In another terminal, verify it works
ollama --version

Pulling Your First Model

Ollama uses a simple pull command to download models. Let's start with a coding-focused model:

# Pull Qwen2.5-Coder (optimized for code generation)
ollama pull qwen2.5-coder:7b

# Or try DeepSeek-Coder
ollama pull deepseek-coder:6.7b

# List available models
ollama list

Popular Models for Developers

Model	Size	Best For	VRAM Needed
qwen2.5-coder:7b	4.7 GB	General coding	8 GB
deepseek-coder:6.7b	3.9 GB	Code completion	8 GB
codestral:7b	4.4 GB	Code generation	8 GB
llama3.1:8b	4.9 GB	General purpose	8 GB
phi4:14b	8.3 GB	Reasoning	16 GB

Running Ollama Interactively

The simplest way to use Ollama is in interactive chat mode:

# Start an interactive session
ollama run qwen2.5-coder:7b

# You can now chat with the model
# Try: "Write a Python function to reverse a string"

Using Ollama as a Local API Server

For integration with editors and tools, run Ollama as a server:

# Start the server (runs on port 11434 by default)
ollama serve

# In another terminal, make API calls
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Write a Python function that reads a CSV file and returns the average of a column",
  "stream": false
}'

JSON Mode

For structured output, use JSON mode:

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Return a JSON object with name and email fields",
  "format": "json"
}'

Connecting VS Code

You can use Ollama with VS Code through the Continue extension or similar:

Using Continue Extension

# Install the Continue extension in VS Code
# Then configure it to use your local Ollama server

# In your config.json (Continue):
{
  "models": [
    {
      "model": "qwen2.5-coder:7b",
      "provider": "ollama",
      "api_base": "http://localhost:11434"
    }
  ]
}

Using CodeGPT or Other Extensions

Most VS Code AI extensions support custom endpoints. Configure them to point to http://localhost:11434.

Connecting OpenCode

To use Ollama with OpenCode or other AI coding assistants:

# Ensure Ollama is running
ollama serve

# Configure your tool to use the Ollama API
# The endpoint is: http://localhost:11434

# For OpenCode, you would configure the model provider:
{
  "provider": "ollama",
  "model": "qwen2.5-coder:7b",
  "api_base": "http://localhost:11434"
}

Basic Examples

Generating a Python Script

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Write a Python script that monitors a directory for new files and logs their names to a file. Include proper error handling.",
  "stream": false
}'

Creating a Project Scaffold

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Create a Flask API project structure with routes for CRUD operations on a User model. Include requirements.txt and basic tests.",
  "stream": false
}'

Explaining Code

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Explain what this code does:\n\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)",
  "stream": false
}'

Multi-Modal: Using Vision Models

Ollama also supports vision models that can analyze images:

# Pull a vision model
ollama pull llava:7b

# Analyze an image
curl http://localhost:11434/api/generate -d '{
  "model": "llava:7b",
  "prompt": "Describe what you see in this image",
  "images": ["$(base64 -w0 /path/to/image.jpg)"]
}'

Troubleshooting Common Issues

MIME Type Errors

If you see MIME type errors when serving files:

# Ensure you're using the correct Content-Type header
curl -H "Content-Type: application/json" http://localhost:11434/api/generate -d '{...}'

# For file uploads, ensure base64 encoding is correct
base64 -w0 image.png  # Linux/macOS
base64 image.png       # Windows (PowerShell)

File Path Resolution

When working with files:

# Use absolute paths
# Wrong: "read file.txt"
# Correct: "read /home/user/project/file.txt"

# For Windows paths, use forward slashes
# Correct: "read C:/Users/Project/file.txt"

Token Limit Errors

If responses are truncated:

# Increase the token limit
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Your long prompt here",
  "options": {
    "num_ctx": 8192,
    "num_predict": 2048
  }
}'

Out of Memory Errors

If you run out of VRAM:

# Use a smaller model
ollama pull qwen2.5-coder:3b  # 1.9GB instead of 4.7GB

# Or reduce context size
curl ... -d '{
  "options": {
    "num_ctx": 2048
  }
}'

Model Not Found

# Check available models
ollama list

# Pull the model again if needed
ollama pull qwen2.5-coder:7b

Creating Custom Models

You can create custom Ollama models using Modelfiles:

# Create a Modelfile
cat > MyCoder << 'EOF'
FROM qwen2.5-coder:7b

PARAMETER temperature 0.7
PARAMETER top_p 0.9

SYSTEM You are an expert Python developer. Provide clean, well-documented code.
EOF

# Create the custom model
ollama create my-coder -f MyCoder

# Use it
ollama run my-coder

Next Steps

Now that you have Ollama running locally, you can:

Integrate it with your code editor for AI-assisted coding
Build automation scripts that use local LLMs
Experiment with different models for different tasks
Create custom model configurations for specific use cases

Ollama provides a powerful, private, and cost-effective way to use AI for development workflows without relying on cloud services.