Running large language models locally gives you privacy, no API costs, and full control over your AI workflows. Ollama makes this accessible by bundling model weights and runtime into a simple executable.

This guide walks you through installing Ollama, pulling models, and using them for code generation.

What is Ollama?

Ollama is an open-source framework for running large language models locally. It supports a variety of models optimized for coding, reasoning, and multi-modal tasks. With Ollama, you can run models like Llama 3.1, Qwen 2.5, and DeepSeek-Coder directly on your machine.

Installing Ollama

macOS

# Download and install from the website
# https://ollama.com/download/mac

# Or via Homebrew
brew install ollama

Linux

# Install via the official script
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

Windows

Windows users have two options:

  • Windows Subsystem for Linux (WSL): Install Ollama within WSL2 for the best performance
  • Direct install: Download the Windows installer from ollama.com/download/windows
# If using WSL2, install inside the Linux distribution
curl -fsSL https://ollama.com/install.sh | sh

# Start the Ollama service
ollama serve

# In another terminal, verify it works
ollama --version

Pulling Your First Model

Ollama uses a simple pull command to download models. Let's start with a coding-focused model:

# Pull Qwen2.5-Coder (optimized for code generation)
ollama pull qwen2.5-coder:7b

# Or try DeepSeek-Coder
ollama pull deepseek-coder:6.7b

# List available models
ollama list

Popular Models for Developers

Model Size Best For VRAM Needed
qwen2.5-coder:7b 4.7 GB General coding 8 GB
deepseek-coder:6.7b 3.9 GB Code completion 8 GB
codestral:7b 4.4 GB Code generation 8 GB
llama3.1:8b 4.9 GB General purpose 8 GB
phi4:14b 8.3 GB Reasoning 16 GB

Running Ollama Interactively

The simplest way to use Ollama is in interactive chat mode:

# Start an interactive session
ollama run qwen2.5-coder:7b

# You can now chat with the model
# Try: "Write a Python function to reverse a string"

Using Ollama as a Local API Server

For integration with editors and tools, run Ollama as a server:

# Start the server (runs on port 11434 by default)
ollama serve

# In another terminal, make API calls
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Write a Python function that reads a CSV file and returns the average of a column",
  "stream": false
}'

JSON Mode

For structured output, use JSON mode:

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Return a JSON object with name and email fields",
  "format": "json"
}'

Connecting VS Code

You can use Ollama with VS Code through the Continue extension or similar:

Using Continue Extension

# Install the Continue extension in VS Code
# Then configure it to use your local Ollama server

# In your config.json (Continue):
{
  "models": [
    {
      "model": "qwen2.5-coder:7b",
      "provider": "ollama",
      "api_base": "http://localhost:11434"
    }
  ]
}

Using CodeGPT or Other Extensions

Most VS Code AI extensions support custom endpoints. Configure them to point to http://localhost:11434.

Connecting OpenCode

To use Ollama with OpenCode or other AI coding assistants:

# Ensure Ollama is running
ollama serve

# Configure your tool to use the Ollama API
# The endpoint is: http://localhost:11434

# For OpenCode, you would configure the model provider:
{
  "provider": "ollama",
  "model": "qwen2.5-coder:7b",
  "api_base": "http://localhost:11434"
}

Basic Examples

Generating a Python Script

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Write a Python script that monitors a directory for new files and logs their names to a file. Include proper error handling.",
  "stream": false
}'

Creating a Project Scaffold

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Create a Flask API project structure with routes for CRUD operations on a User model. Include requirements.txt and basic tests.",
  "stream": false
}'

Explaining Code

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Explain what this code does:\n\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)",
  "stream": false
}'

Multi-Modal: Using Vision Models

Ollama also supports vision models that can analyze images:

# Pull a vision model
ollama pull llava:7b

# Analyze an image
curl http://localhost:11434/api/generate -d '{
  "model": "llava:7b",
  "prompt": "Describe what you see in this image",
  "images": ["$(base64 -w0 /path/to/image.jpg)"]
}'

Troubleshooting Common Issues

MIME Type Errors

If you see MIME type errors when serving files:

# Ensure you're using the correct Content-Type header
curl -H "Content-Type: application/json" http://localhost:11434/api/generate -d '{...}'

# For file uploads, ensure base64 encoding is correct
base64 -w0 image.png  # Linux/macOS
base64 image.png       # Windows (PowerShell)

File Path Resolution

When working with files:

# Use absolute paths
# Wrong: "read file.txt"
# Correct: "read /home/user/project/file.txt"

# For Windows paths, use forward slashes
# Correct: "read C:/Users/Project/file.txt"

Token Limit Errors

If responses are truncated:

# Increase the token limit
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Your long prompt here",
  "options": {
    "num_ctx": 8192,
    "num_predict": 2048
  }
}'

Out of Memory Errors

If you run out of VRAM:

# Use a smaller model
ollama pull qwen2.5-coder:3b  # 1.9GB instead of 4.7GB

# Or reduce context size
curl ... -d '{
  "options": {
    "num_ctx": 2048
  }
}'

Model Not Found

# Check available models
ollama list

# Pull the model again if needed
ollama pull qwen2.5-coder:7b

Creating Custom Models

You can create custom Ollama models using Modelfiles:

# Create a Modelfile
cat > MyCoder << 'EOF'
FROM qwen2.5-coder:7b

PARAMETER temperature 0.7
PARAMETER top_p 0.9

SYSTEM You are an expert Python developer. Provide clean, well-documented code.
EOF

# Create the custom model
ollama create my-coder -f MyCoder

# Use it
ollama run my-coder

Next Steps

Now that you have Ollama running locally, you can:

  • Integrate it with your code editor for AI-assisted coding
  • Build automation scripts that use local LLMs
  • Experiment with different models for different tasks
  • Create custom model configurations for specific use cases

Ollama provides a powerful, private, and cost-effective way to use AI for development workflows without relying on cloud services.