Most code review tools send your diff to someone else's server. For a side project that is fine. For a client codebase, an internal monorepo, or anything under NDA, it is a non-starter. The fix is running the model yourself.
Ollama wraps llama.cpp and gives you a single binary that pulls, runs, and serves GGUF models over a local HTTP API on port 11434. Continue is a VS Code and JetBrains extension that talks to any OpenAI-compatible endpoint, including Ollama. Together they give you a code reviewer that runs on the laptop, with no telemetry and no subscription.
This guide walks through a setup that actually works: install Ollama, pick a model that fits a developer laptop, wire it to Continue, and run a real review on a real diff.
Prerequisites
- macOS, Linux, or Windows (WSL2 recommended on Windows)
- About 8 GB of free RAM for a 7B model, 16 GB for a 14B
- About 6 GB of free disk for model weights
- VS Code 1.85+
- Optional but recommended: an Apple Silicon Mac or any machine with a discrete GPU
Step 1: Install Ollama
# macOS
brew install ollama
# Linux (one-liner from ollama.com)
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download the installer from ollama.com/download
Start the server. On macOS the app handles this; on Linux run it as a systemd service or in a separate terminal:
ollama serve
Verify it is up:
curl http://localhost:11434/api/version
# {"version":"0.5.x"}
Step 2: Pick a model
Not every model is good at code review. You want something trained on code, with a context window large enough to fit a diff plus some surrounding file. Three solid options as of mid-2026:
| Model | Size | RAM needed | Why pick it |
|---|---|---|---|
qwen2.5-coder:7b |
4.7 GB | ~8 GB | Best bang per token for code tasks |
qwen2.5-coder:14b |
9.0 GB | ~16 GB | Noticeably better at catching logic bugs |
deepseek-coder-v2:16b |
8.9 GB | ~16 GB | Strong at multi-file reasoning |
Pull the one that fits your hardware:
ollama pull qwen2.5-coder:7b
The first pull downloads several gigabytes. Subsequent pulls of different tags are fast because the base layers are reused.
Sanity check the model before touching the editor:
ollama run qwen2.5-coder:7b "Review this Python: def add(a,b): return a-b"
You should get a response that flags the bug, explains the off-by-sign issue, and suggests a fix. If you get nonsense, the model file is corrupted. Re-pull it.
Step 3: Configure Continue in VS Code
Install the Continue extension from the VS Code marketplace. After install, open the config file with the command palette:
Continue: Open config.json
This creates ~/.continue/config.json. Replace its contents with:
{
"models": [
{
"title": "Qwen Coder 7B (local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"systemMessage": "You are a senior engineer doing code review. Be specific. Cite line numbers. Do not restate the code. Skip nitpicks. Flag real issues: bugs, security holes, missing edge cases, and confusing naming."
}
],
"tabAutocompleteModel": {
"title": "Qwen Coder 7B (local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
}
Restart VS Code. Open the Continue panel from the left sidebar (it looks like a chat bubble). You should see Qwen Coder 7B (local) in the model dropdown.
Step 4: Review a real diff
The keyboard shortcut Cmd+L (macOS) or Ctrl+L (Linux/Windows) opens the chat with the current selection as context. Select a function, hit the shortcut, and ask:
Find bugs and security issues in this function.
For full-file or full-diff reviews, paste the diff into the chat directly. The model has a 32K context window, which is enough for most PRs.
For a repeatable review across a branch, run this from the repo root:
git diff main...HEAD > /tmp/changes.diff
Then in the Continue chat:
Review the diff at /tmp/changes.diff. List the top 5 issues with file paths and line numbers.
You can also add it as a project-level slash command. Create .continue/rules/review.md:
# Code review
When asked to review code:
1. Read the full diff before commenting.
2. Group findings by severity: bug, security, performance, style.
3. Cite the file and line for every finding.
4. If the code is fine, say so. Do not invent issues.
Step 5: Make it part of pre-commit (optional)
If you want a review on every commit, point Continue at the Ollama server from a shell script and stream the response. A minimal version:
#!/usr/bin/env bash
set -euo pipefail
DIFF=$(git diff --cached)
[ -z "$DIFF" ] && exit 0
PAYLOAD=$(jq -n --arg d "$DIFF" '{
model: "qwen2.5-coder:7b",
prompt: "Review this staged diff. Flag bugs and security issues. Be terse.
($d)",
stream: false
}')
RESPONSE=$(curl -s http://localhost:11434/api/generate -d "$PAYLOAD")
echo "$RESPONSE" | jq -r .response
Save it as scripts/pre-commit-review.sh, make it executable, and add it to a pre-commit hook. The review prints to your terminal before each commit. Exit non-zero on serious findings and you have a CI gate that runs on your laptop.
Performance tuning
Ollama defaults are fine for a single user. Two knobs matter:
Context length. The default is 2048 tokens, which is too short for most diffs. Set it per-model with a Modelfile:
FROM qwen2.5-coder:7b
PARAMETER num_ctx 16384
PARAMETER temperature 0.2
Build it:
ollama create qwen-coder-review -f Modelfile
Then point Continue at qwen-coder-review instead of qwen2.5-coder:7b. Lower temperature makes reviews more deterministic and less likely to invent issues.
Keep-alive. Ollama unloads a model from memory 5 minutes after the last request. The next prompt pays the reload cost, which is 5-15 seconds. Pin it for the duration of a review session:
{
"models": [
{
"title": "Qwen Coder 7B (local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"ollama": {
"keepAlive": "-1"
}
}
]
}
-1 means the model stays loaded until Ollama restarts. On a 16 GB machine with a 7B model, the cost is about 5 GB of RAM, which is usually fine.
When this is not the right tool
A local 7B model is not a frontier model. It misses things. Use it for a first pass and still get a human to look at anything that touches auth, payments, or data migration. For a 14B model on a beefy machine, the gap narrows. For deeper review of large monorepos, run an agent loop with hermes-agent or claude-code pointed at the same repo.
If you need a model that is stronger than what fits on your laptop, the same config file accepts a hosted provider. Swap provider: ollama for provider: anthropic and add an API key. The integration code does not change.
What to do next
- Try
qwen2.5-coder:14bif you have 16 GB of RAM. The jump in review quality is noticeable. - Wire the same Ollama server to a hermes-agent or claude-code agent loop for automated refactors.
- Add a
.continue/rules/file for each project with project-specific review heuristics. - If your team uses JetBrains, the same config works; Continue supports both editors.