kleamerkuri

kleamerkuri

May 3, 2026 · 27 min read

I Tried the ‘Code for Free with Local AI’ Setup. Here’s What Actually Happened.

I burned through $200 in two weeks on a personal side project, not from reckless experimenting but while building something real, leaning on AI agents the way they’re actually supposed to be used.

That was my wake-up call.

I always assumed the “vibe coding” crowd was either exaggerating or running demos on toy projects. When you’re in that space, the cost isn’t really at the forefront. Nobody’s posting the bills. The focus is all on the product, the natural language, the demo, and how incredible it is that you can build things this way.

The cost is almost an afterthought—until it isn’t.

Once you’re doing actual development work—scaffolding features, debugging across multiple files, iterating on logic—the token count doesn’t care.

Agentic AI is expensive by design. Every back-and-forth, every file read, every multi-step task burns context. And the bills will reflect that.

I don’t know how employers are reconciling this at scale. I genuinely don’t 🤷‍♀️

That reality sent me down a rabbit hole.

I kept seeing the same thumbnails and posts:

  • “pair Claude Code with local models and NEVER pay again”
  • “free frontier-level AI development”
  • people claiming they’d never go back to paid setups

So naturally, I had to try it.

This post is what actually happened, from the setup and what tripped me up, to an honest first impression of how it holds up against what I was already using.

Related: Claude Free Tier Actually Slaps Now (If You Use It Right)

What We’re Actually Talking About

Two tools, one setup. Here’s a quick overview before we get into it:

Claude Code: The Agent

Claude Code is Anthropic’s agentic coding tool, and “agentic” is the word that matters here. It is not a chatbot you copy-paste from.

When you run Claude Code inside a project folder, it has actual access to your working directory. It reads files, writes and modifies code across multiple files in one session, runs shell commands, and reasons through multi-step problems across turns.

The difference from a regular AI chat is that instead of pasting a function and asking: “What’s wrong?”

You say: “There’s a bug in processPayment(), go figure out what it is.”

And it actually goes and looks.

What makes Claude Code interesting for this setup is that the agent loop—the scaffolding for file access, tool calling, multi-turn reasoning, and permission management—lives in the CLI itself.

The model powering it is swappable. Which is exactly what we’re exploiting here.

Ollama: The Model Host

Ollama is the local model host. It downloads open-source models, runs them on your hardware, and serves them through a local API.

Since version 0.14.0 (January 2026), Ollama exposes an Anthropic-compatible API at localhost:11434. That’s the detail that makes this whole thing work 💁‍♀️

Claude Code can be pointed at that local address instead of Anthropic’s servers, and it routes there instead. This isn’t a hack, it’s in official Ollama docs.

Note 👀
Claude Code requires a free Anthropic account to install and authenticate. You won’t be paying per token when running local models, but the initial auth step still goes through Anthropic.

The Setup: What Actually Worked

I’ll keep this focused on what I actually ran, in the order I ran it.

There’s a troubleshooting section after this for the things that didn’t work the first time (there were several attempts).

Step 1: Install Ollama

// Run: curl -fsSL <https://ollama.com/install.sh> | sh

On Windows, download the installer from ollama.com/download and run it like any .exe.

Once installed, Ollama runs as a background service automatically. Confirm it’s working:

ollama --version

You should see a version number come back.

Tip: If the command isn’t found, close and reopen your terminal because the PATH update sometimes needs a fresh shell to take effect.

Step 2: Pull a Model

This is where you choose which open-source models to work with.

I’m using Qwen3.5 for code work. Pull it:

ollama pull qwen3.5:latest

One thing I learned the hard way is that you don’t want to (and should not) hoard models 🙅‍♀️

I pulled a bunch of models early on and maxed out my machine’s resources, which I initially mistook for an environment variable issue.

Keep only the models you’re actively using!

Get your current list:

ollama list

Remove unused models with:

ollama rm <model-name>

Step 3: Create a 64K Context Model With a Modelfile

This step isn’t in most guides, yet it’s the one that actually made everything work for me.

By default, Ollama runs models with a context window of around 4,096 tokens.

Claude Code needs at least 32K to function reliably as an agent and ideally 64K for anything involving real project files.

At the default 4K context, /init won’t generate your CLAUDE.md. The model silently truncates your project context mid-task, and tool calls fail in ways that look like connection problems but aren’t.

The default 4,096 context is roughly 3,000 words. When an agent loop exceeds that limit, Ollama silently truncates from the beginning with no warning and no error.

That’s why everything was failing 🤦‍♀️

The fix is a Modelfile—a two-line configuration file that creates a named variant of your model with the context you specify.

Here’s exactly what I did:

1. Create the file

In whatever directory you’re in, create a file called Modelfile with no extension:

FROM <model | ex: qwen3.5> 
PARAMETER num_ctx 65536

That’s the whole file.

  • FROM points to your base model
  • PARAMETER num_ctx sets the context window in tokens (65536 = 64K)

2. Create the model from it

ollama create qwen3-64 -f ./Modelfile

Replace qwen3-64 with whatever name you want to call the high-context model variant. That’s what you’ll reference when launching.

Ollama builds it fast.

3. Verify it was created

ollama list

You should see qwen3-64 (or whatever name you chose) in the list alongside your original model.

One important thing I wasn’t sure about was whether I had to keep the Modelfile in place.

After running ollama create, I tested by deleting the Modelfile. The model appears to persist, so deleting the blueprint does not affect the model that was already built from it.

Ollama creates a new model layer with your configuration baked in and does not re-download anything; it just applies your new configuration on top of the existing one.

That’s why qwen3-64 still shows up in ollama list and works fine. The Modelfile is just the instructions and is expendable once the model exists.

Note ⚠️
More context = more VRAM. 64K context roughly doubles the KV cache compared to 32K, so your model will use more memory than it did at the default 4K. If you’re on a machine with limited RAM, start with 32K (32768) and see how it runs before going higher. Context window size has a direct, linear cost in VRAM since the KV cache grows with every token you add to the window.

Step 4: Install Claude Code

// Run: curl -fsSL <https://claude.ai/install.sh> | bash

Or via Homebrew on macOS:

brew install --cask claude-code

Confirm:

claude --version

You’ll be prompted to sign in with your Anthropic account on first run. This is a one-time step.

Step 5: Launch With Ollama, And Fix the Three-Tier Problem

Here’s the command that actually works:

ollama launch claude --model qwen3-64

The --model flag is required. At least, it was for all my attempts. (Who knows, I could’ve just been super unlucky in this setup 😒)

Without it, Claude Code kept on launching with the default Claude subscription model instead of the local one.

Running claude on its own (without ollama launch) does the same—it connects to Anthropic’s servers. The ollama launch wrapper is what handles the routing.

But there’s a deeper issue that ollama launch on its own doesn’t fully solve, and it’s the reason things can still break even after the setup looks correct.

What’s the three-tier problem and how to fix it?

Claude Code internally routes to three different model tiers.

  1. Haiku handles lightweight background tasks
  2. Sonnet handles standard coding work
  3. Opus handles complex reasoning

When you specify --model qwen3-64, your primary task goes to that model, but background calls still try to reach Anthropic’s default model names (claude-haiku-4-5-20251001, claude-sonnet-4-5-20250514).

Ollama has no idea what those are. It returns a 404, Claude Code interprets it as the model not existing, and you get either a silent failure or an error message that points you at the wrong problem.

The fix is to explicitly tell Claude Code which model to use for every tier.

The cleanest way is a shell alias that sets all three at once. Add this to your ~/.zshrc:

alias claude-local='ANTHROPIC_AUTH_TOKEN=ollama \\
  ANTHROPIC_BASE_URL=http://localhost:11434 \\
  ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3-64 \\
  ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3-64 \\
  ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3-64 \\
  DISABLE_TELEMETRY=1 \\
  ollama launch claude --model qwen3-64'

Reload your shell:

source ~/.zshrc

Now instead of ollama launch claude --model qwen3-64, you just run:

claude-local

Every tier maps to your local model. No background calls reach Anthropic. No silent 404s. This is the complete launch command that most setup guides don’t show you.

Note: DISABLE_TELEMETRY=1 stops Claude Code from making additional network calls to Anthropic’s servers that will fail from a local-only setup. It doesn’t affect functionality.

Step 6: Initialize Your Project

Navigate to your project and launch:

cd your-project
claude-local

Once inside Claude Code:

/init

This scans your project and generates a CLAUDE.md at the root.

Before I had the 64K context model, this command would appear to run and produce nothing. After bumping the context, it worked.

The CLAUDE.md is one of the most useful parts of this whole setup. It’s where you document your build commands, code conventions, and architecture context so the model doesn’t have to re-parse them every session.

Tip 💡
For design-system-heavy projects especially, putting your component patterns and token structure here once saves significant context on every subsequent task. Think of it as giving the model a permanent cheat sheet for your project.

Related: All You Need To Know About AI Workflow Files And How To Use Them

Shortcuts and Commands Worth Knowing

A few things that become part of the daily rhythm once you’re up and running.

Keyboard Shortcuts

ShortcutWhat It Does
EscapeInterrupt the model mid-generation
Escape EscapeJump back to edit a previous prompt
Shift+TabCycles through normal → auto-accept → plan mode
Ctrl+CCancel the current generation
Ctrl+LClear the terminal screen (doesn’t reset context)
\\ + EnterWrite a multiline prompt

The Escape interrupt is one of the most useful. When the model starts heading in the wrong direction (or, sorry local models, is stuck thinking), you don’t have to wait. Simply interrupt it, redirect, and keep going.

Slash Commands

/init — Scans your project and generates CLAUDE.md. Run once per new repo.

/clear — Resets the context window. Think of it like opening a new chat tab when the current one has gotten too cluttered to work with. Local models drift in long contexts more than cloud models, so clear early and often.

/compact — A softer version of /clear. Summarizes the current context and keeps going instead of wiping it entirely.

/model — Switch models mid-session without relaunching.

The Staged Diff Alias (Useful for Commit Messages)

This one is useful outside of any specific project. Add this to your ~/.zshrc:

alias diffsum='ollama launch claude --model qwen3-64 --yes -- -p "Look at the staged git diff and write a 2-3 sentence summary of what changed and why it matters. Keep it factual." --allowedTools "Bash" --bare'

Or:

alias diffsum='ANTHROPIC_AUTH_TOKEN=ollama \\
  ANTHROPIC_BASE_URL=http://localhost:11434 \\
  claude -p "Look at the staged git diff and write a 2-3 sentence summary of what changed and why it matters. Keep it factual." \\
  --model qwen3-64k \\
  --allowedTools "Bash" \\
  --bare'

--bare can be added to reduce startup time by skipping auto-discovery of hooks, skills, and CLAUDE.md. It’s useful for the diffsum alias, since you want a fast one-shot response.

Reload:

source ~/.zshrc

Workflow:

git add .
diffsum          # Claude reads what's staged and prints a summary

What I got while trying it on a project (it’s very much on point 👌):

These changes refactor three key files to improve the subscription flow. In profile_screen.dart, account deletion now uses a new Supabase Function (delete-account) with better error handling. The app_state_provider.dart adds logic to prevent users on active internal trials from being incorrectly marked as trial-eligible when upgrading, ensuring they’re charged immediately. subscription_screen.dart cleans up redundant checks by consolidating to only stripeSubscriptionId instead of checking both subscription and customer IDs.

Then commit as usual:

git commit -m "your message, now informed by the summary"

No hooks, no commit flow changes. Run it when you want, skip it when you don’t.

The only major downside is that it takes a good minute or so to generate 🥲

Things That Tripped Me Up (And How I Got Past Them)

Here are a few troubleshooting tips through research and unfortunate experience:

/init Ran But No CLAUDE.md Appeared

This is almost certainly a context window issue. Claude Code needs at least 32K (ideally 64K) to run the agent loop reliably.

At Ollama’s default 4K context, /init appears to run, says something confirming it can see your files, but nothing appears on disk. The model’s context window fills before it can complete the scan.

Create the 64K context model from Step 3 and try again.

If you’re already on a 64K context model and it’s still not generating, check write permissions in your current directory (ls -la), and make sure you’re using a model that supports tool calling (not all Ollama models do).

Models confirmed to work with Claude Code’s agent loop: qwen3, qwen2.5-coder, glm-4.7-flash. Do look this up to confirm, as it can, and does, change.

The Model Responds, But Won’t Read Files

If Claude Code can answer chat questions but claims it can’t access your file system, the model doesn’t have the tool-calling support Claude Code needs for agentic work. It falls back to plain chat mode.

Swap to one of the models listed above.

Tip 👇
Quick test: ask the agent to “list the files in the current directory.” If it lists them, tool calling works. If it says it can’t access the file system, it’s a model issue.

Ollama Without -model Connects To Claude Subscription Instead

Running ollama launch claude without --model launches an interactive model selector, which looks cool though somewhat misguided. In my experience, it was defaulting to connecting to a Claude subscription model instead.

Always specify the model explicitly:

ollama launch claude --model qwen3-64

The env Variable Approach, What I Tried and What Happened

A lot of setup guides (including my earlier versions of this post) walk through manually setting environment variables in your shell profile:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

I tried this first. The Ollama docs do mention it as a manual setup option.

What I got was a bunch of warnings, auth conflicts, and Claude Code still trying to connect to Anthropic’s servers.

The ollama launch command handles all of this automatically and is the simpler path.

If you want to go the manual route, the Ollama docs cover it, but I’d start with ollama launch and only go manual if you have a specific reason.

Why the Alias Works When Manual Exports Don’t

You might wonder why the claude-local alias sets ANTHROPIC_AUTH_TOKEN and ANTHROPIC_BASE_URL when the above causes issues.

Here’s why: when you export to ~/.zshrc, those variables persist in your shell environment for the entire session and are active not just when you launch Claude Code, but when Claude Code tries to re-authenticate, resume a session, or spawn sub-processes.

That’s where the conflict triggers.

Claude Code sees the env variables alongside its stored OAuth session from the initial login, can’t tell which credential to honor, and hangs.

The alias avoids this because the variables are set inline, scoped to that single claude-local execution. Claude Code sees them for exactly the duration of the ollama launch invocation, routes correctly, and they’re gone when the session ends.

There’s no persistent state sitting in the shell, nothing competing with the stored login session.

Note 📍
That’s also why all six variables need to be in the alias together. ANTHROPIC_BASE_URL has to be in scope at the same time as the three-tier model variables, or the tier calls won’t know where to find your local model. Split them across different scopes, and the routing breaks in ways that are hard to trace.

The Honest First Impression

I spent the $200, saw the thumbnails, read the “never paying for AI again” posts, and went through the setup. And the experience was not what I expected in the least.

In fact, I nearly sacked this post altogether because of it 😬

The Tool-Calling Problem Is Real, Here’s What It Actually Looks Like

Let me show you rather than just describe it. Here’s a real interaction I had while working on VersoID, an app I’m about to release, using Qwen3.5 with 64K context:

// My Prompt 😀: 
Analyze the current code to confirm if a pro user who deletes the account will 
also have the subscription in stripe terminated. or not? what's the current 
case?

// AI's Response 💯: 
⏺ I'll analyze the current code to understand the account deletion flow and its
  interaction with Stripe subscriptions. Let me examine the relevant files.

  ls -la lib/
  grep -r "delete" lib/ | head -20
  grep -r "stripe" lib/ | head -20
  grep -r "subscription" lib/ | head -20
  cat lib/subscription_screen.dart | head -200
  grep -A 50 "delete" lib/subscription_screen.dart
  grep -r "account.*delet\\|delet.*account" lib/

✻ Baked for 30s 



// Me, confused 😕: 
❯ did you finish?

// AI's audacity 🙂‍↕️: 
⏺ I don't have context on what specific task or work you're asking about from
  our conversation. Could you clarify what you're referring to?

✻ Baked for 36s



// Me, frustrated 😒: 
❯ I'm referring to: analyze the current code to confirm if a pro user who 
  deletes the account will also have sub in stripe terminated.

// AI's loop 🙈: 
⏺ I'll analyze the current code to determine if a pro user who deletes their
  account will have their Stripe subscription terminated. Let me check the
  relevant files.

✻ Crunched for 28s

[ Produced absolutely no result 🤬 ]


// Totally irrelevant 😑
※ recap: Finishing login flow updates while handling subscription screen
  modifications. Review your next PR before the upcoming merge freeze.

I asked a direct question about my codebase. The model listed a bunch of shell commands it intended to run, then baked for 30 seconds, producing nothing.

When I asked if it had finished, it had no memory of the original task. I repeated myself. It said it would check the files. Baked again. Failed again.

The session ended with a recap summary about commits I hadn’t made, mentioning a “merge freeze” that doesn’t exist, which tells you exactly how far off the rails the context had gotten.

This interaction took about two minutes of my time (don’t get excited, the following interaction took 13 minutes for no result) and returned nothing useful.

And VersoID isn’t an unusually large Flutter codebase—this is a normal project.

Here’s what’s actually happening under the hood

There are two compounding problems, not one. And it took some online research to rationalize this.

The first is context. Even at 64K, a real project session fills up faster than you’d think. Claude Code’s agent loop loads:

  • your task
  • your project files
  • its own tool call history
  • the conversation

All of that into a single context window simultaneously.

When it runs out, Ollama silently truncates from the beginning with no warning. The model loses its earlier tool call results before it can act on them. You see “Baked for 30s” because real work is happening, but it gets discarded because the context needed to complete the response is gone.

There’s no error. It just forgets.

The second problem is the three-tier routing issue, which the alias in Step 5 addresses. Even when Claude Code appears to be running your local model for the main task, background scaffolding calls were silently failing against Anthropic model names Ollama doesn’t recognize.

This is why tool calls could appear to start, produce a list of commands, and then stall. The tool execution infrastructure itself was partially broken underneath.

With the three-tier fix from Step 5 applied (claude-local with all model tiers mapped), the silent background failures stop.

Whether that fully resolves the experience or not is something I’m still figuring out. The context issue is a hardware and model capacity constraint that the alias doesn’t change.

What 64K context actually fixes, and what it doesn’t

64K is the minimum to get the agentic loop working at all. Below it, almost nothing works reliably:

  • /init fails silently
  • tool calls malform
  • the model loses its own outputs

At 64K, the basic stuff works: /init generates, the model reads your files, simple single-file tasks complete.

What 64K doesn’t fix is the capacity ceiling of a 7–8B parameter model. A VersoID-scale project with multiple screens, a Stripe integration, a subscription flow, and a Flutter architecture requires the model to hold a lot in working memory simultaneously while reasoning about relationships across files.

A small local model hits its limit at exactly the kind of task where you’d want the most help. It’s not a context window problem at that point; it’s a model size problem.

The benchmarks for models like Qwen3 look impressive. The real-world agentic experience on a production codebase is a different story.

Qwen3 with 64K context is better than without it since /init works, it reads files, and understands the project in a conversational sense.

But for sustained multi-step work across files? The tool-calling failures, the 2+ minute waits, the generic responses that miss what you actually asked don’t go away. They just happen at a higher complexity threshold.

The Hardware Reality

My machine can run local models. The inference isn’t impossible. But there’s a real relationship between context length, model size, and heat that the hype glosses over.

I tried asking the model to update the README for VersoID, a task that should be relatively light.

It ran for 16+ minutes.

Not slow, not sluggish. That was 16 minutes of the fan running hard, the machine heating up, and ultimately nothing written to disk.

I had to terminate it.

The laptop was working hard the whole time; it just never finished the task.

That’s the other dimension to “free.” The compute cost doesn’t go away; it moves to your hardware.

For short sessions, it’s fine.

For extended agentic dev work, the kind where you’d actually save time, the machine sustains that heat load the entire time.

The “anyone can run local AI on any laptop” framing that shows up in thumbnails is accurate for demos and single-turn questions.

However, for real agentic development, “runs” and “runs well enough to replace a cloud model” are meaningfully different things.

The CLI vs. the Tools I Was Already Using

I’ll be honest: I find the terminal experience significantly less intuitive than Cursor or Antigravity’s GUI.

Not because I’m unfamiliar with the terminal (I’m a developer, I’m comfortable there), but because in a GUI tool I can:

  • see diffs and changes inline
  • move across conversations easily
  • search prior sessions
  • navigate the whole interaction the way you’d navigate a workspace

It feels like a partner. The terminal feels more like a tool you’re directing.

Claude Code does support these behaviors through commands like /clear, Escape Escape for history navigation, and so on, but you have to learn and remember them.

The muscle memory isn’t built in the same way. That learning curve is real, and it’s part of what the hype doesn’t mention.

Compared to Antigravity’s Free Tier

Antigravity’s free tier (not Cursor’s, that’s significantly less generous) lets you use actual frontier models, which got me further faster than this setup did.

Not because local AI is without merit, but because frontier models handle the agentic complexity significantly better with fewer failed tool calls, faster responses, and more coherent multi-step reasoning.

For the use case of “I want to keep developing without paying per token,” the free tier of a tool like Antigravity is genuinely worth comparing against before committing to a local setup.

The “free” in local AI is real, but so is the cost in setup time, workarounds, and the gap in output quality for complex work.

Explore: Google Antigravity Explained: The New Way to Build Apps With Vibe Coding (2026)

Where Local Actually Makes Sense

To be fair, local models aren’t terrible (they keep getting better on those benchmarks). Here’s what they do well:

Privacy-sensitive work. If your code can’t leave your machine for compliance or client reasons, local is the only option. The performance hit is the price of that guarantee.

Codebase Q&A. Asking the model “what does this function do?” or “explain how this module connects to the rest of the app” works well. It’s when you ask it to do things across the codebase that the reliability drops.

Architecture and documentation generation. Generating architecture diagrams, writing documentation from code, and summarizing how a system works are all tasks where local models hold up and where the latency is less painful because you’re not iterating in real time.

Reducing quota burn on routine tasks. Using a local model for the small stuff, like explaining this stack trace, renaming these variables, generating some seed data, while saving frontier model quota for complex work, is a reasonable hybrid strategy. The keyword is hybrid.

Tip: Many of these points (like the code Q&A and documentation) assume working within proper context limits. Otherwise, you know, it’ll forget.

A Note on What You’ll Find Online

The posts and videos claiming local models are a full replacement for paid AI in agentic dev workflows mostly show demos. I’m talking single tasks, clean repositories, models in ideal conditions.

The benchmarks look good. The demos look good. Yet, the real experience, especially for sustained multi-file agentic work, is more nuanced.

That’s not a reason not to try it. It’s worth understanding what you’re getting into—what works well, what doesn’t, and what “free” actually costs in terms of setup and output quality.

The 64K recommendation for agents is important to know since the default context will cause tool calls to fail partway through a task. Getting past that requires the Modelfile step most guides skip.

Even past it, you’re working with a model that has real capability limits relative to what the agentic tools were designed for.

The honest answer to whether it replaces a paid workflow is: not yet. At least, not for most real development work.

It’s a useful addition to your toolkit when used for the right tasks.

Advanced: A Few Things Worth Exploring

If you’re determined to give this setup a shot yourself, here are a few rather cool things to try.

Multi-Claude Workflows

You can run multiple Claude Code sessions simultaneously in separate terminal windows. Two useful patterns:

1. Parallel independent tasks

Run two agents on completely separate files simultaneously:

Terminal 1:

Read 'components/UserCard.jsx' and add PropTypes validation throughout.

Terminal 2:

Read 'utils/dateHelpers.js' and add JSDoc comments to every exported function.

2. Write then review (sequential handoff)

Terminal 1 writes something, Terminal 2 reviews it after Terminal 1 finishes. Don’t run both at the same time on the same file because the reviewer would be looking at something that doesn’t exist yet.

Think of it like running two separate agent instances in Cursor’s or Antigravity’s parallel mode: they work best on tasks that don’t step on each other.

For larger repos, git worktrees lets you run separate Claude sessions on different branches simultaneously without conflicts:

git worktree add ../project-feature-branch feature-branch
cd ../project-feature-branch
ollama launch claude --model qwen3-64

Scheduled Tasks With /loop

/loop is an Ollama feature available when you launch Claude Code through ollama launch claude. It runs a prompt on a recurring schedule inside your active session:

/loop 30m Check my open GitHub PRs and summarize their status
/loop 1h Look at any new files in the logs/ folder and flag new error patterns

Note that the loop runs only while your terminal session is active—it’s not a background daemon.

It’s useful for ambient check-ins during long work sessions.

Headless Mode Automation

Claude Code’s -p flag (also called print mode, previously called headless mode) runs a single prompt non-interactively and exits.

You pass it a prompt, it runs the task, and the output prints to the terminal. No session, no back-and-forth.

We used it earlier for the staged commit diffs alias!

The basic form:

ollama launch claude --model <name> --yes -- -p "your prompt" --allowedTools "Read"
  • -p flag goes after -- which separates Ollama’s arguments from Claude Code’s arguments
  • --allowedTools flag is important for automation. Without it, Claude will pause and ask for permission before reading files.

Tip: For read-only tasks, pre-approving Read keeps things running unattended.

Nightly Code Quality Check

For something that runs on a schedule and writes to a log:

Step 1 — Create the script:

touch scripts/quality-check.sh
chmod +x scripts/quality-check.sh

Step 2 — Add this to the file:

#!/bin/bash
# Add Ollama env vars so cron can find them
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

TIMESTAMP=$(date '+%Y-%m-%d %H:%M')
echo "=== Quality Check: $TIMESTAMP ===" >> quality-log.txt
ollama launch claude --model qwen3-64 --yes -- -p "Review the codebase for: unused variables, functions that could be simplified, and any obvious security issues. Format as a plain list. Be brief." \\
  --allowedTools "Read,Bash" \\
  --model qwen3-64 >> quality-log.txt 2>&1
echo "" >> quality-log.txt

Step 3 — Schedule it:

crontab -e

Add this line to run the check every night at 11 pm:

0 23 * * * /full/path/to/your-project/scripts/quality-check.sh

Replace /path/to/your-project with the actual full path to your project folder.

Verify it saved:

crontab -l

You should see your entry listed. The first time it runs (tomorrow at 11 pm), it will create quality-log.txt in your project root and append the results.

It’s a Wrap

The $200 lesson is what started this. The thumbnails and “never going back to paid” posts are what got me to try the local setup.

Now, the experience is somewhere in between. More interesting than a dismissal, more complicated than the hype.

Getting the setup working required two things most guides leave out:

  • The Modelfile to bump context to 64K
  • The three-tier alias to stop Claude Code from silently routing background calls to Anthropic model names Ollama doesn’t recognize

Without both, things look like they’re working until they don’t. And the failure modes are confusing enough that you end up chasing the wrong problems 😮‍💨

With both in place, you have a functional local AI coding assistant. What it does well, it does consistently. What it struggles with (sustained multi-step agentic work on a production codebase ) is a model capacity constraint, not a configuration problem.

No amount of Modelfile tuning changes the fact that a 7–8B model has real limits compared to what these agent tools are designed around.

What I’d tell someone starting from scratch is to do the setup, apply both fixes, and try it on something real.

Just go in knowing that “free” and “the same experience as a frontier model” are not the same thing.

In a way, I’m glad I finished this post. It definitely didn’t go as I anticipated (the disappointment got somewhat heavy, not gonna lie).

But learning from practical experience is better than just hearing about all these cool capabilities that most will never try.

So, cheers to the spirit of exploration.

😏 Don’t miss these tips!

We don’t spam! Read more in our privacy policy

Related Posts

Leave a Comment

Your email address will not be published. Required fields are marked *