How to run Qwen3.5-9B with llama.cpp and Pi

Thu, 14 May 2026 12:00:00 +0200

I ran Qwen3.5-9B, a 4-bit quantized model, locally on my MacBook Pro M4 Pro with 24 GB of RAM, pointed a terminal coding agent at it, and asked it to build a checkout page with the Stripe API. It did. No cloud, no API calls to OpenAI, no token costs. Just a model running on my laptop.

Here’s how.

Qwen 3.5

Qwen3.5 is Alibaba’s latest open-weight language model family. The 9B variant sits in a sweet spot: large enough to be genuinely useful for coding tasks, small enough to run on consumer hardware once quantized. It supports a 256K token context window and performs competitively with much larger models on coding benchmarks.

Qwen on Jeroen Nyckees

How to run Qwen3.5-9B with llama.cpp and Pi

Qwen 3.5