Local-Inference | Jeroen Nyckees

How to run Qwen3.5-9B with llama.cpp and Pi

I ran Qwen3.5-9B, a 4-bit quantized model, locally on my MacBook Pro M4 Pro with 24 GB of RAM, pointed a terminal coding agent at it, and asked it to build a checkout page with the Stripe API. It did. No cloud, no API calls to OpenAI, no token costs. Just a model running on my laptop. Here’s how. Qwen 3.5 Qwen3.5 is Alibaba’s latest open-weight language model family. The 9B variant sits in a sweet spot: large enough to be genuinely useful for coding tasks, small enough to run on consumer hardware once quantized. It supports a 256K token context window and performs competitively with much larger models on coding benchmarks. [read more]