~$ ls -la posts/ | grep ai

How to run Qwen3.5-9B with llama.cpp and Pi

I ran Qwen3.5-9B, a 4-bit quantized model, locally on my MacBook Pro M4 Pro with 24 GB of RAM, pointed a terminal coding agent at it, and asked it to build a checkout page with the Stripe API. It did. No cloud, no API calls to OpenAI, no token costs. Just a model running on my laptop. Here’s how. Qwen 3.5 Qwen3.5 is Alibaba’s latest open-weight language model family. The 9B variant sits in a sweet spot: large enough to be genuinely useful for coding tasks, small enough to run on consumer hardware once quantized. It supports a 256K token context window and performs competitively with much larger models on coding benchmarks. [read more]

Building an AI-Powered Salary Search Engine with Local LLMs and Vector Search

What if you could search for salaries not by exact keywords, but by describing a job in natural language? “Senior developer in Brussels with a company car” or “nurse working in Antwerp” — and get relevant results based on semantic similarity rather than string matching. This is exactly what I built with BeSalary: an AI-powered salary search engine that extracts structured data from Reddit posts using local LLMs and enables semantic search through vector embeddings. You can try it live at besalary-wine.vercel.app. [read more]