M4 Mac with 24GB Memory Successfully Runs Qwen 3.5-9B Local AI Model

Original: Running local models on an M4 with 24GB memory

Why This Matters

Demonstrates practical local AI deployment on consumer hardware for development workflows

Developer successfully runs Qwen 3.5-9B (Q4 quantized) local AI model on M4 Mac with 24GB RAM using LM Studio, achieving 40 tokens per second with 128K context window. Setup enables offline AI tasks while leaving memory for other applications.

A developer documented their experience running local AI models on an M4 Mac with 24GB memory, testing various options including Qwen 3.6 Q3, GPT-OSS 20B, and Devstral Small 24B. The optimal configuration uses Qwen 3.5-9B with Q4_K_S quantization via LM Studio, delivering approximately 40 tokens per second with thinking mode enabled and 128K context window. The setup requires specific temperature settings (0.6) and prompt template modifications to enable thinking mode. While performance doesn't match state-of-the-art cloud models, the local setup provides offline capability and reduces dependence on big tech platforms. The author tested integration with Pi and OpenCode development tools, noting the trade-offs between customization and ease of use.

Source

jola.dev — Read original →