Ollama, a popular app for running AI models locally on a computer, has released an update that now takes advantage of Apple’s machine learning framework, MLX.
With this integration, the app delivers a significant performance boost on Macs equipped with Apple silicon. By leveraging MLX, Ollama can more efficiently utilize the unified memory architecture and hardware acceleration built into Apple’s chips, resulting in noticeably faster model execution and improved responsiveness when running local AI workloads.
The update is particularly beneficial for users running larger or more complex models, where improved memory handling and optimized processing can make a substantial difference in speed and overall performance.
According to Ollama, the updated version delivers substantial performance gains when running on Macs using Apple silicon.
The new release processes prompts about 1.6× faster in prefill speed and nearly twice as fast in decode speed, meaning responses are generated significantly quicker than before. Macs equipped with newer M5-series chips reportedly benefit the most, thanks to Apple’s updated GPU Neural Accelerators.
The update also introduces improved memory management, which helps maintain smoother performance during longer AI sessions. This should make tools like AI coding assistants and conversational agents feel more responsive and stable over time.
Ollama notes that these improvements are especially useful for macOS users running AI-powered tools such as coding agents and assistants like OpenClaw, Claude Code, OpenCode, and Codex-based workflows.
The preview version is available as Ollama 0.19, but it requires a Mac with more than 32GB of unified memory. At launch, support is limited to Alibaba’s Qwen3.5 model, with additional model support expected in future updates.
