Rank #1917h ago

Simon Willison seeks DeepSeek-V4-Flash performance on Apple silicon Macs

SARC

4 top authors

Simon Willison queries real-world performance of DeepSeek-V4-Flash on Mac systems with 512 GB, 256 GB, 128 GB or less RAM. Discussion focuses on quantization for deployment, including 2-bit inference build using IQ2_XXS for w1/3, Q2_K for w2, and higher precision elsewhere. This achieves 8 tokens per second on MacBook M3 Max with 86.1 benchmark score. Community expresses interest in 1.5-bit flash quantization timeline.

First post

Simon Willison@simonw·17hOriginal post

Anyone got DeepSeek-V4-Flash running on a Mac yet? 512GB or 256GB or 128GB or smaller?

View on

Why it matters

DeepSeek V4 Flash model contains 284 billion total parameters while activating only 13 billion during each inference pass.

Rohan Anil (former Gemini lead at Google DeepMind now at Anthropic) inquired about the timeline for 1.5 bit quantization.

Simon Willison (Datasette creator and Django co-creator) develops software that simplifies running large models on Apple silicon.

3 more posts

Retweeted by clem 🤗·3hView on

RO
rohan anil@_arohan_·15hQuote tweet
1.5 bit flash when?
SI
Simon Willison@simonw·