antirez@antirez·Original post
2 bit DeepSeek v4 Flash inference. Experts w1/3: IQ2_XXS, w2: Q2_K, all the rest left mostly F16/F32 Final GGUF: 86.18 GiB Runs on a MacBook M3 max with CPU (No Metal backend test as it will crash my OS for OOM likely) at 8 t/s. Should be much faster with Metal.
