Added to prinzbench: - GPT-5.5 Pro (Extended) - GPT-5.5 Thinking (Heavy) - Opus 4.7 - Meta Muse Spark Overall impressions from testing the models: 1. GPT-5.5 Pro scored slightly (3 points) better than GPT-5.4 Pro, including a solid improvement in Legal Research (by 4 points) and a slight decrease in Search (by 1 point). Overall score: 82/99. As noted elsewhere, this model is *significantly* faster than GPT-5.4 Pro; a question that took GPT-5.4 Pro ~30 minutes to answer takes GPT-5.5 Pro ~8 minutes. It's a good model! We have now reached the point where I am surprised if it does not answer a question correctly. 2. GPT-5.5 Thinking (Heavy) is the star of the show, scoring a full 5 points higher than GPT-5.4 (xhigh) and a full 6 points higher than GPT-5.4 Thinking (Heavy). A big jump in Legal Research (+6 points vs. GPT-5.4 (xhigh) is once again offset here by a slight decrease in Search (-1 point vs. GPT-5.4 (xhigh)). Overall score: 74/99. As with Pro, this model is *significantly* faster than GPT-5.4 Thinking (Heavy); a question that took GPT-5.4 ~8-10 minutes to answer takes GPT-5.5 Thinking ~2 minutes. 3. Opus 4.7 started off really well, and I even thought at one point that it might match the performance of Gemini 3 Pro, but... it trailed off in the end. Overall score: 25/99. This is a significantly better performance than that achieved by any other Anthropic model on my benchmark to date (e.g., 6 points higher than Opus 4.6), but Opus 4.7 still significantly trails many other models released over the past 6 months. On the bright side, the model's Search score (4/24) is significantly better than the usual 1/24 or 0/24 that I typically get from Anthropic models. Some further improvement in search capabilities might unlock performance approximately equivalent to that of Gemini 3 Pro for this model. 4. Meta Muse Spark achieved a very unspectacular score of 31/99. Not quite as good as Gemini 3, not quite as good as Kimi K-2.5 Thinking. This model is nothing to write home about. More details in the link below. Please see footnote 1 in particular, which talks about my participation in OpenAI's early access program for GPT-5.5.
