Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞): Interesting fact: if Opus 4.7 is ≈35% less token-efficient than 4.6, this suggests its long-context degradation is *vastly worse* than sugge

#560

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex· 59.8K followers

MediaRank #560

Original Tweet

Interesting fact: if Opus 4.7 is ≈35% less token-efficient than 4.6, this suggests its long-context degradation is *vastly worse* than suggested by MRCR or GraphWalks, because as a user I care about the codebase/text, not "tokens" it's broken down into. https://twitter.com/YouJiacheng/status/2044956691540951115

View on X →

AI Classification

Whether our pipeline considers this post AI-relevant

AI Relevant

Enriched Text

Assembled input used for vector embedding and topic clustering

Context: Quoting @YouJiacheng: "But GraphWalks scores also degraded. GraphWalks has 100× 256k problems and 100× 1M problems. So we can calculate 1M subset scores based on 256K&1M and 256k scores. Opus 4.7: BFS@1M = 40.29%, Parents@1M = 56.63% Opus 4.6 (64k): BFS@1M = 41.2%, Parents@1M = 71.1% @YouJiacheng: "But GraphWalks scores also degraded. GraphWalks has 100× 256k problems and 100× 1M problems. So we can calculate 1M subset scores based on 256K&1M and 256k scores. Opus 4.7: BFS@1M = 40.29%, Parents@1M = 56.63% Opus 4.6 (64k): BFS@1M = 41.2%, Parents@1M = 71.1% https://x.com/YouJiacheng/status/2044956691540951115/photo/1 https://twitter.com/bcherny/status/2044821479980929082" @bcherny: "👋 We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly. Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code. MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too." Tweet: Interesting fact: if Opus 4.7 is ≈35% less token-efficient than 4.6, this suggests its long-context degradation is *vastly worse* than suggested by MRCR or GraphWalks, because as a user I care about the codebase/text, not "tokens" it's broken down into. @YouJiacheng: "But GraphWalks scores also degraded. GraphWalks has 100× 256k problems and 100× 1M problems. So we can calculate 1M subset scores based on 256K&1M and 256k scores. Opus 4.7: BFS@1M = 40.29%, Parents@1M = 56.63% Opus 4.6 (64k): BFS@1M = 41.2%, Parents@1M = 71.1% https://x.com/YouJiacheng/status/2044956691540951115/photo/1 https://twitter.com/bcherny/status/2044821479980929082"

Current Stats

2.0KViews

22Likes

1Retweets

1Replies

5Bookmarks

1Quotes

Topic: Claude 4.7 New Tokenizer DebateStory: Anthropic ends Claude subscriptions for third-party tools like OpenClaw

Engagement Timeline(13 snapshots)

Time	Views	Likes	Bookmarks	RTs	Replies
11:00 AM UTC	+98	—	—	—	—
10:50 AM UTC	+82	+3	—	—	—
10:40 AM UTC	+63	—	—	—	—
10:30 AM UTC	+62	+2	+1	—	—
10:20 AM UTC	+61	—	—	—	—
10:10 AM UTC	+69	+1	—	—	—
10:00 AM UTC	+111	—	—	—	—
9:50 AM UTC	+12	—	—	—	—
9:40 AM UTC	+308	+4	—	+1	—
9:30 AM UTC	+196	+1	+1	—	+1

Time

Views

Likes

Bookmarks

RTs

Replies

11:00 AM UTC

+98

—

10:50 AM UTC

+82

—

10:40 AM UTC

+63

—

10:30 AM UTC

+62

—

10:20 AM UTC

+61

—

10:10 AM UTC

+69

—

10:00 AM UTC

+111

—

9:50 AM UTC

+12

—

9:40 AM UTC

+308

—

9:30 AM UTC

+196

—