Track the latest models, compare frontier and open-source LLMs, and download open-weight models — then run any of them on Semifly with tokens, GPU servers, and AI Foundry.
Overall scores from SuperCLUE’s general-capability leaderboard. Bars colored by access type.
Source: SuperCLUE 通用榜 · 总排行榜 · 总分 (2026年5月). Scores indicative; refreshed regularly.
Updated June 2026 · refreshed regularly
Anthropic's Claude Opus 4.8 debuted at the top of the Artificial Analysis Intelligence Index and leads aggregate leaderboards.
Artificial Analysis →Claude reached a $30B annualized revenue run rate by the end of Q1 2026 — roughly 80x growth in a single quarter.
LLM-Stats →Google launched Gemini 3.5 at I/O 2026 and committed Gemini 3.5 Pro for June, leaning hard into agentic capabilities.
WaveSpeed →Gemini 3.5 Flash rivals large flagships while running ~4x faster, priced around $1.50 / 1M input tokens.
LLM-Stats →GPT-5.5 is OpenAI's current frontier model, with Pro and Instant variants spanning quality and latency needs.
WaveSpeed →Agentic coding model, 256K context, Modified MIT — with ~30% lower reasoning token usage than K2.6.
LLM-Stats →A diffusion-based reasoning model generating tokens in parallel — targeting agentic loops and real-time voice.
LLM-Stats →A frontier model beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) with ~4x faster output.
LLM-Stats →Top hosted and open-weight models as of June 2026. Full tables on the compare page.
| Model | Lab | Context | Notable | Access |
|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | 1M+ | Highest overall score among released models | API |
| GPT-5.5 | OpenAI | 1M+ | Strong all-round; Pro / Instant variants | API |
| Gemini 3.1 Pro | 1M+ | Top reasoning (94.3% GPQA Diamond) | API | |
| Grok 4 Fast | xAI | ~2.0M | Largest practical context window | API |
| Claude Sonnet 4.6 | Anthropic | 1M+ | Balanced speed / quality workhorse | API |
| Gemini 3.5 Flash | 1M | Flagship-level quality at ~4x speed (~$1.50 / 1M in) | API |
Grab the weights from Hugging Face, then run on your own GPUs or on Semifly.