Frontier hosted models and open-weight models you can self-host, side by side. Figures as of June 2026 and refreshed regularly.
Leading hosted models, accessed via API. Scores and prices move quickly; figures are indicative.
| Model | Lab | Context | Notable | Access |
|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | 1M+ | Highest overall score among released models | API |
| GPT-5.5 | OpenAI | 1M+ | Strong all-round; Pro / Instant variants | API |
| Gemini 3.1 Pro | 1M+ | Top reasoning (94.3% GPQA Diamond) | API | |
| Grok 4 Fast | xAI | ~2.0M | Largest practical context window | API |
| Claude Sonnet 4.6 | Anthropic | 1M+ | Balanced speed / quality workhorse | API |
| Gemini 3.5 Flash | 1M | Flagship-level quality at ~4x speed (~$1.50 / 1M in) | API |
Sources: LLM-Stats, Morph, LM Council (June 2026).
Open-weight models you can download and self-host — run on your own GPUs or on Semifly in one click.
| Model | Developer | Params | Context | License | Download | Run |
|---|---|---|---|---|---|---|
| DeepSeek V4 Pro | DeepSeek | ~1.6T (MoE) | 1M | MIT | Hugging Face → | Run on Semifly |
| GLM-5.1 | Zhipu AI | MoE | 256K | MIT | Hugging Face → | Run on Semifly |
| Kimi K2.7-Code | Moonshot AI | MoE | 256K | Mod. MIT | Hugging Face → | Run on Semifly |
| Qwen3.5 (397B-A17B) | Alibaba | 397B (17B active) | 256K | Apache 2.0 | Hugging Face → | Run on Semifly |
| Llama 4 Scout | Meta | MoE | 10M | Llama 4 | Hugging Face → | Run on Semifly |
| Llama 4 Maverick | Meta | MoE | 1M | Llama 4 | Hugging Face → | Run on Semifly |
| Mistral Small 4 | Mistral AI | 24B | 256K | Apache 2.0 | Hugging Face → | Run on Semifly |
| Gemma 4 (31B) | 31B | 256K | Gemma | Hugging Face → | Run on Semifly |
Confirm the current license on each model’s official page before deployment.