Semifly
Semifly · LLMs

Compare large language models

Frontier hosted models and open-weight models you can self-host, side by side. Figures as of June 2026 and refreshed regularly.

Frontier

Frontier & proprietary models

Leading hosted models, accessed via API. Scores and prices move quickly; figures are indicative.

ModelLabContextNotableAccess
Claude Opus 4.8Anthropic1M+Highest overall score among released modelsAPI
GPT-5.5OpenAI1M+Strong all-round; Pro / Instant variantsAPI
Gemini 3.1 ProGoogle1M+Top reasoning (94.3% GPQA Diamond)API
Grok 4 FastxAI~2.0MLargest practical context windowAPI
Claude Sonnet 4.6Anthropic1M+Balanced speed / quality workhorseAPI
Gemini 3.5 FlashGoogle1MFlagship-level quality at ~4x speed (~$1.50 / 1M in)API

Sources: LLM-Stats, Morph, LM Council (June 2026).

Open source

Open-source & open-weight models

Open-weight models you can download and self-host — run on your own GPUs or on Semifly in one click.

ModelDeveloperParamsContextLicenseDownloadRun
DeepSeek V4 ProDeepSeek~1.6T (MoE)1MMITHugging Face →Run on Semifly
GLM-5.1Zhipu AIMoE256KMITHugging Face →Run on Semifly
Kimi K2.7-CodeMoonshot AIMoE256KMod. MITHugging Face →Run on Semifly
Qwen3.5 (397B-A17B)Alibaba397B (17B active)256KApache 2.0Hugging Face →Run on Semifly
Llama 4 ScoutMetaMoE10MLlama 4Hugging Face →Run on Semifly
Llama 4 MaverickMetaMoE1MLlama 4Hugging Face →Run on Semifly
Mistral Small 4Mistral AI24B256KApache 2.0Hugging Face →Run on Semifly
Gemma 4 (31B)Google31B256KGemmaHugging Face →Run on Semifly

Confirm the current license on each model’s official page before deployment.

Run any of these on Semifly

Tokens & API

Access hosted models through a simple, metered token API.

Get API access →

GPU servers

Buy or lease Supermicro GPU systems to self-host open-weight models.

Browse GPU servers →

AI Foundry

Managed compute for training, fine-tuning, and inference.

Explore AI Foundry →