Semifly
Semifly · Tokens & LLM Models

Large Language Models, made easy to compare, download, and run

Track the latest models, compare frontier and open-source LLMs, and download open-weight models — then run any of them on Semifly with tokens, GPU servers, and AI Foundry.

Leaderboard

How the top models rank

Overall scores from SuperCLUE’s general-capability leaderboard. Bars colored by access type.

Proprietary (API)Open-weight

Source: SuperCLUE 通用榜 · 总排行榜 · 总分 (2026年5月). Scores indicative; refreshed regularly.

Latest in LLMs

What’s new in large models

Updated June 2026 · refreshed regularly

May 27, 2026Model rankings

Claude Opus 4.8 takes #1 on the Intelligence Index

Anthropic's Claude Opus 4.8 debuted at the top of the Artificial Analysis Intelligence Index and leads aggregate leaderboards.

Artificial Analysis →
Jun 2026Business

Anthropic: Claude hits $30B revenue run rate

Claude reached a $30B annualized revenue run rate by the end of Q1 2026 — roughly 80x growth in a single quarter.

LLM-Stats →
May 2026Event

Google I/O: the “agentic Gemini era” begins

Google launched Gemini 3.5 at I/O 2026 and committed Gemini 3.5 Pro for June, leaning hard into agentic capabilities.

WaveSpeed →
Jun 2026Release

Gemini 3.5 Flash: flagship quality at 4x speed

Gemini 3.5 Flash rivals large flagships while running ~4x faster, priced around $1.50 / 1M input tokens.

LLM-Stats →
May 4, 2026Release

OpenAI ships GPT-5.5 (Pro / Instant)

GPT-5.5 is OpenAI's current frontier model, with Pro and Instant variants spanning quality and latency needs.

WaveSpeed →
Jun 2026Open source

Moonshot open-sources Kimi K2.7-Code

Agentic coding model, 256K context, Modified MIT — with ~30% lower reasoning token usage than K2.6.

LLM-Stats →
Jun 2026Architecture

Inception's Mercury 2 hits 1,000+ tokens/sec

A diffusion-based reasoning model generating tokens in parallel — targeting agentic loops and real-time voice.

LLM-Stats →
Jun 2026Benchmarks

New model tops Gemini 3.1 Pro on agentic tasks

A frontier model beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) with ~4x faster output.

LLM-Stats →
View all news & 中国大模型现状 →
Compare

Frontier & open-source models

Top hosted and open-weight models as of June 2026. Full tables on the compare page.

ModelLabContextNotableAccess
Claude Opus 4.8Anthropic1M+Highest overall score among released modelsAPI
GPT-5.5OpenAI1M+Strong all-round; Pro / Instant variantsAPI
Gemini 3.1 ProGoogle1M+Top reasoning (94.3% GPQA Diamond)API
Grok 4 FastxAI~2.0MLargest practical context windowAPI
Claude Sonnet 4.6Anthropic1M+Balanced speed / quality workhorseAPI
Gemini 3.5 FlashGoogle1MFlagship-level quality at ~4x speed (~$1.50 / 1M in)API
Full comparison →
Download

Open-weight downloads

Grab the weights from Hugging Face, then run on your own GPUs or on Semifly.

DeepSeek V4 Pro

DeepSeek · MIT · ~1.6T (MoE)

GLM-5.1

Zhipu AI · MIT · MoE

Kimi K2.7-Code

Moonshot AI · Mod. MIT · MoE

Qwen3.5 (397B-A17B)

Alibaba · Apache 2.0 · 397B (17B active)

Mistral Small 4

Mistral AI · Apache 2.0 · 24B
All downloads →

Run any of these on Semifly

Tokens & API

Access hosted models through a simple, metered token API.

Get API access →

GPU servers

Buy or lease Supermicro GPU systems to self-host open-weight models.

Browse GPU servers →

AI Foundry

Managed compute for training, fine-tuning, and inference.

Explore AI Foundry →