Semifly
Semifly · LLMs

Latest in large language models

Releases, benchmarks, open-source drops, and industry shifts — refreshed regularly so you can see what’s newest at a glance.

Global · English

What’s new worldwide

Updated June 2026 · refreshed regularly

May 27, 2026Model rankings

Claude Opus 4.8 takes #1 on the Intelligence Index

Anthropic's Claude Opus 4.8 debuted at the top of the Artificial Analysis Intelligence Index and leads aggregate leaderboards.

Artificial Analysis →
Jun 2026Business

Anthropic: Claude hits $30B revenue run rate

Claude reached a $30B annualized revenue run rate by the end of Q1 2026 — roughly 80x growth in a single quarter.

LLM-Stats →
May 2026Event

Google I/O: the “agentic Gemini era” begins

Google launched Gemini 3.5 at I/O 2026 and committed Gemini 3.5 Pro for June, leaning hard into agentic capabilities.

WaveSpeed →
Jun 2026Release

Gemini 3.5 Flash: flagship quality at 4x speed

Gemini 3.5 Flash rivals large flagships while running ~4x faster, priced around $1.50 / 1M input tokens.

LLM-Stats →
May 4, 2026Release

OpenAI ships GPT-5.5 (Pro / Instant)

GPT-5.5 is OpenAI's current frontier model, with Pro and Instant variants spanning quality and latency needs.

WaveSpeed →
Jun 2026Open source

Moonshot open-sources Kimi K2.7-Code

Agentic coding model, 256K context, Modified MIT — with ~30% lower reasoning token usage than K2.6.

LLM-Stats →
Jun 2026Architecture

Inception's Mercury 2 hits 1,000+ tokens/sec

A diffusion-based reasoning model generating tokens in parallel — targeting agentic loops and real-time voice.

LLM-Stats →
Jun 2026Benchmarks

New model tops Gemini 3.1 Pro on agentic tasks

A frontier model beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) with ~4x faster output.

LLM-Stats →
Jun 4, 2026Safety

NVIDIA unveils Nemotron 3.5 Content Safety

A customizable multimodal safety model for enterprise AI, with guardrails across text, image and more.

NVIDIA →
Jun 1, 2026Industry

GitHub Copilot moves to usage-based billing

GitHub shifted Copilot to metered billing as inference costs from agentic coding sessions rose.

GitHub →
Jun 2026Context

Context windows keep growing

13+ hosted frontier models ship 1M+ windows; Grok 4 Fast exposes ~2.0M and Llama 4 Scout reaches 10M tokens.

Morph →
Jun 2026Open source

Open models match or beat GPT-4 on key benchmarks

Llama, Mistral and Qwen now meet or exceed GPT-4-class scores on several public benchmarks.

ComputingForGeeks →
中国 · 中文

中国大模型现状

全球与中国的大模型生态在一定程度上是“脱节”的——这里用中文单独呈现国产大模型的最新格局与动态。

2026年格局

国产“四强”:DeepSeek、智谱、通义、月之暗面

DeepSeek、GLM(智谱)、Qwen(阿里通义)、Kimi(月之暗面)在开源与性价比领域与国际厂商正面竞争。

人人都是产品经理 →
2026年5月评测

DeepSeek-V4-Pro 获评国产综合第一

在国产主流大模型综合评估中,DeepSeek-V4-Pro 数学推理与代码能力“双冠”,位列综合第一。

AI 中文社区 →
2026年长文本

Kimi K2.6 超长文档与长程编程最强

月之暗面 Kimi 在长文档分析、报告生成和长程编程任务上积累了大量用户,长文本能力领先。

AI 中文社区 →
2026年企业级

智谱 GLM-5.1 企业级 Agent 稳定性最佳

GLM-5.1 在企业级 Agent 场景的稳定性表现突出,加上干净的 MIT 许可,受私有化部署青睐。

AI 中文社区 →
2026年性价比

通义 Qwen3.6-Plus 被称“性价比之王”

阿里通义千问以 Apache 2.0 开放许可和高性价比著称,是国产开源选型的热门之选。

AI 中文社区 →
2026年2月用量

MiniMax 以 10B 激活参数登顶 OpenRouter

MiniMax 发布 M2.5,10B 激活参数实现 Agent 场景高效推理,周调用 3.07T tokens 一度登顶 OpenRouter。

知乎 →
2026年厂商

字节豆包、百度文心持续迭代

字节豆包更新 Doubao-Seed 系列通用与推理模型;文心一言 5.0 背靠百度搜索的知识图谱强化检索增强。

知乎 →
2026年榜单

中国 AI 大模型平台排行榜持续更新

国内媒体每月发布中国 AI 大模型平台排行榜,跟踪各厂商在通用、推理、多模态等维度的最新位次。

腾讯新闻 →

以上为公开报道整理,仅供参考;以各厂商官方信息为准。

Run any of these on Semifly

Tokens & API

Access hosted models through a simple, metered token API.

Get API access →

GPU servers

Buy or lease Supermicro GPU systems to self-host open-weight models.

Browse GPU servers →

AI Foundry

Managed compute for training, fine-tuning, and inference.

Explore AI Foundry →