LLM News — Latest Large Language Model Releases & Updates

Global · English

What’s new worldwide

Updated July 2026 · refreshed regularly

Jul 24, 2026Model rankings

Anthropic claims its new Claude Opus 5 delivers near-Fable 5 performance at half the token price

Anthropic's new flagship model Claude Opus 5 posts top scores in coding and knowledge work at half of Fable 5's token rates. On ARC-AGI-3, a benchmark for novel…

The Decoder →

Jul 24, 2026Open source

Microsoft's open-weight AI push is so obviously an Azure play it hurts

Microsoft, along with Meta, Nvidia, and more than 20 other companies, is pushing for open-weight AI models in an open letter. The strategic logic is simple: the more…

The Decoder →

Jul 24, 2026Model rankings

Sakana claims its AI model router Fugu Ultra v1.1 now beats Fable 5 without even including it in the pool

Sakana AI has updated its Fugu Ultra AI router to version 1.1, claiming gains of up to 7.9 points over v1.0. Independent verification doesn't exist yet. The update adds…

The Decoder →

Jul 24, 2026Release

Introducing Claude Opus 5

Introducing Claude Opus 5 I've been offline kayaking with sea otters for much of today so I haven't had a chance to put Anthropic's new model Claude Opus 5 through its…

Simon Willison →

Jul 19, 2026Release

AI Mania Is Eviscerating Global Decision-Making

AI Mania Is Eviscerating Global Decision-Making Here's an entertaining perspective from Nik Suresh on the AI mania that is overwhelming the large companies that he…

Simon Willison →

Jul 16, 2026Release

Firefox in WebAssembly

Firefox in WebAssembly This is absurdly cool: Puter compiled Firefox to WebAssembly such that the whole browser runs in another browser. Here's my blog, running in…

Simon Willison →

Jul 15, 2026Release

OpenAI is now using AI to attack its own AI, and it's working better than humans ever did

OpenAI's internal GPT-Red model finds successful attacks in 84 percent of test scenarios through self-play training. Human red teamers manage just 13 percent. The…

The Decoder →

Jul 15, 2026Release

GPT-5.6 Sol reportedly disproves a 30-year-old statistics conjecture in 90 minutes after humans couldn't crack it

A University of Pennsylvania statistics professor used OpenAI's GPT-5.6 Sol Pro to disprove a central open conjecture about the Benjamini-Hochberg method in roughly 90…

The Decoder →

Jul 15, 2026Open source

Bonsai 27B is a full open reasoning model that fits on an iPhone

PrismML compressed a 27B reasoning model to under 4 GB, small enough for phone-class local inference while retaining most benchmark performance.

The Decoder →

Jul 15, 2026Open source

Thinking Machines releases Inkling, a 975B open-weight multimodal MoE

Inkling supports text, image and audio inputs, a 1M context window, controllable thinking effort, and day-0 deployment support in Transformers, SGLang and llama.cpp.

Hugging Face →

Jul 15, 2026Architecture

Soofi Consortium Releases Soofi S 30B-A3B: An Open Hybrid Mamba-Transformer MoE Foundation Model For German And English

Soofi S 30B-A3B is an open Mamba-Transformer MoE model activating 3.2B of 31.6B parameters for German and English The post Soofi Consortium Releases Soofi S 30B-A3B: An…

MarkTechPost →

Jul 15, 2026Open source

xai-org/grok-build, now open source

xAI open-sourced grok-build after criticism of its CLI behavior, giving developers a closer look at how the coding tool handles local project context.

Simon Willison →

Jul 15, 2026Architecture

How I tricked Claude into leaking your deepest, darkest secrets

A Claude web_fetch data-exfiltration test shows how tool design and prompt boundaries matter when LLMs browse private or sensitive content.

Simon Willison →

Jul 14, 2026Open source

Mistral Vibe for Code vs Claude Code vs Cursor vs Codex: Four Agents Scored on One Scaffold-to-PR Task

See how Vibe, Claude Code, Cursor, and Codex compare on cost, open weights, self-hosting, and async agent surfaces. The post Mistral Vibe for Code vs Claude Code vs…

MarkTechPost →

Jul 7, 2026Open source

Liquid AI open-sources Antidoom to reduce reasoning-model doom loops

Final Token Preference Optimization targets the token that starts repetitive loops; Liquid reports LFM2.5-2.6B loop rates falling from 10.2% to 1.4%.

Liquid AI →

Jul 7, 2026Release

Anthropic's Claude Cowork AI agent is now available on mobile and web

Anthropic is rolling out its AI agent Claude Cowork to mobile and web. Until now, the feature was limited to the desktop app. The agent keeps working in the background…

The Decoder →

Jul 7, 2026Open source

Cohere Transcribe Arabic is an open-source model built for Arabic's toughest transcription problems

Cohere has released Transcribe Arabic, an open-source model for Arabic speech recognition that the company says outperforms Whisper and OmniASR on dialects,…

The Decoder →

Jul 7, 2026Release

Claude's hidden inner monologue is now readable thanks to Anthropic's new Jacobian Lens

Anthropic has found that Claude developed an internal working memory on its own during training. The company calls it "J-Space" and can now read it using a new analysis…

The Decoder →

Jul 7, 2026Release

OpenAI Releases GPT-Realtime-2.1 and GPT-Realtime-2.1-mini for Low-Latency Voice Agents in the API

OpenAI added two Realtime models to its API. GPT-Realtime-2.1-mini is a mini reasoning model for voice, priced like the earlier gpt-realtime-mini. OpenAI also cut p95…

MarkTechPost →

Jul 6, 2026Release

Training Gemma-3 for Structured Mathematical Reasoning with Tunix GRPO, LoRA Adapters, and GSM8K Rewards

We build an end-to-end GRPO training workflow that teaches Gemma-3 to reason through GSM8K math problems. We prepare the environment, authenticate with Hugging Face,…

MarkTechPost →

Jul 6, 2026Open source

Synthetic Sciences Releases OpenScience: An Open-Source, Model-Agnostic AI Workbench for Machine Learning, Biology,…

Synthetic Sciences has released OpenScience, an Apache-2.0 AI workbench for scientific research. It works with any frontier or open-weight model, using your own API…

MarkTechPost →

Jul 5, 2026Release

Sakana AI launches Namazu-powered Sakana Translate

Sakana Translate adds Japanese-English-Chinese translation, proofreading and follow-up Q&A modes to Sakana Chat, powered by the Namazu model series.

MarkTechPost →

Jul 4, 2026Release

Better Models: Worse Tools

Better Models: Worse Tools Armin reports on a weird problem he ran into while hacking on Pi: The short version is that newer Claude models sometimes call Pi’s edit tool…

Simon Willison →

Jul 1, 2026Release

Anthropic redeploys Claude Fable 5 after export controls lift

Fable 5 returns globally on Claude.ai, Claude Code, Claude Cowork and the Claude Platform with additional cybersecurity safeguards after the June suspension.

Anthropic →

China · Watch

China watch

China’s LLM scene moves on its own beat — the latest on Chinese models (DeepSeek, Qwen, GLM, Kimi, MiniMax and more), in English, pulled from Chinese and international coverage.

Jul 24, 2026Release

Kimi K3 trails frontier US models by a wide margin on cyber exploits, and distillation may explain why

The British AI Security Institute and the U.S. Center for AI Standards and Innovation tested Moonshot AI's Kimi K3 on offensive cyber tasks. Kimi K3 scored 32 percent on…

The Decoder →

Jul 24, 2026Release

Hefei hits another AI unicorn: In the multimodal sector, it raised 2.1 billion in just three months.

Exploring a new path for native multimodal integration.

QbitAI →

Jul 24, 2026Model rankings

The domestic world model has topped the leaderboard by Fei-Fei Li's team! It is compatible with domestic Ascend…

Give it a picture, and it returns your entire world.

QbitAI →

Jul 22, 2026Release

Are AI labs pelicanmaxxing?

Are AI labs pelicanmaxxing? Excellent piece of work by Dylan Castillo, who took a deep-dive into the frequently pondered question of whether the AI labs have been…

Simon Willison →

Jul 20, 2026Release

Who’s Afraid of Chinese Models?

Who’s Afraid of Chinese Models? Interesting proposal from Ben Thompson that both addresses the hypocrisy of labs outlawing distillation against their models despite…

Simon Willison →

Jul 18, 2026Release

Claude make Fable 5 permanent

Claude make Fable 5 permanent An update from the @claudeai account on Twitter: Beginning July 20, Claude Fable 5 will be included in all Max and Team Premium plans, at…

Simon Willison →

Jul 15, 2026Release

StepFun showcases STEPX Neo, an LLM-native agent phone, at WAIC 2026

STEPX Neo is positioned as a large-model-native intelligent-agent phone, pushing StepFun beyond chatbots into device-level AI interaction.

QbitAI →

Jul 15, 2026Release

Alibaba releases Qwen-Audio-3.0-Realtime for live voice agents

The real-time speech model upgrades intelligence, agent tool invocation, empathetic dialogue and duplex interaction fluency for voice-first applications.

QbitAI →

Jul 14, 2026Industry

DeepSeek needs more cash just weeks after closing its first $7 billion round

DeepSeek is already raising again. The Chinese AI lab just closed its first funding round and needs capital for its own data centers and chips to keep its aggressive…

The Decoder →

Jul 7, 2026Open source

Tencent releases Hy3, a 295B open MoE model with 256K context

Hy3 activates 21B parameters per token, ships under Apache 2.0, and targets reasoning, coding agents and long-context workflows with vLLM/SGLang deployment recipes.

MarkTechPost →

Jul 7, 2026Release

openJiuwen debuts Skill-Omni for multimodal agent skills

The openJiuwen team introduces a multimodal skill pattern that pairs text instructions with visual references and reusable experience libraries for agent workflows.

QbitAI →

Jul 7, 2026Release

Deepseek is designing its own AI chip

Chinese startup Deepseek is building its own AI chip, Reuters reports. The article Deepseek is designing its own AI chip appeared first on The Decoder.

The Decoder →

Jul 5, 2026Open source

Meituan releases LongCat-2.0, a 1.6T open MoE coding model

LongCat-2.0 targets agentic coding with a native 1M-token context window, about 48B active parameters per token, and MIT-licensed release plans.

MarkTechPost →

Jun 26, 2026Release

AI startup Lindy ditched Claude entirely for Deepseek, saving millions as cost pressure mounts on Anthropic

AI startup Lindy ditched Claude entirely for Deepseek after AI costs exceeded personnel costs. CEO Flo Crivello calls it "a matter of survival for the business." The…

The Decoder →

Jun 17, 2026Open source

Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons

Chinese AI lab Zhipu AI releases GLM-5.2 with a stable 1-million-token context under the MIT license. On FrontierSWE, a benchmark for hours-long coding tasks, the…

The Decoder →

Jun 17, 2026Model rankings

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token…

MiniMax released MSA, a sparse attention built on Grouped Query Attention. A lightweight Index Branch selects Top-k key-value blocks per query and GQA group; the Main…

MarkTechPost →

Jun 17, 2026Release

GLM-5.2: Built for Long-Horizon Tasks

Hugging Face →

Jun 16, 2026Release

Microsoft's Copilot Cowork moves to usage-based billing and may tap DeepSeek

Microsoft is weighing a fine-tuned version of Deepseek V4 as a cheaper model option for Copilot Cowork. The company is also switching to usage-based billing, since…

The Decoder →

Jun 16, 2026Industry

DeepSeek takes outside money for the first time at a $50 billion valuation

Chinese AI startup DeepSeek has raised more than 50 billion yuan - about $7.4 billion - in its first external funding round. The article DeepSeek takes outside money for…

The Decoder →

Jun 15, 2026Open source

Alibaba ships Qwen3 Coder Next

The Qwen team's newest agentic coding model lands alongside a wave of MiniMax 'Highspeed' variants (M2.5/M2.7), keeping the open-weight release pace relentless.

LLM-Stats →

Jun 12, 2026Open source

Moonshot open-sources Kimi K2.7-Code

Agentic coding model (~1T params, 256K context) under a Modified MIT license — the update cuts reasoning token usage ~30% versus K2.6 and boosts MCP tool-calling.

LLM-Stats →

Apr 24, 2026Release

DeepSeek-V4: a million-token context that agents can actually use

Hugging Face →

Feb 3, 2026Open source

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

Hugging Face →

Jan 19, 2026Release

Claude Code costs up to $200 a month. Goose does the same thing for free.

The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code…

VentureBeat →

Compiled from public reporting; Chinese-source items are machine-translated. Confirm details with each vendor.

Latest in large language models

What’s new worldwide

Anthropic claims its new Claude Opus 5 delivers near-Fable 5 performance at half the token price

Microsoft's open-weight AI push is so obviously an Azure play it hurts

Sakana claims its AI model router Fugu Ultra v1.1 now beats Fable 5 without even including it in the pool

Introducing Claude Opus 5

AI Mania Is Eviscerating Global Decision-Making

Firefox in WebAssembly

OpenAI is now using AI to attack its own AI, and it's working better than humans ever did

GPT-5.6 Sol reportedly disproves a 30-year-old statistics conjecture in 90 minutes after humans couldn't crack it

Bonsai 27B is a full open reasoning model that fits on an iPhone

Thinking Machines releases Inkling, a 975B open-weight multimodal MoE

Soofi Consortium Releases Soofi S 30B-A3B: An Open Hybrid Mamba-Transformer MoE Foundation Model For German And English

xai-org/grok-build, now open source

How I tricked Claude into leaking your deepest, darkest secrets

Mistral Vibe for Code vs Claude Code vs Cursor vs Codex: Four Agents Scored on One Scaffold-to-PR Task

Liquid AI open-sources Antidoom to reduce reasoning-model doom loops

Anthropic's Claude Cowork AI agent is now available on mobile and web

Cohere Transcribe Arabic is an open-source model built for Arabic's toughest transcription problems

Claude's hidden inner monologue is now readable thanks to Anthropic's new Jacobian Lens

OpenAI Releases GPT-Realtime-2.1 and GPT-Realtime-2.1-mini for Low-Latency Voice Agents in the API

Training Gemma-3 for Structured Mathematical Reasoning with Tunix GRPO, LoRA Adapters, and GSM8K Rewards

Synthetic Sciences Releases OpenScience: An Open-Source, Model-Agnostic AI Workbench for Machine Learning, Biology,…

Sakana AI launches Namazu-powered Sakana Translate

Better Models: Worse Tools

Anthropic redeploys Claude Fable 5 after export controls lift

China watch

Kimi K3 trails frontier US models by a wide margin on cyber exploits, and distillation may explain why

Hefei hits another AI unicorn: In the multimodal sector, it raised 2.1 billion in just three months.

The domestic world model has topped the leaderboard by Fei-Fei Li's team! It is compatible with domestic Ascend…

Are AI labs pelicanmaxxing?

Who’s Afraid of Chinese Models?

Claude make Fable 5 permanent

StepFun showcases STEPX Neo, an LLM-native agent phone, at WAIC 2026

Alibaba releases Qwen-Audio-3.0-Realtime for live voice agents

DeepSeek needs more cash just weeks after closing its first $7 billion round

Tencent releases Hy3, a 295B open MoE model with 256K context

openJiuwen debuts Skill-Omni for multimodal agent skills

Deepseek is designing its own AI chip

Meituan releases LongCat-2.0, a 1.6T open MoE coding model

AI startup Lindy ditched Claude entirely for Deepseek, saving millions as cost pressure mounts on Anthropic

Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token…

GLM-5.2: Built for Long-Horizon Tasks

Microsoft's Copilot Cowork moves to usage-based billing and may tap DeepSeek

DeepSeek takes outside money for the first time at a $50 billion valuation

Alibaba ships Qwen3 Coder Next

Moonshot open-sources Kimi K2.7-Code

DeepSeek-V4: a million-token context that agents can actually use

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

Claude Code costs up to $200 a month. Goose does the same thing for free.

Run any of these on Semifly

Tokens & API

GPU servers

AI Foundry