AI AI Toolkit
📊 Updated Weekly

Model Watch

Weekly LLM rankings, capability comparison and benchmarks. Track GPT, Claude, Gemini and domestic model updates.

Rankings
Benchmarks
📅 5 models
Updated 2026-06-25

Google integrates Computer Use as a built-in tool into Gemini 3.5 Flash, enabling developers to build agents that operate across browsers, mobile, and desktop.

📰 Hacker News 热门 · 6/25/2026

New GPT-5.5 Instant is more fun to chat with, better understands intent, more reliable at handling complex constraints.

📰 X:OpenAI · 6/25/2026

Alibaba releases first native language world model Qwen-AgentWorld, surpassing GPT-5.4 and Claude Opus 4.8 on AgentWorldBench, models open-sourced.

📰 公众号:通义实验室 · 6/24/2026