Language Models Kimi K2 Thinking: Another LLM Sensation from China

From Henrik Bork | Translated by AI 3 min Reading Time

Related Vendors

If open-source models from China were to be cheaper and better than closed models, then Sam Altman and his OpenAI or Anthropic would have a problem. This is why the release of "Kimi K2 Thinking" in early November is making headlines worldwide.

According to Moonshot AI, Kimi K2 Thinking surpasses established players like GPT-5 in the disciplines of agentic reasoning and agentic search based on benchmark tests, but still lags behind in coding capabilities.(Image: Moonshot AI)
According to Moonshot AI, Kimi K2 Thinking surpasses established players like GPT-5 in the disciplines of agentic reasoning and agentic search based on benchmark tests, but still lags behind in coding capabilities.
(Image: Moonshot AI)

"Companies that do not incorporate AI or China into their strategy could face growth pressure by 2030," writes the consultancy McKinsey in a recent report. The rapid succession in which Chinese AI startups release impressive language models makes the timeframe of this forecast seem almost conservative shortly after its publication.

"A new reasoning model from a Chinese AI start-up, whose performance surpasses OpenAI's GPT-5 and Anthropic's Claude Sonnet 4.5 in several metrics, has sparked a new debate about a potential second DeepSeek moment and the future of American leadership in AI," writes the South China Morning Post in Hong Kong about the latest product from the Chinese start-up MoonShoot AI.

On the GitHub blog, MoonShoot AI claims that its new model was able to answer 44.9 percent of all questions in the benchmark test "Humanity’s Last Exam." This is better than ChatGPT-5's 41.7 percent, writes the startup, whose investors include Chinese internet and AI companies Alibaba and Tencent.

Agentic Capabilities

Experts from the American company Hugging Face, known for its open-source library "Transformers," point out in a comment that many of the SOTA results of Kimi K2 Thinking were achieved in a special "Heavy" mode. In this mode, up to eight inference runs are executed in parallel and the results are later merged. This somewhat relativizes the claims of MoonShoot AI, although this method is not uncommon in language model competitions.

In initial reviews by the trade press, the agentic capabilities of the new generative AI model from China are particularly praised. It is said to work especially well as a personal agent by combining a variety of tools when solving multi-step cognitive tasks.

"Kimi K2 Thinking can execute up to 200 to 300 consecutive tool calls without human intervention and reason coherently over hundreds of steps to solve complex problems," write its creators at MoonShoot AI on GitHub about this topic.

Benchmark King?

The new LLM from China is said to have performed better in other benchmarks than comparable US models, including "BrowseComb" for internet research and "Seal-0," which tests the factual accuracy and reasoning capabilities of search-supported large language models with its 111 questions.

In other tasks, such as software programming, Kimi K2 Thinking apparently still lags slightly behind its U.S. competitors. More important than such comparisons – and who has a slight edge or not – are likely two other aspects related to Kimi. Similar to DeepSeek, the model was reportedly trained very cost-effectively. Additionally, the model is undeniably another success for "open-source" LLMs compared to proprietary models.

"Training the new Kimi AI model cost $4.6 million, according to sources familiar with the matter," reported the American television network CNBC. Since then, the number of downloads of Kimi K2 Thinking has skyrocketed worldwide. For the first time, downloads of Chinese open-source models have now surpassed those of comparable American open-source models, writes a16z.

Forced Efficiency

The debate about Kimi K2 Thinking is very reminiscent of that about DeepSeek, whose architecture the Kimi team apparently owes a great deal to. Both Chinese models were developed under the pressure of having to work particularly efficiently with existing resources due to the American semiconductor boycotts against China.

Both models are particularly interesting to users worldwide because they may not be the absolute best in every area, but they are often "good enough." In addition, they are affordable and, due to their open-source architecture, can be integrated into existing workflows in many companies. (sb)

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent