Forecasting Research World Cup 2026—Also a Championship of AI

Source: University of Cologne | Translated by AI 2 min Reading Time

Related Vendors

Researchers have developed a comparison tool that analyzes predictions for FIFA World Cup games. The focus is on how well large language models (LLMs) like Chat-GPT, Claude, or Gemini can predict real match results.

The 2026 World Cup also becomes a testing ground for AI: How accurately can large language models predict match outcomes?(Source:   /  Pixabay)
The 2026 World Cup also becomes a testing ground for AI: How accurately can large language models predict match outcomes?
(Source: / Pixabay)

The platform LLM SoccerArena provides a live ranking for this purpose: During the 2026 World Cup, leading AI models will give their predictions for each game. Subsequently, it will be assessed how closely these predictions match the actual results. The principle is similar to a tipping game—with the difference that AI systems compete against each other.The project was developed by Markus Weinmann (University of Cologne, Germany, Institute for Business AI) in collaboration with Oliver Müller (University of Paderborn, Germany) and Stefan Feuerriegel (LMU Munich, Germany, MCML). 

Why is Football Suitable for AI Testing?

Football serves as a realistic test: results are clear, public, and unknown in advance. This makes it suitable for testing the predictive capability of AI models under real-world conditions. Additionally, researchers are examining whether models make better predictions if they can access current information from the internet beforehand. Before each game, the models provide a forecast: the exact result as well as probabilities for a win, draw, or loss. All predictions are timestamped and saved before kickoff and then compared with the official result. The results are incorporated into a continuously updated ranking.
The evaluation follows a clear point system (5 points for the exact result, 2 points for the correct goal difference, 1 point for the correct tendency, 0 points for an incorrect prediction). Additional tournament questions are evaluated separately.

What does the Ranking Say About the Models?

In addition to the results, the research team also analyzes the quality of the probability estimates—a standard procedure in forecasting research. Details on this are documented on the website. It is also transparently recorded whether a model used external information or not.
Important: A good ranking only shows how accurate a model has been so far. It is not proof of genuine "understanding" of football and does not allow for reliable predictions for future games. Especially at the beginning of the tournament, the ranking can still change significantly.
 

The results are also relevant for companies: Language models are increasingly being used to analyze information, assess developments, and prepare decisions. For this, they must not only process data but also evaluate uncertainties and derive well-founded forecasts.

The predictions generated in the project are for research purposes only and do not constitute betting recommendations.

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent