Elon Musk boldly proclaims Grok 3 as "the world's smartest AI," previously describing it as "frighteningly intelligent." But do these superlatives hold up to scrutiny?
With much fanfare, Elon Musk presents his new AI model Grok 3.
(Image: AI-generated)
The presentation of Grok 3 took place—fittingly for the Musk brand—in a livestream on the platform X. Viewers who tuned in on time first had to stare at a black screen for about 20 minutes before the development team from xAI, Musk's company behind Grok, finally began their demonstration. The show was, as expected, held in the typical Musk style: a lot of self-confidence, little modesty, and a staging intended to present Grok 3 as a technological sensation.
How revolutionary is Grok 3 really? Has xAI achieved a real breakthrough in AI development with its new model—or is it mainly a clever PR strategy? The first benchmark results indicate considerable progress, but there are also critical voices. Above all, Musk's stance on "political correctness" and his plans for a "truth-seeking AI" are causing discussions.
Grok 3: Technical innovations, performance, and model family
With Grok 3, xAI takes a decisive step further and introduces not only a single model but an entire model family. In addition to the main version, there is a smaller variant called "mini" and a so-called "Reasoning" version that is distinguished by a particular ability: it tests itself before giving an answer. The goal? Fewer errors, fewer hallucinations—and a significantly smarter chatbot. But how big is the technological leap really?
A key difference from previous Grok versions is the so-called Reasoning model, which is also used by Deepseek or ChatGPT. These models are not just supposed to respond quickly, but to run through various solution approaches and only then choose the best option. Ideally, this leads to more precise and well-founded answers—however, it remains unclear how transparent this thought process actually is for users. According to xAI, part of the "thinking" is deliberately obscured to prevent other companies from extracting knowledge from Grok 3. In ChatGPT o1 and Deepseek R1, this process is visible.
In addition to the new model variants, there is also a significant increase in computing power: Grok 3 was trained with ten times more computing power than its predecessor Grok 2. The underlying architecture remains secret, but xAI emphasizes that the model operates in a new dimension. According to Musk, Grok 3 will also be continuously improved—a statement that is technically only partly correct, as a once-trained model cannot simply continue to "learn" but can only be optimized through fine-tuning.
Grok 3 benchmark results: Reality vs. Marketing
How does Grok 3 fare in direct comparison with the competition? According to the first test results, the model has raised the bar. On the AI benchmarking platform LMarena.ai, Grok 3 has scored more than 1,400 points for the first time ever —a milestone that neither OpenAI’s GPT-4o nor Google's Gemini or Anthropic's Claude have reached so far. Especially in the fields of mathematics, physics, biology, and chemistry, Grok 3 is said to set new standards.
Grok 3 surpasses at least other current AI models in benchmarks.
(Image: xAI)
But this is where the debate begins: While xAI cites these numbers as clear evidence of Grok 3's superiority, there are also critical voices: The chatbot arena, on which many of these tests are based, is itself controversial. Experts criticize methodological weaknesses, potential biases, and lack of transparency. In addition, many of the assessments come from users who subjectively decide which AI response they prefer—a less scientific basis for a superlative like "the most intelligent AI in the world."
Nevertheless, it cannot be denied that Grok 3 has made a technological leap forward. Whether the model is truly better than GPT-4o or Claude in the long term will only be proven in practice. One thing is certain: Musk and xAI have fueled the competition for the most powerful AI.
Controversies surrounding Grok 3 and xAI regarding political correctness
With Grok 3, Elon Musk not only celebrates a technological success—he also sparks new discussions. While xAI presents the model as the next big step in AI development, there are critical voices. The debate revolves around two central questions: How reliable is Grok 3 really? And what role does Musk's controversial stance on political correctness play in the orientation of his AI?
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
That Grok 3 shows impressive capabilities in certain areas is undisputed—but there are also significant deficiencies. One of the most prominent critics is OpenAI co-founder Andrej Karpathy, who was able to test the model early on. His verdict: Grok 3 impresses with logical thinking but repeatedly makes unsupported factual claims. This very problem represents one of the greatest risks with AI-supported chatbots—false information is presented with the same conviction as true facts.
Another problem concerns the transparency of the model. While other AI systems reveal at least part of their thought processes, much remains in the dark with Grok 3. Especially in the new "Reasoning" models, how the model reaches its conclusions is deliberately obscured. Officially, this secrecy serves to prevent so-called "model distillation—the unauthorized copying of knowledge by competitors. Critics, however, see it also as a possible tool for manipulating information.
Grok is Musk's "anti-woke alternative"
One of Musk's most controversial statements regarding Grok 3 is his emphasis that the AI should "follow the truth"—even if its answers "contradict what is politically correct." Musk himself has repeatedly claimed that existing AI models show a left-leaning political bias due to their web-based training data. With Grok 3, he wants to create an "anti-woke" alternative. But what does that mean concretely?
The idea of an AI that does not adhere to societal norms sounds like progress to some—but to others, it seems like a dangerous experiment. In recent years, Musk himself has repeatedly been associated with the spread of misinformation. According to the Center for Countering Digital Hate (CCDH), 50 of his posts about the US election alone, which contained demonstrably false or misleading information, garnered over a billion views.
When Musk, of all people, develops an AI that supposedly delivers "uncomfortable truths," it raises serious questions. Who determines what is considered truth? What mechanisms exist to prevent manipulation? And what responsibility do AI developers have if their models amplify misinformation? The line between "truth-seeking AI" and deliberate influence is narrow – and with a developer like Elon Musk, known for his polarizing views, many critics see cause for concern.
Competition: Musk versus OpenAI, Google, and others.
The release of Grok 3 is not only a technological advancement for xAI but also another chapter in the increasingly bitter rivalry between Elon Musk and the leading AI companies. Particularly, the feud between Musk and OpenAI, the company he once co-founded, plays a central role. Yet, while Musk aggressively pushes forward with his own AI strategy, it remains questionable whether xAI can actually keep up with the established market leaders in the race for the best models.
The relationship between Elon Musk and OpenAI has been tense for years. In 2015, Musk co-founded the company, but he left under controversial circumstances in 2018. Since then, he has repeatedly criticized OpenAI – initially because it evolved from a nonprofit organization to a profit-oriented company, and later because of its close cooperation with Microsoft.
Musk recently raised his criticism to a new level of escalation: Just a few weeks ago, he reportedly offered nearly 100 billion dollars to acquire OpenAI's nonprofit branch—an offer brusquely rejected by OpenAI CEO Sam Altman. Altman described it as an attempt to "hamper" a competitor. Musk, on the other hand, sued OpenAI at the beginning of 2024, accusing the company of violating its original open-source principles.
These personal and legal disputes, however, distract from the real question: Can xAI technologically keep up with OpenAI and other market leaders? While Musk portrays himself as a champion for an independent and "truth-seeking" AI, OpenAI represents a more regulated approach—which, however, is under Microsoft's growing influence and, according to the current model specification, has also been somewhat relaxed.
According to OpenAI, the new guidelines are intended to ensure "more intellectual freedom." This means that the model will engage more openly with controversial topics without causing harm. As an example, while earlier versions of ChatGPT often sidestepped political questions with neutral, cautious answers or defaulted to "no opinion," the new specification might allow the model to delve more into historical context or different perspectives— for example, in debates about climate change or economic theories. The aim is to achieve a "balance between truth-seeking and moral responsibility," according to OpenAI.
xAI remains economically far behind OpenAI
Musk's AI company is currently in talks about a $10 billion funding round, which would allow for a valuation of $75 billion. In comparison, OpenAI is currently aiming for a capital increase of $40 billion, which would result in a market valuation of $300 billion.
This financial gap indicates that investors continue to view OpenAI as the clear market leader. While Musk draws attention with rapid advancements and superlatives in his statements, OpenAI, Google, and Anthropic are focusing on a gradual expansion of their AI models with a long-term strategy. Furthermore, it remains unclear how sustainable xAI's business model is. Currently, Grok 3 is only available to paying subscribers of X Premium—a distribution strategy that promises far less market penetration than the integrations of OpenAI or Google.
It also remains questionable whether Grok 3 can truly surpass OpenAI technologically. While benchmarks show initial advantages, many of these tests are controversial and do not allow definitive conclusions about the practical utility of the model. Additionally, companies like xAI must prove in the long term that they can not only keep up with the competition but also develop innovative and reliable AI systems.
Musk vs. "Big Tech": David versus Goliath?
Musk likes to portray himself as an opponent of the big tech companies, but in the field of AI, he faces a difficult reality: OpenAI, Google, and Microsoft have more financial resources, broader market access, and established customer bases. While xAI seeks to position itself as a serious competitor, the question remains whether Musk can truly catch up with the major players with his company—or whether he will once again get caught up in an overambitious race against his former allies. (mc)