Close Menu
Mirror Brief

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Company says investigation under way into footage of couple at Coldplay gig | US news

    July 19, 2025

    Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say

    July 19, 2025

    Joseph Lee on the Sprawlng Portrait of Aquinnah Wampanoag Identity at the Center of His New Book, ‘Nothing More of This Land’

    July 19, 2025
    Facebook X (Twitter) Instagram
    Mirror BriefMirror Brief
    Trending
    • Company says investigation under way into footage of couple at Coldplay gig | US news
    • Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say
    • Joseph Lee on the Sprawlng Portrait of Aquinnah Wampanoag Identity at the Center of His New Book, ‘Nothing More of This Land’
    • ‘Still not sure’: Shane Lowry casts doubt over two-shot penalty decision at Open | The Open
    • Astronomer investigating after Coldplay concert embrace caught on video
    • Apple Sues the YouTuber Who Leaked iOS 26
    • Mutant seabirds, sewer secrets and a lick of art ice-cream: Folkestone Triennial review | Folkestone Triennial
    • Bears, Durham, Kent & Hants claim T20 quarter-final spots
    Saturday, July 19
    • Home
    • Business
    • Health
    • Lifestyle
    • Politics
    • Science
    • Sports
    • World
    • Travel
    • Technology
    • Entertainment
    Mirror Brief
    Home»Science»Elon Musk’s New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up
    Science

    Elon Musk’s New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up

    By Emma ReynoldsJuly 11, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Elon Musk's New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up
    Share
    Facebook Twitter LinkedIn Pinterest Email

    New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up

    Elon Musk has launched xAI’s Grok 4—calling it the “world’s smartest AI” and claiming it can ace Ph.D.-level exams and outpace rivals such as Google’s Gemini and OpenAI’s o3 on tough benchmarks

    By Deni Ellis Béchard edited by Dean Visser

    Elon Musk released the newest artificial intelligence model from his company xAI on Wednesday night. In an hour-long public reveal session, he called the model, Grok 4, “the smartest AI in the world” and claimed it was capable of getting perfect SAT scores and near-perfect GRE results in every subject, from the humanities to the sciences.

    During the online launch, Musk and members of his team described testing Grok 4 on a metric called Humanity’s Last Exam (HLE)—a 2,500-question benchmark designed to evaluate an AI’s academic knowledge and reasoning skill. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test spans topics from the classics to quantum chemistry and mixes text with images. Grok 4 reportedly scored 25.4 percent on its own. But given access to tools (such as external aids for code execution or Web searches), it hit 38.6 percent. That jumped to 44.4 percent with a version called Grok 4 Heavy, which uses multiple AI agents to solve problems. The two next best-performing AI models are Google’s Gemini-Pro (which achieved 26.9 percent with the tools) and OpenAI’s o3 model (which got 24.9 percent, also with the tools). The results from xAI’s internal testing have yet to appear on the leaderboard for HLE, however, and it remains unclear whether this is because xAI has yet to submit the results or because those results are pending review. Manifold, a social prediction market platform where users bet play money (called “Mana”) on future events in politics, technology and other subjects, predicted a 1 percent chance, as of Friday morning, that Grok 4 would debut on HLE’s leaderboard with a 45 percent score or greater on the exam within a month of its release. (Meanwhile xAI has claimed a score of only 44.4.)

    During the launch, the xAI team also ran live demonstrations showing Grok 4 crunching baseball odds, determining which xAI employee has the “weirdest” profile picture on X and generating a simulated visualization of a black hole. Musk suggested that the system may discover entirely new technologies by later this year—and possibly “new physics” by the end of next year. Games and movies are on the horizon, too, with Musk predicting that Grok 4 will be able to make playable titles and watchable films by 2026. Grok 4 also has new audio capabilities, including a voice that sang during the launch, and Musk said new image generation and coding tools are soon to be released. The regular version of Grok 4 costs $30 a month; SuperGrok Heavy—the deluxe package with multiple agents and research tools—runs at $300.


    On supporting science journalism

    If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


    Artificial Analysis, an independent benchmarking platform that ranks AI models, now lists Grok 4 as highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI’s o4-mini-high. And Grok 4 appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus, or ARC-AGI-1, and its second edition, ARC-AGI-2—benchmarks that measure progress toward “humanlike” general intelligence. Greg Kamradt, president of ARC Prize Foundation, a nonprofit organization that maintains the two leaderboards, says that when the xAI team contacted the foundation with Grok 4’s results, the organization then independently tested Grok 4 on a dataset to which the xAI team did not have access and confirmed the results. “Before we report performance for any lab, it’s not verified unless we verify it,” Kamradt says. “We approved the [testing results] slide that [the xAI team] showed in the launch.”

    According to xAI, Grok 4 also outstrips other AI systems on a number of additional benchmarks that suggest its strength in STEM subjects (read a full breakdown of the benchmarks here). Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested it. “Grok has been strong on math and programming in my tests, and I’ve been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,” Olteanu says. “Its context window, however, isn’t very competitive, and it may struggle with large code bases like those you encounter in production. It also fell short when I asked it to analyze a 170-page PDF, likely due to its limited context window and weak multimodal abilities.” (Multimodal abilities refer to a model’s capacity to analyze more than one kind of data at the same time, such as a combination of text, images, audio and video.)

    On a more nuanced front, issues with Grok 4 have surfaced since its release. Several posters on X—owned by Musk himself—as well as tech-industry news outlets have reported that when Grok 4 was asked questions about the Israeli-Palestinian conflict, abortion and U.S. immigration law, it often searched for Musk’s stance on these issues by referencing his X posts and articles written about him. And the release of Grok 4 comes after several controversies with Grok 3, the previous model, which issued outputs that included antisemitic comments, praise for Hitler and claims of “white genocide”—incidents that xAI publicly acknowledged, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures.

    At one point during the launch, Musk commented on how making an AI smarter than humans is frightening, though he said he believes the ultimate result will be good—probably. “I somewhat reconciled myself to the fact that, even if it wasn’t going to be good, I’d at least like to be alive to see it happen,” he said.

    Elon Exam Grok heats Humanitys Musks race takes
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSeveral Trump Administration Officials Are Overseeing Multiple Federal Offices
    Next Article Portugal v Belgium: Women’s Euro 2025 – live | Women’s Euro 2025
    Emma Reynolds
    • Website

    Emma Reynolds is a senior journalist at Mirror Brief, covering world affairs, politics, and cultural trends for over eight years. She is passionate about unbiased reporting and delivering in-depth stories that matter.

    Related Posts

    Technology

    Donkey Kong Bananza review – delirious destruction derby takes hammer to platforming conventions | Games

    July 18, 2025
    Science

    This Number System Beats Binary, But Most Computers Can’t Use It

    July 18, 2025
    Science

    Tests that AIs Often Fail and Humans Ace Could Pave the Way for Artificial General Intelligence

    July 18, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Medium Rectangle Ad
    Top Posts

    Eric Trump opens door to political dynasty

    June 27, 20257 Views

    Anatomy of a Comedy Cliché

    July 1, 20253 Views

    SpaceX crane collapse in Texas being investigated by OSHA

    June 27, 20252 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Technology

    Meta Wins Blockbuster AI Copyright Case—but There’s a Catch

    Emma ReynoldsJune 25, 2025
    Business

    No phone signal on your train? There may be a fix

    Emma ReynoldsJune 25, 2025
    World

    US sanctions Mexican banks, alleging connections to cartel money laundering | Crime News

    Emma ReynoldsJune 25, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Medium Rectangle Ad
    Most Popular

    Eric Trump opens door to political dynasty

    June 27, 20257 Views

    Anatomy of a Comedy Cliché

    July 1, 20253 Views

    SpaceX crane collapse in Texas being investigated by OSHA

    June 27, 20252 Views
    Our Picks

    Company says investigation under way into footage of couple at Coldplay gig | US news

    July 19, 2025

    Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say

    July 19, 2025

    Joseph Lee on the Sprawlng Portrait of Aquinnah Wampanoag Identity at the Center of His New Book, ‘Nothing More of This Land’

    July 19, 2025
    Recent Posts
    • Company says investigation under way into footage of couple at Coldplay gig | US news
    • Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say
    • Joseph Lee on the Sprawlng Portrait of Aquinnah Wampanoag Identity at the Center of His New Book, ‘Nothing More of This Land’
    • ‘Still not sure’: Shane Lowry casts doubt over two-shot penalty decision at Open | The Open
    • Astronomer investigating after Coldplay concert embrace caught on video
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 Mirror Brief. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.