The artificial intelligence chatbot Grok made several errors over the past few weeks when asked to verify information on the X platform. In one instance, it miscaptioned video of a 2021 altercation involving hospital workers in Russia as taking place in Toronto a year earlier. In the other, it claimed multiple times Mark Carney “has never been Prime Minister.”
A computer scientist says AI chatbots “hallucinate” false information because they are not built to verify facts, but rather predict the next word in a sentence, and says there is an overconfidence among users who rely on chatbots to verify information despite their unreliability.
THE CLAIMS
“Mark Carney has never been Prime Minister,” the artificial intelligence chatbot Grok wrote in several responses to users on the X platform, formerly Twitter, last week.
The chatbot built by Elon Musk’s company xAI, which owns X, doubled down when users pushed back against the false claim.
“My previous response is accurate,” Grok wrote.
Days earlier, Grok responded to a user’s inquiry about a video that appears to show hospital workers restraining and hitting a patient in an elevator.
When someone asked Grok to verify where the video took place, it claimed the video showed an incident at Toronto General Hospital from May 2020 that resulted in the death of 43-year-old Danielle Stephanie Warriner.
“If it’s in Canada why do the uniforms have Russian writing?” asked one user.
Grok claimed the uniforms were “standard green attire for Toronto General Hospital security” and said the video depicted a “fully Canadian event.”
THE FACTS
A reverse image search of a still from the video brings up multiple news articles from Russian media from August 2021.
When translated into English, details show the video first spread on the Telegram channel Mash and took place in the Russian city of Yaroslavl.
Yaroslavl Regional Psychiatric Hospital said it fired two employees who were caught on the leaked CCTV video hitting the woman after leading her into an elevator at a residential building, according to the reports.
The 2020 incident at Toronto General Hospital that Grok referred to was partly captured on video; it shows part of an interaction between patient Warriner and security staff. The staff faced manslaughter and criminal negligence charges after Warriner died after the interaction, and the charges were later dropped.
Carney is indeed the Prime Minister and has been since he won the Liberal leadership election in March, followed by the Liberal party’s general election win on April 28.
In both cases, Grok eventually corrected its mistakes after several prompts from users. But why did Grok repeat falsehoods, and why did it double down when corrected?
‘THEY DON’T HAVE ANY NOTION OF THE TRUTH’
Grok and other chatbots like ChatGPT and Google’s Gemini are large language models, or LLMs. They can recognize and generate text by training on text from the internet.
Large language models are “primarily just trained to predict the next word in a sentence, very much like auto-complete in our phone,” said Vered Shwartz, assistant professor of computer science at the University of British Columbia and CIFAR AI chair at the Vector Institute.
“Because it’s exposed to a lot of text online, it learns to generate text that is fluent and human-like. It’s also learning a lot of word knowledge, anything that people discuss online … it can usually give you factually correct answers,” she said.
When they provide factually incorrect information, it’s known as a “hallucination,” an aspect of language models that researchers say is inevitable because of how they are trained.
“They don’t have any notion of the truth … it just generates the statistically most likely next word,” Shwartz said.
“The result is that you get this really fluent text that looks human-like and often written in a very authoritative manner. But they don’t necessarily always reflect the information that they learned from the web. Sometimes they inappropriately kind of generalize or mix and match facts that are not true,” she said.
A large language model’s quality partly depends on the quality of the data they train on. While most models are proprietary, it’s generally understood they train on large portions of the web.
But while the models might vary slightly, hallucinations are inherent to all of them, not just Grok, Shwartz said.
Grok has multimodal capabilities, meaning it can respond to text inquiries and analyze video. It can associate what it sees in a video with textual description but is “by no means trained to do any kind of fact-checking … it’s just trying to understand what happens in a video and answer questions based on that,” Shwartz said.
She added that the models might double down on incorrect answers because they’re trained on the argumentative style that is common online. Some companies might customize their chatbots to sound more authoritative, or to be more deferential to users.
While it’s become increasingly common for people to lean on these chatbots to verify the information they see online, Shwartz said that’s “concerning.”
Internet users have a tendency to anthropomorphize chatbots, which are designed to mimic human language, and Shwartz said that causes overconfidence in the ability of large language models to verify information.
“They’re so used to humanizing (chatbots) and so they say, ‘Oh, it doubled down so it must be confident,’” she said.
“The premise of people using (large language models) to do fact-checking is flawed … it has no capability of doing that.”
This report by The Canadian Press was first published Nov. 25, 2025.
Error! Sorry, there was an error processing your request.
There was a problem with the recaptcha. Please try again.
You may unsubscribe at any time. By signing up, you agree to our terms of use and privacy policy. This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.
Want more of the latest from us? Sign up for more at our newsletter page.