Why AI Makes Things Up and Never Knows When to Stop
Clients often ask us two questions about AI systems.
Why does AI sometimes make things up?
Why does it keep optimizing endlessly, even when the answer seems clear?
Recent research from OpenAI helps explain that the training and internal instructions of the model are responsible for both.
Why LLMs Hallucinate
OpenAI researchers confirmed what many users notice: large language models (LLMs) will sometimes produce answers that are false but sound convincing. These are called hallucinations. The researchers showed that hallucinations are not caused by bad engineering but by the way the models work at a mathematical level.
As they explained, “Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty.”
“Unlike human intelligence, it lacks the humility to acknowledge uncertainty,” said Neil Shah, VP for research and partner at Counterpoint Technologies. “When unsure, it doesn’t defer to deeper research or human oversight; instead, it often presents estimates as facts.”
According to OpenAI there are three main reasons this happens: “epistemic uncertainty when information appeared rarely in training data, model limitations where tasks exceeded current architectures’ representational capacity, and computational intractability where even superintelligent systems could not solve cryptographically hard problems”. In plain English:
Uncertainty in the data. If information appears rarely in the training data, the model has to guess.
Limited models. Some tasks are more complex than the model’s current architecture can represent.
Computational barriers. Certain problems cannot be solved quickly, even by very powerful systems.
The problem is reinforced by training methods. Most industry benchmarks penalize models that answer “I don’t know” and reward confident but incorrect answers. The researchers wrote that “language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty.”
This means that even advanced models will still invent details when they are unsure.
Why AI Never Stops Optimizing
Many clients also ask why AI sometimes keeps working or reworking an answer for a long time. For example, one client insisted that their system had helped them continue to optimize a press release for two hours with repeated prompting.
This behavior comes from how LLMs are designed to generate text. Inside the model, there are internal rules called inference heuristics (essentially internal rules or instructions for the model). These rules give priority to continuing the conversation in a way that feels coherent and helpful to you, the user.
That means the model will:
Fill in gaps when it does not have exact information.
Keep revising or expanding answers since adding more is treated as more helpful than stopping.
Rarely conclude with “that is enough” because its design favors continuation.
What feels like endless optimization is simply the model following its instructions to keep producing text that seems useful.
What This Means for Enterprises
OpenAI’s researchers were clear in their conclusion: “Hallucinations remain a fundamental challenge for all large language models.” They also acknowledged that even with better training, evaluation, and reasoning methods, some level of unreliability will persist regardless of technical improvements.
For enterprises, this reality requires a new approach. With prevention almost impossible, governance must focus on risk management. That means:
Adding human-in-the-loop review (also known as Human-AI-Human workflows) to ensure accuracy.
Defining guardrails around acceptable error rates.
And in the future, choosing vendors that measure and disclose uncertainty instead of reporting only benchmark scores.
Analysts have even suggested adopting standards similar to automotive safety ratings. As one expert noted, “AI models should be assigned dynamic grades, nationally and internationally, based on their reliability and risk profile.”
Enterprises that adapt to this reality will be better prepared to manage AI deployment in highly sensitive industries like Finance and Healthcare safely and effectively.