What happens when artificial intelligence runs out of humans to learn from? It’s not a hypothetical question anymore — and the answer, according to researchers, could be deeply damaging to the AI systems that hospitals, businesses, and millions of people now rely on every day.
Scientists are raising urgent alarms about a phenomenon called “model collapse” — a process where AI systems trained on AI-generated data rather than human-created content begin to degrade, producing gibberish, hallucinating false information, and delivering increasingly unreliable answers. And the timeline for when this becomes a real crisis may be shorter than most people realize.
Some experts believe the world could run out of high-quality human-generated data by the end of this year. New research, however, suggests there may be a way to stop the collapse before it starts — and the solution involves preserving the human element in AI training.
Why AI Models Are Headed Toward a Dangerous Feedback Loop
To understand the problem, it helps to know how large language models (LLMs) work. These are the AI systems that power tools like chatbots, medical diagnostic software, and content generators. They learn by ingesting enormous quantities of text and data — the vast majority of which, historically, has been produced by humans.
The trouble is that human-generated data is a finite resource. As AI systems have grown more powerful and data-hungry, they’ve consumed that resource at a staggering pace. Once the well runs dry, the natural alternative is to train new AI models on content generated by existing AI models — synthetic data.
That’s where things get dangerous. When AI trains on AI-generated content, small errors and distortions in that content get amplified with each new generation of training. The model drifts further and further from accuracy. Eventually, it collapses — producing outputs that are unreliable, incoherent, or outright wrong.
Yasser Roudi, a professor of disordered systems in the Department of Mathematics at King’s College London, put the stakes plainly.
“That’s especially worrying considering some experts think that we will run out of high-quality human-generated data by the end of the year — so if you’re relying on this synthetic data, but there’s an almost existential threat it will sink your AI, you’re in trouble,” Roudi told Live Science.
The consequences aren’t just abstract. Roudi offered a concrete and sobering example: AI systems used in hospitals to analyze brain scans and detect cancer. If those systems experience model collapse during retraining, they could begin misdiagnosing patients.
The Core Threat: What Model Collapse Actually Looks Like
Model collapse isn’t a sudden crash. It’s a gradual erosion of quality that can be difficult to detect until significant damage is done. Here’s what the research indicates happens as AI systems increasingly feed on their own outputs:
- LLMs begin producing gibberish — outputs that are grammatically plausible but factually meaningless
- AI systems hallucinate information more frequently, confidently stating false facts
- The accuracy of responses to queries degrades progressively over successive generations of training
- Systems underpinned by collapsed models deliver inaccurate answers at rates far higher than today’s already-imperfect AI tools
The compounding nature of the problem is what makes it especially difficult to manage. Each round of training on synthetic data bakes in the errors from the previous round, making correction harder over time.
What the Researchers Found — and Why the Human Touch Matters
The new research from Roudi and his team at King’s College London points toward a solution rooted in something straightforward: keeping humans in the loop.
The core finding is that adding an element of human-generated data — even in limited quantities — into AI training pipelines could be the key to preventing model collapse. Rather than allowing AI systems to train entirely on synthetic outputs, preserving and prioritizing authentic human-created content acts as an anchor, preventing the drift and distortion that leads to collapse.
This matters because it reframes the conversation around synthetic data. The question isn’t simply whether AI-generated content can substitute for human-generated content — it’s about what proportion of human data is necessary to keep AI systems stable and reliable across generations of retraining.
| Training Data Type | Risk Level | Outcome |
|---|---|---|
| Primarily human-generated | Low | Stable, accurate model outputs |
| Mixed human and synthetic | Moderate (managed) | Potential stabilization with sufficient human data |
| Primarily AI-generated (synthetic) | High | Model collapse — gibberish, hallucinations, inaccuracy |
Who Gets Hurt When AI Systems Start Failing
The average person may not think about how their AI tools are trained — but they will notice when those tools stop working properly. Model collapse isn’t a problem contained to research labs. It ripples outward into every sector where AI has embedded itself.
Medical settings are among the most alarming examples. AI tools are increasingly used to assist in diagnosing conditions from imaging data. A collapsed model in that context doesn’t just produce an embarrassing wrong answer — it could contribute to a missed cancer diagnosis or an incorrect treatment recommendation.
Beyond healthcare, model collapse threatens the reliability of AI tools used in legal research, financial analysis, educational platforms, customer service systems, and the content pipelines of media organizations. Essentially, any system that depends on an LLM being accurate is vulnerable if that model’s training degrades.
The concern is amplified by the speed at which AI-generated content is now flooding the internet — the very internet that future AI models will be trained on. Every AI-written article, AI-generated social media post, and synthetic dataset published today becomes potential training material tomorrow, accelerating the timeline toward collapse.
What Comes Next in the Race to Protect AI From Itself
Roudi’s research represents an early but significant step toward addressing model collapse before it becomes a systemic crisis. The emphasis on retaining human-generated data in training pipelines gives AI developers a practical framework to work with, even as the supply of that data tightens.
The broader challenge for the AI industry will be determining how to source, preserve, and prioritize authentic human content at a time when the boundaries between human and AI-generated material are increasingly blurred. That may involve new standards for labeling content origins, new partnerships with publishers and creators, or entirely new approaches to data curation.
What the research makes clear is that the problem is real, the timeline is pressing, and doing nothing is not an option — particularly for applications where accuracy is a matter of life and health.
Frequently Asked Questions
What is AI model collapse?
Model collapse is a process where AI systems trained increasingly on AI-generated synthetic data begin to degrade, producing gibberish, hallucinating false information, and delivering inaccurate answers at much higher rates than today’s AI tools.
When could we run out of human-generated data for AI training?
According to some experts cited in the research, high-quality human-generated data could run out by the end of this year, which would force AI systems to rely more heavily on synthetic data.
Who is leading this research on preventing model collapse?
Yasser Roudi, a professor of disordered systems in the Department of Mathematics at King’s College London, is among the researchers studying this problem and potential solutions.
What is the proposed solution to model collapse?
The research suggests that preserving and incorporating human-generated data into AI training pipelines — even in limited amounts — could act as a stabilizing anchor and help prevent model collapse.
What real-world harm could model collapse cause?
Roudi specifically cited AI systems used in hospitals to analyze brain scans and detect cancer — if those systems experience model collapse during retraining, they could misdiagnose patients.
Does this affect all AI systems or just specific ones?
The concern applies broadly to large language models (LLMs), which underpin a wide range of AI tools across healthcare, business, media, and other sectors that depend on accurate AI-generated responses.

Leave a Reply