Rich Sutton: Generative AI Is Mimicry, Not Discovery

In a recorded talk for the SAIR Foundation, AI pioneer Rich Sutton (University of Alberta) delivered a pointed critique: generative AI trained via supervised learning can produce output that is either novel or good, but never both at the same time. He calls this the "novel and good" problem — a direct parallel to the old academic joke about a research paper being "both novel and good, but the parts that are good are not novel, and the parts that are novel are not good."

Sutton argues that this limitation is baked into the architecture of large language models, image generators, and video models. They are, in his words, "mimics." They generate text or images by sampling from a distribution learned from examples. When they produce something novel, it's due to randomness (stochastic sampling), which Sutton equates to hallucination. When they produce something good, it's because the output closely matches the training data. The two never coincide.

"Generative AI is meant to be a mimic," Sutton states. "This is what supervised learning is for." He acknowledges the technology's transformative utility in tasks like summarization or fiction generation, but insists it cannot serve science or mathematics where true discovery is required.

The Missing Ingredient: Evaluation

Sutton pinpoints the core deficiency: generative AI lacks an evaluation step during inference. While it can generate variety (via stochastic sampling), it has no mechanism to judge the quality of its own outputs at runtime. Without evaluation, there can be no selective retention — the third step in what Sutton calls the "Discovery" process:

  1. Variation
  2. Evaluation
  3. Selective retention

This three-step cycle is the engine behind evolution by natural selection, the scientific method, and reinforcement learning. Sutton cites Donald Campbell, Daniel Dennett, and Gary Cziko as earlier proponents. His contribution is to map it directly onto modern AI: "What is missing is the Evaluation step. The generator was pre-trained by supervised learning, leaving no way at runtime to Evaluate what it generates."

Sutton contrasts this with reinforcement learning systems that do implement the full cycle. He lists:

  • AlphaGo (Move 37 against Lee Sedol)
  • AlphaZero (novel chess strategies)
  • GT-Sophy (simulated racing)
  • AlphaFold (protein folding)
  • AlphaProof (mathematical proofs)
  • Claude-Code (code generation with evaluation)
  • RL-Lyft (ride-hailing optimization)

All of these systems achieved things that are both novel and good. Their common thread: they incorporate an explicit objective or reward signal that allows evaluation and selective retention during or after generation.

Backpropagation and Continual Learning

Sutton also takes aim at deep learning's standard training algorithm. At first glance, backpropagation seems deterministic — no variation, no discovery. But he notes that random weight initialization provides a one-time "variation" step. The problem is that this variation is temporary. Once training converges, the network loses plasticity.

"This is the weakness of deep learning that is alleviated with a new algorithm that my group presented in Nature a couple of years ago," Sutton says. He refers to continual backpropagation, which periodically re-initializes less-used neurons with small random weights. This keeps the variation step active, allowing the network to continue learning over time.

When Humans Complete the Loop

Sutton acknowledges that generative AI can participate in discovery when paired with human evaluation. "As when we have Generative AI make many pictures for us, and then we pick the one that we like the best. The human+AI system completes the discovery." But for full autonomy, the evaluation must come from an explicit goal or reward function.

His call to arms: "If we want the full power of AI scientists, then we should share the goals with them so they can create, evaluate, discover."

Practical Implications for Developers

For developers building AI systems, Sutton's talk highlights a design choice: if your application requires true novelty (e.g., drug discovery, theorem proving, game strategy), supervised learning alone won't cut it. You need to embed an evaluation loop — either through reinforcement learning, search, or human-in-the-loop feedback.

Consider a code generation model: without evaluation, it may produce novel syntax that compiles but is buggy. Adding a test suite as the evaluation function turns it into a discovery system — generate candidate solutions, run tests, keep the passing ones. This is the pattern behind tools like AlphaCode and Claude-Code.

Sutton's point extends to any system that claims to be "creative." He argues that creativity requires evaluation: "Without evaluation, and retention of the best, there is nothing created. The novelty flickers into existence but, if its value is unrecognized, it flickers away and is lost."

The Bottom Line

Generative AI is a powerful mimic, but it is not a discoverer. If you need both novelty and quality — the hallmark of scientific and mathematical progress — you must add an evaluation mechanism. Sutton's talk is a clear, technical argument for why reinforcement learning (or any generate-and-test loop) is essential for autonomous discovery.

Next time you see a demo of an LLM "inventing" something, ask: where is the evaluation step? If there is none, the novelty is just noise.