The Reversal Curse: LLMs Fail at Simple Symmetry

A paper from September 2023 (updated May 2024) exposes a surprising failure in auto-regressive large language models: they cannot generalize a simple reversal. If a model is trained on a sentence like "A is B", it will not automatically learn that "B is A". The authors call this the Reversal Curse.

For example, train a model on "Valentina Tereshkova was the first woman to travel to space." Then ask it "Who was the first woman to travel to space?" The model fails. It's not just that it gets the answer wrong — the probability of the correct name is no higher than for a random name.

This isn't a minor glitch. The paper shows the curse is robust across model sizes (from 125M to 175B parameters) and model families (GPT-3, Llama-1, ChatGPT). Even data augmentation doesn't fix it.

How the Experiment Worked

The researchers finetuned GPT-3 and Llama-1 on fictitious statements like "Uriah Hawthorne is the composer of Abyssal Melodies." After training, they tested the models on the reverse: "Who composed Abyssal Melodies?" The models failed to produce "Uriah Hawthorne" with any reliability.

The curse holds for real-world knowledge too. The authors tested ChatGPT (GPT-3.5 and GPT-4) on questions about celebrities. For "Who is Tom Cruise's mother?" (answer: Mary Lee Pfeiffer), GPT-4 answered correctly 79% of the time. But for the reverse "Who is Mary Lee Pfeiffer's son?", accuracy dropped to 33%.

Why This Matters for Developers

If you're building applications that rely on LLMs for factual recall or reasoning, this is a critical limitation. The models don't truly understand symmetric relationships. They memorize directional patterns — not the underlying concepts.

Consider a knowledge graph application: you might train on "Paris is the capital of France" and expect the model to answer "What is the capital of France?" It can. But ask "What country is Paris the capital of?" and it might fail. The direction of the training text matters.

The paper notes that if the reverse relationship appears in-context (e.g., in the prompt), models can deduce it. But for facts stored in parameters during training, the reversal is not generalized.

Code Example: Testing the Curse

The authors released code at github.com/owain-evans/reversal-curse. You can reproduce the experiment on your own models. Here's a simplified Python snippet using the transformers library to test the curse:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-hf"  # or any autoregressive LLM
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Train on a fictitious fact (in practice, finetune the model)
# For demonstration, we'll use in-context learning to show the curse
prompt = """
Fact: Uriah Hawthorne is the composer of Abyssal Melodies.

Question: Who composed Abyssal Melodies?
Answer:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0]))
# Likely does NOT output "Uriah Hawthorne"

If you finetune on many such facts, the model will still fail on the reverse direction.

Implications for LLM Architecture

The Reversal Curse suggests that auto-regressive language models do not learn symmetric relationships implicitly. This is fundamentally different from how humans or even some other AI architectures (like graph neural networks) handle relations.

For developers working with LLMs, this means:

What the Paper Didn't Test

The paper focuses on simple factual reversals. It doesn't explore more complex relational reasoning (e.g., transitive relations). It also doesn't test encoder-only models like BERT, which might behave differently due to bidirectional attention.

Next Steps for Developers

  1. Test your own models: Use the provided code to check if your finetuned models suffer from the curse.
  2. Augment training data: If you need bidirectional facts, include both directions in your training set.
  3. Consider alternative architectures: For applications requiring symmetric reasoning, look into graph-based models or retrieval systems.

The Reversal Curse is a wake-up call. LLMs are powerful pattern matchers, but they don't understand the world symmetrically. Build accordingly.