The Archaeology Playbook: How to Read Any Legacy Codebase
Somewhere on a hard drive sits a folder of low-resolution scans of Russian typewritten pages from the 1950s. The pages describe PP-BESM, the first high-level programming language compiler ever built in the Soviet Union, designed by Andrey Ershov. A developer who goes by xavxav is rebuilding it—not emulating, rebuilding line by line from the scans. The repo is real, the VM runs, the PP-3 phase has an initial pass. You can clone it.
That project is the extreme version of every "I cannot read this codebase" problem you will ever have at work. The PP-BESM author published a writeup that, stripped of Cold War aesthetic, reads like the cleanest manual on legacy codebase archaeology. This article generalizes those techniques for whatever inherited PHP, COBOL, Perl, or Java 6 repo is currently your problem.
1. Boundaries Before Internals
The first move on any unfamiliar codebase is not to read the code. Draw the boundary. For a web service: what HTTP routes exist, what each returns, what database tables get touched, what external APIs get called, what writes to disk. For a CLI: arguments, files read/written, exit codes. For a library: public API, dependencies, monkey-patches.
You can do this without understanding a single function inside. Example commands:
# HTTP routes for a Node service
grep -rE "router\.(get|post|put|delete)|app\.(get|post)" --include="*.{js,ts}" src/
# Database tables touched
grep -rE "FROM|UPDATE|INSERT INTO|DELETE FROM" --include="*.{sql,js,ts,py}" .
# External API calls
grep -rE "axios|fetch\(|http\.request" --include="*.{js,ts}" src/
# Files read or written
grep -rE "fs\.(read|write)|open\(" --include="*.{js,ts,py}" .
Write the answers down. This is your map. For PP-BESM, the boundary was the BESM machine model—xavxav reconstructed the instruction set from separate documents before touching the compiler source.
2. Build a Harness, Even a Bad One
The highest payoff move is to get any version of the code running in isolation, with one input and one observable output, before understanding any of it. For a web service: a docker-compose that spins up the app and database with a single curl. For a CLI: a one-liner running the binary with representative input. For a library: a five-line consumer.
# Minimal harness for a legacy Python script
mkdir -p harness
cat > harness/run.sh <<'EOF'
#!/bin/bash
cd "$(dirname "$0")/.."
python3 ./scary_script.py --input fixtures/sample.csv > /tmp/out.txt
diff /tmp/out.txt fixtures/expected.txt
EOF
chmod +x harness/run.sh
You now have a one-command loop. Every change can be tested against harness/run.sh. xavxav's harness for PP-BESM is the BESM virtual machine—more important than any single piece of compiler source.
3. Bisection Beats Reading Top to Bottom
Most legacy code is glue. The interesting logic lives in 10-20% of files. The fastest way to find it is bisection:
# What touched the database in the last year?
git log --since="1 year ago" --name-only --pretty=format: | grep -E "schema|migration|model" | sort -u
# Where do the longest files live?
find . -name "*.py" -not -path "*/node_modules/*" -exec wc -l {} \; | sort -rn | head -20
# What gets imported the most?
grep -rE "^import|^from" --include="*.py" . | awk '{print $2}' | sort | uniq -c | sort -rn | head -20
For PP-BESM, the bisection target was PP-3, the last compiler phase—the interesting unknown.
4. Naming as You Go
Every time you understand a function or variable, rename it immediately in a branch. You will forget what you understood. The rename is a note for future you.
// before
function process(x, y) {
const r = x.filter(z => z.s > y).map(z => z.id)
return db.query(r)
}
// after
function fetchActiveUserIdsAboveScore(users, threshold) {
const qualifyingIds = users
.filter(user => user.score > threshold)
.map(user => user.id)
return db.query(qualifyingIds)
}
A good rule: if you cannot rename a function meaningfully, you do not understand it yet. xavxav's rename pass on PP-BESM translated Russian identifiers to English.
5. Types as Living Documentation
If the codebase is dynamically typed, add types. Even loose types beat none, because types compile—comments don't.
type LineItem = { price: number; quantity: number; }
type TaxConfig = { taxRate: number; }
type Order = { items: LineItem[]; }
function calculateTotalWithTax(order: Order, config: TaxConfig): number {
return order.items.reduce((acc, item) => {
return acc + item.price * (config.taxRate + 1)
}, 0)
}
For Python, add type hints. For PHP, use PHPStan or Psalm. For old JS, migrate file by file to TypeScript with allowJs: true.
6. Tests as Ground Truth
Before refactoring, write tests that lock in observed behavior—even the bugs. Pin the bug with a test first, then refactor, then change the test deliberately.
def test_calculate_returns_negative_for_empty_orders():
"""BUG-LIKE: empty orders return -1 instead of 0.
Some downstream system depends on this. Do not change without
coordinating with billing."""
result = calculate([], TaxConfig(rate=0.1))
assert result == -1
xavxav's tests are small Soviet-era programs run through the VM with expected output captured.
7. Comment the Negotiations
Comments that survive a decade capture why a choice was made, especially when it looks weird.
# bad
TIMEOUT = 47
# good
# Set to 47 seconds because their auth gateway has a 50 second hard limit
# and we observed 1-2 second jitter from our load balancer. See incident
# 2024-03-15. Do not raise without coordinating with the partner team.
TIMEOUT = 47
Stitching the Playbook Together
The seven stages build on each other. Boundary work tells you where to put the harness. The harness lets you bisect. Bisection tells you what to name. Names tell you what to type. Types tell you what to test. Tests give you safety to comment confidently.
A practical first week:
- Day 1: Boundaries. Draw the map.
- Day 2: Harness. Get any version running with one command.
- Day 3: Bisection. Find the 10% that does the work.
- Day 4: Naming + types. Make the 10% readable.
- Day 5: Tests. Pin observed behavior before refactoring.
- Day 6: Document negotiations.
- Day 7: Refactor the load-bearing 10%.
The same loop runs at every scale—from a 70-year-old compiler on paper to a 12-year-old Rails app on GitHub. Clone the PP-BESM repo. Then apply the playbook to your own legacy code.




