1.2 Million Messages, One Life Timeline
A developer named Drobinin spent years tracking his life through chat logs. He exported archives from Telegram, VK, Instagram, and Facebook—covering 2008 to 2024. The result: a structured personal CRM built from 1.2 million messages, 52,000 unique lemmas, and 5,695 conversation-days tagged with directional sentiment.
The Data Pipeline
Exporting and Parsing
Each platform has quirks. Instagram double-encodes Cyrillic through latin-1. Telegram assigns different internal message IDs between exports. Facebook E2E encryption scatters messages across three folders. VK exports everything without asking. Instagram doesn't differentiate broadcasts from personal chats.
Drobinin parsed all exports into a uniform tab-separated format. The corpus includes DMs, story interactions, follower graphs, and reply/mention graphs. For Twitter, he used reply graphs to filter out support requests and conference coordination.
Noise Filtering
His longest thread—486,000+ messages with a partner across ten years—breaks down as:
- 58.7% substantive text
- 28.4% short fillers
- 9.1% media
- 2.4% links
- 1.5% emoji-only
Filtering short messages was tricky. A three-word minimum would miss "he died". A denylist of "hahaha" and "noice" failed across languages. The solution: sample from five offset positions, frequency-count short tokens, review the top 80 manually, and pair a denylist with a protected set for life events.
After cleaning, the novelty rate—share of words never used before in any chat—plateaued at 6% six years ago. Most vocabulary was locked in by his early 20s.
Entity Resolution: Which Sasha?
People use multiple platforms with different usernames. Alexander becomes Al, Alex, Xander, Sandy, Alec—or Sasha, which is gender-neutral in Slavic languages. Morphological analyzers handle case inflection but not slang. NER models need hand-labeled training sets.
Drobinin used LLMs for name resolution. A prompt reads a chunk of messages and produces a structured JSON manifest with daily note bullets, entity facts, and a list of ambiguities ("msg 833006: 'John' without surname"). A deterministic Python script injects the bullets with provenance markers linking back to source messages.
Classification with LLMs
Keyword matching on first-person verbs ("bought", "moved") produced false positives. "I moved" to mom is relocation; in a friends' chat, it's interior design; after a breakup, an emotional milestone. Fine-tuning a BERT classifier would yield ~70-80% F1—and at 1.2M messages, 1% false positives means 12,000 fake events.
Drobinin used LLMs (Opus, Qwen3-30B-A3B locally via MLX) for classification. He ran 200+ sessions, roughly 15-20 billion tokens. On Opus, that's ~$15k. On a local M5 Pro, 10-15 weeks of continuous inference. The false-positive rate was under 1% on chunks below 6,000 messages.
A closure gate catches orphan wikilinks and duplicate citations. Sampling 5-10 outputs per batch checks against source. The model's self-reported confidence is never trusted.
Directional Sentiment
Standard sentiment assigns one polarity per message. But close friendships are warm by default—the signal is departure from baseline. Drobinin used 18 tags with three directional prefixes: my emotional state, counterpart's, and mutual.
He initially let the LLM free-tag, getting 5,700+ unique values like "WWDC-binge-mode". He redid it with the 18-tag system. Result: 66% of conversation-days are M:warm. 12.9% of conversations each month are transactional—but in March it's 17%, thanks to UK tax-year-end.
What the Data Shows
Message volume drops don't always mean friendship decay. Average message length can increase as relationships mature. Vocabulary overlap—Jaccard similarity of top-100 words—dropped from 69.5% to 8.7% in some relationships, indicating drifting interests.
A friendship shifting from M:playful to M:transactional across 18 months is a drift that's hard to notice one conversation at a time.
Technical Takeaways
- LLMs beat fine-tuned classifiers for noisy, multilingual, contextual classification—if you can afford the inference cost.
- Provenance tracking is critical for rollback. Every bullet links back to source messages via SQLite.
- Directional sentiment reveals relationship health where absolute sentiment fails.
The Code
# Pseudocode for LLM-based classification
chunk = load_messages(chat_id, start_msg, end_msg)
prompt = f"""
Analyze these messages. Return JSON with:
- daily_notes: list of {date, bullets, sentiment}
- events: list of {date, event_type, description}
- ambiguities: list of {msg_id, issue}
Messages: {chunk}
"""
response = llm.generate(prompt)
manifest = json.loads(response)
inject_bullets(manifest, provenance_store)
Why This Matters
Drobinin's approach shows how to turn noisy, multi-platform personal data into structured insights. For developers building personal analytics, CRM tools, or lifelogging apps, this is a blueprint: parse exports, filter noise, resolve entities with LLMs, and track provenance for debuggability.
Editor's Take
I've been meaning to do something similar with my own chat archives for years. The $15k Opus bill gave me pause—but the local MLX approach is promising. I might try a smaller model like Llama 3.1 8B first. The provenance tracking is a must: without it, you're flying blind when the LLM hallucinates. I think the biggest insight here is that directional sentiment beats absolute sentiment for relationship tracking. I'm going to steal that idea for my own project.




