Fine-Tuning LLMs on 1990s Manuals: Style Transfer Works

A developer fine-tuned two 7-8B parameter instruct models on 37 million words from Microsoft manuals published between 1977 and 2005. The results: the fine-tuned models produced documentation that convincingly matched 1990s technical writing style, even for modern concepts like REST APIs. Total cost: $50.

The project used QLoRA (Quantized Low-Rank Adaptation) to create adapter files that modify model behavior without retraining all weights. The source corpus came from Bitsavers, a repository of scanned old computer manuals. After cleaning OCR artifacts with Python scripts and classifying paragraphs for intelligibility using Gemma 4 26B (cost: $8), the author split the text into 192,456 training examples, each capped at 512 tokens.

Models and Training Conditions

Two base models were tested: Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct, plus a Llama base model (non-instruct). Training was run on Runpod with an Nvidia B200 GPU ($6/hour). The author varied training volume (40k vs 192k examples), epochs (1 vs 3), and LoRA rank (8 vs 16). Adapters were exported as GGUF LoRA files and tested locally via Ollama.

Style Transfer Results

Three prompts were used to evaluate style transfer:

  1. Document malloc() – a function likely present in training data.
  2. Document a fictitious ConnectWifi() Win32 API – not in training.
  3. Explain REST API in 1990s Microsoft style – anachronistic test.

For malloc(), unmodified models output modern Markdown READMEs; fine-tuned models used period-correct structure (Synopsis, Return Value sections). For ConnectWifi(), only the 3-epoch model maintained the fiction and documented it as real; others broke character. The REST API test was most revealing: Qwen 2.5 7B fine-tuned on 192k examples produced a chapter opening resembling the Windows 2000 Resource Kit, using HTTP methods as verbs and formal headings. Llama 3.1 8B, in contrast, produced bland marketing prose—likely due to heavy RLHF reinforcement.

Rank and Epoch Interaction

Comparing rank 8 vs 16 on Qwen models (1 epoch), lower rank adapters committed more strongly to the corpus style, while rank 16 allowed more “escape” and produced hallucinations. The author notes: “the cheaper the adapter, the more honest the impersonation.” Combining 1 epoch with rank 16 increased hallucinations—the adapter was expressive enough to reach for related concepts but not reinforced enough to anchor on the prompt.

Practical Takeaways

  • Fine-tuning for style transfer is feasible on a budget ($50) and can produce small models (7-8B) that run locally on a MacBook Air.
  • The process requires high-quality, domain-specific training data. Cleaning OCR artifacts is non-trivial.
  • Base models (non-instruct) are unsuitable—they produce raw corpus text without answering prompts.
  • Fine-tuned models are not replacements for human writers; they lack judgment and need steering. But they can augment drafting or style review.

How to Reproduce

  1. Download OCR text files from Bitsavers (e.g., Microsoft collection).
  2. Clean with Python: strip indices, frontmatter, and OCR artifacts.
  3. Classify paragraph intelligibility using a cheap model (Gemma 4 26B cost $8).
  4. Split into chunks of ≤512 tokens, paired with synthetic instructions.
  5. Fine-tune with QLoRA on a rented GPU (e.g., Runpod B200 at $6/hr).
  6. Export adapter as GGUF LoRA, convert for Ollama, and test.

All code and data are not distributed due to licensing, but the methodology is repeatable with any corpus.

Why It Matters

Style transfer via fine-tuning is a practical way to enforce consistent tone and structure in generated documentation without expensive full training. For teams with legacy style guides, a small fine-tuned model could automate first drafts or review drafts for style compliance. The low cost ($50) makes it accessible to individuals and small teams.

Next Steps

If you have a corpus of in-house documentation, try fine-tuning a 7B model with QLoRA. Start with rank 8 and 1 epoch to see if style transfers. Compare instruct vs base models—you’ll likely want instruct. Then test on anachronistic concepts to gauge how well the style generalizes.