The Benchmark: 2.3x More Defects Found
Most code health scores are untested. repowise ran its score against 2,770 files across 9 languages and compared it to a leading commercial tool. The result: repowise surfaces 2.3x the defects under a fixed review budget.
What the Score Is
Every file gets a 1–10 score from 25 deterministic biomarkers:
- McCabe complexity
- deep nesting
- brain methods
- class cohesion (LCOM4)
- god classes
- native Rabin-Karp clone detection
- untested hotspots
- function-level churn
- code-age volatility
- ownership dispersion
- change entropy
- co-change scatter
- prior-defect history
- test-quality smells
No LLM calls, no cloud, no new runtime dependency. Pure Python over tree-sitter and git data, finishing in under 30 seconds on a 3,000-file repo.
Validation Methodology
To avoid leakage, health scores were collected at a historical commit (T0), then bug-fixing commits were counted over the following 6 months. The score never sees the future.
Across 21 open-source repos:
- Cross-project mean ROC AUC: 0.74 [95% CI 0.68 to 0.79]
- Up to 0.90 on individual repos
- Survives controlling for file size (partial Spearman rho = -0.16)
- Outperforms recent churn by +0.10 AUC and prior-defect history by +0.12 AUC (DeLong p < 1e-9)
- Holds on external PROMISE/jEdit dataset: AUC 0.76–0.78
Head-to-Head Results
| Axis | repowise | Commercial tool |
|---|---|---|
| Recall @ 20%-of-lines budget | 0.173 | 0.074 |
| Effort-aware ranking (Popt) | 0.607 | 0.462 |
| Defect density (Alert:Healthy) | 2.18x | 0.56x |
| Discrimination (ROC AUC) | 0.731 | 0.705 |
All paired and significant (p = 0.003 for density).
The Four Other Layers
repowise has five layers total, all exposed via MCP tools:
- Graph: tree-sitter dependency graph, 15 languages, Leiden communities, PageRank.
- Git: hotspots, ownership, co-change pairs, bus factor.
- Docs: LLM-generated wiki per module, incremental updates, hybrid RAG search.
- Decisions: architectural decisions mined from 8 sources, evidence-backed.
- Health: the score described above.
Agent Integration
Paired SWE-QA runs with and without MCP tools showed:
- 70% fewer tool calls
- 89% fewer file reads
- 36% lower cost per query
- answer quality at parity
Feeding an agent a commit through get_context costs 2,391 tokens vs 64,039 raw (27x fewer).
How to Use
pip install repowise
cd your-project
repowise init # builds all five layers
repowise serve # MCP server + local dashboard
The graph, git, dead-code, and health layers build in minutes with zero LLM calls. Use --index-only for a queryable index almost immediately. After that, every commit-triggered update takes under 30 seconds.
100% local, bring your own API key, AGPL-3.0.
What You Should Do Now
Clone the repo (github.com/repowise-dev/repowise) and run the health-defect benchmark on your own codebase. The harness is public so you can reproduce or break it.



