The Benchmark: 2.3x More Defects Found

Most code health scores are untested. repowise ran its score against 2,770 files across 9 languages and compared it to a leading commercial tool. The result: repowise surfaces 2.3x the defects under a fixed review budget.

What the Score Is

Every file gets a 1–10 score from 25 deterministic biomarkers:

  • McCabe complexity
  • deep nesting
  • brain methods
  • class cohesion (LCOM4)
  • god classes
  • native Rabin-Karp clone detection
  • untested hotspots
  • function-level churn
  • code-age volatility
  • ownership dispersion
  • change entropy
  • co-change scatter
  • prior-defect history
  • test-quality smells

No LLM calls, no cloud, no new runtime dependency. Pure Python over tree-sitter and git data, finishing in under 30 seconds on a 3,000-file repo.

Validation Methodology

To avoid leakage, health scores were collected at a historical commit (T0), then bug-fixing commits were counted over the following 6 months. The score never sees the future.

Across 21 open-source repos:

  • Cross-project mean ROC AUC: 0.74 [95% CI 0.68 to 0.79]
  • Up to 0.90 on individual repos
  • Survives controlling for file size (partial Spearman rho = -0.16)
  • Outperforms recent churn by +0.10 AUC and prior-defect history by +0.12 AUC (DeLong p < 1e-9)
  • Holds on external PROMISE/jEdit dataset: AUC 0.76–0.78

Head-to-Head Results

AxisrepowiseCommercial tool
Recall @ 20%-of-lines budget0.1730.074
Effort-aware ranking (Popt)0.6070.462
Defect density (Alert:Healthy)2.18x0.56x
Discrimination (ROC AUC)0.7310.705

All paired and significant (p = 0.003 for density).

The Four Other Layers

repowise has five layers total, all exposed via MCP tools:

  1. Graph: tree-sitter dependency graph, 15 languages, Leiden communities, PageRank.
  2. Git: hotspots, ownership, co-change pairs, bus factor.
  3. Docs: LLM-generated wiki per module, incremental updates, hybrid RAG search.
  4. Decisions: architectural decisions mined from 8 sources, evidence-backed.
  5. Health: the score described above.

Agent Integration

Paired SWE-QA runs with and without MCP tools showed:

  • 70% fewer tool calls
  • 89% fewer file reads
  • 36% lower cost per query
  • answer quality at parity

Feeding an agent a commit through get_context costs 2,391 tokens vs 64,039 raw (27x fewer).

How to Use

pip install repowise
cd your-project
repowise init        # builds all five layers
repowise serve       # MCP server + local dashboard

The graph, git, dead-code, and health layers build in minutes with zero LLM calls. Use --index-only for a queryable index almost immediately. After that, every commit-triggered update takes under 30 seconds.

100% local, bring your own API key, AGPL-3.0.

What You Should Do Now

Clone the repo (github.com/repowise-dev/repowise) and run the health-defect benchmark on your own codebase. The harness is public so you can reproduce or break it.