repowise Health Score Beats Commercial Tool: 2.3x Defect Rec

repowise Health Score Beats Commercial Tool: 2.3x Defect Recall

A new open-source tool, repowise, claims its code health score predicts bugs 2.3x better than a leading commercial tool. Built with 25 deterministic biomarkers and validated against 2,770 files, it achieves a 0.74 ROC AUC across 9 languages.

3 min readJun 8, 2026

repowise Health Score Beats Commercial Tool: 2.3x Defect Recall

The Benchmark: 2.3x More Defects Found

Most code health scores are untested. repowise ran its score against 2,770 files across 9 languages and compared it to a leading commercial tool. The result: repowise surfaces 2.3x the defects under a fixed review budget.

What the Score Is

Every file gets a 1–10 score from 25 deterministic biomarkers:

McCabe complexity
deep nesting
brain methods
class cohesion (LCOM4)
god classes
native Rabin-Karp clone detection
untested hotspots
function-level churn
code-age volatility
ownership dispersion
change entropy
co-change scatter
prior-defect history
test-quality smells

No LLM calls, no cloud, no new runtime dependency. Pure Python over tree-sitter and git data, finishing in under 30 seconds on a 3,000-file repo.

Validation Methodology

To avoid leakage, health scores were collected at a historical commit (T0), then bug-fixing commits were counted over the following 6 months. The score never sees the future.

Across 21 open-source repos:

Cross-project mean ROC AUC: 0.74 [95% CI 0.68 to 0.79]
Up to 0.90 on individual repos
Survives controlling for file size (partial Spearman rho = -0.16)
Outperforms recent churn by +0.10 AUC and prior-defect history by +0.12 AUC (DeLong p < 1e-9)
Holds on external PROMISE/jEdit dataset: AUC 0.76–0.78

Head-to-Head Results

Axis	repowise	Commercial tool
Recall @ 20%-of-lines budget	0.173	0.074
Effort-aware ranking (Popt)	0.607	0.462
Defect density (Alert:Healthy)	2.18x	0.56x
Discrimination (ROC AUC)	0.731	0.705

All paired and significant (p = 0.003 for density).

The Four Other Layers

repowise has five layers total, all exposed via MCP tools:

Graph: tree-sitter dependency graph, 15 languages, Leiden communities, PageRank.
Git: hotspots, ownership, co-change pairs, bus factor.
Docs: LLM-generated wiki per module, incremental updates, hybrid RAG search.
Decisions: architectural decisions mined from 8 sources, evidence-backed.
Health: the score described above.

Agent Integration

Paired SWE-QA runs with and without MCP tools showed:

70% fewer tool calls
89% fewer file reads
36% lower cost per query
answer quality at parity

Feeding an agent a commit through get_context costs 2,391 tokens vs 64,039 raw (27x fewer).

How to Use

pip install repowise
cd your-project
repowise init        # builds all five layers
repowise serve       # MCP server + local dashboard

The graph, git, dead-code, and health layers build in minutes with zero LLM calls. Use --index-only for a queryable index almost immediately. After that, every commit-triggered update takes under 30 seconds.

100% local, bring your own API key, AGPL-3.0.

What You Should Do Now

Clone the repo (github.com/repowise-dev/repowise) and run the health-defect benchmark on your own codebase. The harness is public so you can reproduce or break it.

Editor's Take

I've been burned by code quality tools that felt like black boxes—numbers go up, but you never know if they mean anything. repowise's methodology is refreshing: they actually correlated scores with future bugs. The 2.3x improvement over a commercial tool is impressive, but I'd want to see independent replication on more datasets. Still, the fact that it's open-source and deterministic means I can audit and trust it. I'll be testing it on my next project.

— DevDigest Editorial

Key Takeaways

•Run repowise on your repo to identify the top 20% of files that cause 60%+ of future bugs.
•Use the MCP tools to give your AI coding agent context-aware code health data, reducing token costs by 27x.
•Pair repowise with your CI pipeline to flag declining health trends before they become bugs.

Why It Matters

If you maintain a large codebase, you spend time reviewing files that seem messy but aren't actually bug-prone. repowise gives you a validated, deterministic score that prioritizes the files most likely to break, saving review effort and reducing production incidents.

#developer-tools#open-source#static-analysis#code-health#bug-prediction

Get the weekly digest

Every Sunday - top tech stories, industry breakthroughs, and developer tools delivered to your inbox.

No spam, unsubscribe anytime.

repowise Health Score Beats Commercial Tool: 2.3x Defect Recall

The Benchmark: 2.3x More Defects Found

What the Score Is

Validation Methodology

Head-to-Head Results

The Four Other Layers

Agent Integration

How to Use

What You Should Do Now

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

Using unsafe to eliminate Go bound checks for 2x speedup

OpenTelemetry & SigNoz: Instrumenting a Gemini-Powered GitHub Analyzer

DIY V-I Plots: Capturing Real Diode and MOSFET Curves at Home

Kiro CLI Context Rot: Why Sessions Degrade and How to Fix It

Google Cloud Revenue Jumps 82% as Pichai Defends AI Progress

Safari Technology Preview 248 Adds BigInt Math and CSS Progress() No-Clamp