The first time a friend ran ik on a codebase he actually cared about, his
feedback wasn’t “this finding is wrong.” It was worse, and more useful:
“This is a wall of findings. I don’t know where to start.”
He was right. A scanner that finds everything and ranks nothing has just handed you a second problem on top of the first. The raw output — fifteen checks, every finding, sorted by nothing in particular — is technically complete and practically paralysing. “Here are 6,000 things that might be wrong” is not an answer. It’s a shrug with a progress bar.
Two changes fixed it. One collapses everything into a single number. The other decides what you open next.
One grade you can read in a second
Before you read a single finding, you should know roughly how worried to be. So every scan produces one score, 0–100, and a letter grade — and the way it’s computed is designed to be honest under exactly the conditions where naive scoring lies.
It works in two layers. Each check produces a score: start at 100, subtract 15 for every error-severity finding and 5 for every warning, floor at zero. Those roll up into five weighted categories:
| Category | Weight |
|---|---|
| Security | 0.30 |
| Testing | 0.20 |
| Maintainability | 0.20 |
| Complexity | 0.15 |
| Change risk | 0.15 |
Security carries the most weight because a leaked secret is not the same kind of problem as a long function, and the grade should say so.
Two design decisions keep that number trustworthy:
A skipped check is never scored as zero. If a check couldn’t run — wrong language, missing input — it’s excluded from the math, not counted as a failure. Scoring a skip as zero would tank the grade for the wrong reason; the opposite (scoring it as 100) would inflate a clean-looking grade on a check that did nothing. Excluding it is the only honest option.
The weights redistribute. If a category has no data at all, its weight is spread proportionally across the categories that do — so a repo with no test-detectable structure isn’t silently penalised on a dimension we couldn’t measure. The grade reflects what we actually looked at, not what we wished we could.
That’s the one-second answer: a B means something specific, and it means the same thing whether we ran ten checks or fifteen.
A list that decides what you open next
The grade tells you how bad. It doesn’t tell you where to go. The original report led with “Top Offender Files” ranked by raw finding count — which is exactly the wall my friend was staring at, because finding count is a terrible proxy for what to fix first. A generated file with 200 trivial findings would sit above a source file with one leaked credential.
So the report now opens with a Remediation Priority list, ranked by risk rather than volume. Each file accumulates a score across the findings against it:
file_risk = Σ ( check_weight × severity_multiplier ) × file_type_weight
- Check weight encodes how much a finding type matters: secrets 3.0, dependency vulnerabilities 2.5, complexity 1.5, hotspot 1.2, coupling 1.0, duplication 0.8, line-count 0.5, test-presence 0.3.
- Severity multiplier doubles the weight for error-severity findings.
- File-type weight drops to 0.25 for markup, data, and built output —
.html,.json,.css, lock files and the like.
That last factor came straight out of dogfooding. The first time we previewed the
new list on our own repo, it was topped entirely by static-hugo-built/*.html
— our committed site build, where every page co-changes with every other at ~100%
confidence, so the coupling check carried them all to the top. Pure noise.
Down-weighting generated files to a quarter pushed them out and let real .go and
.js source rise.
Crucially, down-weighting a file doesn’t bury a severe finding: a real secret in an HTML file still surfaces, because the check weight for secrets is high. We deflated finding volume on generated files without deflating genuine risk.
The list caps at the top 30 files, and the long per-check tables that used to fill the page now collapse behind a “Show more” toggle — every row still in the document, just not in your face. You land on the report, read one grade, and see the thirty files most worth your time, hardest-hitting first.
The feature was the ranking, not the finding
The lesson my friend handed us is one every analysis tool eventually learns: the hard part isn’t detecting problems, it’s ordering them. Detection without prioritisation transfers the triage burden straight to the user — which is the job they wanted the tool to do.
So the answer to “where do I start?” is now literally the first thing the report shows: a grade that says how worried to be, and a ranked list whose top line is the file to open next. Same findings underneath. A completely different experience on top.
Want to see your codebase’s grade — and the one file to fix first? Install the CLI — one line, about a minute, no account required.