All posts
April 24, 2026 · inkode team

What 1,961 scans told us about AI-written code

We scanned 1,961 repositories in 18 days. 38.4% of first-time scans landed in D or F. Here is what the full dataset says — including the part of the AI gap we had to correct.

Of 1,961 repositories we scanned in 18 days, 38.4% landed in a D or an F.

That is a rough number, but it is a real one. You can’t fix what you can’t measure, so over the last three weeks we ran inkode — our code-quality scanner — against 1,961 different codebases. Most are public projects on GitHub. Some are private repos we scanned under NDA. We did not pick favourites. We did this to answer one question as cleanly as we could: what does the state of AI-assisted code actually look like in April 2026?

This post is the long answer. Skip to the end for the short one.

How the scan works

Before the numbers, a short methodology note so you can judge them.

inkode is one command-line tool. You install it with a single line, point it at a git repo, and wait. The tool runs 16 independent checks against the code. The checks fall into five groups:

  • Security — committed secrets, vulnerable dependencies (CVEs), Dockerfile and Kubernetes misconfigurations, unsafe shell scripts, swallowed errors.
  • Testing — is there any test file at all, and is a test framework installed.
  • Maintainability — duplication, dead code, magic numbers, tangled imports, TODO density, silent coupling between files.
  • Complexity — how branchy each function is, and how large each file is.
  • Change risk — which files change the most often in git history (hotspots).

Each check produces a sub-score. Those sub-scores feed one overall 0–100 number and an A–F grade. There is a 16th check, ai-stack, which just detects whether AI coding tools have left markers in the repo. It does not count toward the score.

We split the 1,961 scanned repos into two groups: 533 that show AI fingerprints (“AI-marked”) and 1,415 that don’t. AI-marked means at least one of the following shows up in the repo: a .cursor/ directory, a .claude/ directory, a CLAUDE.md or AGENTS.md file, a Co-Authored-By: Copilot or Co-Authored-By: Claude trailer in a commit, or a dependency on @anthropic-ai/sdk, openai, or a similar AI SDK.

One sample caveat we’ll repeat: this dataset over-indexes open-source Rust and Python projects. Rust alone is 36% of the scanned set. Your private startup monorepo may look different from what we describe here. We still think the patterns are instructive; we want you to hold that skew in mind.

Claim 1 — 23.1% of scanned repos have at least one committed secret

Across all 1,961 repositories (n=1,961), almost one in four has a password, API key, access token, or similar credential sitting directly in the source code. Not in a secrets vault. Not in an environment variable. In a file any repo reader can open.

Here is what that looks like in the scanner output:

⚠ Secrets             14 findings      1.2s
  → .env.example:5    AWS access key (AKIA...)
  → src/config.js:12  hardcoded API key
  → Dockerfile:8      DB password in ENV line

Secrets getting committed is not a new problem. Developers have been pushing keys to GitHub by accident for a decade. What is new is the rate — see Claim 3.

Claim 2 — AI-marked repos score 7–8 points lower than non-AI repos, after controlling for size

This is the claim that took us the longest to write honestly, so we will show the work.

The raw cohort number is this: AI-marked repos average 19.6 points lower on the overall score than non-AI repos. If you stop there, you walk away thinking “AI-assisted code is a full letter grade worse than non-AI code”. That conclusion is not what the data says.

Why not? Because AI-marked repos are, on average, larger. And bigger codebases score lower regardless of who wrote them — the score punishes total findings, and bigger codebases accumulate more of them.

When we bucket both groups by codebase size (number of files) and compare inside each bucket, the gap drops to 7 to 8 points. The shape is the same in every band:

Repo size (files)Non-AI avgAI-marked avgGap
<10090.583.1−7.4
100–50071.063.4−7.6
500–2,00061.253.5−7.7
2,000+53.945.8−8.2

So the honest version: AI-marked code carries roughly a half-grade penalty compared to non-AI code of the same size. Real, consistent, but much smaller than the raw cohort number suggests.

A note on the −19.6 figure. You may see it in other places, including our own earlier write-ups. It is arithmetically correct. It is just not the right number to lead with if you are trying to understand the world. The size-controlled −7 to −8 is.

Claim 3 — Secrets are committed at 2.7× the rate in AI-marked repos

This is the sharpest cohort difference in the dataset, and it does not suffer from the size confound.

42.6% of AI-marked repos have at least one committed secret. 15.7% of non-AI repos do (n=533 vs n=1,415). That is a 2.7× ratio.

Why doesn’t size explain it this time? Because the metric is yes/no per repo: does this repo contain at least one committed secret? A bigger codebase doesn’t automatically flip a “yes”. Plenty of large repos in our sample have zero committed secrets. Plenty of tiny AI-marked ones have several.

The likely mechanism: AI tools do not know which strings are sensitive. If you write “use Stripe API key sk_live_abc123 for this webhook” in a prompt, the AI will do what a compiler would do. It pastes the string into a real file and commits it. An experienced engineer would put the key in a .env file, add .env to .gitignore, and commit an .env.example with the key scrubbed. The AI does not do that unless you ask, every time.

Claim 4 — 38.4% of first-time scans land in D or F

“First-time scan” means the very first inkode run for a unique project. No cherry-picking. Across 1,960 first scans:

GradeShare
A28.3%
B17.4%
C15.9%
D13.7%
F24.7%

There is a counterintuitive beat inside that table. Most founders, asked before they scan, guess their repo is “at worst a C”. The C bucket is the smallest of the bottom three. F is almost twice as common as C. Some of this is sample skew. Some of it is scoring calibration for large codebases (we will come back to that in another post). Some of it is just that real code has more findings than people expect.

Claim 5 — The ai-stack check is a top-two score predictor, and it doesn’t even feed the score

The ai-stack check is informational. It detects whether AI tools have left markers in a repo. It does not produce findings that lower the grade. It does not feed the score.

And yet, when we compute the Pearson correlation between each check’s output and the final score, ai-stack sits second on the list:

CheckCorrelation with score (ρ)
hotspot−0.535
ai-stack−0.438
magic-numbers−0.380
line-count−0.377
coupling−0.330

(n=1,948)

In plain terms: just knowing whether a repo has AI fingerprints predicts its score almost as well as measuring a real, direct problem. This is the strongest single piece of empirical support we have for “AI-assisted code carries more findings today than non-AI code”. Some of this correlation is the size effect. After partial correction, the signal still holds.

What we are NOT claiming

Four things that would be tempting, and wrong, to say.

Tool-by-tool rankings. We can split Cursor, Claude, Claude Code, GitHub Copilot, and generic “AI Agents” repos in our data. The F-rate spread across these five tools is 6 percentage points. The score spread is 2.5 points. The ranking could flip at a bigger sample size. Until we have at least 200 repos per tool, we are not going to rank them.

Score change over time. Too early. We have four weekly buckets, and the first is dominated by our seed import, which biases the trendline. Ask us again in six months.

Old code is worse than new code. The correlation between repo age and score is ρ = −0.008 (n=1,064). That is not a correlation. “Your scanner just punishes legacy code” is a critique we checked early. The data does not support it.

AI tools write bad code. We do not say that, and we do not believe the data supports it. The data says AI-marked repos trip more of our specific checks. That is a smaller claim. It is also the correct one.

What to do now

If you’ve read this far, the next step is 60 seconds of typing.

curl -fsSL https://inkode.co/install.sh | sh
ik init
ik run

No signup. No source code uploaded. Just a score and a shareable link when the scan finishes.

Scan your repo — free

dataAIresearch
Scan your repo

Know what you shipped.

Install the CLI and see your score in under a minute. No account required.

Get started Book a review