What 5,299 scans taught us about AI-written code

Three weeks ago we shared results from 1,961 scans. We've now scanned 5,299. The bigger dataset didn't weaken the signal — it made it clearer. AI-marked repos commit secrets at 3.2× the rate. After adjusting for size, they still score 8–11 points lower. Some earlier claims held up. A few changed.

Three weeks ago, we shared results from 1,961 repository scans. Now we’ve scanned 5,299.

The bigger dataset did not weaken the signal. It made it clearer.

AI-marked repositories still commit secrets much more often than non-AI repos. After adjusting for project size, they still score lower on average. And one surprising result stayed almost exactly the same: simply detecting AI usage is one of the strongest indicators of a lower code quality score.

Some earlier findings held up. Some changed once the sample got larger. A few things turned out to be wrong.

This post covers all of that.

The short version

From 5,299 repositories:

19.9% contained at least one committed secret
AI-marked repos committed secrets 3.2× more often
AI-marked repos scored 8–11 points lower than non-AI repos of the same size
The ai-stack signal became the second-strongest predictor of a low score

At the same time, a few earlier conclusions no longer hold. Some language-specific trends flattened out, and one repo-age claim turned out to be based on bad data.

How the dataset works

inkode is a command-line tool that scans Git repositories.

It runs 20 separate checks and produces:

a score from 0–100
an A–F grade
detailed findings

The dataset includes 5,299 unique repositories.

We split them into two groups:

Group	Repositories
AI-marked	1,007
Non-AI	4,282

A repository was considered “AI-marked” if it contained signs of AI assistant usage, including:

.cursor/
.claude/
CLAUDE.md
AI SDK dependencies
commit trailers like Co-Authored-By: Copilot

One new detail stood out this time:

84.5% of AI-marked repos contained at least one explicit AI commit trailer.

That matters because commit trailers are a much cleaner signal than config folders or dependencies. Someone — or some tool — intentionally added them.

The dataset also shifted over time. The earlier sample leaned heavily toward Rust projects. The larger sample is now mostly Go repositories.

That changed some of the language-level trends, but not the core AI vs non-AI comparisons.

Claim 1 — Almost 1 in 5 repos contain committed secrets

Across all scanned repositories:

19.9% had at least one committed secret

That includes things like:

API keys
passwords
access tokens
credentials inside config files

Example findings looked like this:

.env.example:5    AWS access key
src/config.js:12  hardcoded API key
Dockerfile:8      DB password in ENV line

This number is lower than the 23.1% we reported in the smaller sample.

The earlier dataset had more small internal or admin-style projects, where this problem appeared more often.

But the AI vs non-AI gap moved in the opposite direction.

Claim 2 — AI-marked repos commit secrets 3.2× more often

This is still the clearest result in the dataset.

Repository type	Secret rate
AI-marked	44.5%
Non-AI	14.1%

That is a 3.2× difference.

In the earlier dataset, the gap was 2.7×. Instead of shrinking as the sample grew, it became larger.

That surprised us.

Why project size does not fully explain this

This metric is binary:

Does the repository contain at least one committed secret?

Larger repositories do have more files, but that alone does not automatically create a “yes”.

Many large repositories had zero secrets. Many tiny AI-marked repos had several.

The pattern appears across all repository sizes.

Why this probably happens

The explanation is usually simple.

AI assistants do not know which strings are sensitive unless the user explicitly tells them.

If a prompt contains:

Use Stripe API key sk_live_abc123

the assistant may paste that value directly into source code.

An experienced engineer would normally:

move the key into .env
add .env to .gitignore
commit a safe .env.example

AI tools can do that too — but only if the prompt asks for it.

Claim 3 — After adjusting for repo size, AI repos still score lower

At first glance, the difference looks huge.

Average scores:

Group	Average score
Non-AI	83.4
AI-marked	57.7

That is a 25.7 point gap.

But that number alone is misleading.

Larger repositories almost always score lower because bigger codebases accumulate more findings over time. AI-marked repos also tend to be larger.

So we compared repositories inside the same size ranges.

Repo size	Non-AI avg	AI avg	Gap
<100 files	92.4	81.4	-11.0
100–500	69.0	61.0	-8.0
500–2,000	57.1	48.2	-8.9
2,000+	50.0	42.1	-7.9

After controlling for size, AI-marked repos still scored:

8–11 points lower in every bucket

That result became slightly stronger in the larger dataset.

The important part is this:

the effect is real
it appears consistently
but it is much smaller than the raw 25-point headline number

Claim 4 — Detecting AI usage strongly predicts lower scores

One of the most surprising findings stayed almost unchanged.

The ai-stack check does not affect the final score.

It only detects signs of AI assistant usage.

Even so, it became the:

second-strongest predictor of a low score

Check	Correlation with score
hotspot	-0.638
ai-stack	-0.477
coupling	-0.335
line-count	-0.308
import-graph	-0.307

In simple terms:

Knowing whether a repo uses AI predicts quality almost as well as measuring its busiest files.

Some of this comes from repo size. AI repos are often larger. But the signal still remains after correcting for that.

Claim 5 — Most AI-marked repos contain explicit AI commit trailers

This run introduced a cleaner way to measure AI-assisted coding.

Previously, we counted many different markers:

config folders
AI SDKs
rules files
commit trailers

But not all of those prove AI-generated code was actually written.

A dependency on an AI SDK only means the project called an LLM somewhere.

Commit trailers are different.

Example:

Co-Authored-By: Claude

That is an explicit signal added by a user or tool.

In this dataset:

84.5% of AI-marked repos had at least one AI commit trailer

The average AI-marked repository had trailers on 6.3% of commits.

Most teams are not using AI for every commit. They are using it selectively.

Going forward, we’ll likely treat commit trailers as the primary definition of “AI-assisted coding”.

What stayed true

Three earlier findings became stronger in the larger dataset:

1. The secrets gap widened

Previously: 2.7×
Now: 3.2×

2. `ai-stack` stayed a top predictor

Previously: -0.438
Now: -0.477

3. The size-controlled AI penalty remained

Previously: 7–8 points
Now: 8–11 points

What changed

Some earlier conclusions did not survive the larger sample.

The language story became weaker

We previously said JavaScript repos showed AI markers far more often than Rust repos.

That gap shrank significantly once the sample grew.

We would not repeat that claim today.

The cleaner language-level signal now comes from commit trailers:

C++ and C showed the highest AI trailer rates
Go and Rust showed the lowest

The “D or F” rate dropped

Previously:

38.4% of first-time scans scored D or F

Now:

28.6%

Why?

The newer dataset contains many small, clean Go libraries that pulled scores upward.

Repo age does matter a bit

We originally reported almost no relationship between repository age and score.

That turned out to be wrong.

A bug had zeroed out repo age values in much of the earlier dataset.

With corrected data:

older repos score slightly lower on average
mostly because older repos also tend to be larger

The effect becomes much weaker after adjusting for size.

What we still cannot claim

There are still things we are deliberately avoiding over-claiming.

Per-tool rankings

We can already separate repos by tools like:

Cursor
Claude
GitHub Copilot
Claude Code

Some early differences are visible.

But the sample sizes are still too small to publish reliable rankings.

For example:

per-tool secret rates currently range from 48% to 60%
F-grade rates range from 64% to 80%

That spread is interesting, but not stable enough yet.

CLI usage patterns

The dataset also includes 758 CLI scans.

But most of them appear to come from:

local testing
repeated installs
broken anonymous runs

Until that data is cleaned up, we are excluding it from public-facing claims.

Final takeaway

We scanned 5,299 repositories over 32 days.

The main findings stayed surprisingly consistent:

AI-marked repos commit secrets more often
they score lower even after adjusting for size
and AI usage strongly predicts lower repository scores

But the larger dataset also made one thing clearer:

This is probably less about the AI tools themselves, and more about workflow habits.

AI assistants generate code very quickly. That changes the speed of development.

What has not changed yet is the review process around that code.

The teams getting the best results from AI are usually the ones still applying the same habits they used before:

reviewing generated code carefully
separating secrets from source
checking architecture decisions
cleaning up before commit

The assistant can write code fast. The responsibility to review it still belongs to the developer.