Three weeks ago, we shared results from 1,961 repository scans. Now we’ve scanned 5,299.
The bigger dataset did not weaken the signal. It made it clearer.
AI-marked repositories still commit secrets much more often than non-AI repos. After adjusting for project size, they still score lower on average. And one surprising result stayed almost exactly the same: simply detecting AI usage is one of the strongest indicators of a lower code quality score.
Some earlier findings held up. Some changed once the sample got larger. A few things turned out to be wrong.
This post covers all of that.
The short version
From 5,299 repositories:
- 19.9% contained at least one committed secret
- AI-marked repos committed secrets 3.2× more often
- AI-marked repos scored 8–11 points lower than non-AI repos of the same size
- The
ai-stacksignal became the second-strongest predictor of a low score
At the same time, a few earlier conclusions no longer hold. Some language-specific trends flattened out, and one repo-age claim turned out to be based on bad data.
How the dataset works
inkode is a command-line tool that scans Git repositories.
It runs 20 separate checks and produces:
- a score from 0–100
- an A–F grade
- detailed findings
The dataset includes 5,299 unique repositories.
We split them into two groups:
| Group | Repositories |
|---|---|
| AI-marked | 1,007 |
| Non-AI | 4,282 |
A repository was considered “AI-marked” if it contained signs of AI assistant usage, including:
.cursor/.claude/CLAUDE.md- AI SDK dependencies
- commit trailers like
Co-Authored-By: Copilot
One new detail stood out this time:
84.5% of AI-marked repos contained at least one explicit AI commit trailer.
That matters because commit trailers are a much cleaner signal than config folders or dependencies. Someone — or some tool — intentionally added them.
The dataset also shifted over time. The earlier sample leaned heavily toward Rust projects. The larger sample is now mostly Go repositories.
That changed some of the language-level trends, but not the core AI vs non-AI comparisons.
Claim 1 — Almost 1 in 5 repos contain committed secrets
Across all scanned repositories:
19.9% had at least one committed secret
That includes things like:
- API keys
- passwords
- access tokens
- credentials inside config files
Example findings looked like this:
.env.example:5 AWS access key
src/config.js:12 hardcoded API key
Dockerfile:8 DB password in ENV line
This number is lower than the 23.1% we reported in the smaller sample.
The earlier dataset had more small internal or admin-style projects, where this problem appeared more often.
But the AI vs non-AI gap moved in the opposite direction.
Claim 2 — AI-marked repos commit secrets 3.2× more often
This is still the clearest result in the dataset.
| Repository type | Secret rate |
|---|---|
| AI-marked | 44.5% |
| Non-AI | 14.1% |
That is a 3.2× difference.
In the earlier dataset, the gap was 2.7×. Instead of shrinking as the sample grew, it became larger.
That surprised us.
Why project size does not fully explain this
This metric is binary:
Does the repository contain at least one committed secret?
Larger repositories do have more files, but that alone does not automatically create a “yes”.
Many large repositories had zero secrets. Many tiny AI-marked repos had several.
The pattern appears across all repository sizes.
Why this probably happens
The explanation is usually simple.
AI assistants do not know which strings are sensitive unless the user explicitly tells them.
If a prompt contains:
Use Stripe API key sk_live_abc123
the assistant may paste that value directly into source code.
An experienced engineer would normally:
- move the key into
.env - add
.envto.gitignore - commit a safe
.env.example
AI tools can do that too — but only if the prompt asks for it.
Claim 3 — After adjusting for repo size, AI repos still score lower
At first glance, the difference looks huge.
Average scores:
| Group | Average score |
|---|---|
| Non-AI | 83.4 |
| AI-marked | 57.7 |
That is a 25.7 point gap.
But that number alone is misleading.
Larger repositories almost always score lower because bigger codebases accumulate more findings over time. AI-marked repos also tend to be larger.
So we compared repositories inside the same size ranges.
| Repo size | Non-AI avg | AI avg | Gap |
|---|---|---|---|
| <100 files | 92.4 | 81.4 | -11.0 |
| 100–500 | 69.0 | 61.0 | -8.0 |
| 500–2,000 | 57.1 | 48.2 | -8.9 |
| 2,000+ | 50.0 | 42.1 | -7.9 |
After controlling for size, AI-marked repos still scored:
8–11 points lower in every bucket
That result became slightly stronger in the larger dataset.
The important part is this:
- the effect is real
- it appears consistently
- but it is much smaller than the raw 25-point headline number
Claim 4 — Detecting AI usage strongly predicts lower scores
One of the most surprising findings stayed almost unchanged.
The ai-stack check does not affect the final score.
It only detects signs of AI assistant usage.
Even so, it became the:
second-strongest predictor of a low score
| Check | Correlation with score |
|---|---|
| hotspot | -0.638 |
| ai-stack | -0.477 |
| coupling | -0.335 |
| line-count | -0.308 |
| import-graph | -0.307 |
In simple terms:
Knowing whether a repo uses AI predicts quality almost as well as measuring its busiest files.
Some of this comes from repo size. AI repos are often larger. But the signal still remains after correcting for that.
Claim 5 — Most AI-marked repos contain explicit AI commit trailers
This run introduced a cleaner way to measure AI-assisted coding.
Previously, we counted many different markers:
- config folders
- AI SDKs
- rules files
- commit trailers
But not all of those prove AI-generated code was actually written.
A dependency on an AI SDK only means the project called an LLM somewhere.
Commit trailers are different.
Example:
Co-Authored-By: Claude
That is an explicit signal added by a user or tool.
In this dataset:
84.5% of AI-marked repos had at least one AI commit trailer
The average AI-marked repository had trailers on 6.3% of commits.
Most teams are not using AI for every commit. They are using it selectively.
Going forward, we’ll likely treat commit trailers as the primary definition of “AI-assisted coding”.
What stayed true
Three earlier findings became stronger in the larger dataset:
1. The secrets gap widened
- Previously: 2.7×
- Now: 3.2×
2. ai-stack stayed a top predictor
- Previously: -0.438
- Now: -0.477
3. The size-controlled AI penalty remained
- Previously: 7–8 points
- Now: 8–11 points
What changed
Some earlier conclusions did not survive the larger sample.
The language story became weaker
We previously said JavaScript repos showed AI markers far more often than Rust repos.
That gap shrank significantly once the sample grew.
We would not repeat that claim today.
The cleaner language-level signal now comes from commit trailers:
The “D or F” rate dropped
Previously:
- 38.4% of first-time scans scored D or F
Now:
- 28.6%
Why?
The newer dataset contains many small, clean Go libraries that pulled scores upward.
Repo age does matter a bit
We originally reported almost no relationship between repository age and score.
That turned out to be wrong.
A bug had zeroed out repo age values in much of the earlier dataset.
With corrected data:
- older repos score slightly lower on average
- mostly because older repos also tend to be larger
The effect becomes much weaker after adjusting for size.
What we still cannot claim
There are still things we are deliberately avoiding over-claiming.
Per-tool rankings
We can already separate repos by tools like:
- Cursor
- Claude
- GitHub Copilot
- Claude Code
Some early differences are visible.
But the sample sizes are still too small to publish reliable rankings.
For example:
- per-tool secret rates currently range from 48% to 60%
- F-grade rates range from 64% to 80%
That spread is interesting, but not stable enough yet.
CLI usage patterns
The dataset also includes 758 CLI scans.
But most of them appear to come from:
- local testing
- repeated installs
- broken anonymous runs
Until that data is cleaned up, we are excluding it from public-facing claims.
Final takeaway
We scanned 5,299 repositories over 32 days.
The main findings stayed surprisingly consistent:
- AI-marked repos commit secrets more often
- they score lower even after adjusting for size
- and AI usage strongly predicts lower repository scores
But the larger dataset also made one thing clearer:
This is probably less about the AI tools themselves, and more about workflow habits.
AI assistants generate code very quickly. That changes the speed of development.
What has not changed yet is the review process around that code.
The teams getting the best results from AI are usually the ones still applying the same habits they used before:
- reviewing generated code carefully
- separating secrets from source
- checking architecture decisions
- cleaning up before commit
The assistant can write code fast. The responsibility to review it still belongs to the developer.