I Scanned 42 Public SaaS Startup Repos. 83% Had AWS Key Patterns.
Most of them aren't real keys. But the noise itself is the story — and it's why regex-only scanners are losing the false-positive war.
I cloned 45 public SaaS startup repos — PostHog, Supabase, Cal.com, Plausible, n8n, Strapi, Medusa, Twenty, and 37 more — pointed our scanner at them, and let it run.
42 of the 45 cloned successfully. Across them I scanned 218,328 source files. After running our most-precise rule and a Claude-based false-positive filter, 35 of the 42 repos (83%) still surfaced strings that look like AWS access keys.
That number is more interesting than it sounds. Some of those are real. Most of them aren't. What's in the noise — and why even a tuned scanner can't fully separate signal from it — tells you more about the state of code-security tooling than the headline does.
Why I ran this
We've spent four rounds tightening false-positive rates on our internal benchmark. On labeled fixtures the scanner now produces 4 findings out of 181 originally flagged — a precision number we're proud of, and the topic of an earlier post comparing us to Semgrep on a public corpus. But labeled fixtures are tiny, hand-curated samples. Real-world monorepos have thousands of test files, generated code, fixture data, and docs. I wanted to see what the same scanner does in the wild.
The headline finding: even with our most-precise rule and an AI false-positive filter, real codebases still produce a stream of ambiguous matches. That stream is the work item every security team spends most of their time on. It's the actual product problem.
Methodology, in one block
Before any numbers, here's exactly what I did. The full script — corpus-scan.js plus the repo list and README — is published as a public gist; you can re-run it on your own corpus.
- Corpus. 45 hand-picked public OSS SaaS repos. Selection criteria: real product code (not SDK shells), permissive license, well-known company. Categories: auth, analytics, CMS, e-commerce, messaging, low-code, observability, etc.
- Scanner. The same regex + entropy engine that ships in our CLI. I cloned each repo shallow, walked source files (skipping
node_modules, build outputs, lockfiles, and any file over 1MB), and ran them through the rule engine. - Rule subset. Six hand-picked, high-precision rules — VC001 (hardcoded AWS keys), VC005 (unprotected Stripe webhook), VC094 (command injection via
child_process.exectemplate literal), VC152 (webhook missing signature verification), VC153 (CORS reflecting origin without allowlist), VC158 (API route serving user-scoped data without an ownership check). Entropy-based secret detection was disabled — too noisy on real codebases without further filtering. - AI filter. Every surviving finding ran through
filterFalsePositives— the same Claude Haiku-backed pass our paying customers see. The filter caps at 50 findings per repo; the cap is documented in the output JSON. - Anonymization. No company name, file path, or line number from the corpus appears anywhere in the published output. Stats are aggregated; per-repo identity stays private.
The numbers
Three runs, progressively more filtered:
Full ruleset (206 rules + entropy): raw findings: ~12,700 (across 3 sample repos) by severity: 2,334 critical / 6,512 high / 3,036 medium / 818 low median per repo: 4,034 signal-to-noise: dominated by FPs Trusted subset (6 hand-picked rules, no entropy): raw findings: 2,069 (across 42 repos) reduction: ~95% rules that fired: 1 (VC001 — AKIA prefix) Trusted + Claude FP filter: surviving findings: 1,355 AI removal rate: 34.5% (714 classified as FP) repos with ≥1 hit: 35 of 42 (83%) median per repo: 8.5 4 repos: >100 hits each
The baseline is what the full scanner produces on real code, untrimmed. A typical SaaS monorepo throws ~4,000 findings. Most are false positives. This is the noise problem every scanner inherits when it leaves the lab.
Tightening to six well-tuned rules cut findings by 95%. Adding the AI filter on top removed another 34.5% of what was left. We end with 1,355 findings across 42 repos — all from one rule (VC001, the AWS access-key pattern), all marked critical, distributed unevenly across the corpus:
Repos by surviving-findings count 0 findings ████████ 7 repos (17%) 1–5 findings █████████ 8 repos (19%) 6–20 findings ███████████████████ 15 repos (36%) 21–100 findings █████████ 8 repos (19%) 100+ findings █████ 4 repos (10%)
Seven repos came back clean. Eight had a handful of hits. The center of the distribution is the 6–20 bucket — 36% of repos. Four repos blew past 100; those are typically large monorepos with extensive test suites that sprinkle AKIA… strings through fixture data.
Why so many “keys” aren't keys
VC001 matches the canonical AWS access-key prefix AKIA[A-Z0-9]{16}. The pattern is narrow enough that on labeled fixtures it's near-zero false-positive. On real public SaaS repos, it lights up everywhere. The reasons are mundane:
// from a real test fixture (anonymized):
const TEST_CREDENTIALS = {
accessKeyId: "AKIAIOSFODNN7EXAMPLE", // public AWS docs example
secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
};
// from a regression test:
expect(maskKey("AKIAIOSFODNN7EXAMPLE")).toBe("AKIA****EXAMPLE");
// from a tutorial markdown rendered into HTML:
<code>aws_access_key_id = AKIA********EXAMPLE</code>AKIAIOSFODNN7EXAMPLE is the canonical placeholder from the AWS documentation. It appears in test fixtures, mocks, examples, and tutorials across most cloud-aware codebases. AKIA****EXAMPLE shows up in masking tests. AKIA0123456789ABCDEF in API documentation. These aren't accidental commits — they're intentional, public, used as negative-control data in the codebase's own tests.
Of the 2,069 raw VC001 hits across 42 repos, the AI filter classified 714 of them as obvious FPs (test fixtures, public examples, masking-output assertions). It left 1,355 alone — but “left alone” doesn't mean “real key.” It means the model couldn't cleanly tell from the surrounding code whether the AKIA string was a fixture or production data. Some of those 1,355 are real. Most are test fixtures whose context wasn't obvious enough to call.
That's the actual problem. The signal-to-noise ratio of regex security scanning on real codebases is bad even when the regex is precise. Pattern matching alone can't answer “is this a real secret” without context — what file is it in, is the variable used in production code paths, does anything dereference it for a network call. Static analysis can sometimes get there, but most scanners stop at the match.
What changes when AI is part of the pipeline
We've been shipping a Claude-based false-positive filter for paying customers since February. The 34.5% removal rate above is what it does, mid-flight, on real code. It looks at the matched string, the surrounding ten lines, the file name and path, and decides whether the finding is “real” or “FP.” It catches the obvious patterns: test fixture, masked example, public placeholder, type-check rather than secret comparison, dev-controlled constant rather than user input.
It's not magic. It missed enough that 1,355 findings still need a human eye. But the survivors are dramatically more triageable than 2,069 raw matches — and dramatically more triageable than the ~4,000 the full ruleset would have produced without the trusted subset.
The lesson, to me, isn't “every scanner needs AI.” It's that the scanner's job has changed. A good scanner now is a fast pattern matcher paired with a slow context understander. The pattern matcher casts a wide net cheaply. The context understander triages. Without that second stage, the user has to be the second stage, and most users — particularly the solo SaaS founders shipping AI-generated code — don't have the time or training for it.
That's the same argument we made when we looked at why traditional SAST tools fail on AI-generated code. The corpus data here is the empirical version of it.
Caveats and limitations
- Sample size. 42 repos is small. The distribution is suggestive, not definitive. We'd need 500+ to make claims about the broader public SaaS landscape with confidence; this is research, not a survey.
- Selection bias. The corpus is hand-picked OSS projects from well-known SaaS companies. They're likely better-instrumented for security than the median private startup. The real-world findings rate on private codebases is almost certainly higher.
- One rule fired. Of the six trusted rules, only VC001 produced hits. The other five (Stripe webhook unprotected, command injection, generic webhook signature, reflected CORS, IDOR) target very specific patterns; their absence on this corpus is expected and matches the precision-first design.
- AI filter cap. The filter reviews up to 50 findings per repo. Four repos had more than 100 raw findings; their tail wasn't reviewed and is included in the published count as-is. This is documented in the output JSON's methodology block.
- Anonymization isn't perfect. Categories with one repo (e.g.
crm,scheduling,automation) are guessable from the corpus list. We rolled them up where it mattered; readers paying close attention can deduce a few.
If you want to run your own
The corpus-scan script lives as a public gist — the runner, the curated repo list, and a README. It takes a list of repo URLs and produces an anonymized JSON exactly like the one this post is written from. If you maintain a SaaS company and want to know what your codebase looks like to a scanner like ours, you can run our scanner on your private repo too — npx xploitscan scan . uses the same engine without the corpus harness.
And if you'd rather skip the scanner and just want the artifact procurement teams ask for — a public Trust Page with your security stance, security contact, subprocessors, and self-attested compliance — we just shipped a free builder for that. Two minutes, no signup.
Frequently asked
What rules did you scan with?
Six hand-picked high-precision rules: VC001 (hardcoded AWS keys), VC005 (Stripe webhook missing constructEvent), VC094 (command injection via child_process.exec template literal), VC152 (webhook missing signature verification), VC153 (CORS reflecting origin without allowlist), and VC158 (API route serving user-scoped data without an ownership check). Entropy-based secret detection was disabled — too noisy on real codebases without further filtering. The full 206-rule scanner produces ~4,000 findings per real-world SaaS repo dominated by FPs, which is exactly why the trusted subset exists.
Are 1,355 hardcoded AWS keys really exposed in these repos?
No. The 1,355 number is matches that survived our AI false-positive filter, not confirmed real keys. Many of the survivors are still test fixtures, public AWS documentation examples, or masking-test assertions where the AI couldn't cleanly tell from the surrounding code whether the AKIA string was a fixture or production data. Some are real. Most are not. The actual story is the noise problem, not the headline.
Why didn't the other five rules fire?
VC005 (Stripe webhook unprotected) only matters if a repo handles Stripe webhooks. VC094 (command injection via child_process.exec template literal) is a narrow pattern post-FP-fix. VC152, VC153, and VC158 target very specific Next.js patterns. Their absence on this corpus matches the precision-first design — these rules are tuned to fire on real exploitable patterns, not generic API routes.
Can I run this on my own repo?
Yes. The same scanner that produced these stats runs locally as `npx xploitscan scan .` — no signup, nothing uploaded, no AI filter calls unless you set ANTHROPIC_API_KEY. The corpus-scan harness itself is published as a public gist (linked above) and accepts your own list of GitHub URLs.
Did you publish which company had which finding?
No. Per-repo URLs, file paths, and line numbers never enter the published output. Stats are aggregated, by category, and by rule. The corpus list is in our public repo for reproducibility, but no per-company claims are made in the post or the data file.
Why is this different from running Semgrep or GitHub secret scanning?
Same regex job, different stage two. Semgrep stops at the match; GitHub secret scanning stops at the match. Our scanner pairs a fast pattern matcher with a slow Claude-based context understander that classifies each finding as real or FP based on the surrounding code. On this corpus the AI filter removed 34.5% of trusted-rule matches the regex alone would have surfaced.
Scan your own repo for free
Same engine as the corpus scan. No signup, no upload — runs locally on your machine.
npx xploitscan scan .