Detection Benchmark
We publish the raw numbers on how well our scanner performs against a labeled corpus of vulnerable and clean code samples. These results are regenerated on every commit so you are always seeing current data.
Last run: Jun 2, 2026 · Corpus size: 218 fixtures (130 vulnerable, 88 clean)
Trend over time
5 runs · since May 29, 2026Precision and recall on each benchmark run. The corpus grows over time, so a flat or rising line means detection kept pace as new fixtures were added.
Corpus: 218 → 218 fixtures. Deterministic scanner only (excludes the optional AI false-positive filter).
How we score
Every fixture in packages/cli/test-fixtures/ is labeled with expectedFindings (rules that must fire, with file and line range) and mustNotFire (rules that must not fire anywhere in the fixture). The runner scans each fixture and counts:
- True positive (TP) — a finding whose rule, file, and line fall inside an expected entry.
- False negative (FN) — an expected entry with no matching finding.
- False positive (FP) — a finding for a rule explicitly listed as mustNotFire on a clean fixture.
Findings on vulnerable fixtures for unrelated rules aren't counted as FPs — they're treated as adjacent observations, neither penalized nor rewarded. Precision and recall are micro-averaged across the whole corpus. The runner lives at scripts/benchmark.js in the repository, and a GitHub Action runs it on every pull request and every push to main.
About this baseline. The corpus is deliberately small and curated — growing it is ongoing work. A perfect score on a small corpus is not a claim that the scanner catches everything; it's the floor below which we will not regress. We will keep adding harder cases, and over time the score will become a more demanding indicator.
Per-rule scores
| Rule | TP | FP | FN | Precision | Recall | F1 |
|---|---|---|---|---|---|---|
| VC001 | 5 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC003 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC005 | 2 | 0 | 1 | 100.0% | 66.7% | 80.0% |
| VC006 | 7 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC007 | 4 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC015 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC016 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC023 | 4 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC025 | 3 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC030 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC031 | 3 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC033 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC034 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC035 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC037 | 4 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC038 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC041 | 5 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC042 | 10 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC043 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC044 | 6 | 0 | 1 | 100.0% | 85.7% | 92.3% |
| VC045 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC046 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC047 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC048 | 3 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC050 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC051 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC052 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC054 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC055 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC057 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC058 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC059 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC060 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC062 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC063 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC072 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC073 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC074 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC075 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC077 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC078 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC079 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC081 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC082 | 4 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC083 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC086 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC088 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC090 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC091 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC094 | 7 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC132 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC133 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC135 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC143 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC146 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC152 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC153 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC156 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC166 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC168 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC178 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC184 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC185 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC186 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC189 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC191 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC192 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC194 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC197 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC198 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC200 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC201 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC203 | 2 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC204 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC206 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC207 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC208 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC209 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
| VC210 | 1 | 0 | 0 | 100.0% | 100.0% | 100.0% |
Only rules with at least one ground-truth entry in the corpus appear here. The other 132 rules don't have fixtures yet and are excluded from the score.
Head-to-head vs open-source scanners
| VC Rule | XploitScan | Semgrep | Sem? | Bearer | Bear? |
|---|---|---|---|---|---|
| VC001 | 100.0% | 0.0% | ✗ | 80.0% | ✓ |
| VC003 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC005 | 66.7% | 0.0% | ✗ | 66.7% | ✓ |
| VC006 | 100.0% | 0.0% | ✗ | 28.6% | ✓ |
| VC007 | 100.0% | 50.0% | ✓ | 75.0% | ✓ |
| VC015 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC016 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC023 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC025 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC030 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC031 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC033 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC034 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC035 | 100.0% | 50.0% | ✓ | 100.0% | ✓ |
| VC037 | 100.0% | 0.0% | ✗ | 50.0% | ✓ |
| VC038 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC041 | 100.0% | 20.0% | ✓ | 100.0% | ✓ |
| VC042 | 100.0% | 0.0% | ✗ | 10.0% | ✓ |
| VC043 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC044 | 85.7% | 14.3% | ✓ | 100.0% | ✓ |
| VC045 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC046 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC047 | 100.0% | 50.0% | ✓ | 50.0% | ✓ |
| VC048 | 100.0% | 0.0% | ✗ | 33.3% | ✓ |
| VC050 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC051 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC052 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC054 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC055 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC057 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC058 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC059 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC060 | 100.0% | 50.0% | ✓ | 100.0% | ✓ |
| VC062 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC063 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC072 | 100.0% | 100.0% | ✓ | 0.0% | ✗ |
| VC073 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC074 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC075 | 100.0% | 100.0% | ✓ | 0.0% | ✗ |
| VC077 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC078 | 100.0% | 100.0% | ✓ | 0.0% | ✗ |
| VC079 | 100.0% | 100.0% | ✓ | 50.0% | ✓ |
| VC081 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC082 | 100.0% | 50.0% | ✓ | 50.0% | ✓ |
| VC083 | 100.0% | 100.0% | ✓ | 0.0% | ✗ |
| VC086 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC088 | 100.0% | 0.0% | ✗ | 50.0% | ✓ |
| VC090 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC091 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC094 | 100.0% | 57.1% | ✓ | 71.4% | ✓ |
| VC132 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC133 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC135 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC143 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC146 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC152 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC153 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC156 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC166 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC168 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC178 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC184 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC185 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC186 | 100.0% | 50.0% | ✓ | 0.0% | ✗ |
| VC189 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC191 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC192 | 100.0% | 100.0% | ✓ | 0.0% | ✗ |
| VC194 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC197 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC198 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC200 | 100.0% | 0.0% | ✗ | 100.0% | ✓ |
| VC201 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC203 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC204 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC206 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC207 | 100.0% | 100.0% | ✓ | 100.0% | ✓ |
| VC208 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC209 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
| VC210 | 100.0% | 0.0% | ✗ | 0.0% | ✗ |
Methodology. All scanners run against the same 218-fixture labeled corpus. A VC rule counts as "covered" by another scanner if any of that scanner's rules fires within ±10 lines of our expected range in the correct file. We don't require rule-ID equivalence — the question is capability to detect the class of vulnerability, not taxonomy alignment.
Semgrep. Version 1.86.0 with community rulesets p/security-audit, p/owasp-top-ten, p/javascript, p/typescript, p/react. Semgrep Pro's proprietary rules would likely score higher — we compare against the free tier because it's what's available to everyone.
Bearer. , SAST scanner mode, Bearer's built-in security ruleset. Free OSS; no account required. Bearer's primary focus is PII data-flow analysis — its security rules are a secondary feature — so it's not an apples-to-apples comparison with a dedicated SAST tool, but it's what's on the market and free.
FP counting. Both Semgrep and Bearer use their own rule taxonomies, so we can't per-rule attribute FPs against our mustNotFire list the way we can with our own scanner. Any finding from those scanners on a clean fixture counts as an FP — a stricter interpretation that puts the third-party scanners at a precision disadvantage we acknowledge.
Spot a gap?
The corpus is open and the runner is in the repo. If you have a real-world vulnerability pattern that our scanner misses, open a PR with a fixture at packages/cli/test-fixtures/ or email admin@xploitscan.com. See the disclosure policy for anything you need to keep private.