Detection Benchmark

We publish the raw numbers on how well our scanner performs against a labeled corpus of vulnerable and clean code samples. These results are regenerated on every commit so you are always seeing current data.

Last run: Jun 2, 2026 · Corpus size: 218 fixtures (130 vulnerable, 88 clean)

Precision
100.0%
true positives / all findings
Recall
98.7%
findings / known vulns
F1 Score
99.3%
harmonic mean of P and R
Counts
148 / 0 / 2
TP / FP / FN

Trend over time

5 runs · since May 29, 2026

Precision and recall on each benchmark run. The corpus grows over time, so a flat or rising line means detection kept pace as new fixtures were added.

Recall98.7%
Precision100.0%

Corpus: 218218 fixtures. Deterministic scanner only (excludes the optional AI false-positive filter).

How we score

Every fixture in packages/cli/test-fixtures/ is labeled with expectedFindings (rules that must fire, with file and line range) and mustNotFire (rules that must not fire anywhere in the fixture). The runner scans each fixture and counts:

  • True positive (TP) — a finding whose rule, file, and line fall inside an expected entry.
  • False negative (FN) — an expected entry with no matching finding.
  • False positive (FP) — a finding for a rule explicitly listed as mustNotFire on a clean fixture.

Findings on vulnerable fixtures for unrelated rules aren't counted as FPs — they're treated as adjacent observations, neither penalized nor rewarded. Precision and recall are micro-averaged across the whole corpus. The runner lives at scripts/benchmark.js in the repository, and a GitHub Action runs it on every pull request and every push to main.

About this baseline. The corpus is deliberately small and curated — growing it is ongoing work. A perfect score on a small corpus is not a claim that the scanner catches everything; it's the floor below which we will not regress. We will keep adding harder cases, and over time the score will become a more demanding indicator.

Per-rule scores

RuleTPFPFNPrecisionRecallF1
VC001500100.0%100.0%100.0%
VC003200100.0%100.0%100.0%
VC005201100.0%66.7%80.0%
VC006700100.0%100.0%100.0%
VC007400100.0%100.0%100.0%
VC015100100.0%100.0%100.0%
VC016100100.0%100.0%100.0%
VC023400100.0%100.0%100.0%
VC025300100.0%100.0%100.0%
VC030100100.0%100.0%100.0%
VC031300100.0%100.0%100.0%
VC033100100.0%100.0%100.0%
VC034200100.0%100.0%100.0%
VC035200100.0%100.0%100.0%
VC037400100.0%100.0%100.0%
VC038200100.0%100.0%100.0%
VC041500100.0%100.0%100.0%
VC0421000100.0%100.0%100.0%
VC043100100.0%100.0%100.0%
VC044601100.0%85.7%92.3%
VC045100100.0%100.0%100.0%
VC046200100.0%100.0%100.0%
VC047200100.0%100.0%100.0%
VC048300100.0%100.0%100.0%
VC050100100.0%100.0%100.0%
VC051200100.0%100.0%100.0%
VC052100100.0%100.0%100.0%
VC054200100.0%100.0%100.0%
VC055100100.0%100.0%100.0%
VC057100100.0%100.0%100.0%
VC058100100.0%100.0%100.0%
VC059100100.0%100.0%100.0%
VC060200100.0%100.0%100.0%
VC062100100.0%100.0%100.0%
VC063100100.0%100.0%100.0%
VC072100100.0%100.0%100.0%
VC073100100.0%100.0%100.0%
VC074100100.0%100.0%100.0%
VC075100100.0%100.0%100.0%
VC077100100.0%100.0%100.0%
VC078100100.0%100.0%100.0%
VC079200100.0%100.0%100.0%
VC081100100.0%100.0%100.0%
VC082400100.0%100.0%100.0%
VC083100100.0%100.0%100.0%
VC086200100.0%100.0%100.0%
VC088200100.0%100.0%100.0%
VC090100100.0%100.0%100.0%
VC091100100.0%100.0%100.0%
VC094700100.0%100.0%100.0%
VC132100100.0%100.0%100.0%
VC133100100.0%100.0%100.0%
VC135100100.0%100.0%100.0%
VC143100100.0%100.0%100.0%
VC146100100.0%100.0%100.0%
VC152100100.0%100.0%100.0%
VC153100100.0%100.0%100.0%
VC156100100.0%100.0%100.0%
VC166100100.0%100.0%100.0%
VC168100100.0%100.0%100.0%
VC178100100.0%100.0%100.0%
VC184100100.0%100.0%100.0%
VC185100100.0%100.0%100.0%
VC186200100.0%100.0%100.0%
VC189100100.0%100.0%100.0%
VC191100100.0%100.0%100.0%
VC192100100.0%100.0%100.0%
VC194100100.0%100.0%100.0%
VC197200100.0%100.0%100.0%
VC198100100.0%100.0%100.0%
VC200100100.0%100.0%100.0%
VC201200100.0%100.0%100.0%
VC203200100.0%100.0%100.0%
VC204100100.0%100.0%100.0%
VC206100100.0%100.0%100.0%
VC207100100.0%100.0%100.0%
VC208100100.0%100.0%100.0%
VC209100100.0%100.0%100.0%
VC210100100.0%100.0%100.0%

Only rules with at least one ground-truth entry in the corpus appear here. The other 132 rules don't have fixtures yet and are excluded from the score.

Head-to-head vs open-source scanners

XploitScan F1
99.3%
210 rules, 218 fixtures
Semgrep F1
33.7%
community rules, TP 31 / FP 3 / FN 119
Bearer F1
53.7%
open-source SAST, TP 65 / FP 27 / FN 85
VC RuleXploitScanSemgrepSem?BearerBear?
VC001100.0%0.0%80.0%
VC003100.0%0.0%0.0%
VC00566.7%0.0%66.7%
VC006100.0%0.0%28.6%
VC007100.0%50.0%75.0%
VC015100.0%100.0%100.0%
VC016100.0%100.0%100.0%
VC023100.0%0.0%0.0%
VC025100.0%0.0%100.0%
VC030100.0%0.0%0.0%
VC031100.0%100.0%100.0%
VC033100.0%0.0%0.0%
VC034100.0%0.0%100.0%
VC035100.0%50.0%100.0%
VC037100.0%0.0%50.0%
VC038100.0%0.0%0.0%
VC041100.0%20.0%100.0%
VC042100.0%0.0%10.0%
VC043100.0%0.0%0.0%
VC04485.7%14.3%100.0%
VC045100.0%0.0%0.0%
VC046100.0%0.0%0.0%
VC047100.0%50.0%50.0%
VC048100.0%0.0%33.3%
VC050100.0%0.0%0.0%
VC051100.0%0.0%0.0%
VC052100.0%0.0%100.0%
VC054100.0%0.0%0.0%
VC055100.0%0.0%0.0%
VC057100.0%0.0%0.0%
VC058100.0%0.0%0.0%
VC059100.0%0.0%0.0%
VC060100.0%50.0%100.0%
VC062100.0%0.0%0.0%
VC063100.0%0.0%100.0%
VC072100.0%100.0%0.0%
VC073100.0%100.0%100.0%
VC074100.0%0.0%100.0%
VC075100.0%100.0%0.0%
VC077100.0%0.0%0.0%
VC078100.0%100.0%0.0%
VC079100.0%100.0%50.0%
VC081100.0%0.0%100.0%
VC082100.0%50.0%50.0%
VC083100.0%100.0%0.0%
VC086100.0%0.0%100.0%
VC088100.0%0.0%50.0%
VC090100.0%0.0%100.0%
VC091100.0%0.0%0.0%
VC094100.0%57.1%71.4%
VC132100.0%0.0%100.0%
VC133100.0%0.0%0.0%
VC135100.0%0.0%0.0%
VC143100.0%0.0%0.0%
VC146100.0%0.0%0.0%
VC152100.0%0.0%0.0%
VC153100.0%100.0%100.0%
VC156100.0%0.0%0.0%
VC166100.0%0.0%0.0%
VC168100.0%0.0%0.0%
VC178100.0%0.0%0.0%
VC184100.0%0.0%0.0%
VC185100.0%0.0%0.0%
VC186100.0%50.0%0.0%
VC189100.0%0.0%0.0%
VC191100.0%100.0%100.0%
VC192100.0%100.0%0.0%
VC194100.0%100.0%100.0%
VC197100.0%0.0%0.0%
VC198100.0%0.0%0.0%
VC200100.0%0.0%100.0%
VC201100.0%0.0%0.0%
VC203100.0%0.0%0.0%
VC204100.0%0.0%0.0%
VC206100.0%0.0%0.0%
VC207100.0%100.0%100.0%
VC208100.0%0.0%0.0%
VC209100.0%0.0%0.0%
VC210100.0%0.0%0.0%

Methodology. All scanners run against the same 218-fixture labeled corpus. A VC rule counts as "covered" by another scanner if any of that scanner's rules fires within ±10 lines of our expected range in the correct file. We don't require rule-ID equivalence — the question is capability to detect the class of vulnerability, not taxonomy alignment.

Semgrep. Version 1.86.0 with community rulesets p/security-audit, p/owasp-top-ten, p/javascript, p/typescript, p/react. Semgrep Pro's proprietary rules would likely score higher — we compare against the free tier because it's what's available to everyone.

Bearer. , SAST scanner mode, Bearer's built-in security ruleset. Free OSS; no account required. Bearer's primary focus is PII data-flow analysis — its security rules are a secondary feature — so it's not an apples-to-apples comparison with a dedicated SAST tool, but it's what's on the market and free.

FP counting. Both Semgrep and Bearer use their own rule taxonomies, so we can't per-rule attribute FPs against our mustNotFire list the way we can with our own scanner. Any finding from those scanners on a clean fixture counts as an FP — a stricter interpretation that puts the third-party scanners at a precision disadvantage we acknowledge.

Spot a gap?

The corpus is open and the runner is in the repo. If you have a real-world vulnerability pattern that our scanner misses, open a PR with a fixture at packages/cli/test-fixtures/ or email admin@xploitscan.com. See the disclosure policy for anything you need to keep private.