Security Workflow comparison

`scan_secrets`

Name: CodeSift
Author: CodeSift

TruffleHog-derived rules with tree-sitter AST integration. Masked output, severity filtering, inline allowlist. The most dramatic benchmark result.

−99.8%

Token reduction

5× grep patterns

Native baseline

1 vs 5

Calls (CS vs native)

11,500 vs 1,637,000

Tokens (CS vs native)

The Most Extreme Benchmark Result

The 99.8% token reduction is not a typo. Here is why it is so dramatic.

The native approach to scanning for secrets is grep with patterns for common key formats: grep -rn "sk-" ., grep -rn "AKIA" ., grep -rn "ghp_" ., and so on. On a real codebase, these patterns match across node_modules, build artifacts, lock files, vendored dependencies, and test fixtures. A single grep -rn "sk-" on a medium-sized Node.js project returns matches from thousands of files in node_modules alone — SDK source code, documentation, example files, and test vectors that all contain strings matching the pattern.

The raw output from five common secret patterns totals approximately 1,637,000 tokens. That is 1.6 million tokens of noise. It overwhelms any context window and produces thousands of false positives that no agent or human can meaningfully triage.

scan_secrets produces 11,500 tokens of actionable findings. It achieves this through three mechanisms.

How It Filters the Noise

Scope exclusion. node_modules, build directories, lock files, and vendored code are excluded by default. These are the primary sources of false matches in grep-based scanning.

AST-aware context. Instead of matching raw text patterns, scan_secrets uses tree-sitter to understand where a match occurs in the code structure. A string literal assigned to a variable named apiKey in production source code is flagged. The same string appearing in a test mock, a documentation comment, or an SDK type definition is not.

~1,100 TruffleHog-derived rules. The rule set covers AWS keys, GitHub tokens, Stripe keys, JWTs, private keys, database connection strings, OAuth secrets, and hundreds of vendor-specific formats. Each rule includes entropy checks and format validation, not just prefix matching.

Masked Output

scan_secrets never returns raw secret values. Every finding is masked: you see the first and last few characters with the middle replaced. This means the scan results themselves are safe to include in logs, reports, and AI conversation context. You can share findings with your team without exposing the actual secrets.

Severity Filtering

Not all findings are equally urgent. The min_confidence and severity parameters let you control the signal-to-noise ratio:

scan_secrets(repo="local/my-project", min_confidence="high")
scan_secrets(repo="local/my-project", severity="critical")

High confidence findings have strong format validation (e.g., AWS keys with the correct prefix, length, and character set). Lower confidence findings may be generic high-entropy strings that could be secrets or could be hashes, UUIDs, or encoded data.

What the Output Contains

Each finding includes:

Rule name — which of the ~1,100 rules matched (e.g., aws-access-key, github-pat, stripe-secret)
File path and line number — exact location
Masked value — the secret with middle characters redacted
Severity and confidence — for prioritization
AST context — what kind of code construct contains the secret (variable assignment, function argument, config object)

When to Use It

Before pushing code. scan_secrets(repo, file_pattern="src/**") catches secrets before they reach the remote repository.
During security audits. Part of any /security-audit or /review workflow.
After onboarding a new codebase. Run it once to establish a baseline. Legacy codebases frequently contain hardcoded secrets from before secret management was adopted.
In CI pipelines. The allowlist feature (codesift-allow: reason) lets you suppress known false positives without disabling the scanner.

The combination of scope exclusion, AST awareness, and masked output makes this safe to run in any context. It cannot leak secrets and it will not drown you in node_modules noise.

Related tools

search_patterns search_text

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.