frequency_analysis
Clusters code by AST shape — finds patterns in how code is written, not just what it says.
A Different Kind of Search
Most code search tools answer “where is this string?” or “where is this symbol?” frequency_analysis answers a fundamentally different question: “what shapes does our code take?”
It works by extracting the AST (abstract syntax tree) structure of every function, stripping away identifiers and literals, and clustering the resulting shapes by frequency. The output tells you which structural patterns recur across the codebase, how many functions follow each pattern, and where they live.
What It Reveals
When we ran frequency_analysis against a 45-repo corpus during internal testing, the results exposed patterns invisible to text search:
- 40% of async functions followed the same try/catch/log structure. Same shape, different variable names, different error messages. This revealed a boilerplate pattern that should have been a shared utility.
- Error handling was inconsistent across modules. Some modules caught and rethrew, others caught and logged, others caught and returned null. The frequency distribution made this immediately visible.
- Convention adoption was measurable. After introducing a new pattern (e.g., structured error responses), frequency analysis showed what percentage of functions had adopted it versus still using the old pattern.
No Native Equivalent
There is no combination of grep, ripgrep, or file reading that can replicate this. Text search operates on characters. AST shape clustering operates on structure. A function that looks completely different textually — different variable names, different string literals, different comments — can have an identical AST shape. Only structural analysis detects this.
What the Output Contains
Each cluster in the output includes:
- AST pattern description — a human-readable summary of the structural shape
- Frequency count — how many functions match this shape
- Representative examples — file paths and function names for the top matches in each cluster
- Shape hash — a structural fingerprint for programmatic tracking
When to Use It
frequency_analysis is a codebase insight tool. It is most valuable in these situations:
- Boilerplate detection. Clusters with high frequency (10+ functions sharing the same shape) are candidates for extraction into shared utilities or higher-order functions.
- Consistency measurement. After establishing a coding convention, run frequency analysis to see what percentage of the codebase follows it. Track this over time.
- Onboarding. Understanding the dominant patterns in a codebase tells you how to write new code that fits. If 80% of service methods follow pattern X, your new service method should too.
- Quality mining. When combined with
search_patterns, frequency analysis reveals not just known anti-patterns but emergent anti-patterns — structural shapes that correlate with bugs but have not been codified as rules yet.
This tool complements find_clones, which finds pairwise duplication. frequency_analysis finds systemic patterns — shapes that repeat across dozens of functions, not just pairs. It operates at the architecture level rather than the code level.
Related tools
Benchmark note
This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task.
For some tools, that baseline is a direct shell equivalent such as rg or find.
For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command.
Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.