Analysis No native equivalent

`frequency_analysis`

Name: CodeSift
Author: CodeSift

Clusters code by AST shape — finds patterns in how code is written, not just what it says.

A Different Kind of Search

Most code search tools answer “where is this string?” or “where is this symbol?” frequency_analysis answers a fundamentally different question: “what shapes does our code take?”

It works by extracting the AST (abstract syntax tree) structure of every function, stripping away identifiers and literals, and clustering the resulting shapes by frequency. The output tells you which structural patterns recur across the codebase, how many functions follow each pattern, and where they live.

What It Reveals

When we ran frequency_analysis against a 45-repo corpus during internal testing, the results exposed patterns invisible to text search:

40% of async functions followed the same try/catch/log structure. Same shape, different variable names, different error messages. This revealed a boilerplate pattern that should have been a shared utility.
Error handling was inconsistent across modules. Some modules caught and rethrew, others caught and logged, others caught and returned null. The frequency distribution made this immediately visible.
Convention adoption was measurable. After introducing a new pattern (e.g., structured error responses), frequency analysis showed what percentage of functions had adopted it versus still using the old pattern.

No Native Equivalent

There is no combination of grep, ripgrep, or file reading that can replicate this. Text search operates on characters. AST shape clustering operates on structure. A function that looks completely different textually — different variable names, different string literals, different comments — can have an identical AST shape. Only structural analysis detects this.

What the Output Contains

Each cluster in the output includes:

AST pattern description — a human-readable summary of the structural shape
Frequency count — how many functions match this shape
Representative examples — file paths and function names for the top matches in each cluster
Shape hash — a structural fingerprint for programmatic tracking

When to Use It

frequency_analysis is a codebase insight tool. It is most valuable in these situations:

Boilerplate detection. Clusters with high frequency (10+ functions sharing the same shape) are candidates for extraction into shared utilities or higher-order functions.
Consistency measurement. After establishing a coding convention, run frequency analysis to see what percentage of the codebase follows it. Track this over time.
Onboarding. Understanding the dominant patterns in a codebase tells you how to write new code that fits. If 80% of service methods follow pattern X, your new service method should too.
Quality mining. When combined with search_patterns, frequency analysis reveals not just known anti-patterns but emergent anti-patterns — structural shapes that correlate with bugs but have not been codified as rules yet.

This tool complements find_clones, which finds pairwise duplication. frequency_analysis finds systemic patterns — shapes that repeat across dozens of functions, not just pairs. It operates at the architecture level rather than the code level.

Related tools

find_clones search_patterns ast_query

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.