Name: CodeSift
Author: CodeSift

Benchmarking individual tools is useful. But AI agents don’t use tools individually — they use them in sequences. The real question isn’t “how efficient is search_text?” It’s “how efficient is the workflow?”

Methodology

We extracted tool call sequences from 188 real Claude Code agent sessions using n-gram analysis on usage logs. From hundreds of unique 2-step, 3-step, and 4-step sequences, we selected the 13 most frequent — each occurring at least 12 times across multiple sessions.

Each sequence was then benchmarked:

772 total runs across 33 real-world TypeScript/React/NestJS codebases (50 to 4,100+ files each)
Both approaches used the same queries and evaluated on token consumption
Native baseline: the equivalent Bash workflow (grep, find, read) that an agent without CodeSift would use
Win = CodeSift consumed fewer tokens than native for the same task

Results summary

Metric	Value
Total native tokens	5,130,240
Total CodeSift tokens	1,994,825
Aggregate reduction	−59%
Win rate	542 / 772 (70%)
Sequences tested	13
Codebases used	33

All 13 sequences

Tier 1: High reduction (−80% to −86%)

These sequences combine structured tools (symbol search, pattern matching) with text search. The structured tool identifies what’s relevant, text search finds how it’s used — without reading full files.

Sequence	Description	Runs	Reduction	Win rate
`ss→st`	Symbol discovery → usage search	65	−86%	63%
`pat→st→pat→st`	Extended pattern investigation	37	−86%	68%
`pat→st→pat`	Pattern-first investigation loop	39	−85%	77%
`st→ss`	Text orient → symbol narrow	58	−84%	67%
`st→pat→st→pat`	Text-first pattern investigation	35	−84%	66%
`st→ss→st`	Text → symbol → text refinement	27	−81%	67%
`st→pat→st`	Pattern sandwich — text bookended by patterns	40	−80%	63%

Why these win big: Symbol search (search_symbols) with detail_level="compact" returns ~15 tokens per result. Pattern search (search_patterns) uses AST matching, returning only structural matches. Both avoid dumping raw file content into the context window.

Tier 2: Good reduction (−68% to −76%)

File tree + search combinations. The agent maps the codebase structure first, then searches within it.

Sequence	Description	Runs	Reduction	Win rate
`st→tree→st`	Search → check structure → search again	27	−76%	89%
`tree→st`	Map the codebase → search within it	50	−68%	86%

Why these are the most reliable: st→tree→st has the highest win rate of any sequence at 89%. The file tree tool (get_file_tree with compact=true) returns a flat path list with symbol counts at ~10x less output than find -type f. The agent learns the project layout cheaply, then targets its search.

Tier 3: Moderate reduction (−26% to −39%)

Retrieval-heavy sequences. codebase_retrieval batches multiple query types (text, semantic, symbols) into one call — powerful, but the batched output is denser than individual tools.

Sequence	Description	Runs	Reduction	Win rate
`st→cr`	Text search → batch follow-up queries	91	−39%	81%
`st→cr→st`	Investigative loop: search, batch, refine	41	−39%	83%
`cr→st`	Batch query first → targeted follow-up	81	−32%	79%
`cr→st→cr→st`	Exploratory investigation (both expensive)	12	−26%	58%

Why these save less: codebase_retrieval returns structured results from multiple query types in one response. That’s already more efficient than 3-5 separate calls, but the response size is larger than a single targeted tool. The −26% outlier (cr→st→cr→st) represents truly exploratory investigation where the agent doesn’t know what it’s looking for — both approaches are expensive.

Real-world examples

Example 1: Symbol discovery workflow (`ss→st`)

An agent investigating a function in an i18n platform:

search_symbols("getLanguageName", detail_level="compact") — finds the function definition in 3 tokens
search_text("getLanguageName", file_pattern="*.service.ts") — finds all usage sites

Native approach: grep -rn "function getLanguageName" + grep -rn "getLanguageName" --include="*.service.ts" — returns full matching lines with surrounding context.

Approach	Tokens
Native (grep)	6,386
CodeSift	3
Reduction	−99.9%

Example 2: File exploration workflow (`tree→st`)

An agent onboarding to a content management system:

get_file_tree("src/", compact=true) — gets the full directory structure in a flat list
search_text("article-generat") — finds the article generation logic

Native approach: find src/ -type f + grep -rn "article-generat" src/

Approach	Tokens
Native (find + grep)	5,852
CodeSift	2
Reduction	−99.9%

Example 3: Anti-pattern hunt (`pat→st→pat`)

An agent running a code quality check on a survey platform:

search_patterns("toBeDefined") — finds all weak assertions in tests
search_text(".toBeDefined(") — gets surrounding context for each match
search_patterns("toBeDefined") — refined pattern search after context

Native approach: grep -rn "toBeDefined" --include="*.test.ts" × 3 with manual filtering

Approach	Tokens
Native (grep)	928
CodeSift	0
Reduction	−100%

(Zero tokens because the pattern search returned empty — no matches in this codebase. CodeSift correctly returns nothing. Native grep still outputs headers, paths, and line numbers even with zero matches.)

Example 4: Investigative workflow (`st→cr`)

An agent searching for permission logic in an enterprise API:

search_text("CAN_MANAGE_ORGS") — finds the permission constant usage
codebase_retrieval(queries=[{type:"references", symbol_name:"CAN_MANAGE_ORGS"}, {type:"semantic", query:"organization management permissions"}]) — batches follow-up

Approach	Tokens
Native (grep + multiple reads)	8,911
CodeSift	63
Reduction	−99.3%

The pattern that emerges

The sequences with the highest reduction share a structure: structured tools bookend text searches. The agent uses symbol search or pattern matching to identify what’s relevant, then text search to find how it’s used. This avoids the biggest token waste in native workflows: reading full file contents to extract a few lines of information.

Pattern	Avg reduction	Why it works
Symbol + text	−83%	Compact symbol lookup (~15 tok/result) replaces grep’s full-line output
Pattern + text	−83%	AST matching returns structural hits, not string matches
Tree + text	−72%	Flat path list replaces `find` + `wc -l` combos
Retrieval + text	−34%	Batch queries save round-trips but responses are denser

The honest outlier

cr→st→cr→st only achieves −26% with a 58% win rate. This pattern represents exploratory investigation — the agent doesn’t yet know what it’s looking for. CodeSift’s advantage is largest when agents have structured intent. When truly exploring, both approaches are expensive.

In the 26% of runs where native won, the margin was small and the task was exploratory. In the 74% where CodeSift won, margins of 80–99% were common.

All benchmark data collected 2026-03-30 from 188 real agent sessions. Scripts available in the CodeSift repository.

Combo Flows: The 13 Tool Sequences Real Agents Use