Retrieval Workflow comparison

`codebase_retrieval`

Name: CodeSift
Author: CodeSift

Batch multi-modal queries: text, symbols, patterns, semantic, hybrid, conversation. Token budget mechanism fills context with highest-signal results.

−77%

Token reduction

3-5 sequential search calls

Native baseline

1 vs 3

Calls (CS vs native)

9,200 vs 40,000

Tokens (CS vs native)

What it does

CodeSift’s most powerful retrieval tool. Accepts an array of queries of different types and returns combined results within a shared token budget.

Supported query types:

Type	Equivalent to
`text`	`search_text`
`symbols`	`search_symbols`
`patterns`	`search_patterns`
`semantic`	Embedding-based conceptual search
`hybrid`	Semantic + BM25 via Reciprocal Rank Fusion
`conversation`	`search_conversations`

Example

{
  "queries": [
    { "type": "text", "query": "TODO", "file_pattern": "*.ts" },
    { "type": "symbols", "query": "create", "kind": "function" },
    { "type": "semantic", "query": "how does caching work?" },
    { "type": "conversation", "query": "why we chose Redis" }
  ],
  "token_budget": 10000
}

Benchmark

Codebase	Native (3 calls)	CodeSift	Reduction
CLI tool (382 files)	13,022 tok	2,996 tok	−77%
i18n platform (1,200+ files)	15,263 tok	3,672 tok	−76%
Full-stack app (4,127 files)	11,767 tok	2,483 tok	−79%

Token budget mechanism

You tell the tool how much context window you can afford. It fills that space with the highest-signal results across all query types, proportionally. Native tools return everything and let the model figure it out.

Sequential hints

After 3+ consecutive search calls on the same repo, CodeSift automatically prepends a hint suggesting codebase_retrieval to batch them. This nudges agents toward more efficient patterns.

When to use

You need 3+ different types of information
Complex investigation tasks
“Find everything related to X” queries
Token budget is constrained

When to use something else

Single-purpose search → use the specific tool (search_text, search_symbols)
Just need a file outline → get_file_outline
Understanding a feature → assemble_context may be simpler

Appears in combo flows

This tool appears in 4 of the 13 most common agent sequences benchmarked across 188 real sessions.

search_text → codebase_retrieval codebase_retrieval → search_text search_text → codebase_retrieval → search_text codebase_retrieval → search_text → codebase_retrieval → search_text

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.

codebase_retrieval