Retrieval Workflow comparison

codebase_retrieval

Batch multi-modal queries: text, symbols, patterns, semantic, hybrid, conversation. Token budget mechanism fills context with highest-signal results.

−77%
Token reduction
3-5 sequential search calls
Native baseline
1 vs 3
Calls (CS vs native)
9,200 vs 40,000
Tokens (CS vs native)

What it does

CodeSift’s most powerful retrieval tool. Accepts an array of queries of different types and returns combined results within a shared token budget.

Supported query types:

TypeEquivalent to
textsearch_text
symbolssearch_symbols
patternssearch_patterns
semanticEmbedding-based conceptual search
hybridSemantic + BM25 via Reciprocal Rank Fusion
conversationsearch_conversations

Example

{
  "queries": [
    { "type": "text", "query": "TODO", "file_pattern": "*.ts" },
    { "type": "symbols", "query": "create", "kind": "function" },
    { "type": "semantic", "query": "how does caching work?" },
    { "type": "conversation", "query": "why we chose Redis" }
  ],
  "token_budget": 10000
}

Benchmark

CodebaseNative (3 calls)CodeSiftReduction
CLI tool (382 files)13,022 tok2,996 tok−77%
i18n platform (1,200+ files)15,263 tok3,672 tok−76%
Full-stack app (4,127 files)11,767 tok2,483 tok−79%

Token budget mechanism

You tell the tool how much context window you can afford. It fills that space with the highest-signal results across all query types, proportionally. Native tools return everything and let the model figure it out.

Sequential hints

After 3+ consecutive search calls on the same repo, CodeSift automatically prepends a hint suggesting codebase_retrieval to batch them. This nudges agents toward more efficient patterns.

When to use

  • You need 3+ different types of information
  • Complex investigation tasks
  • “Find everything related to X” queries
  • Token budget is constrained

When to use something else

  • Single-purpose search → use the specific tool (search_text, search_symbols)
  • Just need a file outline → get_file_outline
  • Understanding a feature → assemble_context may be simpler

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.