Search Workflow comparison

search_symbols

Ranked symbol discovery with BM25F scoring, kind filtering, and three detail levels. Finds function definitions, not string occurrences.

-90%
Token reduction
rg with regex + context lines
Native baseline
9 vs 9
Calls (CS vs native)
5,700 vs 57,000
Tokens (CS vs native)

What It Is

search_symbols is best understood as a symbol discovery tool, not as a prettier grep. Its job is to answer “where is this function/type/class defined?” and return structured metadata about the match — not just a line of text that happens to contain the name.

When an agent greps for createRisk, it gets every mention: the function definition, every call site, every test that references it, every comment that names it. The agent then has to filter this mentally to find the definition. search_symbols queries an AST-derived index and returns only definitions, ranked by relevance, with their kind (function, interface, class, type), file location, and optionally the full source.

Detail Levels

The three detail levels control how much data comes back per result:

LevelWhat you getTokens per resultUse case
compactid, name, kind, file, line~15Discovery: “where is X defined?”
standard+ signature + truncated source~170Default: need to see the code
full+ unlimited source~300Deep read: need the complete function body

The compact level is 90% cheaper than standard. Use it when you only need locations, not code.

Token Budget Mode

Instead of guessing the right top_k, you can set token_budget:

search_symbols(repo, "auth", token_budget=3000)

This tells CodeSift “fill up to 3K tokens with the highest-ranked results.” It automatically adjusts how many results to return based on their individual size. This is more predictable than setting top_k=20 and hoping the total stays reasonable.

The Two-Phase Pattern

The most effective usage pattern is two-phase: discover, then retrieve.

Phase 1search_symbols with detail_level="compact" to find candidates. This costs ~15 tokens per result, so even 50 results fit in under 1K tokens.

Phase 2get_symbol or get_symbols to fetch the full source of the specific symbols you need. This costs more per symbol but you only pay for what you actually need.

This pattern avoids the common waste of fetching full source for 20 symbols when you only needed 3.

Benchmark Framing

The -90% token reduction deserves honest context. This is a workflow comparison, not an apples-to-apples tool swap. The native workflow is: grep for the function name, read context lines to find the definition, then read the file to get the full body. That is 9 calls across a typical session. search_symbols achieves the same outcome in the same number of calls but returns structured, ranked output instead of raw text — so each call wastes fewer tokens on irrelevant context.

The per-call savings come from returning only definitions (not call sites), and from the detail level system that lets you pay for exactly the information you need.

Key Parameters

  • query (required) — symbol name or pattern. Fuzzy matching supported.
  • kind — filter by symbol type: function, class, interface, type, variable, method, property.
  • file_pattern — glob filter. Always pass this when you know the file scope — it halves token output.
  • detail_levelcompact, standard, or full. Default standard.
  • token_budget — max tokens in response. Overrides top_k.
  • include_source — include source code inline. Default true for standard/full.
  • top_k — max results. Default 10.

Where Native Still Wins

Grep is faster for exact-match symbol lookup when you already know the file. If you know the function is in src/services/risk.service.ts, reading that file directly is faster than querying an index. search_symbols shines when you do not know where a symbol lives, or when you need to discover all symbols matching a pattern across the codebase.

Grep also wins for completeness. search_symbols ranks and limits results. If you need every function whose name starts with “create” across 4,000 files with guaranteed completeness, grep with a regex is the safer choice.

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.