Symbols Direct comparison

find_references

Where is X used? Distinguishes code references from comments, string occurrences, and import aliases.

−42%
Token reduction
rg -w (word-boundary)
Native baseline
1,248 vs 2,163
Tokens (CS vs native)

What it does

Answers “where is X used?” with semantic precision. Unlike grep, it distinguishes code references from string occurrences, comments, and import aliases. Returns structured output with reference types (call site, import, type annotation).

What find_references is actually competing with

find_references is not competing with a theoretical perfect native tool. It is competing with what agents actually do natively: search for a symbol name and inspect the results.

That baseline is closer to rg -w plus manual filtering of false positives, plus one or more follow-up reads.

Because of that, find_references may not always produce the most dramatic token win in the library. In some repos it can even return slightly more output than a narrow grep baseline, especially when it finds more valid references across more file types.

That is not necessarily a weakness.

The real evaluation question

  • Did the agent get a cleaner and more complete reference picture?
  • Did it need fewer follow-up steps to trust the answer?

For reference-finding tasks, completeness and structural precision matter almost as much as raw token count.

When to use

  • Understanding the blast radius of a change
  • Finding all consumers of a function or type
  • Pre-refactor impact assessment

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.