Context Workflow comparison

`assemble_context`

Name: CodeSift
Author: CodeSift

Given a topic, assembles a complete context bundle with relevant signatures, types, entry points, and import relationships. Four compression levels from L0 to L3.

−86%

Token reduction

grep + read N files (6 calls)

Native baseline

1 vs 6

Calls (CS vs native)

12,600 vs 93,000

Tokens (CS vs native)

The Problem With “I Need To Understand This Module”

Understanding a module means reading its exports, tracing its imports, seeing its type signatures, and understanding how it connects to the rest of the system. With native tools, that means a cascade of calls: grep for the module name, read the main file, grep for imports, read the imported files, check the types, read those too. Six or more tool calls that each return full file contents, most of which you do not need.

assemble_context replaces that cascade with a single call that returns exactly the information needed for comprehension, compressed to a level you choose.

Four Compression Levels

Level	What’s Included	Typical Tokens	Best For
L0	Full source code of matched symbols	~6,900 (19 symbols)	Deep code review, debugging, editing
L1	Signatures + docstrings only	~5,000 (56 symbols)	Understanding flow, API surface, architecture
L2	File summaries (export lists)	~3,200 (61 files)	Module-level overview, dependency mapping
L3	Directory overview	~611 (18 dirs)	Navigation, orientation, first contact

The default is L0, but L1 is the most commonly useful level. At L1, the same token budget fits three times more symbols than L0. You see every function signature, every parameter type, every return type, and every docstring, without the implementation bodies that dominate token counts.

L1 Is Usually the Right Choice

When the goal is understanding rather than editing, implementation details are noise. A function signature tells you what it does. The body tells you how. For comprehension tasks, assemble_context at L1 packs 56 symbols into the same space where L0 fits 19.

This matters because understanding a module typically requires seeing 30-50 symbols in context. At L0, that overflows any reasonable budget. At L1, it fits comfortably.

assemble_context(repo="local/my-project", query="payment processing", level="L1")

Benchmark

Approach	Tokens	Calls	Coverage
grep + read files	93,000	6	Partial (you stop reading eventually)
`assemble_context` L1	12,600	1	Complete (all relevant symbols)

The native approach produces 93K tokens because each file is read in full, including comments, blank lines, and unrelated functions. Most of that content is irrelevant to the question being asked. assemble_context returns only symbols semantically related to the query, already ranked by relevance.

When It Is the Right First Move

Use assemble_context when you need to understand a topic or module before acting. It is the right tool when:

You are about to refactor a subsystem and need to see all its public interfaces
You need to understand how authentication works across the codebase
You are reviewing a PR that touches a module you have not worked with before
You want to map the API surface of a library before integrating with it

For single-symbol deep reads, get_symbol or get_context_bundle are more precise. For finding a specific string pattern, search_text is faster. assemble_context fills the gap between “I need one thing” and “I need to understand everything about this area.”

L3 for Orientation

When starting on an unfamiliar codebase, L3 gives you the directory structure and module organization in about 600 tokens. That is enough to know where to look next without reading a single file.

assemble_context(repo="local/my-project", query="project structure", level="L3")

From there, narrow down with L1 on the specific module you care about.

Benchmark note

This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task. For some tools, that baseline is a direct shell equivalent such as rg or find. For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command. Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.

assemble_context