assemble_context
Given a topic, assembles a complete context bundle with relevant signatures, types, entry points, and import relationships. Four compression levels from L0 to L3.
The Problem With “I Need To Understand This Module”
Understanding a module means reading its exports, tracing its imports, seeing its type signatures, and understanding how it connects to the rest of the system. With native tools, that means a cascade of calls: grep for the module name, read the main file, grep for imports, read the imported files, check the types, read those too. Six or more tool calls that each return full file contents, most of which you do not need.
assemble_context replaces that cascade with a single call that returns exactly the information needed for comprehension, compressed to a level you choose.
Four Compression Levels
| Level | What’s Included | Typical Tokens | Best For |
|---|---|---|---|
| L0 | Full source code of matched symbols | ~6,900 (19 symbols) | Deep code review, debugging, editing |
| L1 | Signatures + docstrings only | ~5,000 (56 symbols) | Understanding flow, API surface, architecture |
| L2 | File summaries (export lists) | ~3,200 (61 files) | Module-level overview, dependency mapping |
| L3 | Directory overview | ~611 (18 dirs) | Navigation, orientation, first contact |
The default is L0, but L1 is the most commonly useful level. At L1, the same token budget fits three times more symbols than L0. You see every function signature, every parameter type, every return type, and every docstring, without the implementation bodies that dominate token counts.
L1 Is Usually the Right Choice
When the goal is understanding rather than editing, implementation details are noise. A function signature tells you what it does. The body tells you how. For comprehension tasks, assemble_context at L1 packs 56 symbols into the same space where L0 fits 19.
This matters because understanding a module typically requires seeing 30-50 symbols in context. At L0, that overflows any reasonable budget. At L1, it fits comfortably.
assemble_context(repo="local/my-project", query="payment processing", level="L1")
Benchmark
| Approach | Tokens | Calls | Coverage |
|---|---|---|---|
| grep + read files | 93,000 | 6 | Partial (you stop reading eventually) |
assemble_context L1 | 12,600 | 1 | Complete (all relevant symbols) |
The native approach produces 93K tokens because each file is read in full, including comments, blank lines, and unrelated functions. Most of that content is irrelevant to the question being asked. assemble_context returns only symbols semantically related to the query, already ranked by relevance.
When It Is the Right First Move
Use assemble_context when you need to understand a topic or module before acting. It is the right tool when:
- You are about to refactor a subsystem and need to see all its public interfaces
- You need to understand how authentication works across the codebase
- You are reviewing a PR that touches a module you have not worked with before
- You want to map the API surface of a library before integrating with it
For single-symbol deep reads, get_symbol or get_context_bundle are more precise. For finding a specific string pattern, search_text is faster. assemble_context fills the gap between “I need one thing” and “I need to understand everything about this area.”
L3 for Orientation
When starting on an unfamiliar codebase, L3 gives you the directory structure and module organization in about 600 tokens. That is enough to know where to look next without reading a single file.
assemble_context(repo="local/my-project", query="project structure", level="L3")
From there, narrow down with L1 on the specific module you care about.
Related tools
Benchmark note
This benchmark compares CodeSift against the closest practical native workflow an agent would use for the same task.
For some tools, that baseline is a direct shell equivalent such as rg or find.
For AST-aware, graph-aware, and LSP-backed tools, the baseline is a multi-step workflow rather than a strictly identical command.
Results should be read as agent-workflow comparisons: token cost, call count, and practical context efficiency.