How it works

Three pillars. One MCP server.

CodeSift sits between your AI agent and your codebase. It parses code into an AST index, ranks results with BM25F, and bridges to language servers — delivering 61% fewer tokens while giving agents capabilities grep cannot provide.

Without CodeSift

Agent rg "auth" — raw text matches
Agent Read 5 full files (20K+ tokens)
Agent rg "middleware" — more raw matches
Agent Read 3 more files, guess at structure
~80,000 tokens consumed
11+ tool calls, mostly noise

With CodeSift

Agent assemble_context("auth", level="L1")
CodeSift AST parse → BM25F rank → return signatures + types
Agent Gets curated context, reasons about it
~12,600 tokens consumed
1 tool call, structured output

The three pillars

1

AST Index

Tree-sitter parses every file into an abstract syntax tree. Symbols are extracted, scored by centrality (how often imported), and indexed with BM25F for fast, ranked retrieval.

12 languages supported
Incremental reindex (9ms per file)
Import graph for centrality scoring
File watcher for auto-updates
2

Semantic Search

Embeddings turn code into vectors. Questions like "how does authentication work?" find relevant code by meaning, not just matching keywords. Hybrid mode merges semantic + BM25 via Reciprocal Rank Fusion.

3 providers (Voyage, OpenAI, Ollama)
Hybrid: semantic + BM25 via RRF
Token budget: fill context window optimally
Conversation search over past sessions
3

LSP Bridge

When a language server is available, CodeSift proxies type-aware operations: resolved definitions, hover types, cross-file rename. Lazy start, 5-minute idle kill, zero overhead when unused.

6 servers (TS, Python, Go, Rust, Ruby, PHP)
Falls back to AST/index when unavailable
Type discovery: 50-200 tokens vs file read
Type-safe rename across all files

Two commands. Thirty seconds.

1

Install globally

$ npm install -g codesift-mcp

One binary. No cloud. No signup. MIT licensed.

2

Add to your MCP config

{
  "mcpServers": {
    "codesift": {
      "command": "codesift-mcp"
    }
  }
}

Works with Claude Code, Cursor, Codex, and any MCP client.

Code normally

Your AI agent automatically discovers 64 MCP tools. It uses search_symbols instead of grep, assemble_context instead of reading files, trace_route instead of guessing at endpoints. You don't change how you work. The agent works better.

Measured savings

From real benchmarks across 188 agent sessions and 603 combo flow runs

What the agent does Native (grep/read) CodeSift Savings
Search for a symbol definition ~57,000 tok ~5,700 tok -90%
Understand a feature ("how does auth work?") ~93,000 tok ~12,600 tok -86%
Trace an HTTP route end-to-end ~35,000 tok ~61 tok -99%
Scan for hardcoded secrets ~1.6M tok ~11,500 tok -99%
Find unused exports (dead code) 21 calls 1 call -82%
Real-world combo flows (13 sequences) 4.58M tok 1.86M tok -61%

Ready to try?

One binary. Zero cloud dependencies. MIT licensed.