Debug Slow Indexing

A full canopy index on a typical project (1K–50K files) takes 5–60 seconds. If indexing is taking several minutes or hanging, one of a few common causes is usually responsible.

Step 1: Get a timing breakdown

Run the index with default verbosity. Canopy prints a single timing line per phase:

canopy index . --with-search

Expected output:

canopy: indexing /home/you/repos/myapp (incremental)
canopy: AST index done — 8432 files indexed, 12 skipped, 0 errors (42184 ms)
canopy: building full-text search index...
canopy: search index done — 8432 files, 18923 chunks, 18923 indexed (3470 ms)

The two done — ... (N ms) lines tell you which phase ran long. AST indexing dominates total time on most repos; if (N ms) is huge, the next steps narrow the cause.

For deeper per-phase tracing, enable RUST_LOG=debug to surface the internal tracing::debug! events. The exact field names and module paths depend on the build (canopy-indexer, canopy-ast, canopy-search) and on the tracing subscriber’s format — there’s no stable schema to grep against, so use the debug stream to scan for the file path that’s stalling rather than for fixed labels:

RUST_LOG=debug canopy index . --with-search 2>&1 | tail -60

Step 2: Check the file count

canopy status

If the files line is much higher than you expect, files are being indexed that shouldn’t be:

canopy status for /home/you/repos/myapp
  ...
  files       : 94832   ← suspiciously high for a mid-size project
  ...

Run the full debug log and look for the file discovery count:

CANOPY_LOG=debug canopy index . 2>&1 | head -5

If the discovery count is high, node_modules, dist, or another large directory is being traversed.

Step 3: Fix ignored_paths

The most common cause of slow indexing is large directories that should be excluded.

Add to .canopy/config.toml:

[index]
ignored_paths = [
  "node_modules",
  "dist",
  "build",
  ".next",
  ".turbo",
  "coverage",
  ".cache",
  "vendor",
  "**/*.min.js",
  "**/*.bundle.js",
  "**/*.map",
]

Canopy respects .gitignore by default — any file ignored by git is also ignored by Canopy. If node_modules is in .gitignore, you don’t need to add it to ignored_paths explicitly. Check:

git check-ignore node_modules

If this prints node_modules, git is ignoring it and Canopy will too.

If the directory is NOT in .gitignore but should be excluded from indexing, add it to ignored_paths.

Step 4: Exclude large generated files

Generated files — API type stubs, bundled JS, minified assets — are expensive to parse and rarely useful to index. Exclude them by pattern:

[index]
ignored_paths = [
  "**/*.generated.ts",
  "**/*.generated.js",
  "src/generated/**",
  "openapi/generated/**",
  "graphql/generated/**",
]

For very large files that slip through, set a file size limit:

[index]
max_file_size_bytes = 524288   # 512 KB — skips large lock files and bundles

Canopy’s default is 1 MB. Setting it to 512 KB filters most problem files while keeping all normal source files.

Step 5: Understand full vs. incremental

canopy index is incremental by default — it only re-parses files that have changed since the last index. After the first full index, subsequent runs are usually under 5 seconds for repos with normal commit sizes.

When indexing seems slow on every run (not just the first), you’re hitting a full re-index each time. Causes:

The .canopy/index/ directory is being deleted between runs (common in CI without caching)
A config change (ignored_paths, entry_points) triggers a rebuild
Using --full flag explicitly

To confirm the index is truly incremental, look at the first line canopy index prints — it explicitly names the mode:

canopy index .

Output for incremental (re-running on an existing index):

canopy: indexing /home/you/repos/myapp (incremental)

Output for full (first run, or after --full, or when no index is found):

canopy: indexing /home/you/repos/myapp (full)

If you see “full” on every run in CI, add index caching. See CI Cached Indexes.

Step 6: Watch mode overhead

canopy serve . --watch runs a file watcher in addition to serving MCP. On repos with frequent file changes (active development with hot reload, generated files changing on save), the watcher can trigger many small incremental re-indexes.

If the watcher is adding noticeable overhead:

# Remove --watch to disable
canopy serve .

Without --watch, the index is not updated while the server runs — it reflects the state when canopy serve started. Restart the server to pick up new changes.

Common pitfalls

Indexing completes but feels slow in practice The index itself may be fine but searches are slow. Run:

CANOPY_LOG=debug canopy search "test query" 2>&1 | tail -3

If search latency is high, the search index may be large. Trim ignored_paths to reduce it, or increase max_results to check if it’s a result pagination issue.

canopy index hangs indefinitely A file with unusual content (binary file misidentified as text, circular symlink in the directory tree) can cause the parser to stall. Kill the process, then run:

CANOPY_LOG=trace canopy index . 2>&1 | tail -20

The last few lines show which file it was processing when it stalled. Add that file or directory to ignored_paths.

Slow indexing only on CI (not local) CI runners typically have slower disk I/O than developer machines, and they start from a clean state with no index cache. See CI Cached Indexes to restore a cached index instead of re-indexing from scratch.