v1.22.x — Data-sovereignty closure, model-agnostic backends, and the Q2 launch-readiness sweep

Versions: 1.22.0 → 1.22.24

The v1.22.x series is the data-sovereignty closure release plus the Q2 launch-readiness sweep. v1.22.0 ships the three subfeatures that jointly close the v1.22.x roadmap pin’s three claims (data sovereignty, editor / environment-agnostic, model-agnostic). The remaining 24 patches are model-agnostic backend additions, the offline / air-gapped surfaces (canopy doctor, vendored dashboard assets, monthly cargo-update workflow, keypair-rotation runbook), language coverage growth (Godot / GDScript), and a long CI-greening cascade that surfaced once the Test job stopped dying in the linker.

API stability: every change in v1.22.x is additive. New backend variants, new CLI flags, new helper functions, new Language extractors. The v1.0.0 MCP-tool freeze remains in effect — no signature, schema, or behaviour change to the existing 31 MCP tools across the cycle.

v1.22.0 — Data-sovereignty closure (Bedrock + fingerprint check + —no-network)

Three subfeatures bundled into one minor because they jointly close the v1.22.x roadmap pin’s three claims.

Bedrock summarizer backend (`--summary-backend=bedrock`)

Targets Anthropic Claude on Bedrock via aws-sdk-bedrockruntime’s invoke_model API. SigV4 signing and the standard AWS credential-chain (env vars, ~/.aws/credentials, IAM role) are handled by the SDK. --summary-endpoint is reused as a region override (e.g. us-west-2); empty endpoint falls through to AWS_REGION / profile-region resolution. Bedrock has no defensible default model, so --summary-model must be set explicitly (e.g. anthropic.claude-haiku-4-5-20251001-v1:0). Gated behind a new bedrock cargo feature (off by default — adds aws-config + aws-sdk-bedrockruntime, ~30 transitive crates, ~20 MB binary).

Summary-backend fingerprint check

Every build_summary_index call refuses to extend an existing index when the active backend differs from the one that wrote the rows. Mirrors the existing embedder-fingerprint pattern. Heuristic and LLM summaries differ in length by 5–10× and produce vector norms that aren’t comparable; KNN scores against mixed rows are noisy. New forge_meta row summary_backend_fingerprint stores {backend, model}. Operators pass --rebuild-summaries (clean restart) or --allow-mixed-backends (escape hatch — logs warn, leaves prior fingerprint intact so subsequent runs keep warning).

`--no-network` strict mode

Compliance-grade switch (also CANOPY_NO_NETWORK=1) for hospital / bank / air-gapped operators who need an enforceable “nothing leaves the box” invariant before legal review. Refuses to start if any cloud summary backend is active OR any cloud-credential env var is set (OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_PROFILE, CANOPY_SUMMARY_API_KEY). Violations exit code 2 with a message naming the trigger. Two-leg enforcement: canopy index checks before any indexing work; canopy serve checks env vars at daemon start AND backend at watcher-init time.

`canopy stats <repo> --summaries`

Tier-A audit view. Prints total summary count, per-backend breakdown (heuristic vs cloud rows), and the stored fingerprint. Operators run this before / after a backend switch to verify the index converged on a single backend. JSON output (--json) returns the same data for scripting.

v1.22.1 — `canopy doctor` (pre-flight diagnostic)

Composes install info (binary path, version, data dir), license tier, would-activate summary backend (resolved from operator config), the data-flow row that applies to that backend, --no-network activation status, and cloud-credential env-var presence (without printing the values). Read before opening a support ticket / before legal review / before flipping --no-network.

canopy doctor does not phone home; it only inspects local state and the process’s own environment. Safe to run inside --no-network shells. Output is suitable for pasting into legal-review documents — the data-flow row mirrors docs/data-sovereignty.md verbatim.

Available as canopy doctor (human-readable) or canopy doctor --json (structured for scripting / dashboards). Default repo is the current directory; override with --repo PATH.

v1.22.2 — Editor integration recipes

One-pager per editor under docs/editors/ with a concrete config snippet, verification step, common gotchas, and a --no-network lockdown variant: Claude Code, Cursor, Zed, VS Code + Continue, VS Code + Cline, Aider (CLI bridge), and JetBrains (External Tool registration).

Each recipe ships a --no-network lockdown variant so compliance-sensitive operators can paste the strict-mode config directly. Aider and JetBrains entries are CLI-only because neither has shipped native MCP client support yet.

v1.22.3 — README first-pass rewrite

Restructured to lead with the prospect’s actual flow — Quick Start (3 commands) → Editor integration matrix → Privacy posture → What canopy gives the AI assistant → Features → Build from source. Day-1 callout now mentions the full backend lineup (Ollama, OpenAI, Anthropic, Bedrock) and the --no-network lockdown flag. Doc-only release; no code changes.

v1.22.4 — R2 cache key factors Tier-A summary state

Closes the v1.21.3 known limitation where two indexes for the same repo with different --with-summaries settings (or different summary backends / models) could collide on the same R2 cache slot. v1.21.3 fixed the symptom (asserting Tier-A presence on hit); v1.22.4 fixes the root cause (the cache key itself differentiates).

canopy_core::cache::cache_key gains a third parameter summaries_state: &str — opaque string identifying the Tier-A configuration, or "" when summaries are off. New summaries_state_for(with_summaries, backend, model) helper produces the canonical encoding (summaries:{backend}:{model}).

v1.22.5 — Unified Ollama endpoint resolution

New --ollama-host <URL> global CLI flag (and CANOPY_OLLAMA_HOST env var) retargets every Ollama-touching call site in canopy with one input. Previously the embedder, the summarizer, and the runtime probe path each had their own host-resolution logic. Single source of truth: canopy_search::runtime::ollama_host().

v1.22.6 — Godot / GDScript language support

Two new AST extractors land canopy coverage on a typical Godot project at ~95% (was ~5% — only the occasional .cs companion file was indexed):

gdscript.rs — tree-sitter-gdscript = "6.1.0"-based extractor for .gd files. Extracts functions, file-level class_name, inner classes, signals, exported variables (@export var ...), module-level constants, lifecycle hooks, and extends / preload(...) / load(...) import edges (with res:// paths flagged for the resolver).
godot_resource.rs — handles both .tscn (scene) and .tres (resource) via tree-sitter-godot-resource = "0.7.0". Emits a synthetic Module symbol for the file, Class symbols for [node ...] entries, and import edges for [ext_resource path="res://..."] references.

Language enum gains Gdscript and GodotResource variants; INDEXABLE_EXTENSIONS and team-config language matchers updated. 16 new tests including grammar-load probes that catch tree-sitter ABI drift before it ships.

v1.22.7 — OpenAI text-embedding-3 backend

Operators with existing OpenAI spend can use text-embedding-3-small (1536d) or text-embedding-3-large (3072d) for high-recall use cases. Set provider = "openai" in [embedding] config and export OPENAI_API_KEY. Mirrors the existing OllamaEmbedder shape (batch endpoint, dimensionality validation, fail-fast on key absence).

--no-network strict mode rejects this backend at startup if OPENAI_API_KEY is set in the process env, consistent with the v1.22.0 cloud-credential watchlist.

This patch closes Phase 8 of the v1.22.x roadmap pin — the full v1.22.x cycle (Phases 1–8) is shipped at this point.

v1.22.8 — `canopy ci --with-summaries` (team-cache aware)

The CI command exposes the --with-summaries flag, mirroring canopy index --with-summaries. When combined with --team, the auto-derived R2 cache slot is partitioned by the resolved summary backend and model. A --with-summaries run never reuses a no-summaries cached blob, and a no-summaries run never picks up a summaries-on blob. Closes the v1.22.4 known-limitation: that fix only protected hand-managed r2://... URIs; v1.22.8 extends the protection to cmd_ci’s auto-derivation.

v1.22.9 — Per-row model attribution on `file_summaries`

Schema bumps from v4 to v5 with a non-breaking ALTER TABLE file_summaries ADD COLUMN model TEXT NOT NULL DEFAULT ''. The summary writer captures the active backend’s model_name() at index time and stores it on every row, alongside the existing backend column. canopy stats --summaries now prints a per-(backend, model) breakdown alongside the per-backend breakdown; same-backend model swaps (e.g. Haiku → Sonnet) surface in the audit output and trigger a softer warning suggesting --rebuild-summaries.

Legacy rows written before v1.22.9 surface as the empty string and are flagged in the audit output as “(unknown — pre-v1.22.9 row)“.

v1.22.10 — Ollama daemon-restart resilience

A long-running canopy serve --watch session could hang for the per-call timeout budget (120 sec for embeddings, 120 sec for summaries) when the Ollama daemon was restarted out from under the watcher. Cause: reqwest’s default 90-second pool-idle window kept stale TCP connections alive, so the next request after a daemon restart could be sent down a half-open socket and stall.

The fix introduces a shared runtime::ollama_http_client(timeout) builder that sets pool_idle_timeout = 15s, tcp_keepalive = 15s, and connect_timeout = 5s on top of the caller’s per-call timeout. After a daemon restart the watcher detects the dead connection and reconnects within ~15 seconds rather than the prior 1–2 minute hang.

v1.22.11–v1.22.20 — CI-greening cascade

A long sequence of small fixes that surfaced once v1.22.13 fixed the Test job’s linker SIGBUS and let the rest of the test suite actually run to completion:

v1.22.11 / v1.22.12 — Workspace clippy 1.95 lints (useless_conversion, unnecessary_sort_by) across canopy-search, canopy-mcp, and canopy-cli.
v1.22.13 — The headline fix. Test job linker SIGBUS resolved by adding [profile.test] debug = "line-tables-only" to the workspace Cargo.toml; integration test binaries shrink by ~50–70%, comfortably within the GitHub Actions runner’s 7GB link memory budget. Panic backtraces still resolve to file:line. Also: rustfmt drift sweep, three RUSTSEC advisories triaged in deny.toml (rustls-webpki 0.101 chain pulled in transitively via aws-sdk-bedrockruntime → rustls 0.21), and consolidation of cargo audit into cargo deny check (no-redundancy cleanup).
v1.22.14 — env_file_smoke integration test ENOENTed on CI because .env was excluded by .gitignore. Added a scoped negation !canopy-ast/tests/fixtures/**/.env and committed the fixture (no real secrets — contrived data driving extractor coverage).
v1.22.15 — jupyter_smoke and protobuf_smoke integration tests called the license-gated parse_file and silently returned 0 symbols on Community-tier CI runs. Refactored both to call the per-language extractor directly (extractors::jupyter::extract, extractors::protobuf::extract), matching the pattern already used by env_file_smoke. The license gate is correctly applied to parse_file because AST parsing as a feature is paid; testing the extractor itself does not need to gate-check.
v1.22.16 — test_canopy_index_counts_files substring false positive on indices with 10+ files. The naïve !stdout.contains("0 files indexed") predicate fired on "10 files indexed". Replaced with explicit digit parsing.
v1.22.17 — 18 canopy-indexer integration tests failed in CI for the same reason as v1.22.15: run_full_index called the gated parse_file which returned empty results without a license. Added parse_file_internal (non-gated entry point for canopy’s own indexer pipeline) and refactored the indexer to use it. Side effect: canopy index now produces a populated index on Community-tier installs, which makes the Community-allowed query tools actually useful against a freshly-built index.
v1.22.18 — build_license returned features in declaration order, not sorted-and-deduplicated. The test_features_sorted_and_deduped assertion in canopy-license/tests/roundtrip.rs had been red since pre-v1.7.0 but never reached because earlier failures aborted the Test job. Added .sort() + .dedup() at the builder site so the serialised JSON is canonical.
v1.22.19 — All 14 canopy-mcp protocol tests skipped on Community-tier CI runners. The tests spawned canopy serve directly and parsed its JSON-RPC stdout; without a license, canopy serve exits with code 1. Added a has_canopy_license() helper and require_license!() macro to canopy-mcp/tests/protocol_test.rs that emits a SKIP: log line on Community-tier CI runs.
v1.22.20 — Rustfmt drift on the v1.22.19 macro. cargo fmt --all collapsed a hand-wrapped eprintln!(...) to one line. No behaviour change.

This cascade made all five Build CI gates green simultaneously for the first time since the clippy 1.95 / rustls-webpki advisory landings.

v1.22.21–v1.22.23 — Q2 launch-readiness patch trio

Three small patches that close out the canopy-repo half of the Q2 launch-readiness work.

v1.22.21 — Monthly cargo-update workflow + keypair rotation runbook

Monthly cargo-update workflow (.github/workflows/monthly-update.yml): runs at 03:17 UTC on the first of every month, snapshots Cargo.lock, runs cargo update --workspace, gates on cargo deny check + cargo test --workspace, and opens a PR labelled automated-maintenance if anything actually changed (no-op-quiet when the lock is unchanged).
Keypair rotation runbook + helper script (docs/runbooks/keypair-rotation.md + scripts/keypair-shard-regen.sh): the operator-facing procedure for rotating the Ed25519 license verification keypair (annual schedule + incident response). The companion script takes a new public key (hex) and emits the four key_shard_N() Rust constants ready to paste into canopy-core/src/license.rs, plus a JSON audit record so the obfuscation can be independently verified.

v1.22.22 — Real D3 + Chart.js vendored for offline dashboard, benchmark runner

Real D3 v7.9.0 + Chart.js v4.5.1 vendored, replacing the placeholder stubs at canopy-dashboard/static/vendor/. The dashboard now renders identically online and air-gapped — no CDN fetches at runtime. Closes the v1.6.0 known-regression where canopy serve --dashboard rendered blank graphs/charts in air-gapped builds. Adds ~488KB to the release binary’s text segment.
Benchmark runner script (scripts/run-benchmarks.sh): compares canopy_search against grep, ripgrep, and ast-grep on a pinned corpus. Discovers which tools are present in PATH and only benchmarks those (missing tools surface in the JSON as {"available": false, "reason": "..."} so a CI run on a minimal image still produces a partial-but-honest report). Methodology records corpus git SHA + commit date + host arch / cores so results are reproducible. Sample run on 500-file Rust+TS corpus: canopy ~3× faster than grep with 2× more matches surfaced via the symbol index.

v1.22.23 — Nightly D1 backup to R2, drop orphan license_bindings table

Nightly D1 backup to R2. canopy-license-webhook’s scheduled cron handler now dumps every D1 table to a JSON blob at nightly/<UTC-date>/canopy-licenses.json in a new R2 bucket (D1_BACKUP_BUCKET). Schema discovery is dynamic via sqlite_master, so future migrations are picked up without per-release backup-logic updates. Operator setup: wrangler r2 bucket create canopy-d1-backups. Backup failures log at error and continue so a failed backup never aborts the cron run; the operator alert path is monitoring R2 PUT volume in the Cloudflare dashboard.
Drop orphan license_bindings D1 table. Migration 0011 drops the table introduced in migration 0004 but never written to. Production code uses licenses.machine_bindings_json + binding_heartbeats (migration 0005) for actual binding records.

All canopy-repo Q2 boulder items are now shipped at this point. Remaining Q2 work (funnel instrumentation, integration guides, onboarding emails, launch content drafts, /security page, /benchmarks page) lives in the canopy-site repo.

v1.22.24 — `tools/list` ordering: meta-tools first

The MCP server’s tools/list response previously surfaced workflow tools first (canopy_prepare, canopy_validate, canopy_understand, canopy_survey) followed by drill-down tools, with the FIRST PASS triangulators (canopy_orient, canopy_investigate, canopy_triage) at positions 26 / 28 / 30 of 31. The server-instructions text (v1.21.9) already promotes orient / survey / investigate as the FIRST PASS for any new question on an unfamiliar repo, but agents that scan the tools list before reading the instructions text inherited a drill-first bias.

The new ordering groups tools by usage tier:

Meta / First Pass — canopy_orient, canopy_survey, canopy_investigate, canopy_triage
Workflow — canopy_prepare, canopy_validate, canopy_understand, canopy_probe
Search — canopy_search, canopy_search_symbols, canopy_pattern_search, canopy_extract_symbol, canopy_context, canopy_interface, canopy_parse_file
Graph — canopy_trace_dependents, canopy_trace_imports, canopy_dependency_graph, canopy_check_wiring, canopy_find_cycles, canopy_related_tests
Health / Meta — canopy_health_check, canopy_architecture_map, canopy_stats, canopy_index_status, canopy_reindex
Git / Ownership — canopy_git_history, canopy_git_blame, canopy_ownership
Ingest — canopy_ingest_scip, canopy_coverage

No tool signature, schema, or behaviour changed — only the array order in the JSON-RPC response. The frozen v1.0.0 MCP-tool contract is unaffected; clients that read by name are unchanged. The 14 protocol-test assertions still pass (they test for inclusion + count + schema validity, not order).

Compatibility notes for the v1.22.x series

API stability: all 31 MCP tool signatures unchanged. Every change is additive. New backend variants (bedrock, OpenAI embeddings), new CLI flags (--no-network, --allow-mixed-backends, --summaries, --ollama-host, --with-summaries on canopy ci), new commands (canopy doctor), new Language extractors (Gdscript, GodotResource), new helper functions (enforce_no_network_mode, summaries_state_for, ollama_host()). The v1.0.0 MCP-tool freeze remains in effect.
Index schema: file_summaries schema bumps v4 → v5 with a non-breaking ALTER TABLE adding the model column (forward-only and idempotent). forge_meta.summary_backend_fingerprint is a new key. Existing v4 indexes pick up the column on their next open without rebuild; existing rows surface with an empty model and are flagged in the cmd_stats --summaries output.
License-tier gating: unchanged. Bedrock summaries are Solo+ (same as Anthropic / OpenAI / Ollama). canopy doctor is available on every tier including Community.
Cargo features: bedrock (off by default — adds aws-config + aws-sdk-bedrockruntime, ~30 transitive crates, ~20 MB binary) is the only new opt-in.