The Cost of Forgetting

A developer joins a team and spends three weeks building a retry mechanism. A version of it already exists in a utility module written two years ago by someone who's since left. Nobody remembers. The new version is slightly different, slightly worse, and now there are two.

This happens constantly. Not because teams are careless, but because institutional memory is stored in an ungreppable medium: people's heads. When someone leaves, their knowledge of "where things are" and "why we did it that way" goes with them. What remains is code — which answers what but usually not where or why.

What forgetting actually costs

The visible cost is duplicated work. The invisible cost is worse: changes made without understanding their blast radius.

Consider a shared library used by five downstream projects. A mixin in that library — say, AccessControlQuerySetMixin — is imported by 50+ models across those projects. Change the mixin's interface, and you've broken things in repos you may not even know depend on it. The knowledge of that dependency web is institutional. It lives in the head of whoever originally designed the architecture. When they leave, the web is still there. The knowledge of it isn't.

This is a specific instance of a general problem: code is a graph, but teams navigate it as a collection of files. Imports cross repository boundaries. Design decisions propagate through dependency chains. The structure is there — it's just not visible.

Making memory computable

Code-smriti is an attempt to solve this. The name is Sanskrit — smriti (स्मृति) means "memory, that which is remembered." It indexes code repositories and makes their contents searchable by meaning, not just keywords.

The basic operation: ingest repositories, parse code into a hierarchy (repo → module → file → symbol), generate semantic summaries at each level using LLMs, embed those summaries as vectors, and store everything in a way that supports multi-level search. Ask "how do we handle authentication?" and get results across all indexed repositories, ranked by semantic relevance.

That's the foundation. The more interesting layer is what you can compute on top of it.

Dependency graphs and criticality

The system builds import graphs across repository boundaries. When project B pip-installs library A, their dependency relationship is captured — which modules in B depend on which modules in A, and how deeply.

Apply PageRank to this graph and you get criticality scores: a quantitative answer to "which modules matter most?" A module imported by many others scores high. A module imported by important modules scores higher. The damping factor prevents score concentration. What falls out is a ranked list of the components whose changes have the largest blast radius.

For a cluster of related repositories — a shared Django library and five projects that consume it — this looks like:

Rank  Module                         Score    Dependents
1     foundations.models              0.0345   36
2     contacts.models                 0.0262   37
3     associates.models               0.0237   38

Those numbers replace the institutional knowledge that used to be in someone's head: "be careful changing foundations — everything depends on it."

Affected test detection

The dependency graph also answers the inverse question: given a set of changed files, which tests should run? Trace the graph from the changed modules outward through all transitive dependents, filter to test modules, and you have a targeted test list. Not "run everything" — run these, because these are what the change could have broken.

Memory in the loop

The system exposes its capabilities as MCP tools — the Model Context Protocol that lets AI assistants call external services. This means the memory isn't a dashboard you check occasionally. It's ambient context available in every conversation with your AI coding assistant.

When I'm working on a codebase and need to understand how authentication works across projects, the assistant can search the index, drill into a module, fetch the actual code, and check what depends on it — all within the same conversation. The institutional memory is in the loop, not on a shelf.

189 repositories. 131,000 documents. A career's worth of code across multiple companies and domains, indexed and searchable. The system remembers what I've built, how it connects, and what depends on what — even when I don't.

The real problem

The vocabulary problem described knowledge stuck in a pre-linguistic state: practitioners know things they can't articulate. Institutional codebase knowledge is a specific case. The knowledge of "where we solved this" and "what breaks if I change that" is real, it's valuable, and it evaporates.

The fix isn't documentation — nobody maintains it. It isn't onboarding — it's too slow and incomplete. It's making the structure that's already in the code visible and queryable. The answers are in the import graph, in the commit history, in the patterns that recur across repositories. They just need to be surfaced.

The cost of forgetting isn't dramatic. It's mundane. It's three weeks spent building something that already exists. It's a deploy that breaks a downstream project nobody knew depended on the change. It's the slow erosion of context that makes every codebase feel harder to work in than it should.

Memory is infrastructure. Treat it that way.