The Toolkit, Recovered
Part IV of Latent Spaces, continued. After Zero-Inflated Intelligence.
The previous post framed hallucination as a zero-inflated distribution — a sample space with missing faces — and proposed tree-structured routing as the engineering response. This post asks a simpler question: what do tools like LoRA, RLHF, fine-tuning, distillation and quantization actually do?
The answer, once you look at it through the tree frame, is uniform. They're all pruning. They differ in what they cut and at what scale. But every major operation in the current LLM toolkit removes something from the tree.
Leaf-pruning
Fine-tuning, RLHF, LoRA, and DPO all do the same structural thing: clip leaves and renormalize.
Cut a leaf — suppress an output the model could previously produce — and the probability mass that used to flow redistributes across the surviving leaves that share branching paths with the removed one. The renormalization propagates upward: the fork that lost a child rebalances, its parent rebalances, all the way up the path. The tree's topology is unchanged. Its traffic pattern shifts.
The tools differ only in what selects which leaves to cut. Fine-tuning cuts leaves inconsistent with a target corpus. RLHF cuts leaves that a preference model penalizes — outputs humans rated poorly. LoRA does the same cutting but constrains the renormalization to a low-rank subspace, limiting how far up the tree the rebalancing propagates. DPO skips the explicit reward model and cuts directly from preference pairs. Different selectors. Same operation: clip, renormalize, repeat.
On a flat weight space, the renormalization is global. Every parameter participates in every output, so clipping a leaf in the medical region shifts branching probabilities at nodes shared with the legal region. Leaves you didn't intend to touch absorb mass or lose mass because the rebalancing propagates through shared parameters. This is why RLHF can reduce the global novelty of a model — suppressing undesirable outputs in one domain suppresses surprising outputs everywhere. The model becomes safer and blander in the same operation, not because blandness was the objective but because the substrate can't contain the renormalization.
On a tree, leaf-pruning would be branch-local. Clip medical leaves and the renormalization stays within the medical subtree. Legal leaves are untouched because no routing path from a medical fork reaches a legal leaf. The isolation is structural, guaranteed by the routing, not dependent on careful regularization or hoping the gradients don't interfere.
The deeper limitation: clipping leaves from one branch doesn't add leaves to a different branch. The total leaf count can only decrease or stay constant. Fine-tuning on medical data doesn't improve legal reasoning — it can only reshape medical outputs while potentially degrading legal ones through renormalization leakage. The tree never gets larger. It gets reshaped.
Terminal collapse
Quantization is a different operation. It doesn't clip individual leaves — it merges adjacent ones.
A full-precision tree has fine distinctions at its tips. The last few levels of branching separate outputs that are close but distinguishable — subtle differences in phrasing, precision in numerical answers, fine gradations of tone. Quantization collapses these terminal branches. Where there were four distinct leaves, there are now two. No leaf is selectively removed. No mass is redistributed. The mass from adjacent leaves pools into the merged leaf. The tree gets shorter at its tips.
The routing higher up is unchanged — you still reach the right region. But the last few forks are gone, and outcomes that the full-precision tree could distinguish are now the same outcome. This is why quantized models perform surprisingly well on benchmarks that test coarse capability — does the model get the right answer? — and degrade on tasks requiring fine discrimination.
The tree framing allows terminal collapse to be adaptive. Where the tree has dense leafward structure and the task requires fine discrimination, preserve the terminal branches. Where the tree is already sparse or the task is coarse, collapse aggressively. Why maintain sixteen-bit precision on a branch with three leaves? Specialist LLMs already do this implicitly — the model has higher effective resolution in training-heavy regions. Adaptive quantization would make the choice explicit: spend terminal precision where it matters, save it where it doesn't.
Branch-pruning
Distillation is the third operation, and the most dramatic: remove entire subtrees.
The method is to train a smaller model to reproduce the outputs of a larger one. Branches that the smaller model can't represent are dropped. The routing to those regions disappears. The remaining tree renormalizes at a high level — the node whose child was amputated redistributes all of that child's mass to its surviving children. But this isn't the leaf-by-leaf renormalization of fine-tuning. It's amputation.
The surviving subtree routes faster because there's less tree to traverse. It produces better outputs in its surviving regions because the capacity freed by dropping other branches is concentrated on what remains. A distilled model isn't worse at everything — it's better at some things and absent from others.
Each generation of distillation loses degrees of freedom that can't be recovered. The pruned branches don't leave scars — the smaller model has no trace of what was removed, no placeholder where the legal branch used to be. It simply doesn't have one. Recursive distillation — distilling from a distilled model — predicts monotonic shrinkage of capability space. Not necessarily of benchmark performance, which can improve through harder optimization within the reduced space. But the reachable space contracts. This is irreversible coarse-graining in Jaynes' sense: microstates collapse into macrostates. The second law says you don't get them back.
The inventory
Three operations, three scales, one direction.
Leaf-pruning clips individual outputs and renormalizes branching probabilities upward. Terminal collapse merges adjacent leaves into coarser ones. Branch-pruning amputates entire subtrees. All three are subtractive. The tree after any of these operations is smaller than or equal to the tree before.
No operation in the current toolkit makes the tree larger. No operation adds a branch to a region where there isn't one. No operation extends the routing into territory the tree has never covered. No operation deepens an existing branch with new leaves that carry new distinctions.
Growth — isolated, additive extension into new regions without disturbing what already exists — is the missing operation. The current toolkit can reshape, blur, and cut. It cannot extend. This is not a critique of the tools. They do what they do, and they do it well within their regime. But the absence is structural, not incidental. A flat, shared weight space cannot support isolated extension because it has no isolation. Every modification is global. Every addition is also a subtraction somewhere else.
What a tree-shaped architecture would need to support growth — and what growth would look like as an operation on a live, routing index — is a different question, and the subject of the next post.
