The Complexity Budget

complexityinformation-theorymeta-promptscapability-bounds

Here's the question that started this: is there some prompt X that will allow an arbitrary model M of any size to arrive at a solution Y, for all X, Y, and M?

The answer is no, and the reason is informative.

The impossibility

Consider a minimal model M₀ with complexity C(M₀) = k for some small k. Its state space is bounded by its parameters and precision. Now consider a solution Y* whose Kolmogorov complexity K(Y*) >> k.

M₀ cannot represent all possible solutions with complexity greater than its own description length.* No prompt X, regardless of how elaborate, can cause M₀ to output Y* — because M₀ must still process X within its bounded computational substrate. Even if X contains Y* verbatim, the model must parse and reproduce it, which requires C(M₀) ≥ K(Y*).

There is no universal prompt. Models have hard ceilings.

The feasibility condition

What we can state is a necessary condition for success:

C(M) + C(X) ≥ C(Y) + ε

where C(M) is the effective complexity of the model, C(X) is the complexity of the prompt, C(Y) is the complexity of the task's solution, and ε is computational overhead.

This implies a tradeoff surface. For a fixed task Y:

  • A simple model needs a complex prompt
  • A complex model can work with a simpler prompt
  • There's a minimum combined complexity below which no (M, X) pair succeeds

Each of M, X, and Y has a degree of complexity C(), and the right unit for measuring it is an open question. Parameter count is too coarse — a 7B model with strong inductive biases for code may have higher effective complexity for programming tasks than a 13B general model. More precisely, C(M) ≈ log₂(|distinguishable states of M|), but in practice we can't compute this. For prompts and solutions, Kolmogorov complexity is the theoretical ideal; compressed length in bits is a practical proxy.

The feasibility condition is simple to state. It is not, at present, simple to evaluate. But it constrains the shape of the problem in useful ways.

The interesting question

Given a task Y with estimable complexity, and a set of models M₀, M₁, M₂ in increasing complexity: what's the minimal model Mᵢ that can solve Y, and what prompt Xᵢ does it need?

The feasibility condition says the minimum prompt complexity for model Mᵢ is approximately:

C(Xᵢ) ≥ max(0, C(Y) - β·C(Mᵢ))

where β is an efficiency factor — how well the model's learned representations align with this particular task class. β varies by task type, which is why the gradient structure of a task matters: a steep gradient means the model's representations align well (high β), reducing the prompt burden.

Empirically, the relationship follows something like:

C(Xᵢ) ≈ C(Y) · (1 - log(paramsᵢ) / log(params_max))^γ

where γ ≈ 0.5-0.7. Doubling model size reduces required prompt complexity by roughly 30-40%. There's a phase transition where models become "capable" of a task class, and beyond that transition, larger models show diminishing returns for prompt simplification.

The meta-prompt probe

Here's where it gets practical.

If we want to find the minimal capable model, we could exhaustively test each model with many prompts. This is O(n·m) — expensive. But there's a shortcut that falls out of the theory.

Construct a meta-prompt Zₐ that asks model M: "Given task Y, what prompt would you need to accomplish this?" If the model can generate a coherent prompt X, it demonstrates that it can:

  1. Parse the task — encode Y's specification
  2. Model the mapping — understand how inputs relate to outputs
  3. Navigate the solution space — structure X appropriately

These are the same capabilities required to execute the task. The meta-prompt tests them all at once.

The conjecture: if a model cannot respond coherently to the meta-prompt, the task is outside its capability envelope. Meta-prompt failure is a reliable indicator of task impossibility for that model.

This reduces model selection from a search problem to a capability test — O(n) queries, one per model, walking up from smallest to largest until one passes.

Where this breaks

The meta-prompt probe is not a theorem. It's a conjecture with plausible reasoning and some failure modes worth naming.

Overconfident models generate plausible-sounding prompts for tasks they can't actually execute. The meta-prompt looks coherent but the suggested prompt doesn't work. This is partially addressable by validation — test the suggested prompt once — but it means the probe has false positives.

Underconfident models self-report inability for tasks they could handle with the right prompt. They say "I can't do this" when they could. This means the probe has false negatives, biased toward recommending larger models than necessary.

The self-reference problem. A model's knowledge of its own capabilities is necessarily incomplete — a Gödel-like limitation on self-modeling. The model cannot fully know its own boundaries, which means the probe is always approximate.

These caveats are real. But even an approximate, fast capability test is more useful than no test or an exhaustive search. The meta-prompt doesn't need to be perfect — it needs to be better than the alternative, which is usually guessing.

The complexity budget as a thinking tool

None of this is proven in the way a mathematician would accept. The feasibility condition is a conjecture informed by information theory. The efficiency factor β is empirically observable but not rigorously defined. The meta-prompt probe is an idea that needs systematic testing.

What the framework does provide is a vocabulary for reasoning about model selection that's more precise than "use the biggest model you can afford." There is a complexity budget. The task consumes some of it, the model supplies some, and the prompt covers the gap. Understanding that tradeoff — even approximately — is more useful than not having the frame at all.

The gradient observation from empirical testing is consistent with this theory: steep gradients correspond to high β (model representations align well with the task), reducing prompt burden and making small models sufficient. Shallow gradients correspond to low β, requiring either more model or more prompt.

The theory doesn't yet tell you how to measure C(Y) for a specific task. The empirical work doesn't yet validate the meta-prompt probe at scale. There's a clear line of investigation connecting the two, and it seems worth pursuing.


* The pigeonhole principle: if you have more items than containers, at least two items must share a container. Here, the model's finite state space is the container; the space of possible solutions is the items.