Chapter 11
Storage Follows Observation
A recurrence tells us which values depend on which earlier values. It does not automatically tell us how much history must be stored. Storage depends on two things together: what each point reads, and which points later code observes.
The Design Fork: Array Semantics or Family Semantics
If an indexed recurrence is treated immediately as an array allocation, the language has already chosen a storage policy. Every time point appears to need materialization because every time point has a name. That is simple, but it throws away an optimization opportunity before analysis begins.
The other choice is to treat the recurrence first as a family of values. The family has a dependency relation; storage is chosen later, after observations are known. This lets the same source definition support a rolling state, a full history, or a checkpointed evaluation depending on how the value is used.
Einlang’s recurrence notation takes the second route. Define the values first; commit to memory only after the program shows what it will observe.
This is a semantic claim first and an implementation hook second. The current compiler performs recurrence ordering and records lowered execution facts that backends can use, including vectorized and recurrence-aware execution paths. The book’s checkpointing examples describe the policy space those facts make available; they are not a promise that every checkpoint schedule is already a separate user-visible pass.
Consider:
let h[0] = init
let h[t in 1..T] = step(h[t - 1], x[t])
let final = h[T - 1]
The recurrence reads only one step back. If the only observed value is
final, an implementation can evaluate the sequence with a rolling slot:
previous -> current -> previous -> current
The full array h[0..T] does not have to exist as stored data, even though it
exists as a family of values in the source model. This is a distinction between
meaning and storage: the recurrence defines all of the values, but the runtime
may not need to retain all of them.
Ask for the same recurrence twice. First ask only for the last state. Then ask for a plot of every state. The recurrence did not change, but the storage obligation did. Observation is not an afterthought; it is half of the storage story.
A Small Execution Trace
Take four time steps:
h[0] = init
h[1] = step(h[0], x[1])
h[2] = step(h[1], x[2])
h[3] = step(h[2], x[3])
If the program only asks for h[3], the runtime can overwrite:
slot = h[0]
slot = step(slot, x[1])
slot = step(slot, x[2])
slot = step(slot, x[3])
This execution does not deny that h[1] and h[2] are meaningful source
values. It only says they do not need long-term storage for this observation.
That distinction is hard to make when the source program is already written as
mutation. The recurrence gives names to the whole family first; the storage
plan comes second.
Put differently, there are two valid views of the same definition:
source family: h[0], h[1], h[2], h[3]
runtime slots: previous, current
The first is the semantic object. The second is one legal storage plan under a particular observation.
When History Is Observed
Now change only the observation:
let all[t] = h[t]
The recurrence is the same. The dependency offset is still t - 1. But the
program now asks for every member of the family, so the implementation must
make the whole history available.
This distinction is important:
definition what values mean
observation which values are demanded
storage what the runtime must retain
In a loop, these concerns are often entangled with mutation. In a recurrence, they can be discussed separately. That separation is useful even when the final runtime still lowers the recurrence to a loop.
Print Is Observation Too
Observation is not only a model output. Debugging can observe a value as well:
let h[0] = init;
let h[t in 1..T] = step(h[t - 1], x[t]);
print(h);
Before the print, an implementation may have room to keep only a rolling
state if no later expression needs the full history. After print(h), the
program has asked to see the family. That request changes what must be
evaluated and materialized. A debugging statement is therefore part of
the storage story.
This is why printing is more subtle in a language of formulas. Printing is not only display. It is an observation boundary. It turns an expression, a recurrence family, or a derivative request into a concrete witness.
For a smaller example:
let z[i] = x[i] * x[i];
print(z);
print(shape(z));
The first line defines a family of values. The print asks the runtime to show that family. The shape print asks for a different witness: not the values, but the coordinate extent. Both observations reduce implementation freedom because the program has made a demand.
A Wider Window
Some recurrences need more history:
let y[0] = a
let y[1] = b
let y[t in 2..T] = y[t - 1] + y[t - 2]
If only the final value is observed, a two-value rolling window is enough. The
source makes that visible through the offsets t - 1 and t - 2.
If a later expression reads every y[t], the storage plan changes. The
meaning of the recurrence does not.
The Caveat
The largest backward offset is not a complete storage plan. It is only the local dependency window. Other forces may require more:
later code may observe intermediate states
reverse-mode differentiation may need saved activations
debugging may request a trace
checkpointing may trade recomputation for memory
Visible recurrence does not solve those trade-offs by itself. It makes the first fact explicit: what each point needs in order to be computed. Once that fact is visible, storage choices become compiler/runtime policy rather than guesswork hidden inside a loop.
Checkpointing as a Policy
Reverse-mode differentiation is the clearest case where the caveat matters. A training run may need intermediate states during the backward pass. Keeping every state is simple but memory-heavy. Recomputing every state is memory-light but expensive. Checkpointing chooses points in between: save some states, recompute segments as needed.
The recurrence notation does not choose the checkpoint policy. It gives the policy the dependency facts:
h[t] needs h[t - 1]
y[t] needs y[t - 1] and y[t - 2]
With those facts, the compiler or runtime can reason about legal recomputation. Without them, it has to recover the temporal structure from a loop body and mutation pattern.
This is why the chapter says storage follows observation, not “storage is solved by recurrence.” Observation includes ordinary outputs, derivative requests, debug traces, and profiling hooks. Each observation changes what must be retained or recomputed.
Which Observations Force Memory?
The largest backward offset in a recurrence gives the local history window for forward computation. But storage is not determined by that window alone. Later uses, derivative requests, and debugging observations can demand more storage than the recurrence itself. The source tells us the dependency; observation tells us how much of the family must become concrete.
Definition Before Storage
The storage discussion keeps the recurrence story honest. It would be too easy to say “visible dependency offsets imply storage optimization” and stop there. The truth is subtler. Visible offsets give the compiler a fact it can use, but storage is a policy decision under observation.
That distinction is important for production systems. A compiler should be able to use the one-step dependency of an RNN when only the final state is needed. It should also be able to keep all states when the output sequence is requested. It should be able to choose checkpointing when differentiation makes the memory/recompute trade-off worthwhile.
The source notation should not pretend those policies are all the same. It should expose the dependency relation clearly enough that different policies can be justified. Visible structure does not remove implementation choices; it makes their inputs explicit.
A useful warning follows from this: do not confuse a recurrence definition with an array allocation request. The notation:
let h[t] = ...
defines a family of values. It does not necessarily demand that every h[t]
be stored simultaneously. Whether the family is materialized depends on later
uses and runtime policy.
That distinction is one of the quiet benefits of a declarative source form. It lets the program describe the mathematical object first and lets the implementation decide how much of that object must become memory at once. The better the dependency information, the more room the implementation has to make that decision responsibly.
The practical advice is therefore conservative: never infer storage only from the surface existence of an indexed family. First read the dependency window. Then read the observations. Only then talk about materialization.
Pressure Test: Same Recurrence, Different Observations
Consider this recurrence. The dependency is simple; the storage question is not, because observation changes the contract:
let h[0] = init
let h[t in 1..8] = step(h[t - 1], x[t])
The dependency window is one step. That fact alone permits a rolling execution for a final-state query:
let final = h[7]
An implementation can keep a single live state:
slot = h[0]
slot = step(slot, x[1])
slot = step(slot, x[2])
...
slot = step(slot, x[7])
The source family still contains h[0] through h[7]. The storage plan does
not. The difference is legal because the observation only demands the last
member of the family.
Now add one line:
let trace[t] = h[t]
The recurrence definition has not changed. The dependency window is still one step. But the observation now demands every time point, so the runtime must make all eight values available. A rolling slot is no longer enough unless the runtime also emits or stores each value as it is produced.
A third observation changes the policy again:
let every_other[u] = h[2 * u]
Now not every time point is externally needed. A storage planner could retain only even states for the output while still using odd states transiently during computation. The source recurrence and the observation together determine the obligation.
This is the audit:
definition h[t] depends on h[t - 1]
observation final, trace, or every_other
policy rolling slot, full history, partial materialization
Checkpointing is a more complex version of the same separation. Suppose
a reverse pass needs intermediate states, but memory cannot hold all of them.
The dependency graph says which states can be recomputed from which earlier
states. A checkpoint schedule might store h[0], h[4], and h[7], then
recompute segments when the backward computation needs them. The recurrence
notation does not choose that schedule, but it gives the schedule a graph that
is explicit rather than recovered from mutation.
Debugging belongs in the same analysis. A line such as:
print(h);
is not harmless from the storage point of view. It observes the family. If a
compiler had planned to keep only the final state, the print changes the
contract. The user has asked for a witness of all h[t] values, so the
runtime must either materialize them or produce them in an observable stream.
This concrete audit prevents two opposite mistakes. The first mistake is to assume every indexed family is an array allocation. That gives up optimization too early. The second is to assume a small dependency window always means small memory. That ignores observations, debugging, and autodiff. The visible source form lets the program state the family first and lets implementation policy respond to what is actually demanded.
The Minimal Compiler Rule
A practical compiler can begin with a conservative rule:
dependency window gives what is needed next
observation set gives what must remain available
For h[t] reading h[t - 1], the next-step dependency is small. For
print(h) or let trace[t] = h[t], the observation set is large. For
let final = h[T - 1], the observation set is small. This rule does not solve
all memory planning, but it prevents the most common confusion: source
families are semantic objects, not automatic allocation demands.
Once that distinction is stable, more advanced policies can be discussed without changing the source meaning. Full materialization, rolling buffers, streaming output, and checkpointing are different answers to the same visible dependency and observation facts.