Einlang

9

Chapter 9 · The Shape of Thought

“A compiler is a program that reads a program written in one language—and says it again in another.”

— Adapted from SICP

Construction · How the compiler reads names


Part II showed that coordinate names survive composition—through functions, through time, through differentiation. Part III asks: can this be automated? This chapter constructs a compiler frontend that reads named coordinates, checks them against five rules, and lowers them to integers. It is the proof that the checks you have been doing by hand—tracing batch and class through softmax, subtracting coordinate sets at broadcast sites, verifying that function contracts match—can be done by a machine.

You just wrote a short Einlang program:

let x[batch, class] = softmax[class](logits[batch, class]);
let y[batch] = sum[class](x[batch, class] * labels[batch, class]);

Someone asks: what is the shape of y? You cannot run this code—there is no data. But you can still answer.

Trace the coordinates. logits has [batch, class]. softmax[class] preserves the shape—output is [batch, class], bound to x. Then sum[class] consumes class. The surviving coordinate is batch. y has shape [batch].

You just performed coordinate propagation in your head. You tracked each name—where it was introduced, how it flowed through the function call, whether it survived the reduction. You did not need data. You did not need to run the program. You needed only the names.

This—exactly this—is what a compiler must do. For every line, without data, without execution, it must answer: what is the shape of every tensor? Which coordinates survive each operation? Does the coordinate contract at each call site match the function’s declaration?

Before we see the machinery that answers these questions, let’s watch the questions themselves catch bugs.

The Wall

Imagine a detective’s wall. Five slots, each labeled with one rule. When a bug is found, it gets pinned to the slot of the rule that caught it.

Here is a broken program. It was written by a programmer who intended a linear layer with bias followed by softmax over classes. It compiles. It runs. It produces wrong results.

fn predict[class](x: [f32; batch, in], W: [f32; out, in], bias: [f32; out])
    -> [f32; batch, out]
{
    let logits[batch, class] = sum[k](x[batch, k] * W[out, k]) + bias[out];
    softmax[class](logits[batch, class])
}

Look at the program. Which names don’t match?

Rule 1. k is referenced on x and W, but both declare in, not k. The reduction introduces a ghost coordinate. Rule 1 pins: k is not a declared coordinate of x or W. The writer meant in.

Rule 2. Syntactically, k appears in all operands. Rule 2 passes—but the wrong coordinate is being consumed. Rules 1 and 2 together cover both “coordinate doesn’t exist” and “coordinate exists but isn’t used consistently.”

Shape analysis. The output declaration says (batch, class) but the reduction body produces (batch, out). class appears from nowhere. out is produced but not declared. The declared coordinates don’t match the coordinates that actually flow through the expression.

Rule 5. Return type says [f32; batch, out] but the body’s softmax[class](logits[batch, class]) returns [f32; batch, class]. The coordinate out in the return type doesn’t match class in the return value. Pinned.

Rule 3. bias[out] omits batch—a correct broadcast (bias is independent of batch). Recorded for the gradient. No bug.

Rule 4. No recurrence. Silent.

The corrected program:

fn predict[class](x: [f32; batch, in], W: [f32; class, in], bias: [f32; class])
    -> [f32; batch, class]
{
    let logits[batch, class] = sum[in](x[batch, in] * W[class, in]) + bias[class];
    softmax[class](logits[batch, class])
}

Three changes. sum[k]sum[in]. W[out, k]W[class, in]. Return type outclass. Every coordinate flows from declaration to use. Every name matches.

Not one of these bugs was a shape error. In a positional framework, sum(axis=1) on (batch, in) and (out, in) would produce (batch, out)—a valid shape. The code would run. The loss would descend. And the model would be computing a meaningless function.

Every one of these checks reads the same fact from every tensor: its coordinate layout—the ordered list of names the tensor carries. When you write x: [f32; batch, in], the compiler records layout(x) = [batch, in]. Rule 1 checks whether k appears in that list. Rule 3 subtracts lists to find broadcast axes. Rule 5 compares the call-site list against the function’s declared parameter list. The five rules are five queries against one data structure. The coordinate layout is the one fact every rule reads.

The five rules are the five ways a name can be wrong: it can refer to a non-existent coordinate (Rule 1), it can fail to appear where the operation requires it (Rule 2), it can broadcast silently without the record the backward pass needs (Rule 3), it can reference the future in a recurrence (Rule 4), or it can violate the contract of a function call (Rule 5). That’s it. Those are all the ways.


Derive it yourself. Here is a buggy program. Apply the five rules. Which rules fire? What does each rule catch?

fn forward[batch, out](x: [f32; batch, in], W: [f32; out, in], b: [f32; out])
    -> [f32; batch, out]
{
    sum[k](x[batch, k] * W[out, k]) + b[out]
}

Rules 1–5. Go.


But to apply these rules mechanically, the compiler needs a representation where names are preserved and operations are explicit. It needs an intermediate representation—a tree where every name is visible.

Before we build the tree, try this: write sum[k](A[i, k] * B[k, j]) on a piece of paper. Circle every name. Draw an arrow from each name to where it appears. Now erase the brackets and the sum. What’s left? The structure of the computation—two indices multiplied, one coordinate reduced away, two coordinates surviving. That structure is the IR. The names are its bones.


The IR Tree

Conceptually, the compiler’s IR can be understood as a tree of expressions where names are first-class:

A[i, j] + B[i, j]
(+ (index A (i j)) (index B (i j)))

Add a reduction:

sum[k](A[i, k] * B[k, j])
(reduction sum (k)
  (* (index A (i k)) (index B (k j))))

The reduction coordinate k is named, not numbered. A full declaration:

let C[i, j] = sum[k](A[i, k] * B[k, j])
(let-decl (output C (i j))
  (reduction sum (k)
    (* (index A (i k)) (index B (k j)))))

(output C (i j)) declares the surviving coordinates: i and j survive, k is consumed.

Three things the IR preserves: names (i and k remain names, never become axis=0), reduction targets ((reduction sum (k) ...) operates on k), and index patterns ((index A (i k)) matches what the source wrote). The IR has not translated your program. It has said it again, as a tree.

But the IR preserves more than the tree shape. Each node carries its coordinate set—the names that survive at that point. (index A (i k)) carries {i, k}. (reduction sum (k) ...) carries {i}k was consumed. The compiler computes these sets by walking the tree: index nodes introduce coordinates, reductions subtract them, additions merge them. This is exactly what you did by hand at the start of this chapter—tracing batch and class through softmax and sum, without data, using only the names. The compiler does it mechanically, node by node, for every expression in the program.


Lowering: Names Become Numbers

The tree passed every check. But it cannot be handed to a numerical backend. NumPy does not understand class. It needs axis=1.

Translating the analyzed tree into executable instructions is lowering. The mapping is deterministic: the compiler reads the coordinate layout stored on each tensor’s IR node—the same layout that the five rules consulted during checking. logits was declared [f32; batch, class], so its layout is [batch, class]. At lowering, the compiler walks the layout and assigns positions: batch → 0, class → 1. Then every index expression in the tree is rewritten: (index logits (batch class)) touches both axes; (reduction max (class) ...) becomes a reduction over axis 1. The layout is the map. Lowering is the lookup. The name is burned.

After lowering, the softmax conceptually becomes:

def softmax(logits):
    m = np.max(logits, axis=1, keepdims=True)
    e = np.exp(logits - m)
    return e / np.sum(e, axis=1, keepdims=True)

keepdims=True was not in the source. The compiler inferred it. Here is the principle, visible in the core loop at the end of this chapter:

The subtraction logits - max_result has two operands. logits carries {batch, class}. max_result—the output of max[class](logits)—carries {batch} because class was consumed. The coordinate sets differ: left has class, right does not. Set difference: {batch, class} - {batch} = {class}. The missing coordinate is class. The compiler records the broadcast—not as a flag the programmer writes, but as a requirement the coordinate structure demands. This is not a heuristic. It is the same set subtraction from Chapter 2, applied to the operand coordinate sets at every binary operation.

In the actual compiler, this check lives in the CoordinateGroundingPass: at every BinaryOpIR node, the left and right operand layouts are compared. When they are equal, the shared layout is stamped on the result. When they differ, no layout can be stamped—the coordinate structures are incompatible, and the mismatch is recorded as a broadcast that the lowering pass must resolve. The core loop at the end of this chapter reduces this logic to four lines: compare the sets, record the differences, return the union. The principle is the same. The implementation is more careful.

The same set-difference logic applies to reductions. sum[k](A[i, k] * B[k, j])k appears in both operands but not in the output C[i, j]. The compiler subtracts {k} from the body’s coordinate set to produce the output layout {i, j}. The coordinate k is a contracting dimension—shared by both operands, consumed by the reduction, absent from the output. The pattern (i, k) × (k, j) → (i, j) follows from the coordinate structure alone. No annotation needed. The compiler lowers this to a nested loop with a reduction over k. The names told it which axis to contract.


The Panorama: One Name, Five Forms

Here is softmax, in five simultaneous forms:

 max[class](logits[i, class])                          ← what you wrote
 (reduction max (class) (index logits (i class)))     ← what the compiler sees
 class: (range 0 n_class), reduction axis, Rule 2      ← what the compiler derives
 class → axis=1, reduction                                ← how the name becomes a number
 np.max(logits, axis=1, keepdims=True)                 ← what executes (conceptually)

Five forms. One name. The name class traveled through all five without changing its identity. It was written as class, preserved as class, verified as class, mapped from class to axis=1. At no point was it guesswork.

Now ask: if the positional version had a bug—if dim=-1 was normalizing over the wrong axis—at which of the five stages would that bug be caught?

Source: not caught. dim=-1 is a valid integer. IR: not caught. No names to verify. Analysis: not caught. No coordinate contract to check. Lowering: not caught. -1 maps correctly—it’s the choice of -1 that is wrong. Generated code: not caught. np.max(logits, axis=-1) is valid NumPy.

The answer: none of them. The positional bug is invisible to all five stages because the information that would expose it—the name of the coordinate—was never written down.

The Einlang bug is caught at Form 3. Analysis checks: does class appear in every operand of the reduction? Does it exist on the tensor? The bug surfaces before the program runs, at the stage where names are still names and the compiler can still reason about them.


Range Inference: Where Domains Come From

The constraint solver needs ranges. oh ranges over 0..output_height. Where does that range come from?

Sometimes the user declares it: oh in 0..output_height. But in the common case, they don’t. When you write let result[i] = data[i] * 2, the range of i is never stated. The compiler infers it.

The algorithm: find every array access where i appears as an index. For each access, look at the array’s declared shape at the position where i appears. If data has shape [N], then i < N. If i also appears in arr[i+1] and arr has shape [M], then i+1 < M, so i < M-1. Collect all inferred upper bounds. Take the minimum — the most restrictive one. That is i’s range.

When you write let result[i] = arr[i] + arr[i+1], the compiler finds two accesses:

  • arr[i]i < len(arr)i in [0, N)
  • arr[i+1]i+1 < len(arr)i in [0, N-1)

Intersection: i in [0, N-1). The compiler inferred that i cannot reach N-1 because arr[i+1] would be out of bounds. No annotation needed. The name i, appearing in two positions, carries enough information.

The entire algorithm is: sort accesses by expression complexity (direct i before i+1), back-compute a range from each, intersect. The coordinate-name philosophy makes this possible — every index variable appears literally in array accesses, so every range is inferrable from the shapes those arrays were declared with.


Index Arithmetic: The Constraint Solver

With ranges in hand, the compiler can check a harder problem: index arithmetic. input[b, ic, oh + kh, ow + kw] — the expression oh + kh must not exceed the input’s spatial extent. Given the ranges for oh and kh, is this checkable at compile time?

When the domain sizes are known statically, the answer is yes. The compiler’s constraint solver reads the index expression and the declared bounds, then solves for each index variable. The algorithm is pattern-matching on the expression tree.

Take the convolution index oh + kh. The compiler knows:

  • oh ranges over 0..output_height (from the output declaration)
  • kh ranges over 0..kernel_height (from the weight declaration)
  • The input has input_height (from the input declaration)
  • Constraint: oh + kh < input_height

The solver must verify that oh + kh < input_height for all valid oh and kh. It does this by solving for each variable in the worst case. For oh + kh, the maximum value is (output_height - 1) + (kernel_height - 1). If this maximum is less than input_height, the constraint holds. If the domains are known, the check is a single comparison.

But the solver handles more complex expressions. Consider (i * 2 + offset) < N. The solver pattern-matches the expression tree:

  1. Top level is addition (i * 2) + offset < N. Isolate the target: i * 2 < N - offset.
  2. Recurse into multiplication i * 2 < N - offset. Isolate: i < (N - offset) / 2.
  3. Base case i < bound. The range is [0, ceil(bound)).

The solver chains these rewrites automatically. Addition adjusts the bound by subtraction. Multiplication adjusts by division (with ceiling, so (a + b - 1) / b for safety). Division by a constant adjusts by multiplication. When the target appears with a negative coefficient—k - target < bound—the solver returns nothing: negative coefficients require a lower bound, and the solver only computes upper bounds. The pattern is recognized. It is declared unsolvable by the current pass.

The solver operates entirely on IR nodes, not Python integers. The bound N can be a dynamic expression like image.shape[0] rather than a compile-time constant. The ceiling division ceil(a / b) is implemented as (a + b - 1) / b at the IR level, so it works for any input size. This means the bounds inference produces symbolic ranges that are safe for any runtime shape.

The constraint solver is not a general-purpose theorem prover. It handles the patterns that appear in real tensor index expressions: linear combinations with positive coefficients, simple arithmetic (i*2, i/2, i+1, i-1), and the common stencil patterns from Chapter 7. For patterns it cannot solve—modulo, negative coefficients, non-linear expressions—the solver returns no range, and lowering fails with a compile error. The failure is a compile error, not a runtime surprise. The name is attached to the error, as it is to every error in this compiler.

What the solver proves: that a coordinate arithmetic expression, evaluated over its declared ranges, stays within the declared bounds of the tensor it indexes. What it cannot prove—correctness of the arithmetic itself—is not a failure of the solver. It is the boundary from Chapter 14. The solver narrows that boundary. Every expression it proves safe is one less degree of freedom for silent errors. Every expression it cannot prove is flagged before the program runs.


The Core Loop

The Wall presented five rules. Before reading the implementation: Rules 1 and 2 (undeclared coordinates, missing operands) require checking every index expression against its tensor’s declaration. Rule 3 (broadcast recording) requires comparing operand coordinate sets at every binary operation. Rule 4 (causality) applies only to recurrences. Rule 5 (contract matching) applies only at function call boundaries. Which of these does a tree-walking checker handle naturally? Which require additional machinery?

The entire compiler frontend fits in fifteen lines:

check(expr, env, errors):
  match expr:
    case Index(T, coords):
      for c in coords:
        if not declared(c, T, env):
          errors.push("undeclared", c, "in", T)
      return coords

    case Reduction(op, c, body):
      out = check(body, env + [c], errors)
      if c not in out:
        errors.push(c, "not consumed")
      return out - {c}

    case Add(left, right):
      L = check(left, env, errors)
      R = check(right, env, errors)
      for c in R:
        if c not in L:
          errors.push("broadcast", c, "into left")
      for c in L:
        if c not in R:
          errors.push("broadcast", c, "into right")
      return L | R

    case LetDecl(output, T, coords, body):
      check(body, env + coords, errors)
      return coords

    default:
      errors.push("unknown node", expr)
      return {}

Walk the tree. At each node, ask one question. If wrong, record it. If right, return the coordinate set.

Rules 1 and 2 are the Index case: check every coordinate against its tensor’s declaration. Rule 3 is the Add case: compare left and right coordinate sets, record the differences as broadcasts. Rule 5 is the LetDecl case: verify the function body’s coordinate set against the declared output. Rule 4—causality—is not in this loop; recurrence checking is a separate pass with its own fifteen lines. The five rules from the Wall map to five cases in the tree walk. Four of them are above.

Lines 11–16 are the broadcast merge. When Add(left, right) is checked, every coordinate on the right absent on the left means the left operand must broadcast into it—and vice versa. The Add node doesn’t need to know what operation it’s checking—only that both sides contribute coordinate sets and the broadcast relationship must be recorded.

The complexity is in the details—type inference, pack resolution, error message formatting. The structure is fifteen lines. You can hold the entire thing in your head.

Fifteen lines. Read them again—slowly this time. The Index case checks that every coordinate in a bracket exists in its tensor’s declaration. The Add case compares two coordinate sets and records the broadcast. The LetDecl case verifies the body against the declared output. Four of the five rules fit in ten of those fifteen lines. The Wall—the five rules from Chapter 4 that felt like a detective story—is a tree walk with five cases. A first-year CS student could type it during a lecture.

The gap between “five abstract rules” and “fifteen lines of code” is the gap between notation and implementation. The notation made the rules visible. The implementation makes them mechanical. The names are the bridge.

The fifteen lines are the skeleton. Three additions make them a working compiler. Here are the first two. (Pack resolution — the third — is Chapter 5’s subject; you’ve already seen how the compiler resolves ..left and ..right around a named anchor.)

Type Inference: The Simplest Pass

Type inference is the simplest pass in the compiler. You already know how to do it — you just haven’t named it.

Look at 3.14 + 0.5. What is the type? f32. You didn’t run the program. You didn’t consult a type environment. You looked at two float literals, added them in your head, and concluded the result is a float. The compiler does exactly what you did. It just does it systematically, node by node, from leaves to root.

The leaves are the easy part. A literal carries its type — 3.14 is f32, 5 is i32, true is bool. You see it instantly. The compiler sees it the same way: the literal node in the IR has a type field. Done.

A variable inherits its type from its declaration. You wrote let x: [f32; batch, in] = input(). The f32 is right there in the source. When the compiler encounters x in an expression, it looks up x’s declaration and reads the type. No inference. No guessing. The answer was written down.

Now the interior nodes. A binary operation: left + right. You know the rule without being taught it: both sides must be the same type, and the result is that type. Float plus float is float. Integer plus float is an error — not because the hardware can’t do it, but because the programmer probably didn’t mean to mix them. The compiler enforces this. If left is f32 and right is i32, it reports a type mismatch rather than silently promoting. No implicit coercion. No 1.0 + 2 = 3.0 surprises.

A reduction: sum[k](body). The sum iterates over k and accumulates values. The element type doesn’t change — summing f32 values produces an f32. Reducing i32 produces i32. The reduction changes the shape (consuming k). It doesn’t touch the type.

A conditional: if cond { a } else { b }. cond must be bool. a and b must have the same type. The result is that type.

That’s the entire algorithm. Five cases. Walk the tree bottom-up. At each node, look at the children’s types. Apply one rule. Propagate upward. When you reach the root, every node has a type.

Walk a real expression to see it in action:

let z[b, out] = sum[in](x[b, in] * W[out, in]) + b[out] * 0.5;

Start at the leaves. x is f32 (from its declaration). W is f32. b is f32. 0.5 is f32 (literal).

Move up. x * W: both f32, result f32. sum[in](...): element type unchanged, f32. On the other branch: b * 0.5: both f32, result f32.

Move up again. sum_result + scaled_bias: both f32, result f32. z is f32.

Every type in this program is known before a single value is computed. The declarations at the leaves — x: [f32; ...], W: [f32; ...] — determine every type in the tree. No dtype annotations on operations. No astype calls in the generated code. Twelve lines of code. Zero ambiguity.

Error Formatting: The Name in the Message

The checker found an error. What does the programmer see?

A positional error says: IndexError: dimension 3 out of bounds. The programmer counts dimensions. Guesses which one. Guesses why.

A named error says:

error[E004]: coordinate not declared
  → src/model.eln:42:18
   |
42 |     let result = sum[class](logits[batch, class]);
   |                      ^^^^^ coordinate `class` not found in `logits`
   |
   = logits declared with coordinates: {batch, feature}
   = did you mean `feature`?

Four parts. Each has a job.

The error code (E004). Assign each of the five rules a number. The programmer who has seen E001 (undeclared coordinate) three times recognizes it on the fourth. The code is for muscle memory.

The location (src/model.eln:42:18). File, line, column. The caret (^^^^^) points to the exact span — sum[class], five characters, underlined. The programmer’s eye lands where the compiler’s eye landed.

The message. One sentence. Names the coordinate (class), names the tensor (logits), states the mismatch. Not “coordinate contract violation.” Not “broadcast inconsistency.” Plain words: “coordinate class not found in logits.”

The context. logits declared with coordinates: {batch, feature}. The compiler shows what it expected and what it found. The difference is visible: class vs feature. The programmer sees the typo instantly.

Now a broadcast error:

error[E003]: broadcast inconsistency
  → src/model.eln:67:10
   |
67 |     let out[b, c, h, w] = x[b, c, h, w] + scale[c];
   |                                             ^^^^^^^^ coordinate `h` required by output but missing from `scale`
   |
   = output coordinates: {b, c, h, w}
   = scale coordinates: {c}
   = missing: {b, h, w}
   = scale broadcasts over {b, h, w} — is this intended?

The missing coordinates are enumerated. The question at the end — “is this intended?” — is not rhetorical. It is the broadcast self-audit from Chapter 4, framed by the compiler. The programmer reads the missing set and decides: yes, scale should be silent on b, h, and w, or no, scale should also depend on h. The compiler cannot answer the semantic question. It can only ask it. But the question has a place to land — the missing coordinate set, computed by set subtraction, printed with names.

The error formatter is forty lines of code. A third of it is the caret pointing. The rest is formatting the coordinate sets, looking up the declaration, and suggesting the nearest name. The principle: every error names the coordinate, shows the context, and asks a question the programmer can answer. The compiler is not the authority. It is the messenger. The names are the message.

If you implement these fifteen lines and add the three additions — type inference (twelve lines), pack resolution (Chapter 5’s algorithm), and error formatting (forty lines) — you have a working coordinate checker. Total: roughly three hundred lines. The fifteen lines above are the skeleton. The three additions are the flesh. The names are the firewood.


The Checker in Action: A Chained Reduction

The fifteen-line checker above is abstract. Here it is on a concrete expression — a nested reduction drawn from the Einlang standard library of examples:

let W[i in 0..4, k in 0..2] = i as f32 + k as f32;
let X[k in 0..2, h in 0..3] = k as f32 * 10.0 + h as f32;
let nested = max[h](sum[k](W[i, k] * X[k, h]));

W has coordinate set {i, k}. X has {k, h}. The expression max[h](sum[k](W[i, k] * X[k, h])) has two reductions, one inside the other. Watch the 15-line checker walk it.

Step into the inner reduction: sum[k](W[i, k] * X[k, h]).

The body is the product W[i, k] * X[k, h]. The checker reaches the * node — it’s an Add-like binary operation (the checker treats * the same as +: compare sets, record broadcasts, return union). Left operand W[i, k] has set {i, k}. Right operand X[k, h] has set {k, h}. The checker compares: k is in both — shared, contracting. i is only on the left — broadcast recorded. h is only on the right — broadcast recorded. Union: {i, k, h}.

The Reduction case fires for sum[k]. The body returned {i, k, h}. The reduction consumes k: {i, k, h} - {k} = {i, h}. This is the coordinate set of the inner sum.

Step into the outer reduction: max[h](inner_result).

The inner result carries {i, h}. The Reduction case fires again. The reduction consumes h: {i, h} - {h} = {i}. This is the coordinate set of the full expression.

Verification. The declared output nested has coordinate i. The LetDecl case compares: does the body’s coordinate set {i} match the declared output? {i} = {i}. Yes. Four cases — Index, BinaryOp, Reduction, LetDecl — processed the entire nested expression. Three lines of code each. The program passes every check.

Walk the trace backward. k was consumed by the inner sum. h was consumed by the outer max. i is the sole survivor — it was present on W from the start, passed through both reductions untouched, and emerged as the output coordinate. The expression computes: for each i, take the maximum over h of the sum over k of W[i, k] * X[k, h]. The coordinate names recorded exactly what happened. The checker verified it mechanically. The names were both the specification and the proof.

Now ask: what would a positional checker verify? W has shape (4, 2). X has shape (2, 3). W @ X produces (4, 3). np.max(..., axis=1) produces (4,). The shapes match. The positional checker is silent. It has nothing to verify beyond shape consistency.

But the shapes are consistent because the coordinate names are consistent. k appears in both W and X — that’s why axis=1 of (4, 2) and axis=0 of (2, 3) can contract. h is the coordinate to maximize over — that’s why axis=1 of (4, 3) is the right axis. The positional code is correct because the named structure is correct. The names were the proof. The positions were the consequence of the proof. But only the proof survives audit.

A second pattern: chained independent reductions. Consider a different expression from the same example file — two reductions over the same coordinate name, on different tensors, combined element-wise:

let A[k in 0..2, n in 0..3] = (10 * k + n) as f32;
let B[n in 0..3, j in 0..4] = (n * 100 + j) as f32;
let chained = argmax[n](A[k, n]) as f32 * max[n](B[n, j]);

Walk the * node. Left: argmax[n](A[k, n]) — body returns {k, n}, reduction subtracts n, result {k}. Right: max[n](B[n, j]) — body returns {n, j}, reduction subtracts n, result {j}. BinaryOp merge: {k} ∪ {j} = {k, j}. Broadcast recorded for both directions — k absent from right, j absent from left.

The same coordinate n is consumed by two different reductions on two different tensors. The checker verifies that n exists on both A and B. It records that the argmax result broadcasts over j and the max result broadcasts over k. The element-wise multiplication of a (2,) tensor and a (4,) tensor produces a (2, 4) tensor. The broadcast relationship is explicit — computed by set subtraction, not inferred from shapes.

In a positional framework: argmax(A, axis=1) produces (2,). max(B, axis=0) produces (4,). Multiply them and NumPy broadcasts automatically. But nothing in the positional code records why the broadcast works — that n names the same domain in both tensors, that the domain sizes match, that the multiplication is semantically valid. The shapes happen to broadcast. The names guarantee that they should.

The fifteen-line checker traces both patterns — nested reductions and chained independent reductions — with the same four cases. Index introduces coordinates. Reduction subtracts. BinaryOp merges with broadcast recording. LetDecl verifies the output. The structure is identical. The complexity of “nested vs. chained” is not in the checker. It is in the expression. The checker doesn’t care whether reductions are nested or chained — it subtracts one coordinate set at a time, in tree order, and the result falls out.


You wrote class. Five characters. They survived parsing, analysis, lowering—each stage asking a question that a number could not. At the end, they became axis=1 and were burned. But the burn was correct because the name was verified.

The positional alternative is dim=-1: three keystrokes that enable zero checks. The ratio is the distance between correct-by-construction and correct-by-coincidence.

Consume—that word has appeared in every chapter since Chapter 2. A reduction consumes a coordinate. A broadcast consumes silence. A gradient consumes the broadcast set. And now the compiler consumes the name itself. class goes in. axis=1 comes out. A good abstraction is good firewood. Its beauty is not in its surface—but in the light the flame casts when it burns.

The checker has one more pass to learn: range inference, shape analysis, and lowering. After that, the names face the real test.