diff --git a/SEMANTICS.md b/SEMANTICS.md new file mode 100644 index 0000000..8eacd22 --- /dev/null +++ b/SEMANTICS.md @@ -0,0 +1,1388 @@ +# Flux Language Semantics + +This document formalises the static semantics — primarily the **type system** — of Flux. + +Notation used throughout: + +| Symbol | Meaning | +| -------------- | ----------------------------------------------------------------- | +| `Γ` | Type environment (maps names to their type + mutability) | +| `Σ` | Struct table (maps struct names to their field lists) | +| `Φ` | Function table (maps function names to their signatures) | +| `Γ ⊢ e : T` | Under environment Γ, expression `e` has type `T` | +| `Γ ⊢ s ✓` | Under environment Γ, statement `s` is well-typed | +| `⊢ T wf` | Type `T` is well-formed | +| `T₁ ~ T₂` | Types `T₁` and `T₂` are compatible (assignable) | +| `T ⊑ U` | `T` can be implicitly promoted to `U` | +| `common(T, U)` | Least upper bound of `T` and `U` under `⊑`; error if none exists | +| `rank(T)` | Numeric rank of type `T` (width class; defined in §5) | +| `category(T)` | Numeric category of `T`: `Unsigned`, `Signed`, `Float`, or `Char` | + +--- + +## Outline + +The following sections form the complete type system, in dependency order — each section +relies only on those above it. + +1. [Types](#1-types) ✓ +2. [Well-Formedness of Types](#2-well-formedness-of-types) ✓ +3. [Type Environments](#3-type-environments) ✓ +4. [Mutability](#4-mutability) ✓ +5. [Numeric Type Hierarchy and Coercions](#5-numeric-type-hierarchy-and-coercions) ✓ +6. [Operator Typing](#6-operator-typing) ✓ +7. [Expression Typing](#7-expression-typing) ✓ +8. [Statement Typing](#8-statement-typing) ✓ +9. [Function Definitions](#9-function-definitions) ✓ +10. [Struct Definitions](#10-struct-definitions) ✓ +11. [Program Typing](#11-program-typing) ✓ +12. [Error Catalogue](#12-error-catalogue) ✓ + +--- + +## 1. Types + +The **type language** — the set of types that Flux values can have. + +### 1.1 Primitive types + +``` +T ::= u8 | u16 | u32 | u64 -- unsigned integers + | i8 | i16 | i32 | i64 -- signed integers + | f32 | f64 -- floating-point + | bool -- boolean + | char -- Unicode scalar value; representation-compatible with u32 +``` + +### 1.2 Numeric type categories + +Each numeric primitive belongs to exactly one **category**. Categories partition the +numeric types; implicit promotion never crosses a category boundary (see §5). + +| Category | Members | Notes | +| ---------- | ------------------------- | ---------------------------------------- | +| `Unsigned` | `u8`, `u16`, `u32`, `u64` | | +| `Signed` | `i8`, `i16`, `i32`, `i64` | | +| `Float` | `f32`, `f64` | | +| `Char` | `char` | Promotes into `Unsigned` only (see §5.3) | + +For the purposes of operator rules, the union categories are: + +- **Integer** = `Unsigned ∪ Signed` +- **Numeric** = `Unsigned ∪ Signed ∪ Float` + +`bool` and `char` are **not** members of `Numeric`; they are handled separately. + +### 1.3 Bit widths + +| Type | Width | Type | Width | +| ------ | ----- | ------ | ----- | +| `u8` | 8 | `i8` | 8 | +| `u16` | 16 | `i16` | 16 | +| `u32` | 32 | `i32` | 32 | +| `u64` | 64 | `i64` | 64 | +| `f32` | 32 | `f64` | 64 | +| `bool` | 8 | `char` | 32 | + +### 1.4 Compound types + +``` +T ::= ... + | *T -- immutable pointer to T (read-only; C's const T*) + | *mut T -- mutable pointer to T (read-write; C's T*) + | *opaque -- immutable untyped pointer (C's const void*) + | *mut opaque -- mutable untyped pointer (C's void*) + | [T; N] -- array of N elements of type T (N: usize, N ≥ 0) + | struct S -- named struct type (resolved through Σ) + | () -- unit type (no-value / void) +``` + +The `mut` qualifier on a pointer type describes the **pointer's access permission** +to the pointed-to data. It is orthogonal to binding mutability (§4). + +A shorthand used throughout: +- **typed pointer**: `*T` or `*mut T` (pointee type is known) +- **opaque pointer**: `*opaque` or `*mut opaque` (pointee type is erased) +- **mutable pointer**: `*mut T` or `*mut opaque` +- **immutable pointer**: `*T` or `*opaque` + +### 1.5 Function types (internal use) + +Function types do not appear in user-facing type annotations but are used internally by +the type checker when resolving calls. + +``` +FnSig ::= (T₁, …, Tₙ) -> T_ret +``` + +### Open questions — resolved + +- **`()` as first-class type**: `()` may appear explicitly in `-> ()` return-type + annotations and also arises implicitly when a function omits its return type. It is a + valid type in `⊢ T wf` (Wf-Unit, §2). +- **Pointer mutability**: settled in §4. Pointer mutability (`*T` vs `*mut T`) is + orthogonal to binding mutability and carried in the type itself. + +--- + +## 2. Well-Formedness of Types + +A type is **well-formed** (written `⊢ T wf`) if every struct name it references is +declared and all array sizes are non-negative. Well-formedness is checked against the +struct table `Σ`; pointer and array types impose structural constraints. Recursive struct +cycles through **direct** fields (without pointer indirection) are rejected here — the +rule is deferred to §10 where the full `Σ` is available. + +### 2.1 Inference rules + +``` +(Wf-Prim) ⊢ T wf for any T ∈ { u8, u16, u32, u64, + i8, i16, i32, i64, + f32, f64, bool, char } + +(Wf-Unit) ⊢ () wf + +(Wf-PtrImm) ⊢ T wf + ─────────────── + ⊢ *T wf + +(Wf-PtrMut) ⊢ T wf + ─────────────── + ⊢ *mut T wf + +(Wf-OpaqueImm) ⊢ *opaque wf + +(Wf-OpaqueMut) ⊢ *mut opaque wf + +(Wf-Array) ⊢ T wf N ≥ 0 + ────────────────────── + ⊢ [T; N] wf + +(Wf-Struct) S ∈ dom(Σ) + ────────────────────── + ⊢ struct S wf +``` + +Note on `Wf-Struct`: this rule only checks that the struct name is declared. The +requirement that all field types are themselves well-formed, and that the struct has no +infinite-size cycles, is established when the struct *definition* is checked (§10). + +### 2.2 Size finiteness (informal) + +A type is **sized** iff its in-memory representation has a fixed, finite byte count +determinable at compile time. All types in §2.1 are sized, **except** a `struct S` whose +definition contains a direct (non-pointer) field of a type that transitively includes `S` +again. Such cycles are detected in §10 and reported as E0900. + +Pointer types (`*T`, `*mut T`, `*opaque`, `*mut opaque`) always have a fixed pointer-width +size regardless of the pointee. Wrapping a recursive struct in a pointer breaks the cycle: + +``` +struct Node { next: *Node } -- OK: *Node has finite size +struct Bad { inner: Bad } -- error E0900: infinite size +``` + +--- + +## 3. Type Environments + +The **type environment** `Γ` tracks every variable binding that is in scope at a given +program point. It is an **ordered association list** — the most recently introduced +binding for a given name takes precedence. + +### 3.1 Grammar + +``` +m ::= imm | mut -- binding mutability + +Γ ::= ε -- empty environment + | Γ, x : (T, m) -- extend with binding x of type T and mutability m +``` + +Bindings are appended on the right. Lookup always returns the **rightmost** (most recent) +entry for a given name, implementing latest-binding-wins semantics. + +### 3.2 Lookup + +``` +(Lookup-Hit) ──────────────────────── + (Γ, x:(T,m))(x) = (T, m) + +(Lookup-Skip) x ≠ y Γ(x) = (T, m) + ───────────────────────────────────── + (Γ, y:(T',m'))(x) = (T, m) +``` + +Attempting to look up a name not present in `Γ` is undefined; the type checker must +report error E0100. + +### 3.3 Scope and block exit + +Scopes are not represented as a separate data structure — they arise naturally from how +the statement rules in §8 thread `Γ` through blocks. + +A block `{ s₁; s₂; …; sₙ }` is checked by extending `Γ` sequentially with each +statement's bindings, yielding a final `Γ_out`. On block **exit**, `Γ_out` is +**discarded** and the environment reverts to `Γ` (the environment at block entry). +This is written: + +``` +Γ ⊢ block ✓ ⇒ Γ_out but the caller uses only Γ, not Γ_out +``` + +The formal rule (T-Block) is given in §8. + +### 3.4 Shadowing and overriding + +Flux adopts **Rust-style re-binding semantics**: a `let` declaration always creates a +*new* binding regardless of whether the name already exists. The behaviour depends on +where the prior binding lives: + +#### Case A — override (same scope) + +Two `let x` declarations within the same block. Both bindings exist in the environment; +lookup returns the second (newer) one. The first is **permanently inaccessible** for the +remainder of that scope and any nested scopes. + +``` +// scope S +let x: i32 = 1; // Γ₁ = Γ, x:(i32, imm) +let x: f64 = 2.0; // Γ₂ = Γ₁, x:(f64, imm) + // = Γ, x:(i32,imm), x:(f64,imm) +// Γ₂(x) = (f64, imm) ← i32 binding is gone +``` + +On exit from scope S, both entries are discarded. + +The type of a re-bound name is allowed to change (e.g. `i32` → `f64` above). This is +intentional: re-binding is a distinct operation from *mutation*. + +#### Case B — shadow (nested scope) + +An inner `let x` in a nested block, where `x` is already bound in an enclosing scope. +The inner binding shadows the outer one inside the nested block. When the nested block +exits, the outer binding is restored. + +``` +let x: i32 = 1; // outer: Γ₁ = Γ, x:(i32, imm) +{ + let x: f64 = 2.0; // inner: Γ₂ = Γ₁, x:(f64, imm) + // Γ₂(x) = (f64, imm) ← outer i32 is shadowed +} +// back to Γ₁: Γ₁(x) = (i32, imm) ← restored +``` + +#### Summary + +| Situation | Old binding accessible? | After scope exit | +| --------------------- | ----------------------- | ---------------- | +| Override (same scope) | No — immediately hidden | Both removed | +| Shadow (inner scope) | No — hidden in inner | Outer restored | + +Both cases are modelled identically in the formal system — the environment is extended +with a new entry and lookup finds the newest. The difference is entirely about which +entries are *removed* when a scope exits: + +- **Block scope**: on exit, all bindings introduced *inside* that block are removed. +- Because scope exit is wholesale, an overriding binding in scope S is removed at the + same time as the overridden one — neither survives past S. + +### 3.5 Interaction with `Σ` and `Φ` + +Struct names (`Σ`) and function names (`Φ`) occupy **separate namespaces** from variable +bindings (`Γ`). A variable `foo` and a function `foo` may coexist; the expression typing +rules in §7 use `Γ` to resolve identifiers used as values and `Φ` to resolve calls. A +name used as a call target (`f(…)`) is looked up in `Φ`; a name used as a plain value is +looked up in `Γ`. There is no shadowing between these namespaces. + +--- + +## 4. Mutability + +Flux has two orthogonal axes of mutability: + +1. **Binding mutability** — whether a name can be rebound (assigned a new value). + Declared with `let` vs. `let mut`. +2. **Pointer mutability** — whether the data pointed to can be written through a pointer. + Declared as `*T` (immutable) vs. `*mut T` (mutable). + +These axes are fully independent; any combination is valid (see §1.4 table). + +### 4.1 Binding mutability + +A **binding** is introduced by `let` and associates a name with a storage location. + +``` +Γ, x : (T, imm) introduced by let x [: T] [= e] ; +Γ, x : (T, mut) introduced by let mut x [: T] [= e] ; +``` + +Function parameters follow the same rule: `[mut] name : T` in a parameter list. +Mutability on a parameter is **local** — the callee may freely rebind it; the caller +never observes the difference. + +### 4.2 Place expressions + +A **place** (lvalue) is an expression that denotes a memory location that can be read +from or written to. Not every expression is a place. + +``` +place ::= x -- variable + | * e -- pointer dereference (e must have a pointer type) + | e . f -- struct field access (e must be a place or struct value) + | e [ i ] -- array subscript (e must be a place or array value) +``` + +All other expression forms (literals, arithmetic, calls, etc.) are **not** places. + +### 4.3 Mutable places + +A place is **mutable** if it can be the target of an assignment. The predicate +`Γ ⊢ e mutable-place` is defined by the following rules: + +``` +(MP-Var) Γ(x) = (T, mut) + ───────────────────────────── + Γ ⊢ x mutable-place + +(MP-Deref) Γ ⊢ e : *mut T + ───────────────────────────── + Γ ⊢ *e mutable-place + +(MP-Field) Γ ⊢ e mutable-place Σ(S).f = T + ───────────────────────────────────── + Γ ⊢ e.f mutable-place + +(MP-Index) Γ ⊢ e mutable-place + ───────────────────────────── + Γ ⊢ e[i] mutable-place +``` + +Note the asymmetry in MP-Deref: dereferencing an *immutable* pointer `*T` yields a +place (it can be read) but **not** a mutable place (it cannot be written). There is no +MP-Deref rule for `*T`. + +### 4.4 Readable places + +Dereferencing any pointer (mutable or not) produces a place that can be **read**. +The predicate `Γ ⊢ e place` is a relaxed version of mutable-place used in expression +typing (§7) for the `*` unary operator: + +``` +(P-Var) x ∈ dom(Γ) + ───────────── + Γ ⊢ x place + +(P-Deref-Imm) Γ ⊢ e : *T + ───────────────── + Γ ⊢ *e place (readable, but not mutable-place) + +(P-Deref-Mut) Γ ⊢ e : *mut T + ───────────────── + Γ ⊢ *e place (readable AND mutable-place via MP-Deref) + +(P-Field) Γ ⊢ e place Σ(S).f = T + ──────────────────────────── + Γ ⊢ e.f place + +(P-Index) Γ ⊢ e place + ───────────────── + Γ ⊢ e[i] place +``` + +### 4.5 Address-of and pointer mutability + +The unary `&` operator produces a pointer. The mutability of the resulting pointer +is determined by the mutability of the place it is applied to: + +``` +(Addrof-Imm) Γ ⊢ e place ¬(Γ ⊢ e mutable-place) + ──────────────────────────────────────────── + Γ ⊢ &e : *T + +(Addrof-Mut) Γ ⊢ e mutable-place + ──────────────────────────────────────────── + Γ ⊢ &e : *mut T +``` + +Taking the address of a mutable place yields a `*mut T`; taking the address of a +read-only place yields a `*T`. This ensures that a `*mut T` cannot be obtained from a +non-mutable binding via `&`. + +### 4.6 Interaction summary + +| Source form | Binding mut? | Ptr mut? | Can rebind `p`? | Can write `*p`? | +| ---------------------------- | ------------ | -------- | --------------- | --------------- | +| `let p: *T = &x` | no | no | no | no | +| `let mut p: *T = &x` | yes | no | yes | no | +| `let p: *mut T = &mut x` | no | yes | no | yes | +| `let mut p: *mut T = &mut x` | yes | yes | yes | yes | + +### 4.7 Pointer subtyping + +A mutable pointer may be used where an immutable pointer is expected (covariant in the +immutable direction). The reverse is not permitted. + +``` +(Ptr-Coerce) Γ ⊢ e : *mut T + ────────────────── + Γ ⊢ e : *T -- *mut T is a subtype of *T +``` + +This is consistent with the promotion relation (§5): `*mut T ⊑ *T`. There is no +implicit coercion in the other direction; an explicit cast is required to obtain +`*mut T` from `*T`. + +--- + +## 5. Numeric Type Hierarchy and Coercions + +Flux supports **implicit widening promotion** for numeric types. Promotions are +one-directional (widening only), never cross a category boundary, and never lose +information. + +### 5.1 Rank + +Each numeric type has an integer **rank** that reflects its bit width within its +category. + +| Type | `rank` | Type | `rank` | +| ----- | ------ | ----- | ------ | +| `u8` | 1 | `i8` | 1 | +| `u16` | 2 | `i16` | 2 | +| `u32` | 3 | `i32` | 3 | +| `u64` | 4 | `i64` | 4 | +| `f32` | 1 | `f64` | 2 | + +`bool` and `char` have no rank; they are not members of `Numeric`. + +### 5.2 Promotion predicate `T ⊑ U` + +`T ⊑ U` holds — meaning a value of type `T` may be implicitly coerced to type `U` — +exactly when one of the following rules applies: + +``` +(Refl) T ⊑ T for any T + +(Unsigned) T ⊑ U if category(T) = category(U) = Unsigned + and rank(T) < rank(U) + +(Signed) T ⊑ U if category(T) = category(U) = Signed + and rank(T) < rank(U) + +(Float) f32 ⊑ f64 + +(Char) char ⊑ U if category(U) = Unsigned + and rank(U) ≥ 3 -- u32 or u64 +``` + +All other combinations are **forbidden**. In particular: + +- Unsigned → Signed is **forbidden** (even if the target is wider). +- Signed → Unsigned is **forbidden**. +- Any integer → Float is **forbidden**. +- Float → integer is **forbidden**. +- Narrowing (rank decreases) is **forbidden** in both directions. +- `char` → Signed is **forbidden**. +- `char` may only promote to `u32` or `u64` (rank ≥ 3 unsigned). + +### 5.3 char and u32 + +`char` is represented as a 32-bit unsigned scalar. It is **not** a synonym for `u32` — +they are distinct types — but a `char` may be implicitly promoted to `u32` or `u64` +when the context requires it (e.g. as an operand to integer arithmetic). + +The reverse (`u32 → char`) is **never** implicit; it requires an explicit cast, since not +every `u32` value is a valid Unicode scalar. + +### 5.4 Common type `common(T, U)` + +For binary operators that require matching operand types, the **common type** of `T` and +`U` is defined as their least upper bound under `⊑`: + +``` +common(T, U) = T if U ⊑ T +common(T, U) = U if T ⊑ U +common(T, U) = ⊥ otherwise -- type error: no common type +``` + +When `T ⊑ U` and `U ⊑ T` simultaneously (which can only occur when `T = U` by Refl), +`common(T, U) = T = U`. + +**Examples:** + +| `T` | `U` | `common(T, U)` | Notes | +| ------ | ----- | -------------- | ---------------------------------- | +| `u8` | `u32` | `u32` | u8 promoted to u32 | +| `u32` | `u8` | `u32` | symmetric | +| `i16` | `i64` | `i64` | i16 promoted to i64 | +| `f32` | `f64` | `f64` | f32 promoted to f64 | +| `char` | `u32` | `u32` | char promoted as u32 equivalent | +| `char` | `u64` | `u64` | char promoted to u64 | +| `u8` | `i32` | `⊥` | error: cross-signedness promotion | +| `i32` | `u64` | `⊥` | error: signed → unsigned forbidden | +| `u32` | `f32` | `⊥` | error: integer → float forbidden | +| `char` | `u16` | `⊥` | error: rank(u16)=2 < 3, below u32 | +| `char` | `i32` | `⊥` | error: char → signed forbidden | + +### 5.5 Literal type inference + +Integer and float literals have no intrinsic type. Their type is determined by: + +1. **Contextual typing** (checked mode): if the surrounding context expects a specific + numeric type `T`, the literal is given type `T` directly, without promotion. The + literal value must be representable in `T` (range check). +2. **Default type** (synthesised mode): when no context is available, literals are given + a default type: + - `INT_LIT` → `i32` + - `FLOAT_LIT` → `f64` + + These defaults apply only when type inference cannot determine a more specific type. + +### 5.6 Explicit casts + +Conversions not covered by `⊑` require an **explicit cast** (narrowing, cross-signedness, +integer↔float, u32→char, etc.). The cast syntax and its rules are not yet in the grammar +and will be specified separately. + +--- + +## 6. Operator Typing + +Each operator has one or more *typing schemes* describing the types it accepts and the +type it produces. + +### 6.1 Binary operators + +For operators that accept two numeric operands, the operands are first coerced to their +`common(T₁, T₂)` type (§5.4) before the operation is performed. If `common` is `⊥` the +expression is a type error. + +| Operators | Required operand types | Result type | Notes | +| ------------------- | --------------------------------------------------------------------------------------------- | ---------------- | ----------------------------------------------------------------------------------------------- | +| `+` `-` `*` `/` `%` | `T₁`, `T₂` where `common(T₁, T₂) ∈ Numeric` | `common(T₁, T₂)` | Both operands promoted to common type | +| `<<` `>>` | LHS `T` any integer; RHS `U` any **unsigned** integer | `T` | Only LHS type determines the result; no promotion of the LHS; RHS must be unsigned (see §6.1.1) | +| `&` `\|` `^` | `T₁`, `T₂` where `common(T₁, T₂) ∈ Integer` | `common(T₁, T₂)` | Both operands promoted to common integer type | +| `and` `or` | `bool`, `bool` | `bool` | Short-circuit; no promotion applies | +| `==` `!=` | numeric/bool/char: `common(T₁,T₂) ∈ Numeric ∪ {bool, char}`; or pointer equality (see §6.1.2) | `bool` | Structs and arrays are not comparable | +| `<` `>` `<=` `>=` | `T₁`, `T₂` where `common(T₁, T₂) ∈ Numeric` | `bool` | Both operands promoted to common type | +| `=` | LHS: mutable place of type `T`; RHS: `T₂` where `T₂ ⊑ T` | `T` | Result is the assigned value; enables chaining (see §6.1.3) | +| `op=` | Same as `=` after desugaring `lhs op= rhs` → `lhs = lhs op rhs` | `T` | LHS must be a mutable place | + +#### 6.1.1 Shift operator detail + +`a << b` and `a >> b` use **asymmetric typing**: + +- `a` may be any integer type `T`; `T` is also the result type. +- `b` must be any **unsigned** integer type (`u8`, `u16`, `u32`, `u64`). Signed or + other types are rejected (error E0401). +- The two operands are **never** subject to `common` — LHS and RHS types are checked + independently. The LHS is never promoted to match the RHS. + +``` +(T-Shl / T-Shr) Γ ⊢ a : T T ∈ Integer + Γ ⊢ b : U U ∈ Unsigned + ──────────────────────────────── + Γ ⊢ a << b : T (resp. a >> b : T) +``` + +Rationale: shifting a `u8` value by a `u32` amount is idiomatic and well-defined. +Requiring both operands to be the same type would be unnecessarily restrictive. Requiring +the amount to be unsigned makes negative shift amounts a compile-time error rather than +undefined behaviour. + +#### 6.1.2 Pointer equality + +Two pointer expressions may be compared with `==` and `!=` to test whether they point to +the same address. Pointer comparison is type-safe: the two operands must have a **common +pointer type** after applying the Ptr-Coerce rule (§4.7). + +``` +(T-PtrEq) Γ ⊢ e₁ : P₁ Γ ⊢ e₂ : P₂ ptr-common(P₁, P₂) = P + ─────────────────────────────────────────────────────────── + Γ ⊢ e₁ == e₂ : bool (resp. e₁ != e₂ : bool) +``` + +Where `ptr-common` is defined as: + +``` +ptr-common(*mut T, *mut T) = *mut T +ptr-common(*mut T, *T) = *T -- Ptr-Coerce applied to LHS +ptr-common(*T, *mut T) = *T -- Ptr-Coerce applied to RHS +ptr-common(*T, *T) = *T +ptr-common(*mut opaque, *mut opaque) = *mut opaque +ptr-common(*mut opaque, *opaque) = *opaque +ptr-common(*opaque, *mut opaque) = *opaque +ptr-common(*opaque, *opaque) = *opaque +ptr-common(_, _) = ⊥ -- mixed typed/opaque or different pointees +``` + +Comparing a typed pointer `*T` with an opaque pointer `*opaque` is **not** permitted +implicitly; an explicit cast is required. Comparing `*T` with `*U` where `T ≠ U` is also +not permitted. + +Ordering operators (`<`, `>`, `<=`, `>=`) are **not** defined for pointer types. + +#### 6.1.3 Assignment as an expression + +The `=` operator returns the assigned value with the **LHS type** `T`. This makes +assignments first-class expressions and enables chained assignment: + +``` +a = b = 0 -- parses as a = (b = 0) (right-associative) +``` + +Evaluation proceeds right-to-left: +1. `b = 0` — assigns `0` to `b` (type `T_b`), result is `0 : T_b`. +2. `a = (0 : T_b)` — requires `T_b ⊑ T_a`; assigns to `a`, result is the value `T_a`. + +The formal rule: + +``` +(T-Assign) Γ ⊢ e₁ mutable-place Γ ⊢ e₁ : T + Γ ⊢ e₂ : T₂ T₂ ⊑ T + ───────────────────────────────────── + Γ ⊢ e₁ = e₂ : T +``` + +The compound assignment `e₁ op= e₂` desugars before typing: + +``` +(T-CompoundAssign) e₁ op= e₂ ≡ e₁ = e₁ op e₂ +``` + +After desugaring, T-Assign applies. The LHS expression `e₁` is evaluated only once at +runtime (the desugaring is semantic, not textual); both uses of `e₁` refer to the same +place. + +### 6.2 Unary operators + +| Operator | Accepted type | Result type | Notes | +| -------- | -------------------------- | ----------- | --------------------------------------------- | +| `-` | any numeric `T` | `T` | Arithmetic negation | +| `!` | `bool` | `bool` | Logical NOT | +| `~` | any integer `T` | `T` | Bitwise complement | +| `*` | `*T` or `*mut T` | `T` | Dereference; yields a readable place (§4.4) | +| `&` | readable place of type `T` | `*T` | Address-of immutable place (Addrof-Imm, §4.5) | +| `&` | mutable place of type `T` | `*mut T` | Address-of mutable place (Addrof-Mut, §4.5) | + +The two `&` rows are not overloading — the single `&` operator produces `*mut T` or `*T` +depending on whether its operand is a mutable place (§4.5). The unary `*` operator +applied to `*T` yields a readable-but-not-mutable place; applied to `*mut T` it yields a +readable-and-mutable place (rules P-Deref-Imm / P-Deref-Mut, §4.4). + +--- + +## 7. Expression Typing + +Judgement form: `Γ, Σ, Φ ⊢ e : T` (written `Γ ⊢ e : T` when the tables are clear from +context). + +### 7.1 Typing strategy + +Expression typing uses **synthesis** (bottom-up inference): the type of an expression is +*produced* from its sub-expressions; no expected type is pushed down from the enclosing +context. The single exception is at a `let` binding with an explicit type annotation, +where the annotation is used to give numeric literals a more specific type than their +default (§5.5). This is handled in T-Let (§8); the rules in this section always +synthesise. + +### 7.2 Built-in: `string_view` + +String literals produce a value of the built-in struct type **`string_view`**, which is +pre-loaded into `Σ` before any user code is checked: + +``` +string_view ::= struct { data: *char, size: u64 } +``` + +The name `string_view` is **reserved** — a user-defined struct with this name is a type +error. A `STRING_LIT` token is compiled to a `string_view` value whose `data` field +points to the UTF-8 bytes of the string and `size` holds the byte length. + +> Future: when Flux gains slice types, `string_view` will be replaced or superseded by +> a first-class slice `[char]`. + +### 7.3 Inference rules + +#### Literals and booleans + +``` +(T-IntLit) ───────────────────────── + Γ ⊢ INT_LIT : i32 + -- default type; T-Let may supply a contextual integer type (§5.5, §8) + +(T-FloatLit) ───────────────────────── + Γ ⊢ FLOAT_LIT : f64 + -- default type; T-Let may supply a contextual float type (§5.5, §8) + +(T-StrLit) ───────────────────────────────────────── + Γ ⊢ STRING_LIT : struct string_view + +(T-CharLit) ───────────────────────── + Γ ⊢ CHAR_LIT : char + +(T-True) ───────────────────────── + Γ ⊢ true : bool + +(T-False) ───────────────────────── + Γ ⊢ false : bool +``` + +#### Variable + +``` +(T-Var) Γ(x) = (T, m) + ───────────────── + Γ ⊢ x : T +``` + +`x` may be any binding mutability `m`; the type is `T` regardless. Mutability is only +consulted when `x` is the LHS of an assignment (MP-Var, §4.3). + +#### Grouping + +``` +(T-Paren) Γ ⊢ e : T + ───────────── + Γ ⊢ (e) : T +``` + +#### Unary operators + +``` +(T-Neg) Γ ⊢ e : T T ∈ Numeric + ───────────────────────────── + Γ ⊢ -e : T + +(T-Not) Γ ⊢ e : bool + ────────────── + Γ ⊢ !e : bool + +(T-BitNot) Γ ⊢ e : T T ∈ Integer + ───────────────────────────── + Γ ⊢ ~e : T + +(T-Deref) Γ ⊢ e : *T (also valid for *mut T, by Ptr-Coerce §4.7) + ───────────────────────────── + Γ ⊢ *e : T + +(T-AddrOf) -- see §4.5: yields *T or *mut T depending on place mutability + Γ ⊢ e place (readable place) + ───────────────────────────────────────── + Γ ⊢ &e : *T if ¬(Γ ⊢ e mutable-place) [Addrof-Imm] + Γ ⊢ &e : *mut T if Γ ⊢ e mutable-place [Addrof-Mut] +``` + +#### Binary operators + +All binary operator rules follow the typing schemes in §6.1. Concisely: + +``` +(T-Arith) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂ + C = common(T₁, T₂) C ∈ Numeric + ──────────────────────────────────── + Γ ⊢ e₁ {+,-,*,/,%} e₂ : C + +(T-Shift) Γ ⊢ e₁ : T T ∈ Integer + Γ ⊢ e₂ : U U ∈ Unsigned + ───────────────────────────── + Γ ⊢ e₁ {<<,>>} e₂ : T + +(T-BitBin) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂ + C = common(T₁, T₂) C ∈ Integer + ──────────────────────────────────── + Γ ⊢ e₁ {&,|,^} e₂ : C + +(T-Logic) Γ ⊢ e₁ : bool Γ ⊢ e₂ : bool + ─────────────────────────────────── + Γ ⊢ e₁ {and,or} e₂ : bool + +(T-CmpNum) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂ + common(T₁, T₂) ∈ Numeric ∪ {bool, char} + ─────────────────────────────────────────── + Γ ⊢ e₁ {==,!=} e₂ : bool + +(T-CmpPtr) Γ ⊢ e₁ : P₁ Γ ⊢ e₂ : P₂ ptr-common(P₁, P₂) ≠ ⊥ (§6.1.2) + ─────────────────────────────────────────────────────────── + Γ ⊢ e₁ {==,!=} e₂ : bool + +(T-Ord) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂ + common(T₁, T₂) ∈ Numeric + ────────────────────────────── + Γ ⊢ e₁ {<,>,<=,>=} e₂ : bool + +(T-Assign) Γ ⊢ e₁ mutable-place Γ ⊢ e₁ : T + Γ ⊢ e₂ : T₂ T₂ ⊑ T + ───────────────────────────────────── + Γ ⊢ e₁ = e₂ : T (§6.1.3) + +(T-CompAssign) e₁ op= e₂ desugars to e₁ = e₁ op e₂; apply T-Assign (§6.1.3) +``` + +#### Postfix: field access + +``` +(T-Field) Γ ⊢ e : struct S Σ(S).f = T + ───────────────────────────────── + Γ ⊢ e.f : T +``` + +If `f` is not a field of `S`, error E0503. If `e` does not have a struct type, error E0502. + +#### Postfix: array index + +``` +(T-Index) Γ ⊢ e : [T; N] Γ ⊢ i : U U ∈ Unsigned + ────────────────────────────────────────────── + Γ ⊢ e[i] : T +``` + +If `e` is not an array type, error E0600. If `i` is not an unsigned integer, error E0601. + +#### Postfix: function call + +Functions are looked up in the **function table** `Φ`, separate from variable bindings. +The callee of a call expression must be a bare identifier (function pointers are not yet +supported). + +``` +(T-FuncName) Φ(f) = (T₁, …, Tₙ) -> T_ret + ────────────────────────────────────────────────────── + Γ ⊢ f : FnSig(T₁, …, Tₙ) -> T_ret (internal type; §1.5) + +(T-Call) Γ ⊢ e_f : FnSig(T₁, …, Tₙ) -> T_ret + Γ ⊢ a₁ : U₁ U₁ ⊑ T₁ + ⋮ + Γ ⊢ aₙ : Uₙ Uₙ ⊑ Tₙ + ────────────────────────────────────────────────────── + Γ ⊢ e_f(a₁, …, aₙ) : T_ret +``` + +If the argument count does not match the parameter count, error E0205. If argument type +`Uᵢ` is not promotable to `Tᵢ`, error E0204. If the name is not in `Φ`, error E0102. + +#### Struct literal + +``` +(T-StructLit) S ∈ dom(Σ) + Σ(S) = { f₁: T₁, …, fₙ: Tₙ } (n fields, in declaration order) + the literal supplies fields {g₁, …, gₖ} (in any order) + {g₁, …, gₖ} = {f₁, …, fₙ} (same set; field order is irrelevant) + ∀ i ∈ 1..n. Γ ⊢ e_{gᵢ} : Uᵢ ∧ Uᵢ ⊑ Tᵢ + ────────────────────────────────────────────────────────────────── + Γ ⊢ S { g₁: e₁, …, gₙ: eₙ } : struct S +``` + +The set equality `{g₁, …, gₖ} = {f₁, …, fₙ}` is strict: +- A field present in `Σ(S)` but absent from the literal → error E0500 (missing field). +- A field present in the literal but absent from `Σ(S)` → error E0501 (extra field). +- A field listed twice in the literal → error E0501 (duplicate counts as extra). + +--- + +## 8. Statement Typing + +### 8.1 Judgment form + +``` +Γ, A, R, L ⊢ s ✓ ⇒ Γ', A', cf +``` + +| Component | Meaning | +| --------- | ------------------------------------------------------------------------------------------ | +| `Γ` | Type environment (maps names to `(T, m)`); same as §3 | +| `A` | **Definitely-assigned set** — `A ⊆ dom(Γ)`; names provably initialised on all paths so far | +| `R` | Expected return type of the enclosing function | +| `L` | Loop context: `L ∈ { noloop, inloop }` | +| `Γ'` | Output type environment (may have new bindings from `let`) | +| `A'` | Output definitely-assigned set | +| `cf` | Control-flow outcome: `cf ∈ { normal, diverges }` | + +`cf = diverges` means the statement never falls through to the next statement (because of `return`, `break`, or `continue`). `cf = normal` means it may fall through. + +**T-Var** from §7.3 is refined to require definite assignment: + +``` +(T-Var) Γ(x) = (T, m) x ∈ A + ────────────────────────── + Γ, A ⊢ x : T +``` + +If `x ∉ A` (declared but not yet definitely assigned), it is a type error (use-before-init, error E0100 extended). + +### 8.2 Definite-assignment join + +When two control-flow paths merge (in `if-else`), the output `A` is the **intersection** +of the two branch sets, intersected with the outer `A`: + +``` +join_A(A_then, A_else, A) = A ∪ (A_then ∩ A_else) +``` + +A variable becomes definitely assigned after the merge only if it was assigned in **both** +branches. Variables assigned in only one branch remain unassigned after the merge. + +For conditional statements without an else (or `while`/`loop` with uncertain iterations), +the output `A` is conservatively `A` (the input set, unchanged): + +``` +join_A(A_branch, A) = A -- for single-branch paths +``` + +### 8.3 Unreachable code + +Any statement that immediately follows a statement with `cf = diverges` is **unreachable**. +Unreachable statements are still type-checked (to catch errors in dead code) but the +compiler emits warning **W001** for the first unreachable statement in each run of dead +code. + +### 8.4 Loop context + +`L` is a single flag that threads through every statement rule unchanged, **except** that +`while` and `loop` bodies are checked with `L = inloop` regardless of the outer `L`. + +``` +L_body = inloop (when checking body of while/loop) +L_inner = L (for if/else/block — L passes through unchanged) +``` + +This makes `break` and `continue` valid anywhere inside a loop body, regardless of how +many intermediate `if`, `else`, or inner blocks separate them from the `while`/`loop` +keyword. Crossing a function boundary resets `L = noloop` at the function level (§9). + +### 8.5 Statement rules + +#### T-Let + +``` +(T-Let-Infer) Γ, A ⊢ e : T (T synthesised; no annotation) + ───────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ let [mut] x = e ; ✓ ⇒ Γ, x:(T,m) A∪{x} normal + +(T-Let-Ann-Init) ⊢ T wf + Γ, A ⊢ e : U U ⊑ T + -- if e is INT_LIT/FLOAT_LIT and T is numeric: U = T directly (§5.5) + ───────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ let [mut] x : T = e ; ✓ ⇒ Γ, x:(T,m) A∪{x} normal + +(T-Let-Ann-NoInit) ⊢ T wf + ───────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ let [mut] x : T ; ✓ ⇒ Γ, x:(T,m) A normal + -- x is NOT added to A; use before assignment → E0100 +``` + +No annotation and no initialiser → error E1000. + +In all T-Let rules, `m = imm` if no `mut` keyword, `m = mut` if `mut` is present. + +#### T-Return + +``` +(T-Return-Val) Γ, A ⊢ e : U U ⊑ R + ───────────────────────────────────────────────────── + Γ, A, R, L ⊢ return e ; ✓ ⇒ Γ A diverges + +(T-Return-Unit) R = () + ───────────────────────────────────────────────────── + Γ, A, R, L ⊢ return ; ✓ ⇒ Γ A diverges +``` + +#### T-ExprStmt + +``` +(T-ExprStmt) Γ, A ⊢ e : T + A' = A ∪ { x | x ∈ assigns(e) ∩ dom(Γ) } + ───────────────────────────────────────────────────── + Γ, A, R, L ⊢ e ; ✓ ⇒ Γ A' normal +``` + +Where `assigns(e)` collects all variables directly assigned anywhere in `e`: + +``` +assigns(x = e') = {x} ∪ assigns(e') +assigns(x op= e') = assigns(e') -- x already def (op uses it) +assigns(f(e₁,…)) = ∪ assigns(eᵢ) +assigns(S { f₁: e₁, …, fₙ: eₙ }) = ∪ assigns(eᵢ) +assigns(e.f) = assigns(e) +assigns(e[i]) = assigns(e) ∪ assigns(i) +assigns((e)) = assigns(e) +assigns(op e) = assigns(e) +assigns(e₁ binop e₂) = assigns(e₁) ∪ assigns(e₂) +assigns(literal/ident) = ∅ +``` + +#### T-If + +``` +(T-If-Then) Γ, A ⊢ cond : bool + Γ, A, R, L ⊢ { then_block } ✓ ⇒ _, A_then, _ + ───────────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ if cond { then_block } ✓ ⇒ Γ A normal + -- no else: A unchanged (branch may not execute) + +(T-If-Else) Γ, A ⊢ cond : bool + Γ, A, R, L ⊢ { then_block } ✓ ⇒ _, A_then, cf_t + Γ, A, R, L ⊢ else_branch ✓ ⇒ _, A_else, cf_e + ───────────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ if cond { then_block } else else_branch ✓ + ⇒ Γ join_A(A_then, A_else, A) join_cf(cf_t, cf_e) +``` + +`else_branch` is either an `if` statement (else-if chain) or a block `{ … }`. + +``` +join_cf(diverges, diverges) = diverges +join_cf(normal, _) = normal +join_cf(_, normal) = normal +``` + +The `Γ` output of T-If discards any bindings introduced inside the branches (scope exit, +§3.3); only the outer `Γ` is returned. + +#### T-While + +``` +(T-While) Γ, A ⊢ cond : bool + Γ, A, R, inloop ⊢ { body } ✓ ⇒ _, _, _ + ───────────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ while cond { body } ✓ ⇒ Γ A normal +``` + +The body is checked with `L = inloop`. The output `A` is conservatively `A` (the loop +may execute zero times). `cf = normal` (the loop may terminate when the condition +becomes false). + +#### T-Loop + +``` +(T-Loop) Γ, A, R, inloop ⊢ { body } ✓ ⇒ _, _, _ + ───────────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ loop { body } ✓ ⇒ Γ A normal +``` + +`cf = normal` conservatively (a `break` inside exits the loop). Output `A` is `A` +(the body may be bypassed by an immediate `break`). + +#### T-Break / T-Continue + +``` +(T-Break) L = inloop + ───────────────────────────────────────── + Γ, A, R, L ⊢ break ; ✓ ⇒ Γ A diverges + +(T-Continue) L = inloop + ───────────────────────────────────────── + Γ, A, R, L ⊢ continue ; ✓ ⇒ Γ A diverges +``` + +If `L = noloop`, `break` → E0800, `continue` → E0801. + +#### T-Block + +A block `{ s₁; s₂; …; sₙ }` is checked by chaining statements sequentially. Bindings +introduced inside are removed on exit (scope restriction, §3.3). + +``` +(T-Block-Empty) ─────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ { } ✓ ⇒ Γ A normal + +(T-Block-Step) Γ, A, R, L ⊢ s₁ ✓ ⇒ Γ₁, A₁, normal + Γ₁, A₁, R, L ⊢ { s₂; …; sₙ } ✓ ⇒ Γₙ, Aₙ, cf + ─────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ { s₁; s₂; …; sₙ } ✓ ⇒ restrict(Γₙ, Γ) restrict_A(Aₙ, Γ) cf + +(T-Block-Dead) Γ, A, R, L ⊢ s₁ ✓ ⇒ Γ₁, A₁, diverges + -- s₂, …, sₙ are unreachable; emit W001; still type-check them + Γ₁, A₁, R, L ⊢ { s₂; …; sₙ } ✓ ⇒ _, _, _ (results discarded) + ─────────────────────────────────────────────────────────────── + Γ, A, R, L ⊢ { s₁; s₂; …; sₙ } ✓ ⇒ restrict(Γ₁, Γ) restrict_A(A₁, Γ) diverges +``` + +Where: +- `restrict(Γ', Γ)` removes all bindings in `Γ'` that are not in `dom(Γ)` (inner let bindings). +- `restrict_A(A', Γ)` removes all names from `A'` that are not in `dom(Γ)`. + +--- + +## 9. Function Definitions + +Judgement form: `Σ, Φ ⊢ func_def ✓` + +The function table `Φ` is pre-populated with the signatures of **all** functions in the +program before any body is checked (first pass, §11). This gives every function access +to its own name for recursion and to any other function regardless of declaration order. + +### 9.1 Building the initial environment + +Given a function `fn f(p₁, …, pₙ) [-> R] { body }` where each parameter is +`[mut] xᵢ : Tᵢ`: + +``` +Γ_init = x₁:(T₁, m₁), …, xₙ:(Tₙ, mₙ) -- ordered; rightmost wins for lookup +A_init = { x₁, …, xₙ } -- all params are definitely assigned on entry +``` + +where `mᵢ = mut` if `mut` is written, `mᵢ = imm` otherwise. + +All parameter types must be well-formed: `⊢ Tᵢ wf` for each `i`. + +**Duplicate parameter names** are an error (E0902): all names `x₁, …, xₙ` must be +pairwise distinct. The check is performed before the environment is built. + +### 9.2 Return type + +The declared return type `R` is taken from the `-> T` annotation. If the annotation is +absent, `R = ()` (unit; the function returns no value). + +### 9.3 All-paths-return + +The function body is checked as a block statement with `L = noloop`. The resulting +control-flow outcome `cf` determines whether the return requirement is satisfied: + +| Return type | Required `cf` | Otherwise | +| ----------- | ---------------------- | ----------- | +| `R = ()` | `normal` or `diverges` | — always OK | +| `R ≠ ()` | `diverges` | error E1001 | + +When `R = ()` and the body falls through (`cf = normal`), the function implicitly +returns `()` at end of body — no explicit `return` is needed. An explicit `return;` +(bare, no value) is also allowed anywhere in a `()` function (T-Return-Unit, §8). + +When `R ≠ ()`, every control-flow path through the body must end with an explicit +`return e;` where `e : U` and `U ⊑ R`. If the body's `cf = normal` (some path falls +through without a `return`), error E1001 is reported. + +**Note on loop conservatism**: `while` and `loop` statements always output `cf = normal` +(§8.4). A function body that ends with a `loop { return e; }` and no following `return` +will fail the all-paths check (`cf = normal` from the loop). Add an explicit `return` +after the loop to satisfy the checker, or use `if-else` chains that both return. + +### 9.4 Formal rule + +``` +(T-FuncDef) f ∈ dom(Φ) Φ(f) = (T₁, …, Tₙ) -> R -- f pre-loaded by §11 first pass + ∀ i. ⊢ Tᵢ wf -- param types well-formed + all parameter names x₁, …, xₙ distinct -- else E0902 + Γ_init = x₁:(T₁,m₁), …, xₙ:(Tₙ,mₙ) + A_init = { x₁, …, xₙ } + Γ_init, A_init, R, noloop ⊢ { body } ✓ ⇒ _, _, cf + R = () or cf = diverges -- else E1001 + ────────────────────────────────────────────────────────────────────────── + Σ, Φ ⊢ fn f(params) [-> R] { body } ✓ +``` + +### 9.5 Recursion + +Direct recursion (`f` calling `f`) and mutual recursion (`f` calling `g`, `g` calling +`f`) are both permitted. Because `Φ` is fully populated before any body is checked +(§11), `T-FuncName` (§7.3) can resolve any call to `f` inside `f`'s own body without +any special treatment in `T-FuncDef`. + +--- + +## 10. Struct Definitions + +Judgement form: `Σ ⊢ struct_def ✓` + +`Σ` is pre-populated with all struct names in the program before any struct body is +checked (first pass, §11). This allows forward references and mutually recursive pointer +types. + +### 10.1 Size-dependency and cycles + +Define `struct-names(T)` — the set of struct names that appear **directly** (not behind +a pointer) in type `T`: + +``` +struct-names(struct S) = { S } +struct-names([T; N]) = struct-names(T) -- array has same direct dependency as elem +struct-names(*T) = ∅ -- pointer indirection breaks size dependency +struct-names(*mut T) = ∅ +struct-names(*opaque) = ∅ +struct-names(*mut opaque) = ∅ +struct-names(primitive) = ∅ +struct-names(()) = ∅ +``` + +Define the **direct-uses** relation on struct names: + +``` +direct-uses(S) = ⋃ { struct-names(Tᵢ) | fᵢ : Tᵢ ∈ Σ(S) } +``` + +Let `TC(direct-uses)` be its transitive closure. A struct `S` has an **infinite-size +cycle** if `S ∈ TC(direct-uses)(S)`. Such cycles are forbidden (error E0900). + +Examples: + +``` +struct Node { next: *Node } -- direct-uses(Node) = ∅ → no cycle ✓ +struct Bad { inner: Bad } -- direct-uses(Bad) = {Bad} → cycle E0900 +struct A { b: B } -- direct-uses(A) = {B} +struct B { a: *A } -- direct-uses(B) = ∅ → A→B→ (stop); no cycle ✓ +struct C { d: D } -- direct-uses(C) = {D} +struct D { c: C } -- direct-uses(D) = {C} → C→D→C: cycle E0900 +``` + +### 10.2 Formal rule + +``` +(T-StructDef) S ∈ dom(Σ) + Σ(S) = { f₁: T₁, …, fₙ: Tₙ } + all field names f₁, …, fₙ pairwise distinct -- else E0901 + ∀ i ∈ 1..n. ⊢ Tᵢ wf -- else E0101 / type error + S ∉ TC(direct-uses)(S) -- else E0900 + ────────────────────────────────────────────────────────────────────── + Σ ⊢ struct S { f₁: T₁, …, fₙ: Tₙ } ✓ +``` + +An empty struct (`n = 0`) is valid — it has no fields to check and no cycles. + +--- + +## 11. Program Typing + +Judgement form: `⊢ program ✓` + +A program is a sequence of top-level definitions (structs and functions) in any order. +Checking proceeds in four passes to support mutual recursion and forward references. + +### 11.1 Passes + +#### Pass 1 — Collect declarations + +Scan every top-level definition and populate: +- `Σ` — map from struct name to the sequence of field names/types (bodies **not** checked yet) +- `Φ` — map from function name to its signature `(T₁,…,Tₙ) -> R` (bodies **not** checked yet) + +Errors detected in pass 1: +- **E0103**: two `struct` definitions with the same name. +- **E0104**: two `fn` definitions with the same name. +- **E0101** (forward): a type annotation in a signature or field references a struct name not in `Σ` — reported immediately since signatures are parsed in full during pass 1. + +#### Pass 2 — Check struct bodies + +For each struct definition, apply T-StructDef (§10.2) under the completed `Σ`. Errors +checked: E0901 (duplicate field), ⊢ Tᵢ wf (E0101), E0900 (size cycle). The cycle check +is performed globally across all structs using `TC(direct-uses)`. + +#### Pass 3 — Check function bodies + +For each function definition, apply T-FuncDef (§9.4) under the completed `Σ` and `Φ`. +Errors checked: E0902 (duplicate param), E1001 (missing return), and all expression/ +statement errors from §6–§9. + +#### Pass 4 — Entry point (optional) + +If the compiler is invoked with `--entry `: +- `name ∈ dom(Φ)` — else error E0102 (undefined function for entry). +- The signature `Φ(name)` must be compatible with the entry convention (currently + no required signature; any function may serve as entry). + +### 11.2 Formal rule + +``` +(T-Program) Σ, Φ built from pass 1 (no duplicates — E0103/E0104) + ∀ struct_def in program. Σ ⊢ struct_def ✓ -- pass 2 + ∀ func_def in program. Σ, Φ ⊢ func_def ✓ -- pass 3 + entry ∈ dom(Φ) (if --entry specified) -- pass 4 + ────────────────────────────────────────────────────────────────────── + ⊢ program ✓ +``` + +--- + +## 12. Error Catalogue + +Each error has a code, a category, a message template (using `{…}` for placeholders), +and the spans to highlight in the diagnostic. Warnings use the `W` prefix. + +### 12.1 Name errors (E01xx) + +| Code | Category | Message template | Span(s) | +| ----- | ----------------------- | --------------------------------------------- | ----------------- | +| E0100 | Undefined variable | `cannot find value '{name}' in this scope` | use site | +| E0101 | Undefined struct type | `cannot find type '{name}' in this scope` | type annotation | +| E0102 | Undefined function | `cannot find function '{name}' in this scope` | call site | +| E0103 | Duplicate struct name | `struct '{name}' is defined more than once` | second definition | +| E0104 | Duplicate function name | `function '{name}' is defined more than once` | second definition | + +*Note*: use of a declared-but-uninitialized variable (§8.1) is reported as E0100 with +the message `use of possibly-uninitialized variable '{name}'`. + +### 12.2 Type mismatch errors (E02xx) + +| Code | Category | Message template | Span(s) | +| ----- | ------------------------ | ----------------------------------------------------------------- | ---------------- | +| E0200 | Operator type mismatch | `operator '{op}' cannot be applied to types '{T}' and '{U}'` | operator | +| E0201 | Assignment type mismatch | `cannot assign value of type '{U}' to binding of type '{T}'` | initialiser expr | +| E0202 | Condition not bool | `condition must be of type 'bool', found '{T}'` | condition expr | +| E0203 | Return type mismatch | `cannot return value of type '{U}' from function returning '{R}'` | return expr | +| E0204 | Argument type mismatch | `argument {i} has type '{U}', expected '{T}'` | argument expr | +| E0205 | Wrong argument count | `function '{f}' expects {n} argument(s) but {k} were supplied` | call expression | + +### 12.3 Mutability errors (E03xx) + +| Code | Category | Message template | Span(s) | +| ----- | ------------------- | --------------------------------------------------------------- | ----------------- | +| E0300 | Assign to immutable | `cannot assign to '{name}' because it is not declared as 'mut'` | LHS, binding site | +| E0301 | Assign to non-place | `left-hand side of assignment is not a valid place expression` | LHS expr | + +### 12.4 Operator errors (E04xx) + +| Code | Category | Message template | Span(s) | +| ----- | ------------------------- | -------------------------------------------------------------------------- | ------------ | +| E0400 | Operator type mismatch | `operator '{op}' requires compatible numeric types, found '{T}' and '{U}'` | operator | +| E0401 | Shift amount not unsigned | `shift amount must be an unsigned integer type, found '{U}'` | RHS of shift | + +### 12.5 Struct literal errors (E05xx) + +| Code | Category | Message template | Span(s) | +| ----- | -------------------- | ----------------------------------------------------- | --------------------- | +| E0500 | Missing struct field | `missing field '{f}' in initialiser for struct '{S}'` | struct literal | +| E0501 | Extra struct field | `struct '{S}' has no field named '{f}'` | field name in literal | +| E0502 | Field on non-struct | `type '{T}' has no fields` | field access expr | +| E0503 | No such field | `struct '{S}' has no field named '{f}'` | field name in access | + +### 12.6 Index errors (E06xx) + +| Code | Category | Message template | Span(s) | +| ----- | --------------------------- | ----------------------------------------------------------- | ---------- | +| E0600 | Index on non-array | `type '{T}' cannot be indexed` | index expr | +| E0601 | Index type not unsigned int | `array index must be an unsigned integer type, found '{U}'` | index expr | + +### 12.7 Pointer errors (E07xx) + +| Code | Category | Message template | Span(s) | +| ----- | ---------------- | ---------------------------------------------- | ----------- | +| E0700 | Deref of non-ptr | `type '{T}' cannot be dereferenced` | deref expr | +| E0701 | Addrof non-place | `cannot take the address of a temporary value` | addrof expr | + +### 12.8 Control-flow errors (E08xx) + +| Code | Category | Message template | Span(s) | +| ----- | ----------------------- | ----------------------------------- | ------------------ | +| E0800 | `break` outside loop | `'break' used outside of a loop` | `break` keyword | +| E0801 | `continue` outside loop | `'continue' used outside of a loop` | `continue` keyword | + +### 12.9 Struct definition errors (E09xx) + +| Code | Category | Message template | Span(s) | +| ----- | --------------------------------- | ------------------------------------------------------------------ | ---------------- | +| E0900 | Recursive struct (no indirection) | `struct '{S}' has infinite size due to recursive field '{f}: {T}'` | field type | +| E0901 | Duplicate struct field | `field '{f}' is defined more than once in struct '{S}'` | second field | +| E0902 | Duplicate function parameter | `parameter '{x}' is defined more than once in function '{f}'` | second parameter | + +### 12.10 Binding errors (E10xx) + +| Code | Category | Message template | Span(s) | +| ----- | ------------------- | ------------------------------------------------------------------- | ------------- | +| E1000 | No type for binding | `cannot infer type for '{x}': no annotation and no initialiser` | `let` keyword | +| E1001 | Missing return | `function '{f}' must return '{R}' but not all paths return a value` | function name | + +### 12.11 Warnings (Wxxx) + +| Code | Category | Message template | Span(s) | +| ---- | ---------------- | ----------------------- | --------------------------- | +| W001 | Unreachable code | `unreachable statement` | first unreachable statement |