Files
flux/SEMANTICS.md
Jooris Hadeler eda4f92900 Docs: finalise SEMANTICS.md type system specification
All 12 sections are now fully defined. Removed incremental-progress
markers (Status: defined), updated intro and outline preamble, and
fixed two missing cases in the assigns() function (struct literals and
parenthesised expressions).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 11:17:31 +01:00

63 KiB
Raw Blame History

Flux Language Semantics

This document formalises the static semantics — primarily the type system — of Flux.

Notation used throughout:

Symbol Meaning
Γ Type environment (maps names to their type + mutability)
Σ Struct table (maps struct names to their field lists)
Φ Function table (maps function names to their signatures)
Γ ⊢ e : T Under environment Γ, expression e has type T
Γ ⊢ s ✓ Under environment Γ, statement s is well-typed
⊢ T wf Type T is well-formed
T₁ ~ T₂ Types T₁ and T₂ are compatible (assignable)
T ⊑ U T can be implicitly promoted to U
common(T, U) Least upper bound of T and U under ; error if none exists
rank(T) Numeric rank of type T (width class; defined in §5)
category(T) Numeric category of T: Unsigned, Signed, Float, or Char

Outline

The following sections form the complete type system, in dependency order — each section relies only on those above it.

  1. Types
  2. Well-Formedness of Types
  3. Type Environments
  4. Mutability
  5. Numeric Type Hierarchy and Coercions
  6. Operator Typing
  7. Expression Typing
  8. Statement Typing
  9. Function Definitions
  10. Struct Definitions
  11. Program Typing
  12. Error Catalogue

1. Types

The type language — the set of types that Flux values can have.

1.1 Primitive types

T ::= u8 | u16 | u32 | u64       -- unsigned integers
    | i8 | i16 | i32 | i64       -- signed integers
    | f32 | f64                  -- floating-point
    | bool                       -- boolean
    | char                       -- Unicode scalar value; representation-compatible with u32

1.2 Numeric type categories

Each numeric primitive belongs to exactly one category. Categories partition the numeric types; implicit promotion never crosses a category boundary (see §5).

Category Members Notes
Unsigned u8, u16, u32, u64
Signed i8, i16, i32, i64
Float f32, f64
Char char Promotes into Unsigned only (see §5.3)

For the purposes of operator rules, the union categories are:

  • Integer = Unsigned Signed
  • Numeric = Unsigned Signed Float

bool and char are not members of Numeric; they are handled separately.

1.3 Bit widths

Type Width Type Width
u8 8 i8 8
u16 16 i16 16
u32 32 i32 32
u64 64 i64 64
f32 32 f64 64
bool 8 char 32

1.4 Compound types

T ::= ...
    | *T                         -- immutable pointer to T  (read-only; C's const T*)
    | *mut T                     -- mutable pointer to T    (read-write; C's T*)
    | *opaque                    -- immutable untyped pointer (C's const void*)
    | *mut opaque                -- mutable untyped pointer   (C's void*)
    | [T; N]                     -- array of N elements of type T  (N: usize, N ≥ 0)
    | struct S                   -- named struct type (resolved through Σ)
    | ()                         -- unit type (no-value / void)

The mut qualifier on a pointer type describes the pointer's access permission to the pointed-to data. It is orthogonal to binding mutability (§4).

A shorthand used throughout:

  • typed pointer: *T or *mut T (pointee type is known)
  • opaque pointer: *opaque or *mut opaque (pointee type is erased)
  • mutable pointer: *mut T or *mut opaque
  • immutable pointer: *T or *opaque

1.5 Function types (internal use)

Function types do not appear in user-facing type annotations but are used internally by the type checker when resolving calls.

FnSig ::= (T₁, …, Tₙ) -> T_ret

Open questions — resolved

  • () as first-class type: () may appear explicitly in -> () return-type annotations and also arises implicitly when a function omits its return type. It is a valid type in ⊢ T wf (Wf-Unit, §2).
  • Pointer mutability: settled in §4. Pointer mutability (*T vs *mut T) is orthogonal to binding mutability and carried in the type itself.

2. Well-Formedness of Types

A type is well-formed (written ⊢ T wf) if every struct name it references is declared and all array sizes are non-negative. Well-formedness is checked against the struct table Σ; pointer and array types impose structural constraints. Recursive struct cycles through direct fields (without pointer indirection) are rejected here — the rule is deferred to §10 where the full Σ is available.

2.1 Inference rules

(Wf-Prim)   ⊢ T wf     for any T ∈ { u8, u16, u32, u64,
                                      i8, i16, i32, i64,
                                      f32, f64, bool, char }

(Wf-Unit)   ⊢ () wf

(Wf-PtrImm)   ⊢ T wf
              ───────────────
              ⊢ *T wf

(Wf-PtrMut)   ⊢ T wf
              ───────────────
              ⊢ *mut T wf

(Wf-OpaqueImm)   ⊢ *opaque wf

(Wf-OpaqueMut)   ⊢ *mut opaque wf

(Wf-Array)   ⊢ T wf    N ≥ 0
             ──────────────────────
             ⊢ [T; N] wf

(Wf-Struct)  S ∈ dom(Σ)
             ──────────────────────
             ⊢ struct S wf

Note on Wf-Struct: this rule only checks that the struct name is declared. The requirement that all field types are themselves well-formed, and that the struct has no infinite-size cycles, is established when the struct definition is checked (§10).

2.2 Size finiteness (informal)

A type is sized iff its in-memory representation has a fixed, finite byte count determinable at compile time. All types in §2.1 are sized, except a struct S whose definition contains a direct (non-pointer) field of a type that transitively includes S again. Such cycles are detected in §10 and reported as E0900.

Pointer types (*T, *mut T, *opaque, *mut opaque) always have a fixed pointer-width size regardless of the pointee. Wrapping a recursive struct in a pointer breaks the cycle:

struct Node { next: *Node }          -- OK: *Node has finite size
struct Bad  { inner: Bad  }          -- error E0900: infinite size

3. Type Environments

The type environment Γ tracks every variable binding that is in scope at a given program point. It is an ordered association list — the most recently introduced binding for a given name takes precedence.

3.1 Grammar

m ::= imm | mut               -- binding mutability

Γ ::= ε                       -- empty environment
    | Γ, x : (T, m)           -- extend with binding x of type T and mutability m

Bindings are appended on the right. Lookup always returns the rightmost (most recent) entry for a given name, implementing latest-binding-wins semantics.

3.2 Lookup

(Lookup-Hit)   ────────────────────────
               (Γ, x:(T,m))(x) = (T, m)

(Lookup-Skip)  x ≠ y       Γ(x) = (T, m)
               ─────────────────────────────────────
               (Γ, y:(T',m'))(x) = (T, m)

Attempting to look up a name not present in Γ is undefined; the type checker must report error E0100.

3.3 Scope and block exit

Scopes are not represented as a separate data structure — they arise naturally from how the statement rules in §8 thread Γ through blocks.

A block { s₁; s₂; …; sₙ } is checked by extending Γ sequentially with each statement's bindings, yielding a final Γ_out. On block exit, Γ_out is discarded and the environment reverts to Γ (the environment at block entry). This is written:

Γ ⊢ block ✓ ⇒ Γ_out     but the caller uses only Γ, not Γ_out

The formal rule (T-Block) is given in §8.

3.4 Shadowing and overriding

Flux adopts Rust-style re-binding semantics: a let declaration always creates a new binding regardless of whether the name already exists. The behaviour depends on where the prior binding lives:

Case A — override (same scope)

Two let x declarations within the same block. Both bindings exist in the environment; lookup returns the second (newer) one. The first is permanently inaccessible for the remainder of that scope and any nested scopes.

// scope S
let x: i32 = 1;    // Γ₁ = Γ, x:(i32, imm)
let x: f64 = 2.0;  // Γ₂ = Γ₁, x:(f64, imm)
                   //      = Γ, x:(i32,imm), x:(f64,imm)
// Γ₂(x) = (f64, imm)   ← i32 binding is gone

On exit from scope S, both entries are discarded.

The type of a re-bound name is allowed to change (e.g. i32f64 above). This is intentional: re-binding is a distinct operation from mutation.

Case B — shadow (nested scope)

An inner let x in a nested block, where x is already bound in an enclosing scope. The inner binding shadows the outer one inside the nested block. When the nested block exits, the outer binding is restored.

let x: i32 = 1;         // outer: Γ₁ = Γ, x:(i32, imm)
{
    let x: f64 = 2.0;   // inner: Γ₂ = Γ₁, x:(f64, imm)
    // Γ₂(x) = (f64, imm)   ← outer i32 is shadowed
}
// back to Γ₁: Γ₁(x) = (i32, imm)   ← restored

Summary

Situation Old binding accessible? After scope exit
Override (same scope) No — immediately hidden Both removed
Shadow (inner scope) No — hidden in inner Outer restored

Both cases are modelled identically in the formal system — the environment is extended with a new entry and lookup finds the newest. The difference is entirely about which entries are removed when a scope exits:

  • Block scope: on exit, all bindings introduced inside that block are removed.
  • Because scope exit is wholesale, an overriding binding in scope S is removed at the same time as the overridden one — neither survives past S.

3.5 Interaction with Σ and Φ

Struct names (Σ) and function names (Φ) occupy separate namespaces from variable bindings (Γ). A variable foo and a function foo may coexist; the expression typing rules in §7 use Γ to resolve identifiers used as values and Φ to resolve calls. A name used as a call target (f(…)) is looked up in Φ; a name used as a plain value is looked up in Γ. There is no shadowing between these namespaces.


4. Mutability

Flux has two orthogonal axes of mutability:

  1. Binding mutability — whether a name can be rebound (assigned a new value). Declared with let vs. let mut.
  2. Pointer mutability — whether the data pointed to can be written through a pointer. Declared as *T (immutable) vs. *mut T (mutable).

These axes are fully independent; any combination is valid (see §1.4 table).

4.1 Binding mutability

A binding is introduced by let and associates a name with a storage location.

Γ, x : (T, imm)   introduced by   let x [: T] [= e] ;
Γ, x : (T, mut)   introduced by   let mut x [: T] [= e] ;

Function parameters follow the same rule: [mut] name : T in a parameter list. Mutability on a parameter is local — the callee may freely rebind it; the caller never observes the difference.

4.2 Place expressions

A place (lvalue) is an expression that denotes a memory location that can be read from or written to. Not every expression is a place.

place ::= x                     -- variable
        | * e                   -- pointer dereference  (e must have a pointer type)
        | e . f                 -- struct field access   (e must be a place or struct value)
        | e [ i ]               -- array subscript       (e must be a place or array value)

All other expression forms (literals, arithmetic, calls, etc.) are not places.

4.3 Mutable places

A place is mutable if it can be the target of an assignment. The predicate Γ ⊢ e mutable-place is defined by the following rules:

(MP-Var)     Γ(x) = (T, mut)
             ─────────────────────────────
             Γ ⊢ x  mutable-place

(MP-Deref)   Γ ⊢ e : *mut T
             ─────────────────────────────
             Γ ⊢ *e  mutable-place

(MP-Field)   Γ ⊢ e  mutable-place      Σ(S).f = T
             ─────────────────────────────────────
             Γ ⊢ e.f  mutable-place

(MP-Index)   Γ ⊢ e  mutable-place
             ─────────────────────────────
             Γ ⊢ e[i]  mutable-place

Note the asymmetry in MP-Deref: dereferencing an immutable pointer *T yields a place (it can be read) but not a mutable place (it cannot be written). There is no MP-Deref rule for *T.

4.4 Readable places

Dereferencing any pointer (mutable or not) produces a place that can be read. The predicate Γ ⊢ e place is a relaxed version of mutable-place used in expression typing (§7) for the * unary operator:

(P-Var)      x ∈ dom(Γ)
             ─────────────
             Γ ⊢ x  place

(P-Deref-Imm)  Γ ⊢ e : *T
               ─────────────────
               Γ ⊢ *e  place        (readable, but not mutable-place)

(P-Deref-Mut)  Γ ⊢ e : *mut T
               ─────────────────
               Γ ⊢ *e  place        (readable AND mutable-place via MP-Deref)

(P-Field)    Γ ⊢ e  place      Σ(S).f = T
             ────────────────────────────
             Γ ⊢ e.f  place

(P-Index)    Γ ⊢ e  place
             ─────────────────
             Γ ⊢ e[i]  place

4.5 Address-of and pointer mutability

The unary & operator produces a pointer. The mutability of the resulting pointer is determined by the mutability of the place it is applied to:

(Addrof-Imm)   Γ ⊢ e  place       ¬(Γ ⊢ e  mutable-place)
               ────────────────────────────────────────────
               Γ ⊢ &e : *T

(Addrof-Mut)   Γ ⊢ e  mutable-place
               ────────────────────────────────────────────
               Γ ⊢ &e : *mut T

Taking the address of a mutable place yields a *mut T; taking the address of a read-only place yields a *T. This ensures that a *mut T cannot be obtained from a non-mutable binding via &.

4.6 Interaction summary

Source form Binding mut? Ptr mut? Can rebind p? Can write *p?
let p: *T = &x no no no no
let mut p: *T = &x yes no yes no
let p: *mut T = &mut x no yes no yes
let mut p: *mut T = &mut x yes yes yes yes

4.7 Pointer subtyping

A mutable pointer may be used where an immutable pointer is expected (covariant in the immutable direction). The reverse is not permitted.

(Ptr-Coerce)   Γ ⊢ e : *mut T
               ──────────────────
               Γ ⊢ e : *T          -- *mut T is a subtype of *T

This is consistent with the promotion relation (§5): *mut T ⊑ *T. There is no implicit coercion in the other direction; an explicit cast is required to obtain *mut T from *T.


5. Numeric Type Hierarchy and Coercions

Flux supports implicit widening promotion for numeric types. Promotions are one-directional (widening only), never cross a category boundary, and never lose information.

5.1 Rank

Each numeric type has an integer rank that reflects its bit width within its category.

Type rank Type rank
u8 1 i8 1
u16 2 i16 2
u32 3 i32 3
u64 4 i64 4
f32 1 f64 2

bool and char have no rank; they are not members of Numeric.

5.2 Promotion predicate T ⊑ U

T ⊑ U holds — meaning a value of type T may be implicitly coerced to type U — exactly when one of the following rules applies:

(Refl)      T ⊑ T       for any T

(Unsigned)  T ⊑ U       if  category(T) = category(U) = Unsigned
                         and rank(T) < rank(U)

(Signed)    T ⊑ U       if  category(T) = category(U) = Signed
                         and rank(T) < rank(U)

(Float)     f32 ⊑ f64

(Char)      char ⊑ U    if  category(U) = Unsigned
                         and rank(U) ≥ 3          -- u32 or u64

All other combinations are forbidden. In particular:

  • Unsigned → Signed is forbidden (even if the target is wider).
  • Signed → Unsigned is forbidden.
  • Any integer → Float is forbidden.
  • Float → integer is forbidden.
  • Narrowing (rank decreases) is forbidden in both directions.
  • char → Signed is forbidden.
  • char may only promote to u32 or u64 (rank ≥ 3 unsigned).

5.3 char and u32

char is represented as a 32-bit unsigned scalar. It is not a synonym for u32 — they are distinct types — but a char may be implicitly promoted to u32 or u64 when the context requires it (e.g. as an operand to integer arithmetic).

The reverse (u32 → char) is never implicit; it requires an explicit cast, since not every u32 value is a valid Unicode scalar.

5.4 Common type common(T, U)

For binary operators that require matching operand types, the common type of T and U is defined as their least upper bound under :

common(T, U) = T    if U ⊑ T
common(T, U) = U    if T ⊑ U
common(T, U) = ⊥    otherwise  -- type error: no common type

When T ⊑ U and U ⊑ T simultaneously (which can only occur when T = U by Refl), common(T, U) = T = U.

Examples:

T U common(T, U) Notes
u8 u32 u32 u8 promoted to u32
u32 u8 u32 symmetric
i16 i64 i64 i16 promoted to i64
f32 f64 f64 f32 promoted to f64
char u32 u32 char promoted as u32 equivalent
char u64 u64 char promoted to u64
u8 i32 error: cross-signedness promotion
i32 u64 error: signed → unsigned forbidden
u32 f32 error: integer → float forbidden
char u16 error: rank(u16)=2 < 3, below u32
char i32 error: char → signed forbidden

5.5 Literal type inference

Integer and float literals have no intrinsic type. Their type is determined by:

  1. Contextual typing (checked mode): if the surrounding context expects a specific numeric type T, the literal is given type T directly, without promotion. The literal value must be representable in T (range check).

  2. Default type (synthesised mode): when no context is available, literals are given a default type:

    • INT_LITi32
    • FLOAT_LITf64

    These defaults apply only when type inference cannot determine a more specific type.

5.6 Explicit casts

Conversions not covered by require an explicit cast (narrowing, cross-signedness, integer↔float, u32→char, etc.). The cast syntax and its rules are not yet in the grammar and will be specified separately.


6. Operator Typing

Each operator has one or more typing schemes describing the types it accepts and the type it produces.

6.1 Binary operators

For operators that accept two numeric operands, the operands are first coerced to their common(T₁, T₂) type (§5.4) before the operation is performed. If common is the expression is a type error.

Operators Required operand types Result type Notes
+ - * / % T₁, T₂ where common(T₁, T₂) ∈ Numeric common(T₁, T₂) Both operands promoted to common type
<< >> LHS T any integer; RHS U any unsigned integer T Only LHS type determines the result; no promotion of the LHS; RHS must be unsigned (see §6.1.1)
& | ^ T₁, T₂ where common(T₁, T₂) ∈ Integer common(T₁, T₂) Both operands promoted to common integer type
and or bool, bool bool Short-circuit; no promotion applies
== != numeric/bool/char: common(T₁,T₂) ∈ Numeric {bool, char}; or pointer equality (see §6.1.2) bool Structs and arrays are not comparable
< > <= >= T₁, T₂ where common(T₁, T₂) ∈ Numeric bool Both operands promoted to common type
= LHS: mutable place of type T; RHS: T₂ where T₂ ⊑ T T Result is the assigned value; enables chaining (see §6.1.3)
op= Same as = after desugaring lhs op= rhslhs = lhs op rhs T LHS must be a mutable place

6.1.1 Shift operator detail

a << b and a >> b use asymmetric typing:

  • a may be any integer type T; T is also the result type.
  • b must be any unsigned integer type (u8, u16, u32, u64). Signed or other types are rejected (error E0401).
  • The two operands are never subject to common — LHS and RHS types are checked independently. The LHS is never promoted to match the RHS.
(T-Shl / T-Shr)   Γ ⊢ a : T         T ∈ Integer
                   Γ ⊢ b : U         U ∈ Unsigned
                   ────────────────────────────────
                   Γ ⊢ a << b : T    (resp. a >> b : T)

Rationale: shifting a u8 value by a u32 amount is idiomatic and well-defined. Requiring both operands to be the same type would be unnecessarily restrictive. Requiring the amount to be unsigned makes negative shift amounts a compile-time error rather than undefined behaviour.

6.1.2 Pointer equality

Two pointer expressions may be compared with == and != to test whether they point to the same address. Pointer comparison is type-safe: the two operands must have a common pointer type after applying the Ptr-Coerce rule (§4.7).

(T-PtrEq)   Γ ⊢ e₁ : P₁    Γ ⊢ e₂ : P₂    ptr-common(P₁, P₂) = P
            ───────────────────────────────────────────────────────────
            Γ ⊢ e₁ == e₂ : bool    (resp. e₁ != e₂ : bool)

Where ptr-common is defined as:

ptr-common(*mut T,  *mut T)    = *mut T
ptr-common(*mut T,  *T)        = *T        -- Ptr-Coerce applied to LHS
ptr-common(*T,      *mut T)    = *T        -- Ptr-Coerce applied to RHS
ptr-common(*T,      *T)        = *T
ptr-common(*mut opaque, *mut opaque) = *mut opaque
ptr-common(*mut opaque, *opaque)     = *opaque
ptr-common(*opaque, *mut opaque)     = *opaque
ptr-common(*opaque, *opaque)         = *opaque
ptr-common(_,  _)              = ⊥        -- mixed typed/opaque or different pointees

Comparing a typed pointer *T with an opaque pointer *opaque is not permitted implicitly; an explicit cast is required. Comparing *T with *U where T ≠ U is also not permitted.

Ordering operators (<, >, <=, >=) are not defined for pointer types.

6.1.3 Assignment as an expression

The = operator returns the assigned value with the LHS type T. This makes assignments first-class expressions and enables chained assignment:

a = b = 0    -- parses as  a = (b = 0)  (right-associative)

Evaluation proceeds right-to-left:

  1. b = 0 — assigns 0 to b (type T_b), result is 0 : T_b.
  2. a = (0 : T_b) — requires T_b ⊑ T_a; assigns to a, result is the value T_a.

The formal rule:

(T-Assign)   Γ ⊢ e₁  mutable-place    Γ ⊢ e₁ : T
             Γ ⊢ e₂ : T₂              T₂ ⊑ T
             ─────────────────────────────────────
             Γ ⊢ e₁ = e₂ : T

The compound assignment e₁ op= e₂ desugars before typing:

(T-CompoundAssign)   e₁ op= e₂  ≡  e₁ = e₁ op e₂

After desugaring, T-Assign applies. The LHS expression e₁ is evaluated only once at runtime (the desugaring is semantic, not textual); both uses of e₁ refer to the same place.

6.2 Unary operators

Operator Accepted type Result type Notes
- any numeric T T Arithmetic negation
! bool bool Logical NOT
~ any integer T T Bitwise complement
* *T or *mut T T Dereference; yields a readable place (§4.4)
& readable place of type T *T Address-of immutable place (Addrof-Imm, §4.5)
& mutable place of type T *mut T Address-of mutable place (Addrof-Mut, §4.5)

The two & rows are not overloading — the single & operator produces *mut T or *T depending on whether its operand is a mutable place (§4.5). The unary * operator applied to *T yields a readable-but-not-mutable place; applied to *mut T it yields a readable-and-mutable place (rules P-Deref-Imm / P-Deref-Mut, §4.4).


7. Expression Typing

Judgement form: Γ, Σ, Φ ⊢ e : T (written Γ ⊢ e : T when the tables are clear from context).

7.1 Typing strategy

Expression typing uses synthesis (bottom-up inference): the type of an expression is produced from its sub-expressions; no expected type is pushed down from the enclosing context. The single exception is at a let binding with an explicit type annotation, where the annotation is used to give numeric literals a more specific type than their default (§5.5). This is handled in T-Let (§8); the rules in this section always synthesise.

7.2 Built-in: string_view

String literals produce a value of the built-in struct type string_view, which is pre-loaded into Σ before any user code is checked:

string_view ::= struct { data: *char, size: u64 }

The name string_view is reserved — a user-defined struct with this name is a type error. A STRING_LIT token is compiled to a string_view value whose data field points to the UTF-8 bytes of the string and size holds the byte length.

Future: when Flux gains slice types, string_view will be replaced or superseded by a first-class slice [char].

7.3 Inference rules

Literals and booleans

(T-IntLit)    ─────────────────────────
              Γ ⊢ INT_LIT : i32
              -- default type; T-Let may supply a contextual integer type (§5.5, §8)

(T-FloatLit)  ─────────────────────────
              Γ ⊢ FLOAT_LIT : f64
              -- default type; T-Let may supply a contextual float type (§5.5, §8)

(T-StrLit)    ─────────────────────────────────────────
              Γ ⊢ STRING_LIT : struct string_view

(T-CharLit)   ─────────────────────────
              Γ ⊢ CHAR_LIT : char

(T-True)      ─────────────────────────
              Γ ⊢ true : bool

(T-False)     ─────────────────────────
              Γ ⊢ false : bool

Variable

(T-Var)       Γ(x) = (T, m)
              ─────────────────
              Γ ⊢ x : T

x may be any binding mutability m; the type is T regardless. Mutability is only consulted when x is the LHS of an assignment (MP-Var, §4.3).

Grouping

(T-Paren)     Γ ⊢ e : T
              ─────────────
              Γ ⊢ (e) : T

Unary operators

(T-Neg)       Γ ⊢ e : T      T ∈ Numeric
              ─────────────────────────────
              Γ ⊢ -e : T

(T-Not)       Γ ⊢ e : bool
              ──────────────
              Γ ⊢ !e : bool

(T-BitNot)    Γ ⊢ e : T      T ∈ Integer
              ─────────────────────────────
              Γ ⊢ ~e : T

(T-Deref)     Γ ⊢ e : *T            (also valid for *mut T, by Ptr-Coerce §4.7)
              ─────────────────────────────
              Γ ⊢ *e : T

(T-AddrOf)    -- see §4.5: yields *T or *mut T depending on place mutability
              Γ ⊢ e  place                             (readable place)
              ─────────────────────────────────────────
              Γ ⊢ &e : *T    if ¬(Γ ⊢ e  mutable-place)   [Addrof-Imm]
              Γ ⊢ &e : *mut T    if Γ ⊢ e  mutable-place   [Addrof-Mut]

Binary operators

All binary operator rules follow the typing schemes in §6.1. Concisely:

(T-Arith)     Γ ⊢ e₁ : T₁    Γ ⊢ e₂ : T₂
              C = common(T₁, T₂)    C ∈ Numeric
              ────────────────────────────────────
              Γ ⊢ e₁ {+,-,*,/,%} e₂ : C

(T-Shift)     Γ ⊢ e₁ : T    T ∈ Integer
              Γ ⊢ e₂ : U    U ∈ Unsigned
              ─────────────────────────────
              Γ ⊢ e₁ {<<,>>} e₂ : T

(T-BitBin)    Γ ⊢ e₁ : T₁    Γ ⊢ e₂ : T₂
              C = common(T₁, T₂)    C ∈ Integer
              ────────────────────────────────────
              Γ ⊢ e₁ {&,|,^} e₂ : C

(T-Logic)     Γ ⊢ e₁ : bool    Γ ⊢ e₂ : bool
              ───────────────────────────────────
              Γ ⊢ e₁ {and,or} e₂ : bool

(T-CmpNum)    Γ ⊢ e₁ : T₁    Γ ⊢ e₂ : T₂
              common(T₁, T₂) ∈ Numeric  {bool, char}
              ───────────────────────────────────────────
              Γ ⊢ e₁ {==,!=} e₂ : bool

(T-CmpPtr)    Γ ⊢ e₁ : P₁    Γ ⊢ e₂ : P₂    ptr-common(P₁, P₂) ≠ ⊥   (§6.1.2)
              ───────────────────────────────────────────────────────────
              Γ ⊢ e₁ {==,!=} e₂ : bool

(T-Ord)       Γ ⊢ e₁ : T₁    Γ ⊢ e₂ : T₂
              common(T₁, T₂) ∈ Numeric
              ──────────────────────────────
              Γ ⊢ e₁ {<,>,<=,>=} e₂ : bool

(T-Assign)    Γ ⊢ e₁  mutable-place    Γ ⊢ e₁ : T
              Γ ⊢ e₂ : T₂              T₂ ⊑ T
              ─────────────────────────────────────
              Γ ⊢ e₁ = e₂ : T                        (§6.1.3)

(T-CompAssign) e₁ op= e₂  desugars to  e₁ = e₁ op e₂; apply T-Assign  (§6.1.3)

Postfix: field access

(T-Field)     Γ ⊢ e : struct S    Σ(S).f = T
              ─────────────────────────────────
              Γ ⊢ e.f : T

If f is not a field of S, error E0503. If e does not have a struct type, error E0502.

Postfix: array index

(T-Index)     Γ ⊢ e : [T; N]    Γ ⊢ i : U    U ∈ Unsigned
              ──────────────────────────────────────────────
              Γ ⊢ e[i] : T

If e is not an array type, error E0600. If i is not an unsigned integer, error E0601.

Postfix: function call

Functions are looked up in the function table Φ, separate from variable bindings. The callee of a call expression must be a bare identifier (function pointers are not yet supported).

(T-FuncName)  Φ(f) = (T₁, …, Tₙ) -> T_ret
              ──────────────────────────────────────────────────────
              Γ ⊢ f : FnSig(T₁, …, Tₙ) -> T_ret    (internal type; §1.5)

(T-Call)      Γ ⊢ e_f : FnSig(T₁, …, Tₙ) -> T_ret
              Γ ⊢ a₁ : U₁    U₁ ⊑ T₁
              ⋮
              Γ ⊢ aₙ : Uₙ    Uₙ ⊑ Tₙ
              ──────────────────────────────────────────────────────
              Γ ⊢ e_f(a₁, …, aₙ) : T_ret

If the argument count does not match the parameter count, error E0205. If argument type Uᵢ is not promotable to Tᵢ, error E0204. If the name is not in Φ, error E0102.

Struct literal

(T-StructLit)  S ∈ dom(Σ)
               Σ(S) = { f₁: T₁, …, fₙ: Tₙ }    (n fields, in declaration order)
               the literal supplies fields {g₁, …, gₖ} (in any order)
               {g₁, …, gₖ} = {f₁, …, fₙ}        (same set; field order is irrelevant)
               ∀ i ∈ 1..n. Γ ⊢ e_{gᵢ} : Uᵢ  ∧  Uᵢ ⊑ Tᵢ
               ──────────────────────────────────────────────────────────────────
               Γ ⊢ S { g₁: e₁, …, gₙ: eₙ } : struct S

The set equality {g₁, …, gₖ} = {f₁, …, fₙ} is strict:

  • A field present in Σ(S) but absent from the literal → error E0500 (missing field).
  • A field present in the literal but absent from Σ(S) → error E0501 (extra field).
  • A field listed twice in the literal → error E0501 (duplicate counts as extra).

8. Statement Typing

8.1 Judgment form

Γ, A, R, L ⊢ s ✓ ⇒ Γ', A', cf
Component Meaning
Γ Type environment (maps names to (T, m)); same as §3
A Definitely-assigned setA ⊆ dom(Γ); names provably initialised on all paths so far
R Expected return type of the enclosing function
L Loop context: L ∈ { noloop, inloop }
Γ' Output type environment (may have new bindings from let)
A' Output definitely-assigned set
cf Control-flow outcome: cf ∈ { normal, diverges }

cf = diverges means the statement never falls through to the next statement (because of return, break, or continue). cf = normal means it may fall through.

T-Var from §7.3 is refined to require definite assignment:

(T-Var)   Γ(x) = (T, m)    x ∈ A
          ──────────────────────────
          Γ, A ⊢ x : T

If x ∉ A (declared but not yet definitely assigned), it is a type error (use-before-init, error E0100 extended).

8.2 Definite-assignment join

When two control-flow paths merge (in if-else), the output A is the intersection of the two branch sets, intersected with the outer A:

join_A(A_then, A_else, A) = A  (A_then ∩ A_else)

A variable becomes definitely assigned after the merge only if it was assigned in both branches. Variables assigned in only one branch remain unassigned after the merge.

For conditional statements without an else (or while/loop with uncertain iterations), the output A is conservatively A (the input set, unchanged):

join_A(A_branch, A)   =   A         -- for single-branch paths

8.3 Unreachable code

Any statement that immediately follows a statement with cf = diverges is unreachable. Unreachable statements are still type-checked (to catch errors in dead code) but the compiler emits warning W001 for the first unreachable statement in each run of dead code.

8.4 Loop context

L is a single flag that threads through every statement rule unchanged, except that while and loop bodies are checked with L = inloop regardless of the outer L.

L_body = inloop     (when checking body of while/loop)
L_inner = L         (for if/else/block — L passes through unchanged)

This makes break and continue valid anywhere inside a loop body, regardless of how many intermediate if, else, or inner blocks separate them from the while/loop keyword. Crossing a function boundary resets L = noloop at the function level (§9).

8.5 Statement rules

T-Let

(T-Let-Infer)       Γ, A ⊢ e : T      (T synthesised; no annotation)
                    ─────────────────────────────────────────────────────────────────
                    Γ, A, R, L ⊢ let [mut] x = e ; ✓ ⇒ Γ, x:(T,m)  A{x}  normal

(T-Let-Ann-Init)    ⊢ T wf
                    Γ, A ⊢ e : U    U ⊑ T
                    -- if e is INT_LIT/FLOAT_LIT and T is numeric: U = T directly (§5.5)
                    ─────────────────────────────────────────────────────────────────
                    Γ, A, R, L ⊢ let [mut] x : T = e ; ✓ ⇒ Γ, x:(T,m)  A{x}  normal

(T-Let-Ann-NoInit)  ⊢ T wf
                    ─────────────────────────────────────────────────────────────────
                    Γ, A, R, L ⊢ let [mut] x : T ; ✓ ⇒ Γ, x:(T,m)  A  normal
                    -- x is NOT added to A; use before assignment → E0100

No annotation and no initialiser → error E1000.

In all T-Let rules, m = imm if no mut keyword, m = mut if mut is present.

T-Return

(T-Return-Val)   Γ, A ⊢ e : U    U ⊑ R
                 ─────────────────────────────────────────────────────
                 Γ, A, R, L ⊢ return e ; ✓ ⇒ Γ  A  diverges

(T-Return-Unit)  R = ()
                 ─────────────────────────────────────────────────────
                 Γ, A, R, L ⊢ return ; ✓ ⇒ Γ  A  diverges

T-ExprStmt

(T-ExprStmt)     Γ, A ⊢ e : T
                 A' = A  { x | x ∈ assigns(e) ∩ dom(Γ) }
                 ─────────────────────────────────────────────────────
                 Γ, A, R, L ⊢ e ; ✓ ⇒ Γ  A'  normal

Where assigns(e) collects all variables directly assigned anywhere in e:

assigns(x = e')                    = {x}  assigns(e')
assigns(x op= e')                  = assigns(e')      -- x already def (op uses it)
assigns(f(e₁,…))                   =  assigns(eᵢ)
assigns(S { f₁: e₁, …, fₙ: eₙ }) =  assigns(eᵢ)
assigns(e.f)                       = assigns(e)
assigns(e[i])                      = assigns(e)  assigns(i)
assigns((e))                       = assigns(e)
assigns(op e)                      = assigns(e)
assigns(e₁ binop e₂)               = assigns(e₁)  assigns(e₂)
assigns(literal/ident)             = ∅

T-If

(T-If-Then)    Γ, A ⊢ cond : bool
               Γ, A, R, L ⊢ { then_block } ✓ ⇒ _, A_then, _
               ─────────────────────────────────────────────────────────────────────
               Γ, A, R, L ⊢ if cond { then_block } ✓ ⇒ Γ  A  normal
               -- no else: A unchanged (branch may not execute)

(T-If-Else)    Γ, A ⊢ cond : bool
               Γ, A, R, L ⊢ { then_block } ✓ ⇒ _, A_then, cf_t
               Γ, A, R, L ⊢ else_branch ✓ ⇒ _, A_else, cf_e
               ─────────────────────────────────────────────────────────────────────
               Γ, A, R, L ⊢ if cond { then_block } else else_branch ✓
                           ⇒ Γ  join_A(A_then, A_else, A)  join_cf(cf_t, cf_e)

else_branch is either an if statement (else-if chain) or a block { … }.

join_cf(diverges, diverges) = diverges
join_cf(normal,   _)        = normal
join_cf(_,        normal)   = normal

The Γ output of T-If discards any bindings introduced inside the branches (scope exit, §3.3); only the outer Γ is returned.

T-While

(T-While)      Γ, A ⊢ cond : bool
               Γ, A, R, inloop ⊢ { body } ✓ ⇒ _, _, _
               ─────────────────────────────────────────────────────────────────────
               Γ, A, R, L ⊢ while cond { body } ✓ ⇒ Γ  A  normal

The body is checked with L = inloop. The output A is conservatively A (the loop may execute zero times). cf = normal (the loop may terminate when the condition becomes false).

T-Loop

(T-Loop)       Γ, A, R, inloop ⊢ { body } ✓ ⇒ _, _, _
               ─────────────────────────────────────────────────────────────────────
               Γ, A, R, L ⊢ loop { body } ✓ ⇒ Γ  A  normal

cf = normal conservatively (a break inside exits the loop). Output A is A (the body may be bypassed by an immediate break).

T-Break / T-Continue

(T-Break)      L = inloop
               ─────────────────────────────────────────
               Γ, A, R, L ⊢ break ; ✓ ⇒ Γ  A  diverges

(T-Continue)   L = inloop
               ─────────────────────────────────────────
               Γ, A, R, L ⊢ continue ; ✓ ⇒ Γ  A  diverges

If L = noloop, break → E0800, continue → E0801.

T-Block

A block { s₁; s₂; …; sₙ } is checked by chaining statements sequentially. Bindings introduced inside are removed on exit (scope restriction, §3.3).

(T-Block-Empty)   ───────────────────────────────────────────────────────────────
                  Γ, A, R, L ⊢ { } ✓ ⇒ Γ  A  normal

(T-Block-Step)    Γ, A, R, L ⊢ s₁ ✓ ⇒ Γ₁, A₁, normal
                  Γ₁, A₁, R, L ⊢ { s₂; …; sₙ } ✓ ⇒ Γₙ, Aₙ, cf
                  ───────────────────────────────────────────────────────────────
                  Γ, A, R, L ⊢ { s₁; s₂; …; sₙ } ✓ ⇒ restrict(Γₙ, Γ)  restrict_A(Aₙ, Γ)  cf

(T-Block-Dead)    Γ, A, R, L ⊢ s₁ ✓ ⇒ Γ₁, A₁, diverges
                  -- s₂, …, sₙ are unreachable; emit W001; still type-check them
                  Γ₁, A₁, R, L ⊢ { s₂; …; sₙ } ✓ ⇒ _, _, _     (results discarded)
                  ───────────────────────────────────────────────────────────────
                  Γ, A, R, L ⊢ { s₁; s₂; …; sₙ } ✓ ⇒ restrict(Γ₁, Γ)  restrict_A(A₁, Γ)  diverges

Where:

  • restrict(Γ', Γ) removes all bindings in Γ' that are not in dom(Γ) (inner let bindings).
  • restrict_A(A', Γ) removes all names from A' that are not in dom(Γ).

9. Function Definitions

Judgement form: Σ, Φ ⊢ func_def ✓

The function table Φ is pre-populated with the signatures of all functions in the program before any body is checked (first pass, §11). This gives every function access to its own name for recursion and to any other function regardless of declaration order.

9.1 Building the initial environment

Given a function fn f(p₁, …, pₙ) [-> R] { body } where each parameter is [mut] xᵢ : Tᵢ:

Γ_init = x₁:(T₁, m₁), …, xₙ:(Tₙ, mₙ)    -- ordered; rightmost wins for lookup
A_init = { x₁, …, xₙ }                    -- all params are definitely assigned on entry

where mᵢ = mut if mut is written, mᵢ = imm otherwise.

All parameter types must be well-formed: ⊢ Tᵢ wf for each i.

Duplicate parameter names are an error (E0902): all names x₁, …, xₙ must be pairwise distinct. The check is performed before the environment is built.

9.2 Return type

The declared return type R is taken from the -> T annotation. If the annotation is absent, R = () (unit; the function returns no value).

9.3 All-paths-return

The function body is checked as a block statement with L = noloop. The resulting control-flow outcome cf determines whether the return requirement is satisfied:

Return type Required cf Otherwise
R = () normal or diverges — always OK
R ≠ () diverges error E1001

When R = () and the body falls through (cf = normal), the function implicitly returns () at end of body — no explicit return is needed. An explicit return; (bare, no value) is also allowed anywhere in a () function (T-Return-Unit, §8).

When R ≠ (), every control-flow path through the body must end with an explicit return e; where e : U and U ⊑ R. If the body's cf = normal (some path falls through without a return), error E1001 is reported.

Note on loop conservatism: while and loop statements always output cf = normal (§8.4). A function body that ends with a loop { return e; } and no following return will fail the all-paths check (cf = normal from the loop). Add an explicit return after the loop to satisfy the checker, or use if-else chains that both return.

9.4 Formal rule

(T-FuncDef)   f ∈ dom(Φ)    Φ(f) = (T₁, …, Tₙ) -> R     -- f pre-loaded by §11 first pass
              ∀ i. ⊢ Tᵢ wf                                 -- param types well-formed
              all parameter names x₁, …, xₙ distinct       -- else E0902
              Γ_init = x₁:(T₁,m₁), …, xₙ:(Tₙ,mₙ)
              A_init = { x₁, …, xₙ }
              Γ_init, A_init, R, noloop ⊢ { body } ✓ ⇒ _, _, cf
              R = ()  or  cf = diverges                     -- else E1001
              ──────────────────────────────────────────────────────────────────────────
              Σ, Φ ⊢ fn f(params) [-> R] { body } ✓

9.5 Recursion

Direct recursion (f calling f) and mutual recursion (f calling g, g calling f) are both permitted. Because Φ is fully populated before any body is checked (§11), T-FuncName (§7.3) can resolve any call to f inside f's own body without any special treatment in T-FuncDef.


10. Struct Definitions

Judgement form: Σ ⊢ struct_def ✓

Σ is pre-populated with all struct names in the program before any struct body is checked (first pass, §11). This allows forward references and mutually recursive pointer types.

10.1 Size-dependency and cycles

Define struct-names(T) — the set of struct names that appear directly (not behind a pointer) in type T:

struct-names(struct S)    = { S }
struct-names([T; N])      = struct-names(T)   -- array has same direct dependency as elem
struct-names(*T)          = ∅                 -- pointer indirection breaks size dependency
struct-names(*mut T)      = ∅
struct-names(*opaque)     = ∅
struct-names(*mut opaque) = ∅
struct-names(primitive)   = ∅
struct-names(())          = ∅

Define the direct-uses relation on struct names:

direct-uses(S) =  { struct-names(Tᵢ) | fᵢ : Tᵢ ∈ Σ(S) }

Let TC(direct-uses) be its transitive closure. A struct S has an infinite-size cycle if S ∈ TC(direct-uses)(S). Such cycles are forbidden (error E0900).

Examples:

struct Node { next: *Node }      -- direct-uses(Node) = ∅  → no cycle ✓
struct Bad  { inner: Bad  }      -- direct-uses(Bad)  = {Bad} → cycle E0900
struct A    { b: B }             -- direct-uses(A) = {B}
struct B    { a: *A }            -- direct-uses(B) = ∅  → A→B→ (stop); no cycle ✓
struct C    { d: D }             -- direct-uses(C) = {D}
struct D    { c: C }             -- direct-uses(D) = {C} → C→D→C: cycle E0900

10.2 Formal rule

(T-StructDef)  S ∈ dom(Σ)
               Σ(S) = { f₁: T₁, …, fₙ: Tₙ }
               all field names f₁, …, fₙ pairwise distinct        -- else E0901
               ∀ i ∈ 1..n. ⊢ Tᵢ wf                               -- else E0101 / type error
               S ∉ TC(direct-uses)(S)                             -- else E0900
               ──────────────────────────────────────────────────────────────────────
               Σ ⊢ struct S { f₁: T₁, …, fₙ: Tₙ } ✓

An empty struct (n = 0) is valid — it has no fields to check and no cycles.


11. Program Typing

Judgement form: ⊢ program ✓

A program is a sequence of top-level definitions (structs and functions) in any order. Checking proceeds in four passes to support mutual recursion and forward references.

11.1 Passes

Pass 1 — Collect declarations

Scan every top-level definition and populate:

  • Σ — map from struct name to the sequence of field names/types (bodies not checked yet)
  • Φ — map from function name to its signature (T₁,…,Tₙ) -> R (bodies not checked yet)

Errors detected in pass 1:

  • E0103: two struct definitions with the same name.
  • E0104: two fn definitions with the same name.
  • E0101 (forward): a type annotation in a signature or field references a struct name not in Σ — reported immediately since signatures are parsed in full during pass 1.

Pass 2 — Check struct bodies

For each struct definition, apply T-StructDef (§10.2) under the completed Σ. Errors checked: E0901 (duplicate field), ⊢ Tᵢ wf (E0101), E0900 (size cycle). The cycle check is performed globally across all structs using TC(direct-uses).

Pass 3 — Check function bodies

For each function definition, apply T-FuncDef (§9.4) under the completed Σ and Φ. Errors checked: E0902 (duplicate param), E1001 (missing return), and all expression/ statement errors from §6§9.

Pass 4 — Entry point (optional)

If the compiler is invoked with --entry <name>:

  • name ∈ dom(Φ) — else error E0102 (undefined function for entry).
  • The signature Φ(name) must be compatible with the entry convention (currently no required signature; any function may serve as entry).

11.2 Formal rule

(T-Program)   Σ, Φ built from pass 1 (no duplicates — E0103/E0104)
              ∀ struct_def  in program. Σ ⊢ struct_def ✓          -- pass 2
              ∀ func_def    in program. Σ, Φ ⊢ func_def ✓         -- pass 3
              entry ∈ dom(Φ)  (if --entry specified)               -- pass 4
              ──────────────────────────────────────────────────────────────────────
              ⊢ program ✓

12. Error Catalogue

Each error has a code, a category, a message template (using {…} for placeholders), and the spans to highlight in the diagnostic. Warnings use the W prefix.

12.1 Name errors (E01xx)

Code Category Message template Span(s)
E0100 Undefined variable cannot find value '{name}' in this scope use site
E0101 Undefined struct type cannot find type '{name}' in this scope type annotation
E0102 Undefined function cannot find function '{name}' in this scope call site
E0103 Duplicate struct name struct '{name}' is defined more than once second definition
E0104 Duplicate function name function '{name}' is defined more than once second definition

Note: use of a declared-but-uninitialized variable (§8.1) is reported as E0100 with the message use of possibly-uninitialized variable '{name}'.

12.2 Type mismatch errors (E02xx)

Code Category Message template Span(s)
E0200 Operator type mismatch operator '{op}' cannot be applied to types '{T}' and '{U}' operator
E0201 Assignment type mismatch cannot assign value of type '{U}' to binding of type '{T}' initialiser expr
E0202 Condition not bool condition must be of type 'bool', found '{T}' condition expr
E0203 Return type mismatch cannot return value of type '{U}' from function returning '{R}' return expr
E0204 Argument type mismatch argument {i} has type '{U}', expected '{T}' argument expr
E0205 Wrong argument count function '{f}' expects {n} argument(s) but {k} were supplied call expression

12.3 Mutability errors (E03xx)

Code Category Message template Span(s)
E0300 Assign to immutable cannot assign to '{name}' because it is not declared as 'mut' LHS, binding site
E0301 Assign to non-place left-hand side of assignment is not a valid place expression LHS expr

12.4 Operator errors (E04xx)

Code Category Message template Span(s)
E0400 Operator type mismatch operator '{op}' requires compatible numeric types, found '{T}' and '{U}' operator
E0401 Shift amount not unsigned shift amount must be an unsigned integer type, found '{U}' RHS of shift

12.5 Struct literal errors (E05xx)

Code Category Message template Span(s)
E0500 Missing struct field missing field '{f}' in initialiser for struct '{S}' struct literal
E0501 Extra struct field struct '{S}' has no field named '{f}' field name in literal
E0502 Field on non-struct type '{T}' has no fields field access expr
E0503 No such field struct '{S}' has no field named '{f}' field name in access

12.6 Index errors (E06xx)

Code Category Message template Span(s)
E0600 Index on non-array type '{T}' cannot be indexed index expr
E0601 Index type not unsigned int array index must be an unsigned integer type, found '{U}' index expr

12.7 Pointer errors (E07xx)

Code Category Message template Span(s)
E0700 Deref of non-ptr type '{T}' cannot be dereferenced deref expr
E0701 Addrof non-place cannot take the address of a temporary value addrof expr

12.8 Control-flow errors (E08xx)

Code Category Message template Span(s)
E0800 break outside loop 'break' used outside of a loop break keyword
E0801 continue outside loop 'continue' used outside of a loop continue keyword

12.9 Struct definition errors (E09xx)

Code Category Message template Span(s)
E0900 Recursive struct (no indirection) struct '{S}' has infinite size due to recursive field '{f}: {T}' field type
E0901 Duplicate struct field field '{f}' is defined more than once in struct '{S}' second field
E0902 Duplicate function parameter parameter '{x}' is defined more than once in function '{f}' second parameter

12.10 Binding errors (E10xx)

Code Category Message template Span(s)
E1000 No type for binding cannot infer type for '{x}': no annotation and no initialiser let keyword
E1001 Missing return function '{f}' must return '{R}' but not all paths return a value function name

12.11 Warnings (Wxxx)

Code Category Message template Span(s)
W001 Unreachable code unreachable statement first unreachable statement