All 12 sections are now fully defined. Removed incremental-progress markers (Status: defined), updated intro and outline preamble, and fixed two missing cases in the assigns() function (struct literals and parenthesised expressions). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1389 lines
63 KiB
Markdown
1389 lines
63 KiB
Markdown
# Flux Language Semantics
|
||
|
||
This document formalises the static semantics — primarily the **type system** — of Flux.
|
||
|
||
Notation used throughout:
|
||
|
||
| Symbol | Meaning |
|
||
| -------------- | ----------------------------------------------------------------- |
|
||
| `Γ` | Type environment (maps names to their type + mutability) |
|
||
| `Σ` | Struct table (maps struct names to their field lists) |
|
||
| `Φ` | Function table (maps function names to their signatures) |
|
||
| `Γ ⊢ e : T` | Under environment Γ, expression `e` has type `T` |
|
||
| `Γ ⊢ s ✓` | Under environment Γ, statement `s` is well-typed |
|
||
| `⊢ T wf` | Type `T` is well-formed |
|
||
| `T₁ ~ T₂` | Types `T₁` and `T₂` are compatible (assignable) |
|
||
| `T ⊑ U` | `T` can be implicitly promoted to `U` |
|
||
| `common(T, U)` | Least upper bound of `T` and `U` under `⊑`; error if none exists |
|
||
| `rank(T)` | Numeric rank of type `T` (width class; defined in §5) |
|
||
| `category(T)` | Numeric category of `T`: `Unsigned`, `Signed`, `Float`, or `Char` |
|
||
|
||
---
|
||
|
||
## Outline
|
||
|
||
The following sections form the complete type system, in dependency order — each section
|
||
relies only on those above it.
|
||
|
||
1. [Types](#1-types) ✓
|
||
2. [Well-Formedness of Types](#2-well-formedness-of-types) ✓
|
||
3. [Type Environments](#3-type-environments) ✓
|
||
4. [Mutability](#4-mutability) ✓
|
||
5. [Numeric Type Hierarchy and Coercions](#5-numeric-type-hierarchy-and-coercions) ✓
|
||
6. [Operator Typing](#6-operator-typing) ✓
|
||
7. [Expression Typing](#7-expression-typing) ✓
|
||
8. [Statement Typing](#8-statement-typing) ✓
|
||
9. [Function Definitions](#9-function-definitions) ✓
|
||
10. [Struct Definitions](#10-struct-definitions) ✓
|
||
11. [Program Typing](#11-program-typing) ✓
|
||
12. [Error Catalogue](#12-error-catalogue) ✓
|
||
|
||
---
|
||
|
||
## 1. Types
|
||
|
||
The **type language** — the set of types that Flux values can have.
|
||
|
||
### 1.1 Primitive types
|
||
|
||
```
|
||
T ::= u8 | u16 | u32 | u64 -- unsigned integers
|
||
| i8 | i16 | i32 | i64 -- signed integers
|
||
| f32 | f64 -- floating-point
|
||
| bool -- boolean
|
||
| char -- Unicode scalar value; representation-compatible with u32
|
||
```
|
||
|
||
### 1.2 Numeric type categories
|
||
|
||
Each numeric primitive belongs to exactly one **category**. Categories partition the
|
||
numeric types; implicit promotion never crosses a category boundary (see §5).
|
||
|
||
| Category | Members | Notes |
|
||
| ---------- | ------------------------- | ---------------------------------------- |
|
||
| `Unsigned` | `u8`, `u16`, `u32`, `u64` | |
|
||
| `Signed` | `i8`, `i16`, `i32`, `i64` | |
|
||
| `Float` | `f32`, `f64` | |
|
||
| `Char` | `char` | Promotes into `Unsigned` only (see §5.3) |
|
||
|
||
For the purposes of operator rules, the union categories are:
|
||
|
||
- **Integer** = `Unsigned ∪ Signed`
|
||
- **Numeric** = `Unsigned ∪ Signed ∪ Float`
|
||
|
||
`bool` and `char` are **not** members of `Numeric`; they are handled separately.
|
||
|
||
### 1.3 Bit widths
|
||
|
||
| Type | Width | Type | Width |
|
||
| ------ | ----- | ------ | ----- |
|
||
| `u8` | 8 | `i8` | 8 |
|
||
| `u16` | 16 | `i16` | 16 |
|
||
| `u32` | 32 | `i32` | 32 |
|
||
| `u64` | 64 | `i64` | 64 |
|
||
| `f32` | 32 | `f64` | 64 |
|
||
| `bool` | 8 | `char` | 32 |
|
||
|
||
### 1.4 Compound types
|
||
|
||
```
|
||
T ::= ...
|
||
| *T -- immutable pointer to T (read-only; C's const T*)
|
||
| *mut T -- mutable pointer to T (read-write; C's T*)
|
||
| *opaque -- immutable untyped pointer (C's const void*)
|
||
| *mut opaque -- mutable untyped pointer (C's void*)
|
||
| [T; N] -- array of N elements of type T (N: usize, N ≥ 0)
|
||
| struct S -- named struct type (resolved through Σ)
|
||
| () -- unit type (no-value / void)
|
||
```
|
||
|
||
The `mut` qualifier on a pointer type describes the **pointer's access permission**
|
||
to the pointed-to data. It is orthogonal to binding mutability (§4).
|
||
|
||
A shorthand used throughout:
|
||
- **typed pointer**: `*T` or `*mut T` (pointee type is known)
|
||
- **opaque pointer**: `*opaque` or `*mut opaque` (pointee type is erased)
|
||
- **mutable pointer**: `*mut T` or `*mut opaque`
|
||
- **immutable pointer**: `*T` or `*opaque`
|
||
|
||
### 1.5 Function types (internal use)
|
||
|
||
Function types do not appear in user-facing type annotations but are used internally by
|
||
the type checker when resolving calls.
|
||
|
||
```
|
||
FnSig ::= (T₁, …, Tₙ) -> T_ret
|
||
```
|
||
|
||
### Open questions — resolved
|
||
|
||
- **`()` as first-class type**: `()` may appear explicitly in `-> ()` return-type
|
||
annotations and also arises implicitly when a function omits its return type. It is a
|
||
valid type in `⊢ T wf` (Wf-Unit, §2).
|
||
- **Pointer mutability**: settled in §4. Pointer mutability (`*T` vs `*mut T`) is
|
||
orthogonal to binding mutability and carried in the type itself.
|
||
|
||
---
|
||
|
||
## 2. Well-Formedness of Types
|
||
|
||
A type is **well-formed** (written `⊢ T wf`) if every struct name it references is
|
||
declared and all array sizes are non-negative. Well-formedness is checked against the
|
||
struct table `Σ`; pointer and array types impose structural constraints. Recursive struct
|
||
cycles through **direct** fields (without pointer indirection) are rejected here — the
|
||
rule is deferred to §10 where the full `Σ` is available.
|
||
|
||
### 2.1 Inference rules
|
||
|
||
```
|
||
(Wf-Prim) ⊢ T wf for any T ∈ { u8, u16, u32, u64,
|
||
i8, i16, i32, i64,
|
||
f32, f64, bool, char }
|
||
|
||
(Wf-Unit) ⊢ () wf
|
||
|
||
(Wf-PtrImm) ⊢ T wf
|
||
───────────────
|
||
⊢ *T wf
|
||
|
||
(Wf-PtrMut) ⊢ T wf
|
||
───────────────
|
||
⊢ *mut T wf
|
||
|
||
(Wf-OpaqueImm) ⊢ *opaque wf
|
||
|
||
(Wf-OpaqueMut) ⊢ *mut opaque wf
|
||
|
||
(Wf-Array) ⊢ T wf N ≥ 0
|
||
──────────────────────
|
||
⊢ [T; N] wf
|
||
|
||
(Wf-Struct) S ∈ dom(Σ)
|
||
──────────────────────
|
||
⊢ struct S wf
|
||
```
|
||
|
||
Note on `Wf-Struct`: this rule only checks that the struct name is declared. The
|
||
requirement that all field types are themselves well-formed, and that the struct has no
|
||
infinite-size cycles, is established when the struct *definition* is checked (§10).
|
||
|
||
### 2.2 Size finiteness (informal)
|
||
|
||
A type is **sized** iff its in-memory representation has a fixed, finite byte count
|
||
determinable at compile time. All types in §2.1 are sized, **except** a `struct S` whose
|
||
definition contains a direct (non-pointer) field of a type that transitively includes `S`
|
||
again. Such cycles are detected in §10 and reported as E0900.
|
||
|
||
Pointer types (`*T`, `*mut T`, `*opaque`, `*mut opaque`) always have a fixed pointer-width
|
||
size regardless of the pointee. Wrapping a recursive struct in a pointer breaks the cycle:
|
||
|
||
```
|
||
struct Node { next: *Node } -- OK: *Node has finite size
|
||
struct Bad { inner: Bad } -- error E0900: infinite size
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Type Environments
|
||
|
||
The **type environment** `Γ` tracks every variable binding that is in scope at a given
|
||
program point. It is an **ordered association list** — the most recently introduced
|
||
binding for a given name takes precedence.
|
||
|
||
### 3.1 Grammar
|
||
|
||
```
|
||
m ::= imm | mut -- binding mutability
|
||
|
||
Γ ::= ε -- empty environment
|
||
| Γ, x : (T, m) -- extend with binding x of type T and mutability m
|
||
```
|
||
|
||
Bindings are appended on the right. Lookup always returns the **rightmost** (most recent)
|
||
entry for a given name, implementing latest-binding-wins semantics.
|
||
|
||
### 3.2 Lookup
|
||
|
||
```
|
||
(Lookup-Hit) ────────────────────────
|
||
(Γ, x:(T,m))(x) = (T, m)
|
||
|
||
(Lookup-Skip) x ≠ y Γ(x) = (T, m)
|
||
─────────────────────────────────────
|
||
(Γ, y:(T',m'))(x) = (T, m)
|
||
```
|
||
|
||
Attempting to look up a name not present in `Γ` is undefined; the type checker must
|
||
report error E0100.
|
||
|
||
### 3.3 Scope and block exit
|
||
|
||
Scopes are not represented as a separate data structure — they arise naturally from how
|
||
the statement rules in §8 thread `Γ` through blocks.
|
||
|
||
A block `{ s₁; s₂; …; sₙ }` is checked by extending `Γ` sequentially with each
|
||
statement's bindings, yielding a final `Γ_out`. On block **exit**, `Γ_out` is
|
||
**discarded** and the environment reverts to `Γ` (the environment at block entry).
|
||
This is written:
|
||
|
||
```
|
||
Γ ⊢ block ✓ ⇒ Γ_out but the caller uses only Γ, not Γ_out
|
||
```
|
||
|
||
The formal rule (T-Block) is given in §8.
|
||
|
||
### 3.4 Shadowing and overriding
|
||
|
||
Flux adopts **Rust-style re-binding semantics**: a `let` declaration always creates a
|
||
*new* binding regardless of whether the name already exists. The behaviour depends on
|
||
where the prior binding lives:
|
||
|
||
#### Case A — override (same scope)
|
||
|
||
Two `let x` declarations within the same block. Both bindings exist in the environment;
|
||
lookup returns the second (newer) one. The first is **permanently inaccessible** for the
|
||
remainder of that scope and any nested scopes.
|
||
|
||
```
|
||
// scope S
|
||
let x: i32 = 1; // Γ₁ = Γ, x:(i32, imm)
|
||
let x: f64 = 2.0; // Γ₂ = Γ₁, x:(f64, imm)
|
||
// = Γ, x:(i32,imm), x:(f64,imm)
|
||
// Γ₂(x) = (f64, imm) ← i32 binding is gone
|
||
```
|
||
|
||
On exit from scope S, both entries are discarded.
|
||
|
||
The type of a re-bound name is allowed to change (e.g. `i32` → `f64` above). This is
|
||
intentional: re-binding is a distinct operation from *mutation*.
|
||
|
||
#### Case B — shadow (nested scope)
|
||
|
||
An inner `let x` in a nested block, where `x` is already bound in an enclosing scope.
|
||
The inner binding shadows the outer one inside the nested block. When the nested block
|
||
exits, the outer binding is restored.
|
||
|
||
```
|
||
let x: i32 = 1; // outer: Γ₁ = Γ, x:(i32, imm)
|
||
{
|
||
let x: f64 = 2.0; // inner: Γ₂ = Γ₁, x:(f64, imm)
|
||
// Γ₂(x) = (f64, imm) ← outer i32 is shadowed
|
||
}
|
||
// back to Γ₁: Γ₁(x) = (i32, imm) ← restored
|
||
```
|
||
|
||
#### Summary
|
||
|
||
| Situation | Old binding accessible? | After scope exit |
|
||
| --------------------- | ----------------------- | ---------------- |
|
||
| Override (same scope) | No — immediately hidden | Both removed |
|
||
| Shadow (inner scope) | No — hidden in inner | Outer restored |
|
||
|
||
Both cases are modelled identically in the formal system — the environment is extended
|
||
with a new entry and lookup finds the newest. The difference is entirely about which
|
||
entries are *removed* when a scope exits:
|
||
|
||
- **Block scope**: on exit, all bindings introduced *inside* that block are removed.
|
||
- Because scope exit is wholesale, an overriding binding in scope S is removed at the
|
||
same time as the overridden one — neither survives past S.
|
||
|
||
### 3.5 Interaction with `Σ` and `Φ`
|
||
|
||
Struct names (`Σ`) and function names (`Φ`) occupy **separate namespaces** from variable
|
||
bindings (`Γ`). A variable `foo` and a function `foo` may coexist; the expression typing
|
||
rules in §7 use `Γ` to resolve identifiers used as values and `Φ` to resolve calls. A
|
||
name used as a call target (`f(…)`) is looked up in `Φ`; a name used as a plain value is
|
||
looked up in `Γ`. There is no shadowing between these namespaces.
|
||
|
||
---
|
||
|
||
## 4. Mutability
|
||
|
||
Flux has two orthogonal axes of mutability:
|
||
|
||
1. **Binding mutability** — whether a name can be rebound (assigned a new value).
|
||
Declared with `let` vs. `let mut`.
|
||
2. **Pointer mutability** — whether the data pointed to can be written through a pointer.
|
||
Declared as `*T` (immutable) vs. `*mut T` (mutable).
|
||
|
||
These axes are fully independent; any combination is valid (see §1.4 table).
|
||
|
||
### 4.1 Binding mutability
|
||
|
||
A **binding** is introduced by `let` and associates a name with a storage location.
|
||
|
||
```
|
||
Γ, x : (T, imm) introduced by let x [: T] [= e] ;
|
||
Γ, x : (T, mut) introduced by let mut x [: T] [= e] ;
|
||
```
|
||
|
||
Function parameters follow the same rule: `[mut] name : T` in a parameter list.
|
||
Mutability on a parameter is **local** — the callee may freely rebind it; the caller
|
||
never observes the difference.
|
||
|
||
### 4.2 Place expressions
|
||
|
||
A **place** (lvalue) is an expression that denotes a memory location that can be read
|
||
from or written to. Not every expression is a place.
|
||
|
||
```
|
||
place ::= x -- variable
|
||
| * e -- pointer dereference (e must have a pointer type)
|
||
| e . f -- struct field access (e must be a place or struct value)
|
||
| e [ i ] -- array subscript (e must be a place or array value)
|
||
```
|
||
|
||
All other expression forms (literals, arithmetic, calls, etc.) are **not** places.
|
||
|
||
### 4.3 Mutable places
|
||
|
||
A place is **mutable** if it can be the target of an assignment. The predicate
|
||
`Γ ⊢ e mutable-place` is defined by the following rules:
|
||
|
||
```
|
||
(MP-Var) Γ(x) = (T, mut)
|
||
─────────────────────────────
|
||
Γ ⊢ x mutable-place
|
||
|
||
(MP-Deref) Γ ⊢ e : *mut T
|
||
─────────────────────────────
|
||
Γ ⊢ *e mutable-place
|
||
|
||
(MP-Field) Γ ⊢ e mutable-place Σ(S).f = T
|
||
─────────────────────────────────────
|
||
Γ ⊢ e.f mutable-place
|
||
|
||
(MP-Index) Γ ⊢ e mutable-place
|
||
─────────────────────────────
|
||
Γ ⊢ e[i] mutable-place
|
||
```
|
||
|
||
Note the asymmetry in MP-Deref: dereferencing an *immutable* pointer `*T` yields a
|
||
place (it can be read) but **not** a mutable place (it cannot be written). There is no
|
||
MP-Deref rule for `*T`.
|
||
|
||
### 4.4 Readable places
|
||
|
||
Dereferencing any pointer (mutable or not) produces a place that can be **read**.
|
||
The predicate `Γ ⊢ e place` is a relaxed version of mutable-place used in expression
|
||
typing (§7) for the `*` unary operator:
|
||
|
||
```
|
||
(P-Var) x ∈ dom(Γ)
|
||
─────────────
|
||
Γ ⊢ x place
|
||
|
||
(P-Deref-Imm) Γ ⊢ e : *T
|
||
─────────────────
|
||
Γ ⊢ *e place (readable, but not mutable-place)
|
||
|
||
(P-Deref-Mut) Γ ⊢ e : *mut T
|
||
─────────────────
|
||
Γ ⊢ *e place (readable AND mutable-place via MP-Deref)
|
||
|
||
(P-Field) Γ ⊢ e place Σ(S).f = T
|
||
────────────────────────────
|
||
Γ ⊢ e.f place
|
||
|
||
(P-Index) Γ ⊢ e place
|
||
─────────────────
|
||
Γ ⊢ e[i] place
|
||
```
|
||
|
||
### 4.5 Address-of and pointer mutability
|
||
|
||
The unary `&` operator produces a pointer. The mutability of the resulting pointer
|
||
is determined by the mutability of the place it is applied to:
|
||
|
||
```
|
||
(Addrof-Imm) Γ ⊢ e place ¬(Γ ⊢ e mutable-place)
|
||
────────────────────────────────────────────
|
||
Γ ⊢ &e : *T
|
||
|
||
(Addrof-Mut) Γ ⊢ e mutable-place
|
||
────────────────────────────────────────────
|
||
Γ ⊢ &e : *mut T
|
||
```
|
||
|
||
Taking the address of a mutable place yields a `*mut T`; taking the address of a
|
||
read-only place yields a `*T`. This ensures that a `*mut T` cannot be obtained from a
|
||
non-mutable binding via `&`.
|
||
|
||
### 4.6 Interaction summary
|
||
|
||
| Source form | Binding mut? | Ptr mut? | Can rebind `p`? | Can write `*p`? |
|
||
| ---------------------------- | ------------ | -------- | --------------- | --------------- |
|
||
| `let p: *T = &x` | no | no | no | no |
|
||
| `let mut p: *T = &x` | yes | no | yes | no |
|
||
| `let p: *mut T = &mut x` | no | yes | no | yes |
|
||
| `let mut p: *mut T = &mut x` | yes | yes | yes | yes |
|
||
|
||
### 4.7 Pointer subtyping
|
||
|
||
A mutable pointer may be used where an immutable pointer is expected (covariant in the
|
||
immutable direction). The reverse is not permitted.
|
||
|
||
```
|
||
(Ptr-Coerce) Γ ⊢ e : *mut T
|
||
──────────────────
|
||
Γ ⊢ e : *T -- *mut T is a subtype of *T
|
||
```
|
||
|
||
This is consistent with the promotion relation (§5): `*mut T ⊑ *T`. There is no
|
||
implicit coercion in the other direction; an explicit cast is required to obtain
|
||
`*mut T` from `*T`.
|
||
|
||
---
|
||
|
||
## 5. Numeric Type Hierarchy and Coercions
|
||
|
||
Flux supports **implicit widening promotion** for numeric types. Promotions are
|
||
one-directional (widening only), never cross a category boundary, and never lose
|
||
information.
|
||
|
||
### 5.1 Rank
|
||
|
||
Each numeric type has an integer **rank** that reflects its bit width within its
|
||
category.
|
||
|
||
| Type | `rank` | Type | `rank` |
|
||
| ----- | ------ | ----- | ------ |
|
||
| `u8` | 1 | `i8` | 1 |
|
||
| `u16` | 2 | `i16` | 2 |
|
||
| `u32` | 3 | `i32` | 3 |
|
||
| `u64` | 4 | `i64` | 4 |
|
||
| `f32` | 1 | `f64` | 2 |
|
||
|
||
`bool` and `char` have no rank; they are not members of `Numeric`.
|
||
|
||
### 5.2 Promotion predicate `T ⊑ U`
|
||
|
||
`T ⊑ U` holds — meaning a value of type `T` may be implicitly coerced to type `U` —
|
||
exactly when one of the following rules applies:
|
||
|
||
```
|
||
(Refl) T ⊑ T for any T
|
||
|
||
(Unsigned) T ⊑ U if category(T) = category(U) = Unsigned
|
||
and rank(T) < rank(U)
|
||
|
||
(Signed) T ⊑ U if category(T) = category(U) = Signed
|
||
and rank(T) < rank(U)
|
||
|
||
(Float) f32 ⊑ f64
|
||
|
||
(Char) char ⊑ U if category(U) = Unsigned
|
||
and rank(U) ≥ 3 -- u32 or u64
|
||
```
|
||
|
||
All other combinations are **forbidden**. In particular:
|
||
|
||
- Unsigned → Signed is **forbidden** (even if the target is wider).
|
||
- Signed → Unsigned is **forbidden**.
|
||
- Any integer → Float is **forbidden**.
|
||
- Float → integer is **forbidden**.
|
||
- Narrowing (rank decreases) is **forbidden** in both directions.
|
||
- `char` → Signed is **forbidden**.
|
||
- `char` may only promote to `u32` or `u64` (rank ≥ 3 unsigned).
|
||
|
||
### 5.3 char and u32
|
||
|
||
`char` is represented as a 32-bit unsigned scalar. It is **not** a synonym for `u32` —
|
||
they are distinct types — but a `char` may be implicitly promoted to `u32` or `u64`
|
||
when the context requires it (e.g. as an operand to integer arithmetic).
|
||
|
||
The reverse (`u32 → char`) is **never** implicit; it requires an explicit cast, since not
|
||
every `u32` value is a valid Unicode scalar.
|
||
|
||
### 5.4 Common type `common(T, U)`
|
||
|
||
For binary operators that require matching operand types, the **common type** of `T` and
|
||
`U` is defined as their least upper bound under `⊑`:
|
||
|
||
```
|
||
common(T, U) = T if U ⊑ T
|
||
common(T, U) = U if T ⊑ U
|
||
common(T, U) = ⊥ otherwise -- type error: no common type
|
||
```
|
||
|
||
When `T ⊑ U` and `U ⊑ T` simultaneously (which can only occur when `T = U` by Refl),
|
||
`common(T, U) = T = U`.
|
||
|
||
**Examples:**
|
||
|
||
| `T` | `U` | `common(T, U)` | Notes |
|
||
| ------ | ----- | -------------- | ---------------------------------- |
|
||
| `u8` | `u32` | `u32` | u8 promoted to u32 |
|
||
| `u32` | `u8` | `u32` | symmetric |
|
||
| `i16` | `i64` | `i64` | i16 promoted to i64 |
|
||
| `f32` | `f64` | `f64` | f32 promoted to f64 |
|
||
| `char` | `u32` | `u32` | char promoted as u32 equivalent |
|
||
| `char` | `u64` | `u64` | char promoted to u64 |
|
||
| `u8` | `i32` | `⊥` | error: cross-signedness promotion |
|
||
| `i32` | `u64` | `⊥` | error: signed → unsigned forbidden |
|
||
| `u32` | `f32` | `⊥` | error: integer → float forbidden |
|
||
| `char` | `u16` | `⊥` | error: rank(u16)=2 < 3, below u32 |
|
||
| `char` | `i32` | `⊥` | error: char → signed forbidden |
|
||
|
||
### 5.5 Literal type inference
|
||
|
||
Integer and float literals have no intrinsic type. Their type is determined by:
|
||
|
||
1. **Contextual typing** (checked mode): if the surrounding context expects a specific
|
||
numeric type `T`, the literal is given type `T` directly, without promotion. The
|
||
literal value must be representable in `T` (range check).
|
||
2. **Default type** (synthesised mode): when no context is available, literals are given
|
||
a default type:
|
||
- `INT_LIT` → `i32`
|
||
- `FLOAT_LIT` → `f64`
|
||
|
||
These defaults apply only when type inference cannot determine a more specific type.
|
||
|
||
### 5.6 Explicit casts
|
||
|
||
Conversions not covered by `⊑` require an **explicit cast** (narrowing, cross-signedness,
|
||
integer↔float, u32→char, etc.). The cast syntax and its rules are not yet in the grammar
|
||
and will be specified separately.
|
||
|
||
---
|
||
|
||
## 6. Operator Typing
|
||
|
||
Each operator has one or more *typing schemes* describing the types it accepts and the
|
||
type it produces.
|
||
|
||
### 6.1 Binary operators
|
||
|
||
For operators that accept two numeric operands, the operands are first coerced to their
|
||
`common(T₁, T₂)` type (§5.4) before the operation is performed. If `common` is `⊥` the
|
||
expression is a type error.
|
||
|
||
| Operators | Required operand types | Result type | Notes |
|
||
| ------------------- | --------------------------------------------------------------------------------------------- | ---------------- | ----------------------------------------------------------------------------------------------- |
|
||
| `+` `-` `*` `/` `%` | `T₁`, `T₂` where `common(T₁, T₂) ∈ Numeric` | `common(T₁, T₂)` | Both operands promoted to common type |
|
||
| `<<` `>>` | LHS `T` any integer; RHS `U` any **unsigned** integer | `T` | Only LHS type determines the result; no promotion of the LHS; RHS must be unsigned (see §6.1.1) |
|
||
| `&` `\|` `^` | `T₁`, `T₂` where `common(T₁, T₂) ∈ Integer` | `common(T₁, T₂)` | Both operands promoted to common integer type |
|
||
| `and` `or` | `bool`, `bool` | `bool` | Short-circuit; no promotion applies |
|
||
| `==` `!=` | numeric/bool/char: `common(T₁,T₂) ∈ Numeric ∪ {bool, char}`; or pointer equality (see §6.1.2) | `bool` | Structs and arrays are not comparable |
|
||
| `<` `>` `<=` `>=` | `T₁`, `T₂` where `common(T₁, T₂) ∈ Numeric` | `bool` | Both operands promoted to common type |
|
||
| `=` | LHS: mutable place of type `T`; RHS: `T₂` where `T₂ ⊑ T` | `T` | Result is the assigned value; enables chaining (see §6.1.3) |
|
||
| `op=` | Same as `=` after desugaring `lhs op= rhs` → `lhs = lhs op rhs` | `T` | LHS must be a mutable place |
|
||
|
||
#### 6.1.1 Shift operator detail
|
||
|
||
`a << b` and `a >> b` use **asymmetric typing**:
|
||
|
||
- `a` may be any integer type `T`; `T` is also the result type.
|
||
- `b` must be any **unsigned** integer type (`u8`, `u16`, `u32`, `u64`). Signed or
|
||
other types are rejected (error E0401).
|
||
- The two operands are **never** subject to `common` — LHS and RHS types are checked
|
||
independently. The LHS is never promoted to match the RHS.
|
||
|
||
```
|
||
(T-Shl / T-Shr) Γ ⊢ a : T T ∈ Integer
|
||
Γ ⊢ b : U U ∈ Unsigned
|
||
────────────────────────────────
|
||
Γ ⊢ a << b : T (resp. a >> b : T)
|
||
```
|
||
|
||
Rationale: shifting a `u8` value by a `u32` amount is idiomatic and well-defined.
|
||
Requiring both operands to be the same type would be unnecessarily restrictive. Requiring
|
||
the amount to be unsigned makes negative shift amounts a compile-time error rather than
|
||
undefined behaviour.
|
||
|
||
#### 6.1.2 Pointer equality
|
||
|
||
Two pointer expressions may be compared with `==` and `!=` to test whether they point to
|
||
the same address. Pointer comparison is type-safe: the two operands must have a **common
|
||
pointer type** after applying the Ptr-Coerce rule (§4.7).
|
||
|
||
```
|
||
(T-PtrEq) Γ ⊢ e₁ : P₁ Γ ⊢ e₂ : P₂ ptr-common(P₁, P₂) = P
|
||
───────────────────────────────────────────────────────────
|
||
Γ ⊢ e₁ == e₂ : bool (resp. e₁ != e₂ : bool)
|
||
```
|
||
|
||
Where `ptr-common` is defined as:
|
||
|
||
```
|
||
ptr-common(*mut T, *mut T) = *mut T
|
||
ptr-common(*mut T, *T) = *T -- Ptr-Coerce applied to LHS
|
||
ptr-common(*T, *mut T) = *T -- Ptr-Coerce applied to RHS
|
||
ptr-common(*T, *T) = *T
|
||
ptr-common(*mut opaque, *mut opaque) = *mut opaque
|
||
ptr-common(*mut opaque, *opaque) = *opaque
|
||
ptr-common(*opaque, *mut opaque) = *opaque
|
||
ptr-common(*opaque, *opaque) = *opaque
|
||
ptr-common(_, _) = ⊥ -- mixed typed/opaque or different pointees
|
||
```
|
||
|
||
Comparing a typed pointer `*T` with an opaque pointer `*opaque` is **not** permitted
|
||
implicitly; an explicit cast is required. Comparing `*T` with `*U` where `T ≠ U` is also
|
||
not permitted.
|
||
|
||
Ordering operators (`<`, `>`, `<=`, `>=`) are **not** defined for pointer types.
|
||
|
||
#### 6.1.3 Assignment as an expression
|
||
|
||
The `=` operator returns the assigned value with the **LHS type** `T`. This makes
|
||
assignments first-class expressions and enables chained assignment:
|
||
|
||
```
|
||
a = b = 0 -- parses as a = (b = 0) (right-associative)
|
||
```
|
||
|
||
Evaluation proceeds right-to-left:
|
||
1. `b = 0` — assigns `0` to `b` (type `T_b`), result is `0 : T_b`.
|
||
2. `a = (0 : T_b)` — requires `T_b ⊑ T_a`; assigns to `a`, result is the value `T_a`.
|
||
|
||
The formal rule:
|
||
|
||
```
|
||
(T-Assign) Γ ⊢ e₁ mutable-place Γ ⊢ e₁ : T
|
||
Γ ⊢ e₂ : T₂ T₂ ⊑ T
|
||
─────────────────────────────────────
|
||
Γ ⊢ e₁ = e₂ : T
|
||
```
|
||
|
||
The compound assignment `e₁ op= e₂` desugars before typing:
|
||
|
||
```
|
||
(T-CompoundAssign) e₁ op= e₂ ≡ e₁ = e₁ op e₂
|
||
```
|
||
|
||
After desugaring, T-Assign applies. The LHS expression `e₁` is evaluated only once at
|
||
runtime (the desugaring is semantic, not textual); both uses of `e₁` refer to the same
|
||
place.
|
||
|
||
### 6.2 Unary operators
|
||
|
||
| Operator | Accepted type | Result type | Notes |
|
||
| -------- | -------------------------- | ----------- | --------------------------------------------- |
|
||
| `-` | any numeric `T` | `T` | Arithmetic negation |
|
||
| `!` | `bool` | `bool` | Logical NOT |
|
||
| `~` | any integer `T` | `T` | Bitwise complement |
|
||
| `*` | `*T` or `*mut T` | `T` | Dereference; yields a readable place (§4.4) |
|
||
| `&` | readable place of type `T` | `*T` | Address-of immutable place (Addrof-Imm, §4.5) |
|
||
| `&` | mutable place of type `T` | `*mut T` | Address-of mutable place (Addrof-Mut, §4.5) |
|
||
|
||
The two `&` rows are not overloading — the single `&` operator produces `*mut T` or `*T`
|
||
depending on whether its operand is a mutable place (§4.5). The unary `*` operator
|
||
applied to `*T` yields a readable-but-not-mutable place; applied to `*mut T` it yields a
|
||
readable-and-mutable place (rules P-Deref-Imm / P-Deref-Mut, §4.4).
|
||
|
||
---
|
||
|
||
## 7. Expression Typing
|
||
|
||
Judgement form: `Γ, Σ, Φ ⊢ e : T` (written `Γ ⊢ e : T` when the tables are clear from
|
||
context).
|
||
|
||
### 7.1 Typing strategy
|
||
|
||
Expression typing uses **synthesis** (bottom-up inference): the type of an expression is
|
||
*produced* from its sub-expressions; no expected type is pushed down from the enclosing
|
||
context. The single exception is at a `let` binding with an explicit type annotation,
|
||
where the annotation is used to give numeric literals a more specific type than their
|
||
default (§5.5). This is handled in T-Let (§8); the rules in this section always
|
||
synthesise.
|
||
|
||
### 7.2 Built-in: `string_view`
|
||
|
||
String literals produce a value of the built-in struct type **`string_view`**, which is
|
||
pre-loaded into `Σ` before any user code is checked:
|
||
|
||
```
|
||
string_view ::= struct { data: *char, size: u64 }
|
||
```
|
||
|
||
The name `string_view` is **reserved** — a user-defined struct with this name is a type
|
||
error. A `STRING_LIT` token is compiled to a `string_view` value whose `data` field
|
||
points to the UTF-8 bytes of the string and `size` holds the byte length.
|
||
|
||
> Future: when Flux gains slice types, `string_view` will be replaced or superseded by
|
||
> a first-class slice `[char]`.
|
||
|
||
### 7.3 Inference rules
|
||
|
||
#### Literals and booleans
|
||
|
||
```
|
||
(T-IntLit) ─────────────────────────
|
||
Γ ⊢ INT_LIT : i32
|
||
-- default type; T-Let may supply a contextual integer type (§5.5, §8)
|
||
|
||
(T-FloatLit) ─────────────────────────
|
||
Γ ⊢ FLOAT_LIT : f64
|
||
-- default type; T-Let may supply a contextual float type (§5.5, §8)
|
||
|
||
(T-StrLit) ─────────────────────────────────────────
|
||
Γ ⊢ STRING_LIT : struct string_view
|
||
|
||
(T-CharLit) ─────────────────────────
|
||
Γ ⊢ CHAR_LIT : char
|
||
|
||
(T-True) ─────────────────────────
|
||
Γ ⊢ true : bool
|
||
|
||
(T-False) ─────────────────────────
|
||
Γ ⊢ false : bool
|
||
```
|
||
|
||
#### Variable
|
||
|
||
```
|
||
(T-Var) Γ(x) = (T, m)
|
||
─────────────────
|
||
Γ ⊢ x : T
|
||
```
|
||
|
||
`x` may be any binding mutability `m`; the type is `T` regardless. Mutability is only
|
||
consulted when `x` is the LHS of an assignment (MP-Var, §4.3).
|
||
|
||
#### Grouping
|
||
|
||
```
|
||
(T-Paren) Γ ⊢ e : T
|
||
─────────────
|
||
Γ ⊢ (e) : T
|
||
```
|
||
|
||
#### Unary operators
|
||
|
||
```
|
||
(T-Neg) Γ ⊢ e : T T ∈ Numeric
|
||
─────────────────────────────
|
||
Γ ⊢ -e : T
|
||
|
||
(T-Not) Γ ⊢ e : bool
|
||
──────────────
|
||
Γ ⊢ !e : bool
|
||
|
||
(T-BitNot) Γ ⊢ e : T T ∈ Integer
|
||
─────────────────────────────
|
||
Γ ⊢ ~e : T
|
||
|
||
(T-Deref) Γ ⊢ e : *T (also valid for *mut T, by Ptr-Coerce §4.7)
|
||
─────────────────────────────
|
||
Γ ⊢ *e : T
|
||
|
||
(T-AddrOf) -- see §4.5: yields *T or *mut T depending on place mutability
|
||
Γ ⊢ e place (readable place)
|
||
─────────────────────────────────────────
|
||
Γ ⊢ &e : *T if ¬(Γ ⊢ e mutable-place) [Addrof-Imm]
|
||
Γ ⊢ &e : *mut T if Γ ⊢ e mutable-place [Addrof-Mut]
|
||
```
|
||
|
||
#### Binary operators
|
||
|
||
All binary operator rules follow the typing schemes in §6.1. Concisely:
|
||
|
||
```
|
||
(T-Arith) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂
|
||
C = common(T₁, T₂) C ∈ Numeric
|
||
────────────────────────────────────
|
||
Γ ⊢ e₁ {+,-,*,/,%} e₂ : C
|
||
|
||
(T-Shift) Γ ⊢ e₁ : T T ∈ Integer
|
||
Γ ⊢ e₂ : U U ∈ Unsigned
|
||
─────────────────────────────
|
||
Γ ⊢ e₁ {<<,>>} e₂ : T
|
||
|
||
(T-BitBin) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂
|
||
C = common(T₁, T₂) C ∈ Integer
|
||
────────────────────────────────────
|
||
Γ ⊢ e₁ {&,|,^} e₂ : C
|
||
|
||
(T-Logic) Γ ⊢ e₁ : bool Γ ⊢ e₂ : bool
|
||
───────────────────────────────────
|
||
Γ ⊢ e₁ {and,or} e₂ : bool
|
||
|
||
(T-CmpNum) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂
|
||
common(T₁, T₂) ∈ Numeric ∪ {bool, char}
|
||
───────────────────────────────────────────
|
||
Γ ⊢ e₁ {==,!=} e₂ : bool
|
||
|
||
(T-CmpPtr) Γ ⊢ e₁ : P₁ Γ ⊢ e₂ : P₂ ptr-common(P₁, P₂) ≠ ⊥ (§6.1.2)
|
||
───────────────────────────────────────────────────────────
|
||
Γ ⊢ e₁ {==,!=} e₂ : bool
|
||
|
||
(T-Ord) Γ ⊢ e₁ : T₁ Γ ⊢ e₂ : T₂
|
||
common(T₁, T₂) ∈ Numeric
|
||
──────────────────────────────
|
||
Γ ⊢ e₁ {<,>,<=,>=} e₂ : bool
|
||
|
||
(T-Assign) Γ ⊢ e₁ mutable-place Γ ⊢ e₁ : T
|
||
Γ ⊢ e₂ : T₂ T₂ ⊑ T
|
||
─────────────────────────────────────
|
||
Γ ⊢ e₁ = e₂ : T (§6.1.3)
|
||
|
||
(T-CompAssign) e₁ op= e₂ desugars to e₁ = e₁ op e₂; apply T-Assign (§6.1.3)
|
||
```
|
||
|
||
#### Postfix: field access
|
||
|
||
```
|
||
(T-Field) Γ ⊢ e : struct S Σ(S).f = T
|
||
─────────────────────────────────
|
||
Γ ⊢ e.f : T
|
||
```
|
||
|
||
If `f` is not a field of `S`, error E0503. If `e` does not have a struct type, error E0502.
|
||
|
||
#### Postfix: array index
|
||
|
||
```
|
||
(T-Index) Γ ⊢ e : [T; N] Γ ⊢ i : U U ∈ Unsigned
|
||
──────────────────────────────────────────────
|
||
Γ ⊢ e[i] : T
|
||
```
|
||
|
||
If `e` is not an array type, error E0600. If `i` is not an unsigned integer, error E0601.
|
||
|
||
#### Postfix: function call
|
||
|
||
Functions are looked up in the **function table** `Φ`, separate from variable bindings.
|
||
The callee of a call expression must be a bare identifier (function pointers are not yet
|
||
supported).
|
||
|
||
```
|
||
(T-FuncName) Φ(f) = (T₁, …, Tₙ) -> T_ret
|
||
──────────────────────────────────────────────────────
|
||
Γ ⊢ f : FnSig(T₁, …, Tₙ) -> T_ret (internal type; §1.5)
|
||
|
||
(T-Call) Γ ⊢ e_f : FnSig(T₁, …, Tₙ) -> T_ret
|
||
Γ ⊢ a₁ : U₁ U₁ ⊑ T₁
|
||
⋮
|
||
Γ ⊢ aₙ : Uₙ Uₙ ⊑ Tₙ
|
||
──────────────────────────────────────────────────────
|
||
Γ ⊢ e_f(a₁, …, aₙ) : T_ret
|
||
```
|
||
|
||
If the argument count does not match the parameter count, error E0205. If argument type
|
||
`Uᵢ` is not promotable to `Tᵢ`, error E0204. If the name is not in `Φ`, error E0102.
|
||
|
||
#### Struct literal
|
||
|
||
```
|
||
(T-StructLit) S ∈ dom(Σ)
|
||
Σ(S) = { f₁: T₁, …, fₙ: Tₙ } (n fields, in declaration order)
|
||
the literal supplies fields {g₁, …, gₖ} (in any order)
|
||
{g₁, …, gₖ} = {f₁, …, fₙ} (same set; field order is irrelevant)
|
||
∀ i ∈ 1..n. Γ ⊢ e_{gᵢ} : Uᵢ ∧ Uᵢ ⊑ Tᵢ
|
||
──────────────────────────────────────────────────────────────────
|
||
Γ ⊢ S { g₁: e₁, …, gₙ: eₙ } : struct S
|
||
```
|
||
|
||
The set equality `{g₁, …, gₖ} = {f₁, …, fₙ}` is strict:
|
||
- A field present in `Σ(S)` but absent from the literal → error E0500 (missing field).
|
||
- A field present in the literal but absent from `Σ(S)` → error E0501 (extra field).
|
||
- A field listed twice in the literal → error E0501 (duplicate counts as extra).
|
||
|
||
---
|
||
|
||
## 8. Statement Typing
|
||
|
||
### 8.1 Judgment form
|
||
|
||
```
|
||
Γ, A, R, L ⊢ s ✓ ⇒ Γ', A', cf
|
||
```
|
||
|
||
| Component | Meaning |
|
||
| --------- | ------------------------------------------------------------------------------------------ |
|
||
| `Γ` | Type environment (maps names to `(T, m)`); same as §3 |
|
||
| `A` | **Definitely-assigned set** — `A ⊆ dom(Γ)`; names provably initialised on all paths so far |
|
||
| `R` | Expected return type of the enclosing function |
|
||
| `L` | Loop context: `L ∈ { noloop, inloop }` |
|
||
| `Γ'` | Output type environment (may have new bindings from `let`) |
|
||
| `A'` | Output definitely-assigned set |
|
||
| `cf` | Control-flow outcome: `cf ∈ { normal, diverges }` |
|
||
|
||
`cf = diverges` means the statement never falls through to the next statement (because of `return`, `break`, or `continue`). `cf = normal` means it may fall through.
|
||
|
||
**T-Var** from §7.3 is refined to require definite assignment:
|
||
|
||
```
|
||
(T-Var) Γ(x) = (T, m) x ∈ A
|
||
──────────────────────────
|
||
Γ, A ⊢ x : T
|
||
```
|
||
|
||
If `x ∉ A` (declared but not yet definitely assigned), it is a type error (use-before-init, error E0100 extended).
|
||
|
||
### 8.2 Definite-assignment join
|
||
|
||
When two control-flow paths merge (in `if-else`), the output `A` is the **intersection**
|
||
of the two branch sets, intersected with the outer `A`:
|
||
|
||
```
|
||
join_A(A_then, A_else, A) = A ∪ (A_then ∩ A_else)
|
||
```
|
||
|
||
A variable becomes definitely assigned after the merge only if it was assigned in **both**
|
||
branches. Variables assigned in only one branch remain unassigned after the merge.
|
||
|
||
For conditional statements without an else (or `while`/`loop` with uncertain iterations),
|
||
the output `A` is conservatively `A` (the input set, unchanged):
|
||
|
||
```
|
||
join_A(A_branch, A) = A -- for single-branch paths
|
||
```
|
||
|
||
### 8.3 Unreachable code
|
||
|
||
Any statement that immediately follows a statement with `cf = diverges` is **unreachable**.
|
||
Unreachable statements are still type-checked (to catch errors in dead code) but the
|
||
compiler emits warning **W001** for the first unreachable statement in each run of dead
|
||
code.
|
||
|
||
### 8.4 Loop context
|
||
|
||
`L` is a single flag that threads through every statement rule unchanged, **except** that
|
||
`while` and `loop` bodies are checked with `L = inloop` regardless of the outer `L`.
|
||
|
||
```
|
||
L_body = inloop (when checking body of while/loop)
|
||
L_inner = L (for if/else/block — L passes through unchanged)
|
||
```
|
||
|
||
This makes `break` and `continue` valid anywhere inside a loop body, regardless of how
|
||
many intermediate `if`, `else`, or inner blocks separate them from the `while`/`loop`
|
||
keyword. Crossing a function boundary resets `L = noloop` at the function level (§9).
|
||
|
||
### 8.5 Statement rules
|
||
|
||
#### T-Let
|
||
|
||
```
|
||
(T-Let-Infer) Γ, A ⊢ e : T (T synthesised; no annotation)
|
||
─────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ let [mut] x = e ; ✓ ⇒ Γ, x:(T,m) A∪{x} normal
|
||
|
||
(T-Let-Ann-Init) ⊢ T wf
|
||
Γ, A ⊢ e : U U ⊑ T
|
||
-- if e is INT_LIT/FLOAT_LIT and T is numeric: U = T directly (§5.5)
|
||
─────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ let [mut] x : T = e ; ✓ ⇒ Γ, x:(T,m) A∪{x} normal
|
||
|
||
(T-Let-Ann-NoInit) ⊢ T wf
|
||
─────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ let [mut] x : T ; ✓ ⇒ Γ, x:(T,m) A normal
|
||
-- x is NOT added to A; use before assignment → E0100
|
||
```
|
||
|
||
No annotation and no initialiser → error E1000.
|
||
|
||
In all T-Let rules, `m = imm` if no `mut` keyword, `m = mut` if `mut` is present.
|
||
|
||
#### T-Return
|
||
|
||
```
|
||
(T-Return-Val) Γ, A ⊢ e : U U ⊑ R
|
||
─────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ return e ; ✓ ⇒ Γ A diverges
|
||
|
||
(T-Return-Unit) R = ()
|
||
─────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ return ; ✓ ⇒ Γ A diverges
|
||
```
|
||
|
||
#### T-ExprStmt
|
||
|
||
```
|
||
(T-ExprStmt) Γ, A ⊢ e : T
|
||
A' = A ∪ { x | x ∈ assigns(e) ∩ dom(Γ) }
|
||
─────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ e ; ✓ ⇒ Γ A' normal
|
||
```
|
||
|
||
Where `assigns(e)` collects all variables directly assigned anywhere in `e`:
|
||
|
||
```
|
||
assigns(x = e') = {x} ∪ assigns(e')
|
||
assigns(x op= e') = assigns(e') -- x already def (op uses it)
|
||
assigns(f(e₁,…)) = ∪ assigns(eᵢ)
|
||
assigns(S { f₁: e₁, …, fₙ: eₙ }) = ∪ assigns(eᵢ)
|
||
assigns(e.f) = assigns(e)
|
||
assigns(e[i]) = assigns(e) ∪ assigns(i)
|
||
assigns((e)) = assigns(e)
|
||
assigns(op e) = assigns(e)
|
||
assigns(e₁ binop e₂) = assigns(e₁) ∪ assigns(e₂)
|
||
assigns(literal/ident) = ∅
|
||
```
|
||
|
||
#### T-If
|
||
|
||
```
|
||
(T-If-Then) Γ, A ⊢ cond : bool
|
||
Γ, A, R, L ⊢ { then_block } ✓ ⇒ _, A_then, _
|
||
─────────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ if cond { then_block } ✓ ⇒ Γ A normal
|
||
-- no else: A unchanged (branch may not execute)
|
||
|
||
(T-If-Else) Γ, A ⊢ cond : bool
|
||
Γ, A, R, L ⊢ { then_block } ✓ ⇒ _, A_then, cf_t
|
||
Γ, A, R, L ⊢ else_branch ✓ ⇒ _, A_else, cf_e
|
||
─────────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ if cond { then_block } else else_branch ✓
|
||
⇒ Γ join_A(A_then, A_else, A) join_cf(cf_t, cf_e)
|
||
```
|
||
|
||
`else_branch` is either an `if` statement (else-if chain) or a block `{ … }`.
|
||
|
||
```
|
||
join_cf(diverges, diverges) = diverges
|
||
join_cf(normal, _) = normal
|
||
join_cf(_, normal) = normal
|
||
```
|
||
|
||
The `Γ` output of T-If discards any bindings introduced inside the branches (scope exit,
|
||
§3.3); only the outer `Γ` is returned.
|
||
|
||
#### T-While
|
||
|
||
```
|
||
(T-While) Γ, A ⊢ cond : bool
|
||
Γ, A, R, inloop ⊢ { body } ✓ ⇒ _, _, _
|
||
─────────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ while cond { body } ✓ ⇒ Γ A normal
|
||
```
|
||
|
||
The body is checked with `L = inloop`. The output `A` is conservatively `A` (the loop
|
||
may execute zero times). `cf = normal` (the loop may terminate when the condition
|
||
becomes false).
|
||
|
||
#### T-Loop
|
||
|
||
```
|
||
(T-Loop) Γ, A, R, inloop ⊢ { body } ✓ ⇒ _, _, _
|
||
─────────────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ loop { body } ✓ ⇒ Γ A normal
|
||
```
|
||
|
||
`cf = normal` conservatively (a `break` inside exits the loop). Output `A` is `A`
|
||
(the body may be bypassed by an immediate `break`).
|
||
|
||
#### T-Break / T-Continue
|
||
|
||
```
|
||
(T-Break) L = inloop
|
||
─────────────────────────────────────────
|
||
Γ, A, R, L ⊢ break ; ✓ ⇒ Γ A diverges
|
||
|
||
(T-Continue) L = inloop
|
||
─────────────────────────────────────────
|
||
Γ, A, R, L ⊢ continue ; ✓ ⇒ Γ A diverges
|
||
```
|
||
|
||
If `L = noloop`, `break` → E0800, `continue` → E0801.
|
||
|
||
#### T-Block
|
||
|
||
A block `{ s₁; s₂; …; sₙ }` is checked by chaining statements sequentially. Bindings
|
||
introduced inside are removed on exit (scope restriction, §3.3).
|
||
|
||
```
|
||
(T-Block-Empty) ───────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ { } ✓ ⇒ Γ A normal
|
||
|
||
(T-Block-Step) Γ, A, R, L ⊢ s₁ ✓ ⇒ Γ₁, A₁, normal
|
||
Γ₁, A₁, R, L ⊢ { s₂; …; sₙ } ✓ ⇒ Γₙ, Aₙ, cf
|
||
───────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ { s₁; s₂; …; sₙ } ✓ ⇒ restrict(Γₙ, Γ) restrict_A(Aₙ, Γ) cf
|
||
|
||
(T-Block-Dead) Γ, A, R, L ⊢ s₁ ✓ ⇒ Γ₁, A₁, diverges
|
||
-- s₂, …, sₙ are unreachable; emit W001; still type-check them
|
||
Γ₁, A₁, R, L ⊢ { s₂; …; sₙ } ✓ ⇒ _, _, _ (results discarded)
|
||
───────────────────────────────────────────────────────────────
|
||
Γ, A, R, L ⊢ { s₁; s₂; …; sₙ } ✓ ⇒ restrict(Γ₁, Γ) restrict_A(A₁, Γ) diverges
|
||
```
|
||
|
||
Where:
|
||
- `restrict(Γ', Γ)` removes all bindings in `Γ'` that are not in `dom(Γ)` (inner let bindings).
|
||
- `restrict_A(A', Γ)` removes all names from `A'` that are not in `dom(Γ)`.
|
||
|
||
---
|
||
|
||
## 9. Function Definitions
|
||
|
||
Judgement form: `Σ, Φ ⊢ func_def ✓`
|
||
|
||
The function table `Φ` is pre-populated with the signatures of **all** functions in the
|
||
program before any body is checked (first pass, §11). This gives every function access
|
||
to its own name for recursion and to any other function regardless of declaration order.
|
||
|
||
### 9.1 Building the initial environment
|
||
|
||
Given a function `fn f(p₁, …, pₙ) [-> R] { body }` where each parameter is
|
||
`[mut] xᵢ : Tᵢ`:
|
||
|
||
```
|
||
Γ_init = x₁:(T₁, m₁), …, xₙ:(Tₙ, mₙ) -- ordered; rightmost wins for lookup
|
||
A_init = { x₁, …, xₙ } -- all params are definitely assigned on entry
|
||
```
|
||
|
||
where `mᵢ = mut` if `mut` is written, `mᵢ = imm` otherwise.
|
||
|
||
All parameter types must be well-formed: `⊢ Tᵢ wf` for each `i`.
|
||
|
||
**Duplicate parameter names** are an error (E0902): all names `x₁, …, xₙ` must be
|
||
pairwise distinct. The check is performed before the environment is built.
|
||
|
||
### 9.2 Return type
|
||
|
||
The declared return type `R` is taken from the `-> T` annotation. If the annotation is
|
||
absent, `R = ()` (unit; the function returns no value).
|
||
|
||
### 9.3 All-paths-return
|
||
|
||
The function body is checked as a block statement with `L = noloop`. The resulting
|
||
control-flow outcome `cf` determines whether the return requirement is satisfied:
|
||
|
||
| Return type | Required `cf` | Otherwise |
|
||
| ----------- | ---------------------- | ----------- |
|
||
| `R = ()` | `normal` or `diverges` | — always OK |
|
||
| `R ≠ ()` | `diverges` | error E1001 |
|
||
|
||
When `R = ()` and the body falls through (`cf = normal`), the function implicitly
|
||
returns `()` at end of body — no explicit `return` is needed. An explicit `return;`
|
||
(bare, no value) is also allowed anywhere in a `()` function (T-Return-Unit, §8).
|
||
|
||
When `R ≠ ()`, every control-flow path through the body must end with an explicit
|
||
`return e;` where `e : U` and `U ⊑ R`. If the body's `cf = normal` (some path falls
|
||
through without a `return`), error E1001 is reported.
|
||
|
||
**Note on loop conservatism**: `while` and `loop` statements always output `cf = normal`
|
||
(§8.4). A function body that ends with a `loop { return e; }` and no following `return`
|
||
will fail the all-paths check (`cf = normal` from the loop). Add an explicit `return`
|
||
after the loop to satisfy the checker, or use `if-else` chains that both return.
|
||
|
||
### 9.4 Formal rule
|
||
|
||
```
|
||
(T-FuncDef) f ∈ dom(Φ) Φ(f) = (T₁, …, Tₙ) -> R -- f pre-loaded by §11 first pass
|
||
∀ i. ⊢ Tᵢ wf -- param types well-formed
|
||
all parameter names x₁, …, xₙ distinct -- else E0902
|
||
Γ_init = x₁:(T₁,m₁), …, xₙ:(Tₙ,mₙ)
|
||
A_init = { x₁, …, xₙ }
|
||
Γ_init, A_init, R, noloop ⊢ { body } ✓ ⇒ _, _, cf
|
||
R = () or cf = diverges -- else E1001
|
||
──────────────────────────────────────────────────────────────────────────
|
||
Σ, Φ ⊢ fn f(params) [-> R] { body } ✓
|
||
```
|
||
|
||
### 9.5 Recursion
|
||
|
||
Direct recursion (`f` calling `f`) and mutual recursion (`f` calling `g`, `g` calling
|
||
`f`) are both permitted. Because `Φ` is fully populated before any body is checked
|
||
(§11), `T-FuncName` (§7.3) can resolve any call to `f` inside `f`'s own body without
|
||
any special treatment in `T-FuncDef`.
|
||
|
||
---
|
||
|
||
## 10. Struct Definitions
|
||
|
||
Judgement form: `Σ ⊢ struct_def ✓`
|
||
|
||
`Σ` is pre-populated with all struct names in the program before any struct body is
|
||
checked (first pass, §11). This allows forward references and mutually recursive pointer
|
||
types.
|
||
|
||
### 10.1 Size-dependency and cycles
|
||
|
||
Define `struct-names(T)` — the set of struct names that appear **directly** (not behind
|
||
a pointer) in type `T`:
|
||
|
||
```
|
||
struct-names(struct S) = { S }
|
||
struct-names([T; N]) = struct-names(T) -- array has same direct dependency as elem
|
||
struct-names(*T) = ∅ -- pointer indirection breaks size dependency
|
||
struct-names(*mut T) = ∅
|
||
struct-names(*opaque) = ∅
|
||
struct-names(*mut opaque) = ∅
|
||
struct-names(primitive) = ∅
|
||
struct-names(()) = ∅
|
||
```
|
||
|
||
Define the **direct-uses** relation on struct names:
|
||
|
||
```
|
||
direct-uses(S) = ⋃ { struct-names(Tᵢ) | fᵢ : Tᵢ ∈ Σ(S) }
|
||
```
|
||
|
||
Let `TC(direct-uses)` be its transitive closure. A struct `S` has an **infinite-size
|
||
cycle** if `S ∈ TC(direct-uses)(S)`. Such cycles are forbidden (error E0900).
|
||
|
||
Examples:
|
||
|
||
```
|
||
struct Node { next: *Node } -- direct-uses(Node) = ∅ → no cycle ✓
|
||
struct Bad { inner: Bad } -- direct-uses(Bad) = {Bad} → cycle E0900
|
||
struct A { b: B } -- direct-uses(A) = {B}
|
||
struct B { a: *A } -- direct-uses(B) = ∅ → A→B→ (stop); no cycle ✓
|
||
struct C { d: D } -- direct-uses(C) = {D}
|
||
struct D { c: C } -- direct-uses(D) = {C} → C→D→C: cycle E0900
|
||
```
|
||
|
||
### 10.2 Formal rule
|
||
|
||
```
|
||
(T-StructDef) S ∈ dom(Σ)
|
||
Σ(S) = { f₁: T₁, …, fₙ: Tₙ }
|
||
all field names f₁, …, fₙ pairwise distinct -- else E0901
|
||
∀ i ∈ 1..n. ⊢ Tᵢ wf -- else E0101 / type error
|
||
S ∉ TC(direct-uses)(S) -- else E0900
|
||
──────────────────────────────────────────────────────────────────────
|
||
Σ ⊢ struct S { f₁: T₁, …, fₙ: Tₙ } ✓
|
||
```
|
||
|
||
An empty struct (`n = 0`) is valid — it has no fields to check and no cycles.
|
||
|
||
---
|
||
|
||
## 11. Program Typing
|
||
|
||
Judgement form: `⊢ program ✓`
|
||
|
||
A program is a sequence of top-level definitions (structs and functions) in any order.
|
||
Checking proceeds in four passes to support mutual recursion and forward references.
|
||
|
||
### 11.1 Passes
|
||
|
||
#### Pass 1 — Collect declarations
|
||
|
||
Scan every top-level definition and populate:
|
||
- `Σ` — map from struct name to the sequence of field names/types (bodies **not** checked yet)
|
||
- `Φ` — map from function name to its signature `(T₁,…,Tₙ) -> R` (bodies **not** checked yet)
|
||
|
||
Errors detected in pass 1:
|
||
- **E0103**: two `struct` definitions with the same name.
|
||
- **E0104**: two `fn` definitions with the same name.
|
||
- **E0101** (forward): a type annotation in a signature or field references a struct name not in `Σ` — reported immediately since signatures are parsed in full during pass 1.
|
||
|
||
#### Pass 2 — Check struct bodies
|
||
|
||
For each struct definition, apply T-StructDef (§10.2) under the completed `Σ`. Errors
|
||
checked: E0901 (duplicate field), ⊢ Tᵢ wf (E0101), E0900 (size cycle). The cycle check
|
||
is performed globally across all structs using `TC(direct-uses)`.
|
||
|
||
#### Pass 3 — Check function bodies
|
||
|
||
For each function definition, apply T-FuncDef (§9.4) under the completed `Σ` and `Φ`.
|
||
Errors checked: E0902 (duplicate param), E1001 (missing return), and all expression/
|
||
statement errors from §6–§9.
|
||
|
||
#### Pass 4 — Entry point (optional)
|
||
|
||
If the compiler is invoked with `--entry <name>`:
|
||
- `name ∈ dom(Φ)` — else error E0102 (undefined function for entry).
|
||
- The signature `Φ(name)` must be compatible with the entry convention (currently
|
||
no required signature; any function may serve as entry).
|
||
|
||
### 11.2 Formal rule
|
||
|
||
```
|
||
(T-Program) Σ, Φ built from pass 1 (no duplicates — E0103/E0104)
|
||
∀ struct_def in program. Σ ⊢ struct_def ✓ -- pass 2
|
||
∀ func_def in program. Σ, Φ ⊢ func_def ✓ -- pass 3
|
||
entry ∈ dom(Φ) (if --entry specified) -- pass 4
|
||
──────────────────────────────────────────────────────────────────────
|
||
⊢ program ✓
|
||
```
|
||
|
||
---
|
||
|
||
## 12. Error Catalogue
|
||
|
||
Each error has a code, a category, a message template (using `{…}` for placeholders),
|
||
and the spans to highlight in the diagnostic. Warnings use the `W` prefix.
|
||
|
||
### 12.1 Name errors (E01xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ----------------------- | --------------------------------------------- | ----------------- |
|
||
| E0100 | Undefined variable | `cannot find value '{name}' in this scope` | use site |
|
||
| E0101 | Undefined struct type | `cannot find type '{name}' in this scope` | type annotation |
|
||
| E0102 | Undefined function | `cannot find function '{name}' in this scope` | call site |
|
||
| E0103 | Duplicate struct name | `struct '{name}' is defined more than once` | second definition |
|
||
| E0104 | Duplicate function name | `function '{name}' is defined more than once` | second definition |
|
||
|
||
*Note*: use of a declared-but-uninitialized variable (§8.1) is reported as E0100 with
|
||
the message `use of possibly-uninitialized variable '{name}'`.
|
||
|
||
### 12.2 Type mismatch errors (E02xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ------------------------ | ----------------------------------------------------------------- | ---------------- |
|
||
| E0200 | Operator type mismatch | `operator '{op}' cannot be applied to types '{T}' and '{U}'` | operator |
|
||
| E0201 | Assignment type mismatch | `cannot assign value of type '{U}' to binding of type '{T}'` | initialiser expr |
|
||
| E0202 | Condition not bool | `condition must be of type 'bool', found '{T}'` | condition expr |
|
||
| E0203 | Return type mismatch | `cannot return value of type '{U}' from function returning '{R}'` | return expr |
|
||
| E0204 | Argument type mismatch | `argument {i} has type '{U}', expected '{T}'` | argument expr |
|
||
| E0205 | Wrong argument count | `function '{f}' expects {n} argument(s) but {k} were supplied` | call expression |
|
||
|
||
### 12.3 Mutability errors (E03xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ------------------- | --------------------------------------------------------------- | ----------------- |
|
||
| E0300 | Assign to immutable | `cannot assign to '{name}' because it is not declared as 'mut'` | LHS, binding site |
|
||
| E0301 | Assign to non-place | `left-hand side of assignment is not a valid place expression` | LHS expr |
|
||
|
||
### 12.4 Operator errors (E04xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ------------------------- | -------------------------------------------------------------------------- | ------------ |
|
||
| E0400 | Operator type mismatch | `operator '{op}' requires compatible numeric types, found '{T}' and '{U}'` | operator |
|
||
| E0401 | Shift amount not unsigned | `shift amount must be an unsigned integer type, found '{U}'` | RHS of shift |
|
||
|
||
### 12.5 Struct literal errors (E05xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | -------------------- | ----------------------------------------------------- | --------------------- |
|
||
| E0500 | Missing struct field | `missing field '{f}' in initialiser for struct '{S}'` | struct literal |
|
||
| E0501 | Extra struct field | `struct '{S}' has no field named '{f}'` | field name in literal |
|
||
| E0502 | Field on non-struct | `type '{T}' has no fields` | field access expr |
|
||
| E0503 | No such field | `struct '{S}' has no field named '{f}'` | field name in access |
|
||
|
||
### 12.6 Index errors (E06xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | --------------------------- | ----------------------------------------------------------- | ---------- |
|
||
| E0600 | Index on non-array | `type '{T}' cannot be indexed` | index expr |
|
||
| E0601 | Index type not unsigned int | `array index must be an unsigned integer type, found '{U}'` | index expr |
|
||
|
||
### 12.7 Pointer errors (E07xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ---------------- | ---------------------------------------------- | ----------- |
|
||
| E0700 | Deref of non-ptr | `type '{T}' cannot be dereferenced` | deref expr |
|
||
| E0701 | Addrof non-place | `cannot take the address of a temporary value` | addrof expr |
|
||
|
||
### 12.8 Control-flow errors (E08xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ----------------------- | ----------------------------------- | ------------------ |
|
||
| E0800 | `break` outside loop | `'break' used outside of a loop` | `break` keyword |
|
||
| E0801 | `continue` outside loop | `'continue' used outside of a loop` | `continue` keyword |
|
||
|
||
### 12.9 Struct definition errors (E09xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | --------------------------------- | ------------------------------------------------------------------ | ---------------- |
|
||
| E0900 | Recursive struct (no indirection) | `struct '{S}' has infinite size due to recursive field '{f}: {T}'` | field type |
|
||
| E0901 | Duplicate struct field | `field '{f}' is defined more than once in struct '{S}'` | second field |
|
||
| E0902 | Duplicate function parameter | `parameter '{x}' is defined more than once in function '{f}'` | second parameter |
|
||
|
||
### 12.10 Binding errors (E10xx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ----- | ------------------- | ------------------------------------------------------------------- | ------------- |
|
||
| E1000 | No type for binding | `cannot infer type for '{x}': no annotation and no initialiser` | `let` keyword |
|
||
| E1001 | Missing return | `function '{f}' must return '{R}' but not all paths return a value` | function name |
|
||
|
||
### 12.11 Warnings (Wxxx)
|
||
|
||
| Code | Category | Message template | Span(s) |
|
||
| ---- | ---------------- | ----------------------- | --------------------------- |
|
||
| W001 | Unreachable code | `unreachable statement` | first unreachable statement |
|