Initial Flux language specification

Add the LL(1) context-free grammar (GRAMMAR.ebnf), token and syntax
reference (SYNTAX.md), LL(1) verification tool (ll1_check.py), and a
fibonacci example demonstrating the language.
This commit is contained in:
2026-03-10 14:41:54 +01:00
commit 73e36fac71
4 changed files with 1607 additions and 0 deletions

803
SYNTAX.md Normal file
View File

@@ -0,0 +1,803 @@
# Flux Language Syntax Reference
## Lexical Tokens
All tokens listed here are produced by the lexer (lexical analysis phase) and
appear as UPPERCASE terminals in `GRAMMAR.ebnf`.
### Literals
| Token | Description | Examples |
| ------------ | ------------------------------------------------------------------- | ------------------------------ |
| `INT_LIT` | Integer literal (decimal, hex `0x`, octal `0o`, binary `0b`) | `42`, `0xFF`, `0o77`, `0b1010` |
| `FLOAT_LIT` | Floating-point literal | `3.14`, `1.0e-9`, `0.5` |
| `STRING_LIT` | Double-quoted UTF-8 string, supports `\n \t \\ \"` escape sequences | `"hello\nworld"` |
| `CHAR_LIT` | Single-quoted Unicode scalar value | `'a'`, `'\n'`, `'\u{1F600}'` |
| `TRUE` | Boolean true literal | `true` |
| `FALSE` | Boolean false literal | `false` |
### Identifier
| Token | Description |
| ------- | ------------------------------------------------------------------------------------------------------------ |
| `IDENT` | Identifier: starts with a letter or `_`, followed by letters, digits, or `_`. Unicode letters are permitted. |
### Operator Tokens
| Token | Lexeme | Description |
| --------- | ------ | -------------------------------------- |
| `PLUS` | `+` | Addition / unary plus (not in grammar) |
| `MINUS` | `-` | Subtraction / unary negation |
| `STAR` | `*` | Multiplication / pointer dereference |
| `SLASH` | `/` | Division |
| `PERCENT` | `%` | Modulo (remainder) |
| `AMP` | `&` | Bitwise AND / address-of |
| `PIPE` | `\|` | Bitwise OR |
| `CARET` | `^` | Bitwise XOR |
| `BANG` | `!` | Logical NOT |
| `TILDE` | `~` | Bitwise NOT |
| `DOT` | `.` | Member access |
### Keyword Tokens
#### Operator Keywords
| Lexeme | Description |
| ------ | ----------- |
| `and` | Logical AND |
| `or` | Logical OR |
#### Boolean Literals
| Lexeme | Description |
| ------- | ------------------- |
| `true` | Boolean true value |
| `false` | Boolean false value |
#### Primitive Type Keywords
| Lexeme | Description |
| ------ | ------------------------------ |
| `u8` | Unsigned 8-bit integer |
| `u16` | Unsigned 16-bit integer |
| `u32` | Unsigned 32-bit integer |
| `u64` | Unsigned 64-bit integer |
| `i8` | Signed 8-bit integer |
| `i16` | Signed 16-bit integer |
| `i32` | Signed 32-bit integer |
| `i64` | Signed 64-bit integer |
| `f32` | 32-bit IEEE 754 floating-point |
| `f64` | 64-bit IEEE 754 floating-point |
| `bool` | Boolean (`true` or `false`) |
| `char` | Unicode scalar value (32-bit) |
#### Pointer Keyword
| Lexeme | Description |
| -------- | ------------------------------------------------------- |
| `opaque` | Used in `*opaque` to denote a pointer with no type info |
#### Statement Keywords
| Lexeme | Description |
| ---------- | ------------------------------------- |
| `let` | Introduces a variable binding |
| `mut` | Marks a binding or pointer as mutable |
| `return` | Exits the enclosing function |
| `if` | Conditional statement |
| `else` | Alternative branch of an `if` |
| `while` | Condition-controlled loop |
| `loop` | Infinite loop |
| `break` | Exit the immediately enclosing loop |
| `continue` | Skip to the next iteration of a loop |
#### Definition Keywords
| Lexeme | Description |
| -------- | -------------------------------- |
| `fn` | Introduces a function definition |
| `struct` | Introduces a struct definition |
> **Lexer note:** All keywords above are reserved and must be recognised before
> the general `IDENT` rule. An identifier may not shadow any keyword.
### Delimiter / Punctuation Tokens
| Token | Lexeme | Description |
| ----------- | ------ | ------------------------------------------------------ |
| `LPAREN` | `(` | Left parenthesis |
| `RPAREN` | `)` | Right parenthesis |
| `LBRACKET` | `[` | Left square bracket |
| `RBRACKET` | `]` | Right square bracket |
| `COMMA` | `,` | Argument / element separator |
| `SEMICOLON` | `;` | Statement terminator / array size separator (`[T; N]`) |
| `LCURLY` | `{` | Block / compound expression open |
| `RCURLY` | `}` | Block / compound expression close |
| `ARROW` | `->` | Function return type separator |
| `COLON` | `:` | Type annotation separator |
---
## Expressions
Expressions produce a value. The grammar defines them through a hierarchy of
precedence levels — lower in the list means lower precedence (binds less
tightly).
### Operator Precedence Table
| Level | Operators | Associativity | Description |
| ----- | --------------------------- | -------------- | -------------------------------- |
| 1 | `or` | left | Logical OR (lowest) |
| 2 | `and` | left | Logical AND |
| 3 | `\|` | left | Bitwise OR |
| 4 | `^` | left | Bitwise XOR |
| 5 | `&` | left | Bitwise AND |
| 6 | `+` `-` | left | Addition, subtraction |
| 7 | `*` `/` `%` | left | Multiplication, division, modulo |
| 8 | `!` `~` `-` `*` `&` | right (unary) | Prefix unary operators |
| 9 | `.` `[…]` `(…)` | left (postfix) | Member access, index, call |
| 10 | literals, identifiers, `()` | — | Primary expressions (highest) |
### Operator Descriptions
#### Binary Operators
| Operator | Name | Example | Notes |
| -------- | -------------- | --------- | -------------------------------------------- |
| `or` | Logical OR | `a or b` | Short-circuits; both operands must be `bool` |
| `and` | Logical AND | `a and b` | Short-circuits; both operands must be `bool` |
| `\|` | Bitwise OR | `a \| b` | Integer types |
| `^` | Bitwise XOR | `a ^ b` | Integer types |
| `&` | Bitwise AND | `a & b` | Integer types (binary context) |
| `+` | Addition | `a + b` | |
| `-` | Subtraction | `a - b` | |
| `*` | Multiplication | `a * b` | Binary context (both operands are values) |
| `/` | Division | `a / b` | Integer division truncates toward zero |
| `%` | Modulo | `a % b` | Sign follows the dividend |
#### Unary Prefix Operators
| Operator | Name | Example | Notes |
| -------- | ----------- | ------- | ------------------------------------------------ |
| `!` | Logical NOT | `!cond` | Operand must be `bool` |
| `~` | Bitwise NOT | `~mask` | Bitwise complement; integer types |
| `-` | Negation | `-x` | Arithmetic negation |
| `*` | Dereference | `*ptr` | Unary context; operand must be a pointer type |
| `&` | Address-of | `&x` | Unary context; produces a pointer to the operand |
#### Postfix Operators
| Operator | Name | Example | Notes |
| -------- | ------------- | ----------- | ------------------------------------------------- |
| `.` | Member access | `obj.field` | Accesses a named field or method of a struct/type |
| `[…]` | Subscript | `arr[i]` | Indexes into an array, slice, or map |
| `(…)` | Call | `f(a, b)` | Invokes a function or closure |
> **Disambiguation:** `*` and `&` are context-sensitive.
> When appearing as the first token of a `unary_expr` they are **unary**
> (dereference / address-of). When appearing between two `unary_expr`
> sub-trees inside `multiplicative_expr` or `bitand_expr` they are **binary**
> (multiplication / bitwise AND). The parser resolves this purely from
> grammatical position — no look-ahead beyond 1 token is required.
### Parenthesised Expressions
Any expression may be wrapped in parentheses to override default precedence:
```
(a + b) * c
```
### Function Call Argument List
Arguments are comma-separated expressions. A trailing comma is **not**
permitted at this grammar level.
```
f()
f(x)
f(x, y, z)
```
### Examples
```flux
// Arithmetic
a + b * c - d % 2
// Bitwise
flags & MASK | extra ^ toggle
// Logical
ready and not_done or fallback
// Mixed unary / postfix
*ptr.field
&arr[i]
!cond
// Chained postfix
obj.method(arg1, arg2)[0].name
// Explicit precedence override
(a or b) and c
```
---
## Types
Types describe the shape and interpretation of values. All type positions in
the grammar reference the `type` non-terminal.
### Primitive Types
Primitive types are single-keyword types built into the language.
| Type | Kind | Width | Range / Notes |
| ------ | ---------------- | ------ | ------------------------------------------ |
| `u8` | Unsigned integer | 8-bit | 0 … 255 |
| `u16` | Unsigned integer | 16-bit | 0 … 65 535 |
| `u32` | Unsigned integer | 32-bit | 0 … 4 294 967 295 |
| `u64` | Unsigned integer | 64-bit | 0 … 2⁶⁴ 1 |
| `i8` | Signed integer | 8-bit | 128 … 127 |
| `i16` | Signed integer | 16-bit | 32 768 … 32 767 |
| `i32` | Signed integer | 32-bit | 2 147 483 648 … 2 147 483 647 |
| `i64` | Signed integer | 64-bit | 2⁶³ … 2⁶³ 1 |
| `f32` | Floating-point | 32-bit | IEEE 754 single precision |
| `f64` | Floating-point | 64-bit | IEEE 754 double precision |
| `bool` | Boolean | 1 byte | `true` or `false` |
| `char` | Unicode scalar | 32-bit | Any Unicode scalar value (not a surrogate) |
### Named Types
A named type is any user-defined type referenced by its identifier — typically a struct name. Because all primitive-type keywords (`u8`, `bool`, etc.) are reserved, an `IDENT` in type position is always a named type, never a primitive.
```flux
Point // struct Point { x: f32, y: f32 }
Node // struct Node { value: i64, next: *Node }
*Point // pointer to a named type
[Node; 8] // array of a named type
```
### Pointer Types
A pointer type is written with a leading `*`.
| Syntax | Description |
| --------- | ------------------------------------------------------------------------------------- |
| `*T` | Typed pointer — points to a value of type `T` |
| `*opaque` | Opaque pointer — no compile-time pointee type information; equivalent to C's `void *` |
Pointer types may be nested: `**u8` is a pointer to a pointer to `u8`.
```flux
*u8 // pointer to u8
**i32 // pointer to pointer to i32
*opaque // untyped pointer
**opaque // pointer to untyped pointer
```
### Array Types
Arrays have a fixed size known at compile time.
```
[ <element-type> ; <size> ]
```
`<size>` must be a non-negative integer literal (`INT_LIT`). The element type
may itself be any `type`, including pointers or nested arrays.
```flux
[u8; 256] // array of 256 u8 values
[*u8; 4] // array of 4 pointers to u8
[[f32; 3]; 3] // 3×3 matrix of f32 (array of arrays)
[*opaque; 8] // array of 8 opaque pointers
```
### Type Grammar Summary
```ebnf
type = primitive_type | named_type | pointer_type | array_type ;
primitive_type = "u8" | "u16" | "u32" | "u64"
| "i8" | "i16" | "i32" | "i64"
| "f32" | "f64" | "bool" | "char" ;
named_type = IDENT ;
pointer_type = "*" , ( "opaque" | type ) ;
array_type = "[" , type , ";" , INT_LIT , "]" ;
```
---
## Struct Literals
A struct literal constructs a value of a named struct type by providing values for each field.
```
<TypeName> { <field>: <expr>, ... }
```
Fields may appear in any order and need not match the declaration order. No trailing comma is permitted.
### Examples
```flux
let p = Point { x: 1.0, y: 2.0 };
let n = Node {
value: 42,
next: get_next()
};
// Nested struct literal
let outer = Rect {
origin: Point { x: 0.0, y: 0.0 },
size: Point { x: 10.0, y: 5.0 }
};
// Empty struct
let u = Unit {};
```
### Struct Literals in Conditions
Struct literals are **not permitted** as the outermost expression in `if` and `while` conditions. This restriction exists because `{` after the condition is ambiguous — it could start a struct literal body or the statement block.
```flux
// ERROR — ambiguous: is `{` a struct body or the if block?
if Flags { verbose: true } { ... }
// OK — parentheses resolve the ambiguity
if (Flags { verbose: true }).verbose { ... }
```
The grammar enforces this through the `expr_ns` (no-struct) hierarchy used in condition positions. Struct literals remain valid everywhere else: `let`, `return`, function arguments, field values, etc.
### Struct Literal Grammar Summary
```ebnf
primary_expr = IDENT , [ struct_lit_body ] | INT_LIT | FLOAT_LIT
| STRING_LIT | CHAR_LIT | "true" | "false"
| "(" , expr , ")" ;
struct_lit_body = "{" , struct_field_list , "}" ;
struct_field_list = [ struct_field , { "," , struct_field } ] ;
struct_field = IDENT , ":" , expr ;
```
### No-Struct Expression (`expr_ns`)
`expr_ns` is a parallel expression hierarchy identical to `expr` except its primary level (`primary_expr_ns`) does not allow the `struct_lit_body` suffix after an `IDENT`. Struct literals are still permitted when enclosed in parentheses (`"(" , expr , ")"`), because the `(` unambiguously marks the start of a grouped expression.
`if_stmt` and `while_stmt` use `expr_ns` for their condition; all other expression positions use the full `expr`.
---
## Statements
Statements perform an action and do not produce a value. Each statement is
terminated by a semicolon `;`.
### Let Statement
Introduces a new named binding in the current scope.
```
let [mut] <name> [: <type>] [= <expr>] ;
```
| Part | Required | Description |
| ---------- | -------- | --------------------------------------------- |
| `mut` | no | Makes the binding mutable; omit for immutable |
| `<name>` | yes | The identifier being bound |
| `: <type>` | no | Explicit type annotation |
| `= <expr>` | no | Initialiser expression |
| `;` | yes | Statement terminator |
Bindings are **immutable by default**. Attempting to assign to a binding
declared without `mut` is a compile-time error.
At least one of the type annotation or the initialiser must be present so the
compiler can determine the binding's type. This is a semantic constraint, not a
syntactic one — the grammar permits bare `let x;` and the type checker rejects
it if no type can be inferred from context.
#### Examples
```flux
// Immutable, type inferred from initialiser
let x = 42;
// Immutable, explicit type
let y: f64 = 3.14;
// Mutable, type inferred
let mut count = 0;
// Mutable, explicit type, no initialiser (must be assigned before use)
let mut buf: [u8; 128];
// Mutable pointer to u32
let mut ptr: *u32 = &value;
// Shadowing a previous binding is allowed
let x = "hello"; // x is now a string, previous x is gone
```
### Return Statement
Exits the enclosing function immediately, optionally producing a return value.
```
return [<expr>] ;
```
`return;` (no expression) is used when the function's return type is the unit
type `()`. `return <expr>;` returns the value of the expression.
Explicit `return` is only needed for early exits. The idiomatic way to return a
value from a function is the implicit return of its body block.
```flux
return; // unit return
return 42; // return an integer
return x * 2 + 1; // return an expression
```
### Expression Statement
Evaluates an expression for its side effects; the resulting value is
discarded. A semicolon is required.
```
<expr> ;
```
```flux
do_something(x); // call for side effects
count + 1; // legal but silly — value discarded
```
### Statement Grammar Summary
```ebnf
stmt = let_stmt | return_stmt | if_stmt
| while_stmt | loop_stmt | break_stmt | continue_stmt
| block_stmt | expr_stmt ;
let_stmt = "let" , [ "mut" ] , IDENT , [ ":" , type ] , [ "=" , expr ] , ";" ;
return_stmt = "return" , [ expr ] , ";" ;
if_stmt = "if" , expr_ns , block_stmt , [ "else" , else_branch ] ;
else_branch = if_stmt | block_stmt ;
while_stmt = "while" , expr_ns , block_stmt ;
loop_stmt = "loop" , block_stmt ;
break_stmt = "break" , ";" ;
continue_stmt = "continue" , ";" ;
block_stmt = "{" , { stmt } , "}" ;
expr_stmt = expr , ";" ;
```
---
## If Statement
Conditionally executes a block based on a boolean expression.
```
if <cond> <block> [else <else-branch>]
```
The condition `<cond>` must be an expression of type `bool`. The body is
always a `block_stmt` — braces are mandatory.
### Else Branch
The optional `else` branch is either a plain block or another `if` statement,
enabling `else if` chains of arbitrary length.
```flux
if x > 0 {
pos();
}
if x > 0 {
pos();
} else {
non_pos();
}
if x > 0 {
pos();
} else if x < 0 {
neg();
} else {
zero();
}
```
### If Statement Grammar Summary
```ebnf
if_stmt = "if" , expr_ns , block_stmt , [ "else" , else_branch ] ;
else_branch = if_stmt | block_stmt ;
```
---
## While Loop
Repeatedly executes a block as long as a boolean condition holds. The
condition is tested before each iteration; if it is false on entry, the body
never runs.
```
while <cond> <block>
```
```flux
let mut i = 0;
while i < 10 {
process(i);
i = i + 1;
}
```
### While Loop Grammar Summary
```ebnf
while_stmt = "while" , expr_ns , block_stmt ;
```
---
## Loop
Executes a block unconditionally and indefinitely. The loop runs until a
`break` or `return` inside the body transfers control out.
```
loop <block>
```
```flux
loop {
let msg = recv();
if msg.is_quit() {
break;
}
handle(msg);
}
```
### Loop Grammar Summary
```ebnf
loop_stmt = "loop" , block_stmt ;
```
---
## Break and Continue
`break` and `continue` are only valid inside the body of a `while` or `loop`.
The compiler enforces this as a semantic rule.
| Statement | Effect |
| ------------ | -------------------------------------------------------------- |
| `break ;` | Exits the immediately enclosing loop immediately |
| `continue ;` | Skips the rest of the current iteration; jumps to the next one |
For `while`, `continue` jumps back to the condition check. For `loop`,
`continue` jumps back to the top of the body.
```flux
let mut i = 0;
while i < 20 {
i = i + 1;
if i % 2 == 0 {
continue; // skip even numbers
}
if i > 15 {
break; // stop after 15
}
process(i);
}
```
### Break / Continue Grammar Summary
```ebnf
break_stmt = "break" , ";" ;
continue_stmt = "continue" , ";" ;
```
---
## Block Statement
A block groups zero or more statements into a single statement and introduces
a new lexical scope. Blocks do not produce a value.
```
{ <stmt>* }
```
### Scoping
Bindings declared inside a block are not visible outside it. A binding in an
inner scope may shadow a name from an outer scope without affecting it.
```flux
let x = 1;
{
let x = 2; // shadows outer x inside this block only
f(x); // uses 2
}
// x is still 1 here
```
### Nesting
Blocks may be nested freely to any depth.
```flux
{
let a = compute_a();
{
let b = compute_b();
use(a, b);
}
// b is no longer in scope here
}
```
### Block Grammar Summary
```ebnf
block = "{" , { stmt } , "}" ;
```
---
## Top-Level Definitions
A Flux source file is a sequence of top-level definitions.
```ebnf
program = { top_level_def } ;
top_level_def = func_def | struct_def ;
```
The leading token unambiguously selects the definition kind: `fn` → function,
`struct` → struct.
---
## Function Definition
Defines a named, callable function.
```
fn <name> ( [<params>] ) [-> <return-type>] <block>
```
| Part | Required | Description |
| ------------------ | -------- | -------------------------------------------------------- |
| `<name>` | yes | The function's identifier |
| `( [<params>] )` | yes | Comma-separated parameter list, may be empty |
| `-> <return-type>` | no | Return type; omitting it means the function returns `()` |
| `<block>` | yes | Function body — a `block_stmt` |
### Parameters
Each parameter is a name with a mandatory type annotation. Parameters are
immutable by default; `mut` makes the local binding mutable within the body.
```
[mut] <name> : <type>
```
```flux
fn add(a: i32, b: i32) -> i32 {
return a + b;
}
fn greet(name: *u8) {
print(name);
}
fn increment(mut x: i32) -> i32 {
x = x + 1;
return x;
}
fn apply(f: *opaque, mut buf: [u8; 64]) -> bool {
return call(f, &buf);
}
```
### Return Type
If `->` is omitted the return type is implicitly `()` (the unit type). An
explicit `-> ()` is also permitted but redundant.
```flux
fn do_work() { // returns ()
side_effect();
}
fn get_value() -> i64 { // returns i64
return 42;
}
```
### Function Definition Grammar Summary
```ebnf
func_def = "fn" , IDENT , "(" , param_list , ")" , [ "->" , type ] , block_stmt ;
param_list = [ param , { "," , param } ] ;
param = [ "mut" ] , IDENT , ":" , type ;
```
---
## Struct Definition
Defines a named product type with zero or more typed fields.
```
struct <name> {
<field>: <type>,
...
}
```
Fields are separated by commas. No trailing comma is permitted. An empty
struct (zero fields) is valid.
### Fields
Each field is a name and a type. Fields may be of any type including pointers,
arrays, and other structs. Field names must be unique within the struct.
```flux
struct Point {
x: f32,
y: f32
}
struct Node {
value: i64,
next: *Node
}
struct Buffer {
data: *u8,
len: u64,
cap: u64
}
struct Unit {}
```
### Member Access
Fields of a struct value are accessed with the `.` operator (defined in the
expression grammar). If the value is behind a pointer, dereference it first
with `*`.
```flux
let p: Point = make_point();
let x = p.x;
let ptr: *Point = get_point_ptr();
let y = (*ptr).y;
```
### Struct Definition Grammar Summary
```ebnf
struct_def = "struct" , IDENT , "{" , field_list , "}" ;
field_list = [ field , { "," , field } ] ;
field = IDENT , ":" , type ;
```