Files
flux/SYNTAX.md
Jooris Hadeler a82b7e4633 Feat: add compound assignment and shift operators
Compound assignment: +=, -=, *=, /=, %=, &=, |=, ^=, <<=, >>=
Shift: <<, >>

Each compound assignment token parses at the same precedence as `=`
(right-associative, lowest) and produces ExprKind::CompoundAssign.
Shifts parse between additive and multiplicative precedence.
GRAMMAR.ebnf and SYNTAX.md updated accordingly.
2026-03-10 18:29:52 +01:00

832 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Flux Language Syntax Reference
## Lexical Tokens
All tokens listed here are produced by the lexer (lexical analysis phase) and
appear as UPPERCASE terminals in `GRAMMAR.ebnf`.
### Literals
| Token | Description | Examples |
| ------------ | ------------------------------------------------------------------- | ------------------------------ |
| `INT_LIT` | Integer literal (decimal, hex `0x`, octal `0o`, binary `0b`) | `42`, `0xFF`, `0o77`, `0b1010` |
| `FLOAT_LIT` | Floating-point literal | `3.14`, `1.0e-9`, `0.5` |
| `STRING_LIT` | Double-quoted UTF-8 string, supports `\n \t \\ \"` escape sequences | `"hello\nworld"` |
| `CHAR_LIT` | Single-quoted Unicode scalar value | `'a'`, `'\n'`, `'\u{1F600}'` |
| `TRUE` | Boolean true literal | `true` |
| `FALSE` | Boolean false literal | `false` |
### Identifier
| Token | Description |
| ------- | ------------------------------------------------------------------------------------------------------------ |
| `IDENT` | Identifier: starts with a letter or `_`, followed by letters, digits, or `_`. Unicode letters are permitted. |
### Operator Tokens
| Token | Lexeme | Description |
| ------------ | ------ | -------------------------------------- |
| `PLUS` | `+` | Addition / unary plus (not in grammar) |
| `MINUS` | `-` | Subtraction / unary negation |
| `STAR` | `*` | Multiplication / pointer dereference |
| `SLASH` | `/` | Division |
| `PERCENT` | `%` | Modulo (remainder) |
| `AMP` | `&` | Bitwise AND / address-of |
| `PIPE` | `\|` | Bitwise OR |
| `CARET` | `^` | Bitwise XOR |
| `BANG` | `!` | Logical NOT |
| `TILDE` | `~` | Bitwise NOT |
| `DOT` | `.` | Member access |
| `SHL` | `<<` | Left shift |
| `SHR` | `>>` | Right shift |
| `EQ` | `=` | Assignment |
| `PLUS_EQ` | `+=` | Add-assign |
| `MINUS_EQ` | `-=` | Subtract-assign |
| `STAR_EQ` | `*=` | Multiply-assign |
| `SLASH_EQ` | `/=` | Divide-assign |
| `PERCENT_EQ` | `%=` | Modulo-assign |
| `AMP_EQ` | `&=` | Bitwise-AND-assign |
| `PIPE_EQ` | `\|=` | Bitwise-OR-assign |
| `CARET_EQ` | `^=` | Bitwise-XOR-assign |
| `SHL_EQ` | `<<=` | Left-shift-assign |
| `SHR_EQ` | `>>=` | Right-shift-assign |
### Keyword Tokens
#### Operator Keywords
| Lexeme | Description |
| ------ | ----------- |
| `and` | Logical AND |
| `or` | Logical OR |
#### Boolean Literals
| Lexeme | Description |
| ------- | ------------------- |
| `true` | Boolean true value |
| `false` | Boolean false value |
#### Primitive Type Keywords
| Lexeme | Description |
| ------ | ------------------------------ |
| `u8` | Unsigned 8-bit integer |
| `u16` | Unsigned 16-bit integer |
| `u32` | Unsigned 32-bit integer |
| `u64` | Unsigned 64-bit integer |
| `i8` | Signed 8-bit integer |
| `i16` | Signed 16-bit integer |
| `i32` | Signed 32-bit integer |
| `i64` | Signed 64-bit integer |
| `f32` | 32-bit IEEE 754 floating-point |
| `f64` | 64-bit IEEE 754 floating-point |
| `bool` | Boolean (`true` or `false`) |
| `char` | Unicode scalar value (32-bit) |
#### Pointer Keyword
| Lexeme | Description |
| -------- | ------------------------------------------------------- |
| `opaque` | Used in `*opaque` to denote a pointer with no type info |
#### Statement Keywords
| Lexeme | Description |
| ---------- | ------------------------------------- |
| `let` | Introduces a variable binding |
| `mut` | Marks a binding or pointer as mutable |
| `return` | Exits the enclosing function |
| `if` | Conditional statement |
| `else` | Alternative branch of an `if` |
| `while` | Condition-controlled loop |
| `loop` | Infinite loop |
| `break` | Exit the immediately enclosing loop |
| `continue` | Skip to the next iteration of a loop |
#### Definition Keywords
| Lexeme | Description |
| -------- | -------------------------------- |
| `fn` | Introduces a function definition |
| `struct` | Introduces a struct definition |
> **Lexer note:** All keywords above are reserved and must be recognised before
> the general `IDENT` rule. An identifier may not shadow any keyword.
### Delimiter / Punctuation Tokens
| Token | Lexeme | Description |
| ----------- | ------ | ------------------------------------------------------ |
| `LPAREN` | `(` | Left parenthesis |
| `RPAREN` | `)` | Right parenthesis |
| `LBRACKET` | `[` | Left square bracket |
| `RBRACKET` | `]` | Right square bracket |
| `COMMA` | `,` | Argument / element separator |
| `SEMICOLON` | `;` | Statement terminator / array size separator (`[T; N]`) |
| `LCURLY` | `{` | Block / compound expression open |
| `RCURLY` | `}` | Block / compound expression close |
| `ARROW` | `->` | Function return type separator |
| `COLON` | `:` | Type annotation separator |
---
## Expressions
Expressions produce a value. The grammar defines them through a hierarchy of
precedence levels — lower in the list means lower precedence (binds less
tightly).
### Operator Precedence Table
| Level | Operators | Associativity | Description |
| ----- | -------------------------------------------------------- | -------------- | -------------------------------- |
| 1 | `=` `+=` `-=` `*=` `/=` `%=` `&=` `\|=` `^=` `<<=` `>>=` | right | Assignment (lowest) |
| 2 | `or` | left | Logical OR |
| 3 | `and` | left | Logical AND |
| 4 | `\|` | left | Bitwise OR |
| 5 | `^` | left | Bitwise XOR |
| 6 | `&` | left | Bitwise AND |
| 7 | `+` `-` | left | Addition, subtraction |
| 8 | `<<` `>>` | left | Bit shift |
| 9 | `*` `/` `%` | left | Multiplication, division, modulo |
| 10 | `!` `~` `-` `*` `&` | right (unary) | Prefix unary operators |
| 11 | `.` `[…]` `(…)` | left (postfix) | Member access, index, call |
| 12 | literals, identifiers, `()` | — | Primary expressions (highest) |
### Operator Descriptions
#### Binary Operators
| Operator | Name | Example | Notes |
| -------- | -------------- | --------- | ---------------------------------------------- |
| `=` | Assignment | `a = b` | Right-associative; `a = b = c``a = (b = c)` |
| `+=` | Add-assign | `a += b` | Expands to `a = a + b` |
| `-=` | Sub-assign | `a -= b` | Expands to `a = a - b` |
| `*=` | Mul-assign | `a *= b` | Expands to `a = a * b` |
| `/=` | Div-assign | `a /= b` | Expands to `a = a / b` |
| `%=` | Rem-assign | `a %= b` | Expands to `a = a % b` |
| `&=` | BitAnd-assign | `a &= b` | Expands to `a = a & b` |
| `\|=` | BitOr-assign | `a \|= b` | Expands to `a = a \| b` |
| `^=` | BitXor-assign | `a ^= b` | Expands to `a = a ^ b` |
| `<<` | Left shift | `a << b` | Shift `a` left by `b` bits; integer types |
| `>>` | Right shift | `a >> b` | Shift `a` right by `b` bits; integer types |
| `<<=` | Shl-assign | `a <<= b` | Expands to `a = a << b` |
| `>>=` | Shr-assign | `a >>= b` | Expands to `a = a >> b` |
| `or` | Logical OR | `a or b` | Short-circuits; both operands must be `bool` |
| `and` | Logical AND | `a and b` | Short-circuits; both operands must be `bool` |
| `\|` | Bitwise OR | `a \| b` | Integer types |
| `^` | Bitwise XOR | `a ^ b` | Integer types |
| `&` | Bitwise AND | `a & b` | Integer types (binary context) |
| `+` | Addition | `a + b` | |
| `-` | Subtraction | `a - b` | |
| `*` | Multiplication | `a * b` | Binary context (both operands are values) |
| `/` | Division | `a / b` | Integer division truncates toward zero |
| `%` | Modulo | `a % b` | Sign follows the dividend |
#### Unary Prefix Operators
| Operator | Name | Example | Notes |
| -------- | ----------- | ------- | ------------------------------------------------ |
| `!` | Logical NOT | `!cond` | Operand must be `bool` |
| `~` | Bitwise NOT | `~mask` | Bitwise complement; integer types |
| `-` | Negation | `-x` | Arithmetic negation |
| `*` | Dereference | `*ptr` | Unary context; operand must be a pointer type |
| `&` | Address-of | `&x` | Unary context; produces a pointer to the operand |
#### Postfix Operators
| Operator | Name | Example | Notes |
| -------- | ------------- | ----------- | ------------------------------------------------- |
| `.` | Member access | `obj.field` | Accesses a named field or method of a struct/type |
| `[…]` | Subscript | `arr[i]` | Indexes into an array, slice, or map |
| `(…)` | Call | `f(a, b)` | Invokes a function or closure |
> **Disambiguation:** `*` and `&` are context-sensitive.
> When appearing as the first token of a `unary_expr` they are **unary**
> (dereference / address-of). When appearing between two `unary_expr`
> sub-trees inside `multiplicative_expr` or `bitand_expr` they are **binary**
> (multiplication / bitwise AND). The parser resolves this purely from
> grammatical position — no look-ahead beyond 1 token is required.
### Parenthesised Expressions
Any expression may be wrapped in parentheses to override default precedence:
```
(a + b) * c
```
### Function Call Argument List
Arguments are comma-separated expressions. A trailing comma is **not**
permitted at this grammar level.
```
f()
f(x)
f(x, y, z)
```
### Examples
```flux
// Arithmetic
a + b * c - d % 2
// Bitwise
flags & MASK | extra ^ toggle
// Logical
ready and not_done or fallback
// Mixed unary / postfix
*ptr.field
&arr[i]
!cond
// Chained postfix
obj.method(arg1, arg2)[0].name
// Explicit precedence override
(a or b) and c
```
---
## Types
Types describe the shape and interpretation of values. All type positions in
the grammar reference the `type` non-terminal.
### Primitive Types
Primitive types are single-keyword types built into the language.
| Type | Kind | Width | Range / Notes |
| ------ | ---------------- | ------ | ------------------------------------------ |
| `u8` | Unsigned integer | 8-bit | 0 … 255 |
| `u16` | Unsigned integer | 16-bit | 0 … 65 535 |
| `u32` | Unsigned integer | 32-bit | 0 … 4 294 967 295 |
| `u64` | Unsigned integer | 64-bit | 0 … 2⁶⁴ 1 |
| `i8` | Signed integer | 8-bit | 128 … 127 |
| `i16` | Signed integer | 16-bit | 32 768 … 32 767 |
| `i32` | Signed integer | 32-bit | 2 147 483 648 … 2 147 483 647 |
| `i64` | Signed integer | 64-bit | 2⁶³ … 2⁶³ 1 |
| `f32` | Floating-point | 32-bit | IEEE 754 single precision |
| `f64` | Floating-point | 64-bit | IEEE 754 double precision |
| `bool` | Boolean | 1 byte | `true` or `false` |
| `char` | Unicode scalar | 32-bit | Any Unicode scalar value (not a surrogate) |
### Named Types
A named type is any user-defined type referenced by its identifier — typically a struct name. Because all primitive-type keywords (`u8`, `bool`, etc.) are reserved, an `IDENT` in type position is always a named type, never a primitive.
```flux
Point // struct Point { x: f32, y: f32 }
Node // struct Node { value: i64, next: *Node }
*Point // pointer to a named type
[Node; 8] // array of a named type
```
### Pointer Types
A pointer type is written with a leading `*`.
| Syntax | Description |
| --------- | ------------------------------------------------------------------------------------- |
| `*T` | Typed pointer — points to a value of type `T` |
| `*opaque` | Opaque pointer — no compile-time pointee type information; equivalent to C's `void *` |
Pointer types may be nested: `**u8` is a pointer to a pointer to `u8`.
```flux
*u8 // pointer to u8
**i32 // pointer to pointer to i32
*opaque // untyped pointer
**opaque // pointer to untyped pointer
```
### Array Types
Arrays have a fixed size known at compile time.
```
[ <element-type> ; <size> ]
```
`<size>` must be a non-negative integer literal (`INT_LIT`). The element type
may itself be any `type`, including pointers or nested arrays.
```flux
[u8; 256] // array of 256 u8 values
[*u8; 4] // array of 4 pointers to u8
[[f32; 3]; 3] // 3×3 matrix of f32 (array of arrays)
[*opaque; 8] // array of 8 opaque pointers
```
### Type Grammar Summary
```ebnf
type = primitive_type | named_type | pointer_type | array_type ;
primitive_type = "u8" | "u16" | "u32" | "u64"
| "i8" | "i16" | "i32" | "i64"
| "f32" | "f64" | "bool" | "char" ;
named_type = IDENT ;
pointer_type = "*" , ( "opaque" | type ) ;
array_type = "[" , type , ";" , INT_LIT , "]" ;
```
---
## Struct Literals
A struct literal constructs a value of a named struct type by providing values for each field.
```
<TypeName> { <field>: <expr>, ... }
```
Fields may appear in any order and need not match the declaration order. No trailing comma is permitted.
### Examples
```flux
let p = Point { x: 1.0, y: 2.0 };
let n = Node {
value: 42,
next: get_next()
};
// Nested struct literal
let outer = Rect {
origin: Point { x: 0.0, y: 0.0 },
size: Point { x: 10.0, y: 5.0 }
};
// Empty struct
let u = Unit {};
```
### Struct Literals in Conditions
Struct literals are **not permitted** as the outermost expression in `if` and `while` conditions. This restriction exists because `{` after the condition is ambiguous — it could start a struct literal body or the statement block.
```flux
// ERROR — ambiguous: is `{` a struct body or the if block?
if Flags { verbose: true } { ... }
// OK — parentheses resolve the ambiguity
if (Flags { verbose: true }).verbose { ... }
```
The grammar enforces this through the `expr_ns` (no-struct) hierarchy used in condition positions. Struct literals remain valid everywhere else: `let`, `return`, function arguments, field values, etc.
### Struct Literal Grammar Summary
```ebnf
primary_expr = IDENT , [ struct_lit_body ] | INT_LIT | FLOAT_LIT
| STRING_LIT | CHAR_LIT | "true" | "false"
| "(" , expr , ")" ;
struct_lit_body = "{" , struct_field_list , "}" ;
struct_field_list = [ struct_field , { "," , struct_field } ] ;
struct_field = IDENT , ":" , expr ;
```
### No-Struct Expression (`expr_ns`)
`expr_ns` is a parallel expression hierarchy identical to `expr` except its primary level (`primary_expr_ns`) does not allow the `struct_lit_body` suffix after an `IDENT`. Struct literals are still permitted when enclosed in parentheses (`"(" , expr , ")"`), because the `(` unambiguously marks the start of a grouped expression.
`if_stmt` and `while_stmt` use `expr_ns` for their condition; all other expression positions use the full `expr`.
---
## Statements
Statements perform an action and do not produce a value. Each statement is
terminated by a semicolon `;`.
### Let Statement
Introduces a new named binding in the current scope.
```
let [mut] <name> [: <type>] [= <expr>] ;
```
| Part | Required | Description |
| ---------- | -------- | --------------------------------------------- |
| `mut` | no | Makes the binding mutable; omit for immutable |
| `<name>` | yes | The identifier being bound |
| `: <type>` | no | Explicit type annotation |
| `= <expr>` | no | Initialiser expression |
| `;` | yes | Statement terminator |
Bindings are **immutable by default**. Attempting to assign to a binding
declared without `mut` is a compile-time error.
At least one of the type annotation or the initialiser must be present so the
compiler can determine the binding's type. This is a semantic constraint, not a
syntactic one — the grammar permits bare `let x;` and the type checker rejects
it if no type can be inferred from context.
#### Examples
```flux
// Immutable, type inferred from initialiser
let x = 42;
// Immutable, explicit type
let y: f64 = 3.14;
// Mutable, type inferred
let mut count = 0;
// Mutable, explicit type, no initialiser (must be assigned before use)
let mut buf: [u8; 128];
// Mutable pointer to u32
let mut ptr: *u32 = &value;
// Shadowing a previous binding is allowed
let x = "hello"; // x is now a string, previous x is gone
```
### Return Statement
Exits the enclosing function immediately, optionally producing a return value.
```
return [<expr>] ;
```
`return;` (no expression) is used when the function's return type is the unit
type `()`. `return <expr>;` returns the value of the expression.
Explicit `return` is only needed for early exits. The idiomatic way to return a
value from a function is the implicit return of its body block.
```flux
return; // unit return
return 42; // return an integer
return x * 2 + 1; // return an expression
```
### Expression Statement
Evaluates an expression for its side effects; the resulting value is
discarded. A semicolon is required.
```
<expr> ;
```
```flux
do_something(x); // call for side effects
count + 1; // legal but silly — value discarded
```
### Statement Grammar Summary
```ebnf
stmt = let_stmt | return_stmt | if_stmt
| while_stmt | loop_stmt | break_stmt | continue_stmt
| block_stmt | expr_stmt ;
let_stmt = "let" , [ "mut" ] , IDENT , [ ":" , type ] , [ "=" , expr ] , ";" ;
return_stmt = "return" , [ expr ] , ";" ;
if_stmt = "if" , expr_ns , block_stmt , [ "else" , else_branch ] ;
else_branch = if_stmt | block_stmt ;
while_stmt = "while" , expr_ns , block_stmt ;
loop_stmt = "loop" , block_stmt ;
break_stmt = "break" , ";" ;
continue_stmt = "continue" , ";" ;
block_stmt = "{" , { stmt } , "}" ;
expr_stmt = expr , ";" ;
```
---
## If Statement
Conditionally executes a block based on a boolean expression.
```
if <cond> <block> [else <else-branch>]
```
The condition `<cond>` must be an expression of type `bool`. The body is
always a `block_stmt` — braces are mandatory.
### Else Branch
The optional `else` branch is either a plain block or another `if` statement,
enabling `else if` chains of arbitrary length.
```flux
if x > 0 {
pos();
}
if x > 0 {
pos();
} else {
non_pos();
}
if x > 0 {
pos();
} else if x < 0 {
neg();
} else {
zero();
}
```
### If Statement Grammar Summary
```ebnf
if_stmt = "if" , expr_ns , block_stmt , [ "else" , else_branch ] ;
else_branch = if_stmt | block_stmt ;
```
---
## While Loop
Repeatedly executes a block as long as a boolean condition holds. The
condition is tested before each iteration; if it is false on entry, the body
never runs.
```
while <cond> <block>
```
```flux
let mut i = 0;
while i < 10 {
process(i);
i = i + 1;
}
```
### While Loop Grammar Summary
```ebnf
while_stmt = "while" , expr_ns , block_stmt ;
```
---
## Loop
Executes a block unconditionally and indefinitely. The loop runs until a
`break` or `return` inside the body transfers control out.
```
loop <block>
```
```flux
loop {
let msg = recv();
if msg.is_quit() {
break;
}
handle(msg);
}
```
### Loop Grammar Summary
```ebnf
loop_stmt = "loop" , block_stmt ;
```
---
## Break and Continue
`break` and `continue` are only valid inside the body of a `while` or `loop`.
The compiler enforces this as a semantic rule.
| Statement | Effect |
| ------------ | -------------------------------------------------------------- |
| `break ;` | Exits the immediately enclosing loop immediately |
| `continue ;` | Skips the rest of the current iteration; jumps to the next one |
For `while`, `continue` jumps back to the condition check. For `loop`,
`continue` jumps back to the top of the body.
```flux
let mut i = 0;
while i < 20 {
i = i + 1;
if i % 2 == 0 {
continue; // skip even numbers
}
if i > 15 {
break; // stop after 15
}
process(i);
}
```
### Break / Continue Grammar Summary
```ebnf
break_stmt = "break" , ";" ;
continue_stmt = "continue" , ";" ;
```
---
## Block Statement
A block groups zero or more statements into a single statement and introduces
a new lexical scope. Blocks do not produce a value.
```
{ <stmt>* }
```
### Scoping
Bindings declared inside a block are not visible outside it. A binding in an
inner scope may shadow a name from an outer scope without affecting it.
```flux
let x = 1;
{
let x = 2; // shadows outer x inside this block only
f(x); // uses 2
}
// x is still 1 here
```
### Nesting
Blocks may be nested freely to any depth.
```flux
{
let a = compute_a();
{
let b = compute_b();
use(a, b);
}
// b is no longer in scope here
}
```
### Block Grammar Summary
```ebnf
block = "{" , { stmt } , "}" ;
```
---
## Top-Level Definitions
A Flux source file is a sequence of top-level definitions.
```ebnf
program = { top_level_def } ;
top_level_def = func_def | struct_def ;
```
The leading token unambiguously selects the definition kind: `fn` → function,
`struct` → struct.
---
## Function Definition
Defines a named, callable function.
```
fn <name> ( [<params>] ) [-> <return-type>] <block>
```
| Part | Required | Description |
| ------------------ | -------- | -------------------------------------------------------- |
| `<name>` | yes | The function's identifier |
| `( [<params>] )` | yes | Comma-separated parameter list, may be empty |
| `-> <return-type>` | no | Return type; omitting it means the function returns `()` |
| `<block>` | yes | Function body — a `block_stmt` |
### Parameters
Each parameter is a name with a mandatory type annotation. Parameters are
immutable by default; `mut` makes the local binding mutable within the body.
```
[mut] <name> : <type>
```
```flux
fn add(a: i32, b: i32) -> i32 {
return a + b;
}
fn greet(name: *u8) {
print(name);
}
fn increment(mut x: i32) -> i32 {
x = x + 1;
return x;
}
fn apply(f: *opaque, mut buf: [u8; 64]) -> bool {
return call(f, &buf);
}
```
### Return Type
If `->` is omitted the return type is implicitly `()` (the unit type). An
explicit `-> ()` is also permitted but redundant.
```flux
fn do_work() { // returns ()
side_effect();
}
fn get_value() -> i64 { // returns i64
return 42;
}
```
### Function Definition Grammar Summary
```ebnf
func_def = "fn" , IDENT , "(" , param_list , ")" , [ "->" , type ] , block_stmt ;
param_list = [ param , { "," , param } ] ;
param = [ "mut" ] , IDENT , ":" , type ;
```
---
## Struct Definition
Defines a named product type with zero or more typed fields.
```
struct <name> {
<field>: <type>,
...
}
```
Fields are separated by commas. No trailing comma is permitted. An empty
struct (zero fields) is valid.
### Fields
Each field is a name and a type. Fields may be of any type including pointers,
arrays, and other structs. Field names must be unique within the struct.
```flux
struct Point {
x: f32,
y: f32
}
struct Node {
value: i64,
next: *Node
}
struct Buffer {
data: *u8,
len: u64,
cap: u64
}
struct Unit {}
```
### Member Access
Fields of a struct value are accessed with the `.` operator (defined in the
expression grammar). If the value is behind a pointer, dereference it first
with `*`.
```flux
let p: Point = make_point();
let x = p.x;
let ptr: *Point = get_point_ptr();
let y = (*ptr).y;
```
### Struct Definition Grammar Summary
```ebnf
struct_def = "struct" , IDENT , "{" , field_list , "}" ;
field_list = [ field , { "," , field } ] ;
field = IDENT , ":" , type ;
```