2.8 KiB
2.8 KiB
Compiler Roadmap
A Rust-flavored, C-targeting language - built pipeline-first.
Implementation language: Rust
Code generation target: Native object files (.o) via Cranelift JIT/AOT
Phase 1 - Lexer
- Define token enum (int literal, bool literal, ident, keywords, operators, punctuation)
- Implement character-by-character scanner loop
- Handle whitespace & single-line comments (
//) - Produce source spans (file, line, col) on every token
- Unit-test: known inputs → expected token streams
Phase 2 - Parser
- Write grammar for the base subset
fndeclarations,return, int/bool literals- Arithmetic (
+,-,*,/)
- Implement recursive-descent parser
- Attach source spans to every AST node
- Emit structured parse errors with span info
- Unit-test: parse valid snippets, expect correct AST shapes
Phase 3 - Semantic Analysis
- Implement scope-aware symbol table (environment)
- Name resolution pass - resolve all
Identnodes to their declarations - Hindley-Milner type inference (unification, type variables, occurs check)
- Integer literal sizing and unary minus type promotion logic
- Translate untyped AST directly into a fully-typed AST (Typed AST)
- Validate function return types match declared signature
- Error on use-before-declaration, undeclared symbols, and type mismatches
- Unit-test: HM unification, type mappings, and ill-typed program diagnostics
Phase 4 - Code Generation via Cranelift
- Integrate
cranelift-codegen,cranelift-frontend, andcranelift-object - Implement CLI with
clap(--emit-irflag, input/output files) - Map Typed AST types (
Ty) to Cranelift IR types - Lower functions, parameters, and variable definitions to Cranelift IR
- Codegen for arithmetic, unary operations, and
returnstatements - Run built-in optimization passes (constant folding, e-graphs, DCE)
- Output System V AMD64 ABI-compliant
.omachine code files - End-to-end test: compile a simple
fn→ link viagcc→ run → correct exit code
Planned Features (Backlog)
Control flow
if/elsebranchingwhileloops
Types & memory
- Typed pointers (
*T) - Opaque pointers (
*void/*opaque) - Raw pointer arithmetic & dereference
- Fixed-size arrays (
[T; N]) - Slices (
&[T]/[]T)
Composite types
- Structs & field access
- Enums (C-style tagged unions)
- Pattern matching (
match/switch)
Strings & interop
- String literals &
*u8handling - Variadic functions (for
printfinterop) extern/ FFI declarations
Tooling & backend
- Proper register allocator
- Debug info (DWARF)
- Standard library bootstrap (
print,mallocwrapper)