Files
compiler-old/PLAN.md
T

2.5 KiB

Compiler Roadmap

A Rust-flavored, C-targeting language - built pipeline-first.

Implementation language: Rust
Code generation target: x86-64 (AT&T / Intel syntax .s → assembled via GAS or NASM)

Phase 1 - Lexer

  • Define token enum (int literal, bool literal, ident, keywords, operators, punctuation)
  • Implement character-by-character scanner loop
  • Handle whitespace & single-line comments (//)
  • Produce source spans (file, line, col) on every token
  • Unit-test: known inputs → expected token streams

Phase 2 - Parser

  • Write grammar for the base subset
    • fn declarations, return, int/bool literals
    • Arithmetic (+, -, *, /)
  • Implement recursive-descent parser
  • Attach source spans to every AST node
  • Emit structured parse errors with span info
  • Unit-test: parse valid snippets, expect correct AST shapes

Phase 3 - Semantic Analysis

  • Implement scope-aware symbol table
  • Name resolution pass - resolve all Ident nodes to their declarations
  • Type inference / checking for int and bool
  • Validate function return types match declared signature
  • Error on use-before-declaration and undeclared symbols
  • Unit-test: ill-typed programs produce correct diagnostics

Phase 4 - x86-64 Code Generation

  • Design a simple intermediate representation (linear IR, or use AST directly)
  • Implement stack-frame layout for local variables
  • Emit System V AMD64 ABI-compliant function prologues / epilogues
  • Codegen for arithmetic & comparison expressions
  • Codegen for function calls (argument passing via registers)
  • Codegen for return statements
  • Output .s file, assemble with NASM / GAS
  • End-to-end test: compile a simple fn → run → correct exit code

Planned Features (Backlog)

Control flow

  • if / else branching
  • while loops

Types & memory

  • Typed pointers (*T)
  • Opaque pointers (*void / *opaque)
  • Raw pointer arithmetic & dereference
  • Fixed-size arrays ([T; N])
  • Slices (&[T] / []T)

Composite types

  • Structs & field access
  • Enums (C-style tagged unions)
  • Pattern matching (match / switch)

Strings & interop

  • String literals & *u8 handling
  • Variadic functions (for printf interop)
  • extern / FFI declarations

Tooling & backend

  • Proper register allocator
  • Debug info (DWARF)
  • Standard library bootstrap (print, malloc wrapper)