AstroScript's compiler follows a classic multi-phase compilation pipeline.
Source Code (.as)
│
▼
┌──────────┐
│ Lexer │ (Flex)
│ lexer.l │
└────┬─────┘
│ tokens
▼
┌──────────┐
│ Parser │ (Bison)
│ parser.y │
└────┬─────┘
│ semantic actions + TAC emission
▼
┌────────────────┐
│ Semantic Check │ (parser.y + symbol_table.cpp)
│ Symbol Metadata│
└────┬───────────┘
│
▼
┌────────────────┐
│ IR Generation │ (tac.cpp)
│ Three-Address │
│ Code (TAC) │
└────┬───────────┘
│
▼
┌────────────────┐
│ Optimization │
│ - Const fold │
│ - Algebraic │
│ - Redundant mv │
└────┬───────────┘
│
▼
┌────────────────┐
│ C-Like Output │
│ Readable code │
│ for playground │
└────┬───────────┘
│
▼
┌────────────────┐
│ Execution │
│ TAC Interpreter│
└────────────────┘
File: backend/compiler/lexer/lexer.l
Converts source code into tokens. Handles:
- Keywords (mission, telemetry, verify, orbit, etc.)
- Operators (add, minus, mul, :=, etc.)
- Literals (integers, floats, strings)
- Identifiers
- Comments ($$ and
$* ... *$ )
File: backend/compiler/parser/parser.y
LALR(1) parser that validates syntax and triggers semantic actions. Defines the grammar for all AstroScript constructs including declarations, assignments, control flow, loops, functions, and modules.
Files: backend/compiler/semantic/symbol_table.h, symbol_table.cpp
Maintains symbol metadata storage (name, type, declared line) and final symbol reporting.
Important: scope-aware declaration checks and most semantic validation logic are implemented in parser.y (for example declareScopedName, isDeclaredName, overload resolution, module inheritance checks, and member access checks).
Files: backend/compiler/ir/tac.h, tac.cpp
Generates three-address code (TAC) instructions:
- Arithmetic operations
- Control flow (labels, gotos, conditional jumps)
- Function calls and returns
- Array operations
- I/O operations
Applied to TAC before execution:
- Constant folding — evaluates constant expressions at compile time
- Algebraic simplification — removes identity operations (x+0, x*1, etc.)
- Redundant move elimination — removes self-assignments
File: backend/compiler/ir/tac.cpp
The backend now prints a readable C-like projection generated from optimized TAC. This output is designed for learning and playground comparison, so users can map AstroScript constructs to familiar C-style statements.
Current behavior (March 30, 2026):
- Function and method call arguments are now carried through into translated call sites.
- Side-effecting calls are preserved in output even when temporary results are not reused.
- Constructor/method invocation emitted after object creation now keeps the original argument list in translation.
- The output remains a learning-focused pseudo-C projection, not a strict production transpiler.
The TAC interpreter executes the optimized instructions using a stack-based runtime with:
- Call frames for function scope
- Array storage
- Parameter passing stack
backend/compiler/
├── lexer/ # Flex lexer definition
├── parser/ # Bison parser grammar
├── semantic/ # Symbol table
├── ir/ # Three-address code generator
├── include/ # Shared headers (future)
├── main.cpp # Compiler entry point
└── build/ # Generated files and binary
For exact lexer/parser/symbol/TAC line-by-line execution points for conditionals, loops, functions, and classes, see:
docs/compiler-feature-execution-map.md
AstroScript uses a three-address intermediate representation (TAC) as a bridge between parsing and runtime.
- Each IR instruction is a compact 4-field record: operation + up to two inputs + one output.
- Examples of operations include assignment, arithmetic, control flow jumps, function calls, array access, object field access, and built-in math functions.
- Parser semantic actions emit these TAC instructions directly while reducing grammar rules.
So instead of executing syntax-tree nodes directly, the compiler first rewrites source constructs into a uniform low-level instruction list.
Before execution, optimize() applies three passes in order:
constantFold()- Computes expressions early when both operands are compile-time numeric constants.
- Also folds unary math built-ins when domain-safe (for example avoids folding invalid
root(-x)orlogarithm(0)).
algebraicSimplify()- Applies identity rules like
x + 0 -> x,x * 1 -> x,x * 0 -> 0,x / 1 -> x.
- Applies identity rules like
removeRedundantMoves()- Removes no-op moves such as
x = x.
- Removes no-op moves such as
These passes are local and semantics-preserving for the supported cases.
- Separation of concerns
- Front-end (syntax and semantic checks) is separated from runtime execution.
- Target independence
- The same IR can be interpreted now and can later support additional backends if needed.
- Easier correctness and debugging
- TAC is explicit (
label,goto,ifFalse,call), so control/data flow is visible and testable.
- TAC is explicit (
- Better performance and cleaner output
- Constant/algebraic simplifications reduce runtime work.
- Better pedagogy
- Students can inspect optimized TAC and C-like translation, then compare with source behavior.
In short: IR is the canonical executable form, and optimization is an early clean-up step that reduces unnecessary runtime effort without changing intended program behavior.