A sophisticated multi-phase compiler written in Rust, designed to demonstrate advanced compiler construction techniques through semantic analysis, intermediate representation optimization, and code execution.
- Project Overview
- Tech Stack
- Architecture & Compilation Pipeline
- Features
- Compiler Phases
- Keywords & C/C++ Relations
- Operator Precedence Table
- Compiler Optimizations
- Test Cases & Examples
- Building & Running
YaarScript is an educational compiler that parses a C-like language and compiles it to Three-Address Code (TAC) with sophisticated intermediate representation optimizations. The compiler demonstrates modern compiler theory concepts including:
- Hybrid parsing architecture combining Recursive Descent and Pratt Parsing
- Multi-phase semantic analysis with scope and type checking
- Advanced IR optimization with constant folding, dead code elimination, and peephole optimization
- Execution engine capable of interpreting compiled TAC instructions
This project serves as a reference implementation for compiler design patterns and offers a complete compilation pipeline from source code to execution.
| Component | Technology | Purpose |
|---|---|---|
| Language | Rust (2024 edition) | Memory safety, performance, and low-level control |
| Module System | Cargo workspace | Project organization and dependency management |
| Parsing | Pratt Parser + Recursive Descent | Efficient expression and statement parsing |
| Intermediate Code | Three-Address Code (TAC) | Platform-independent intermediate representation |
| Analysis | Symbol Table + Type System | Semantic validation and type checking |
| Optimization | Multi-pass IR optimizer | Performance enhancement at compile-time |
| Execution | Custom TAC interpreter | Direct program execution without code generation |
┌─────────────────────────────────────────────────────────────────┐
│ Source Code │
│ (test_input.txt) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 1. LEXICAL ANALYSIS (Lexer) │
│ - Tokenization: Source → Tokens │
│ - Error Reporting: Invalid symbols │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. SYNTAX ANALYSIS (Parser) │
│ - Parsing: Tokens → Abstract Syntax Tree (AST) │
│ - Algorithm: Hybrid (Recursive Descent + Pratt) │
│ - Output: Colored AST visualization │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. SEMANTIC ANALYSIS - SCOPE (ScopeAnalyzer) │
│ - Symbol collection and validation │
│ - Forward reference checking │
│ - Error Detection: 8 types of scope errors │
│ - Output: Symbol table (ScopeFrame tree) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. SEMANTIC ANALYSIS - TYPE (TypeChecker) │
│ - Type compatibility validation │
│ - Function signature verification │
│ - Error Detection: 17 types of type errors │
│ - Strict type checking (no implicit conversions) │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 5. IR GENERATION (TACGenerator) │
│ - Conversion: AST → Three-Address Code │
│ - Operands: Temps, Vars, Literals, Labels │
│ - Instructions: 13 TAC instruction types │
│ - Features: Enum constant optimization │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 6. IR OPTIMIZATION (IROptimizer) - Multi-Pass │
│ - Constant Folding │
│ - Constant Propagation │
│ - Copy Propagation │
│ - Peephole Optimization │
│ - Dead Code Elimination (DCE) │
│ - Iterative fixed-point optimization │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 7. EXECUTION (ExecutionEngine) │
│ - TAC interpretation │
│ - Runtime evaluation │
│ - Output printing │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────┐
│ Program Output │
└─────────────────────┘
| Phase | Pattern | Implementation |
|---|---|---|
| Parser | Hybrid Parsing | Recursive Descent (statements) + Pratt (expressions) |
| Scope Analyzer | Two-Pass Analysis | Declaration collection + Validation |
| Type Checker | Scope-Aware Traversal | Reuses symbol table with scope navigation |
| TAC Generator | AST Visitor | Linear instruction stream generation |
| IR Optimizer | Multi-Pass Fixed-Point | Repeated passes until fixed-point convergence |
- ✅ Data Types:
int,float,bool,char,void - ✅ Variables: Global and local scope with qualifiers (
const,global) - ✅ Functions: Prototypes, definitions, return types, parameters
- ✅ Enumerations: Named constants with automatic integer mapping
- ✅ Operators:
- Arithmetic:
+,-,*,/,% - Logical:
&&,||,! - Bitwise:
&,|,^,<<,>> - Comparison:
==,!=,<,>,<=,>= - Unary:
-,!,++,--(prefix/postfix) - Assignment:
=
- Arithmetic:
- ✅ Control Flow:
- Conditional:
if,else if,else - Loops:
while,for - Switching:
switchwithcaseanddefault - Jump:
break,return
- Conditional:
- ✅ I/O:
print()function for output
- ✅ Hybrid Parser: Combines Recursive Descent and Pratt parsing for optimal efficiency
- ✅ Comprehensive Error Reporting:
- 1 Lexical error type
- 5+ Syntax error types
- 8 Scope error types
- 17 Type error types
- ✅ Symbol Table Management: Hierarchical scope frames with forward reference support
- ✅ Strict Type Checking: No implicit conversions, explicit type validation
- ✅ TAC Generation: Enum-optimized three-address code
- ✅ Multi-Pass Optimization:
- Constant folding (arithmetic evaluation)
- Constant propagation (literal substitution)
- Copy propagation (variable aliasing)
- Peephole optimization (algebraic simplifications)
- Dead code elimination (recursive analysis)
- ✅ Direct Execution: TAC interpreter for immediate program execution
Documentation: See docs/README
Test Suite: tests/lexer/
Tokenizes source code into a stream of tokens. Handles:
- Keywords, identifiers, literals
- Single/multi-character operators
- String and character literals
- Comments and whitespace
Documentation: See docs/PARSER.md
Test Suite: Various parser test cases
Constructs an Abstract Syntax Tree using:
- Recursive Descent for statements and declarations
- Pratt Parsing for expressions with operator precedence
Key Components:
parse_program(): Entry point, parses declarations and main blockparse_statement(): Parses individual statementsparse_expression(): Pratt parser engine for expressionsast_printer.rs: Colored tree visualization tool
Example:
// Source Code
int x = 5 + 10 * 2;
// AST Representation
VarDecl(int, "x")
├─ InitExpr
└─ BinaryExpr(+)
├─ IntLiteral(5)
└─ BinaryExpr(*)
├─ IntLiteral(10)
└─ IntLiteral(2)Documentation: See docs/SCOPE_ANALYZER.md
Test Suite: tests/scope/
Analyzes symbol declarations and manages lexical scopes through two passes:
- Declaration Collection: Gathers all top-level declarations
- Validation: Performs scope analysis and error detection
Detects 8 Error Types:
UndeclaredVariableAccessed- Variable used but never declaredUndefinedFunctionCalled- Function called but never definedVariableRedefinition- Variable declared multiple timesFunctionPrototypeRedefinition- Duplicate function prototypeConflictingFunctionDefinition- Function with different signatureConflictingDeclaration- Name used for different symbol typesParameterRedefinition- Same parameter name used twiceInvalidForwardReference- Symbol referenced before declaration
Valid Example:
int add(int a, int b); // Forward declaration
int add(int a, int b) { // Definition matches prototype
return a + b;
}
main {
int result = add(10, 20); // Valid function call
print(result);
}Error Example:
main {
int x = 10;
int x = 20; // ERROR: VariableRedefinition
}Documentation: See docs/TYPE_ANALYZER.md
Test Suite: tests/type/
Ensures type safety through strict type checking:
Detects 17 Error Types:
ErroneousVarDecl- Variable declared as voidFnCallParamCount- Wrong number of function argumentsFnCallParamType- Wrong function argument typeErroneousReturnType- Return type mismatchExpressionTypeMismatch- Assignment type mismatchExpectedBooleanExpression- Boolean expected elsewhereErroneousBreak- Break outside loopNonBooleanCondStmt- Non-boolean condition in if/while/forAttemptedBoolOpOnNonBools- Logical operators on non-boolsAttemptedBitOpOnNonInt- Bitwise operators on non-integersAttemptedShiftOnNonInt- Shift operators on non-integersAttemptedAddOpOnNonNumeric- Arithmetic on non-numeric typesReturnStmtInVoid- Return value in void functionNonBooleanSwitchExpression- Non-int/char switch expressionInvalidCaseValueType- Case type mismatchIncrementDecrementOnNonInt- ++/-- on non-integersNotOnNonBool- NOT operator on non-boolean
Valid Example:
int calculateArea(int width, int height) {
int area = width * height;
return area; // Correct return type
}
bool isPassing(float score) {
if (score >= 0.75) { // Boolean condition ✓
return true;
}
return false;
}
main {
int result = calculateArea(5, 10); // Correct arg types ✓
bool pass = isPassing(0.85); // Type compatible ✓
}Error Example:
int add(int a, int b) {
return a + b;
}
main {
int x = add(10, 3.14); // ERROR: Float to int parameter
if (x) { // ERROR: int in boolean context
print("Test");
}
}Documentation: See docs/TAC_GENERATION.md
Test Suite: tests/ir-generation/
Transforms the AST into Three-Address Code:
TAC Instruction Set:
| Instruction | Example | Purpose |
|---|---|---|
Declare |
int x |
Variable declaration with type |
Assign |
x = 10 |
Value assignment |
Binary |
t0 = a + b |
Binary operations |
Unary |
t1 = !x |
Unary operations |
Label |
L0: |
Jump target |
Goto |
goto L0 |
Unconditional jump |
IfTrue |
ifTrue x goto L1 |
Conditional jump (true) |
IfFalse |
ifFalse x goto L0 |
Conditional jump (false) |
Call |
Call(func_name, args) |
Function invocation |
Param |
param x |
Function parameter |
Return |
return x |
Function exit with value |
FuncStart |
FuncStart(name, params) |
Function entry point |
FuncEnd |
FuncEnd |
Function exit marker |
Print |
print x |
Output instruction |
Operand Types:
enum Operand {
Temp(usize), // Virtual register: t0, t1, etc.
Var(String), // Named variable
Int(i64), // Integer literal
Float(f64), // Float literal
Bool(bool), // Boolean literal
Char(char), // Character literal
String(String), // String literal
Label(String), // Jump target
None, // Void/no operand
}Example TAC Generation:
Source:
for (int i = 0; i < 3; i++) {
print(i);
}Generated TAC:
int i = 0
L1:
t4 = i Lt 3
ifFalse t4 goto L2
print i
i = i Plus 1
goto L1
L2:
Documentation: See docs/IR_OPTIMIZATION.md
Test Suite: tests/ir-generation/ir-optimization.txt
Multi-pass optimization engine that improves code efficiency:
Evaluates arithmetic expressions at compile-time:
Input: t0 = 20 * 5
Output: t0 = 100
Substitutes known constants with their literal values:
Input: x = 10
t0 = x + 5
Output: x = 10
t0 = 10 + 5 // Next pass: constant folding → 15
Replaces variables with their source values:
Input: val = original
t0 = val * 2
Output: val = original
t0 = original * 2
Applies algebraic identities and control flow simplifications:
Algebraic Optimizations:
$x + 0 \rightarrow x$ $x - 0 \rightarrow x$ $x - x \rightarrow 0$ $x \times 1 \rightarrow x$ $x \times 0 \rightarrow 0$ $x \div 1 \rightarrow x$
Control Flow:
Input: goto L1
L1:
Output: (goto removed - execution falls through)
Removes unused instructions through mark-and-sweep:
Example:
// Source
main {
int a = 10;
int b = a + 5;
int c = b * 1 + 0; // Optimized to: c = b
int d = 999; // Never used → REMOVED
print(c); // c is used
}Optimization Trace:
Pass 1: Constant Folding
int c = b * 1 + 0 → int c = b
Pass 2: Copy Propagation
int b = a + 5
int c = b → int c = a + 5
Pass 3: Dead Code Elimination
int d = 999 is unused → REMOVED
Pass 4: Constant Propagation
int a = 10
int c = 10 + 5 → int c = 15
Final TAC:
int a = 10
int c = 15
print c
Iterative Fixed-Point: Optimizations repeat until no further changes occur.
YaarScript keywords are inspired by C/C++ with key differences:
| Keyword | Type | C/C++ Equivalent | Purpose |
|---|---|---|---|
int |
Type | int |
64-bit signed integer |
float |
Type | float |
64-bit floating-point |
bool |
Type | bool |
Boolean value (true/false) |
char |
Type | char |
Single character |
void |
Type | void |
No return value |
const |
Qualifier | const |
Read-only variable |
global |
Qualifier | Static global scope | Global variable |
if |
Control | if |
Conditional execution |
else |
Control | else |
Alternative execution |
while |
Control | while |
Loop while condition true |
for |
Control | for |
Iterative loop |
switch |
Control | switch |
Multi-way branch |
case |
Control | case |
Switch case label |
default |
Control | default |
Default switch case |
break |
Control | break |
Break from loop/switch |
return |
Control | return |
Return from function |
enum |
Type | enum |
Named constant type |
main |
Block | main() |
Entry point (special) |
print |
I/O | printf/cout |
Output function |
| Feature | C/C++ | YaarScript | Reason |
|---|---|---|---|
| Type Safety | Implicit conversions allowed | Strict, no implicit conversion | Educational rigor |
| Main Function | Function declaration | Special block | Simplification |
| Prototypes | Optional | Required pattern | Explicit forward refs |
| Enums | Global scope only | Global scope only | Enforced structure |
| Type Qualifiers | Multiple (const, volatile, etc.) |
const, global |
Focused subset |
| Pointers | Fully featured | Not supported | Simplification |
| Memory | Manual allocation (malloc) |
Automatic | Educational focus |
The parser enforces operator precedence from lowest to highest binding power:
| Level | Precedence | Operators | Associativity | Example |
|---|---|---|---|---|
| 0 | Lowest | - | - | - |
| 1 | Assignment | = |
Right-to-left | a = b = c → a = (b = c) |
| 2 | Logical OR | || |
Left-to-right | a || b || c → (a || b) || c |
| 3 | Logical AND | && |
Left-to-right | a && b && c → (a && b) && c |
| 4 | Equality | ==, != |
Left-to-right | a == b != c → (a == b) != c |
| 5 | Comparison | <, >, <=, >= |
Left-to-right | a < b <= c → (a < b) <= c |
| 6 | Bitwise | &, |, ^, <<, >> |
Left-to-right | a & b | c → (a & b) | c |
| 7 | Additive | +, - |
Left-to-right | a + b - c → (a + b) - c |
| 8 | Multiplicative | *, /, % |
Left-to-right | a * b / c → (a * b) / c |
| 9 | Unary | -, !, ++, -- (prefix) |
Right-to-left | !-x → !(-x) |
| 10 | Postfix | ++, -- (postfix) |
Left-to-right | x++-- → (x++)-- |
| 11 | Call | () |
Highest | f(x + 1) → f((x + 1)) |
Complex Precedence Example:
// Parse: a + b * c == d || e
//
// Step-by-step:
// 1. b * c (precedence 8)
// 2. a + (b*c) (precedence 7)
// 3. (a + b*c) == d (precedence 4)
// 4. ((a + b*c) == d) || e (precedence 2)
//
// Final: (a + (b * c)) == d) || eValid Example (tests/scope/valid.txt):
enum Color {
Red,
Green,
Blue
};
int globalCounter = 0;
void printMessage(int code);
int add(int a, int b) {
int result = a + b;
return result;
}
Color getColor(int index) {
if (index == 0) {
return Red;
}
return Blue;
}
main {
int x = 42;
float y = 3.14;
int sum = add(x, 10);
printMessage(sum);
Color c = getColor(1);
}✅ No scope errors - All symbols declared before use, no redefinitions
Error Examples (tests/scope/erros.txt):
// ERROR 1: Variable Redefinition
int x = 10;
int x = 20; // ❌ Redefinition in same scope
// ERROR 2: Undeclared Variable Access
int result = unknownVar + 5; // ❌ unknownVar never declared
// ERROR 3: Undefined Function Call
int value = nonExistentFunc(10); // ❌ Function never defined
// ERROR 4: Conflicting Declaration
enum Size { Small, Large };
int Size = 10; // ❌ Size already enum typeValid Example (tests/type/valid.txt):
enum Status {
Active,
Inactive,
Pending
};
int globalId = 100;
float threshold = 0.75;
int calculateArea(int width, int height) {
int area = width * height;
return area; // ✓ Correct return type (int)
}
bool isPassing(float score) {
if (score >= threshold) { // ✓ Boolean condition
return true;
}
return false;
}
void logStatus(int s) {
switch (s) {
case 1:
print("Status is Active");
break;
case 2:
print("Status is Inactive");
break;
default:
print("Status is Pending");
}
}
main {
int area = calculateArea(5, 10); // ✓ Args match params
bool pass = isPassing(0.85); // ✓ Type compatible
logStatus(area); // ✓ Type compatible
}✅ No type errors - All types match, operations valid
Error Examples (tests/type/error.txt):
enum Color { Red, Green };
// ERROR 1: Erroneous Variable Declaration
void invalidVar; // ❌ Variable cannot be void
int add(int a, int b) {
return a + b;
}
main {
int i = 10;
float f = 3.14;
// ERROR 2: Function Parameter Count Mismatch
add(10); // ❌ Expected 2 args, got 1
// ERROR 3: Function Parameter Type Mismatch
add(10, f); // ❌ Expected int, got float
// ERROR 4: Expression Type Mismatch
i = 3.14; // ❌ Cannot assign float to int
// ERROR 5: Erroneous Break
break; // ❌ break outside loop/switch
// ERROR 6: Non-Boolean Condition Statement
if (i) { // ❌ if expects bool, got int
print("wont work");
}
// ERROR 7: Attempted Boolean Operation on Non-Bools
int result = 10 && 20; // ❌ && expects bool operands
// ERROR 8: Attempted Bitwise Operation on Non-Int
float bitResult = f & 2.2; // ❌ & expects int operands
// ERROR 9: Attempted Shift on Non-Int
float shiftResult = f << 2; // ❌ << expects int operands
// ERROR 10: Attempted Addition on Non-Numeric
bool b = true;
int sum = b + 5; // ❌ Cannot add bool and int
}Example Optimization (tests/ir-generation/ir-optimization.txt):
Source Code:
main {
int a = 10;
int b = a + 5;
int c = b * 1 + 0;
int d = 999; // Never used
print(c);
}TAC Before Optimization:
int a
a = 10
int b
t0 = a Plus 5
b = t0
int c
t1 = b Mult 1
t2 = t1 Plus 0
c = t2
int d
d = 999
print c
TAC After Optimization:
Pass 1: Constant Folding
b * 1 + 0 → b
Pass 2: Constant Propagation
a = 10; b = a + 5 → a = 10; b = 15
Pass 3: Copy Propagation
c = b → c = 15
Pass 4: Dead Code Elimination
d = 999 removed (never used)
Final TAC:
int a
a = 10
int b
b = 15
int c
c = 15
print c
- Rust 2024 edition (or compatible)
- Cargo package manager
cargo build --release# Execute compiler on test file
cargo run -- test_input.txtEdit your test case in test_input.txt, then:
cargo run --releaseCreate test file and check tests/scope/ examples
Use examples from tests/type/
Review tests/ir-generation/ir-optimization.txt
For detailed information about each compiler phase:
| Phase | Documentation | Details |
|---|---|---|
| Lexer | docs/LEXER.md |
Tokenization, error handling |
| Parser | docs/PARSER.md | Hybrid parsing, AST structure, precedence |
| Scope Analyzer | docs/SCOPE_ANALYZER.md | Symbol table, scope frames, error types |
| Type Analyzer | docs/TYPE_ANALYZER.md | Type rules, error detection, type system |
| TAC Generation | docs/TAC_GENERATION.md | Instruction set, code structure, operands |
| IR Optimization | docs/IR_OPTIMIZATION.md | Optimization passes, algorithms, examples |
YaarScript/
├── Cargo.toml # Project manifest
├── README.md # This file
├── test_input.txt # Input program
│
├── src/
│ ├── main.rs # Compiler driver
│ ├── lib.rs # Module exports
│ ├── error.rs # Error reporting
│ │
│ ├── core/ # Core data structures
│ │ ├── token.rs # Token definitions
│ │ └── ast.rs # AST node types
│ │
│ ├── lexer/ # Lexical analysis
│ │ └── lexer.rs
│ │
│ ├── parser/ # Syntax analysis
│ │ ├── parser.rs # Pratt parser
│ │ └── ast_printer.rs # Tree visualization
│ │
│ ├── semantics/ # Semantic analysis
│ │ ├── scope.rs # Scope checking
│ │ └── type_checker.rs # Type checking
│ │
│ ├── ir_pipeline/ # Intermediate representation
│ │ ├── tac.rs # TAC generation
│ │ └── tac_optimizer.rs # IR optimization
│ │
│ └── codegen/ # Code generation
│ └── execution.rs # TAC execution engine
│
├── docs/ # Phase documentation
│ ├── PARSER.md
│ ├── SCOPE_ANALYZER.md
│ ├── TYPE_ANALYZER.md
│ ├── TAC_GENERATION.md
│ └── IR_OPTIMIZATION.md
│
└── tests/ # Test cases by phase
├── scope/
│ ├── valid.txt
│ └── erros.txt
├── type/
│ ├── valid.txt
│ └── error.txt
├── ir-generation/
│ └── ir-optimization.txt
└── execution/
└── exec.txt
This is an educational compiler project designed to teach:
- Compiler architecture and design patterns
- Intermediate representation techniques
- Optimization algorithms
- Semantic analysis strategies
- Type system implementation
Perfect as a:
- Learning resource for compiler construction
- Reference implementation for compiler phases
- Template for custom language design
- Advanced compiler optimization study
Created: February 2026
Language: Rust 2024
Type: Educational Compiler
Status: Feature Complete