Skip to content

Commit ec8276b

Browse files
committed
tweaks
1 parent c228386 commit ec8276b

6 files changed

Lines changed: 159 additions & 132 deletions

File tree

docs/ARCHITECTURE.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -331,21 +331,13 @@ However, we don't emit error nodes for missing delimiters, but rather, recover b
331331

332332
## Stage 3: Semantic Analysis and AST Lowering
333333

334-
**Semantic Analysis** involves walking the AST (a post-order traversal) and constructing a **high-level intermediate representation (HIR)**. Specifically, to create the nodes of an HIR, it involves three parts:
335-
- Name Resolution
336-
- Type Checking
337-
- Desugaring
334+
**Semantic Analysis** involves traversing the AST starting at the root node, and constructing a **high-level intermediate representation (HIR)**. Specifically, semantic analysis involves three phases:
335+
- **Collection**: register item signatures because crawfish supports top-level and block-level item hoisting
336+
- **Name Resolution**: resolve identifiers at use-sites to their bindings
337+
- **Type Checking**: verify type correctness of expressions and statements
338+
- **Desugaring**: As the HIR is constructed, certain AST constructs are desugared along the way.
338339

339-
However, that two-pass design breaks when resolution depends on types ([source](https://www.reddit.com/r/ProgrammingLanguages/comments/w0biir/comment/igdt1ce/)). For instance:
340-
```
341-
// Which `foo`? Depends on type of `x`
342-
x.foo()
343-
344-
// Which `+`? Depends on types of operands (if you have overloading)
345-
a + b
346-
```
347-
348-
### Name Resolution
340+
### Symbol Table
349341

350342
An identifier's **scope** is the part of a program where it's accessible. An identifier may refer to different values in different parts of the program. Crawfish particularly has **static scope**, which means the visibility and accessibility of an identifier are determined by its physical location within the source code, at compile-time.
351343

@@ -366,8 +358,6 @@ func outer(x: i32) -> i32 {
366358

367359
This means that scope frames are not always "transparent" (i.e., the semantic analyzer can see outer names). As such, each scope frame has an associated `ScopeKind` to be used as a flag that tells lookup when to start filtering out locals. Most scopes are `ScopeKind::Normal`, which behave like ordinary lexical scopes. However, when the semantic analyzer enters a nested function item, it pushes a `ScopeKind::Item` scope, which acts as an item boundary. During lookup, once resolution crosses an Item scope, outer local bindings (`BindingId::Local`) are no longer visible, while item bindings (`BindingId::Item`) remain visible.
368360

369-
Additionally, since crawfish supports top-level and block-level item hoisting, a simple single-pass AST walk is not sufficient. Inspired by rustc, the compiler uses a two-step approach per scope. First, it walks the immediate contents of a module or block and records all item names into the scope's symbol table. Then, it walks statements and expressions in that scope and resolves each identifier by looking it up in the symbol tables that were already populated ([source](https://rustc-dev-guide.rust-lang.org/name-resolution.html#overall-strategy)).
370-
371361
> [!NOTE]
372362
> Since most code don't nest very deeply, we _may_ further optimize the symbol table (specifically, avoiding allocation churn) by pre-allocating around 4 to 8 empty hashmaps and reusing cleared hashmaps instead of allocating and dropping them ([source](https://www.reddit.com/r/Compilers/comments/1dy9722/comment/lc833ho/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button))
373363
@@ -378,16 +368,27 @@ Additionally, since crawfish supports top-level and block-level item hoisting, a
378368
Our high-level intermediate representation is close to our AST, and should be able to support high-level optimizations such as inlining and constant folding ([source](https://www.cs.cornell.edu/courses/cs4120/2026sp/notes.html?id=ir)).
379369

380370
There are two common approaches for high-level IRs ([source](https://www.reddit.com/r/ProgrammingLanguages/comments/1cj0oj2/comment/l2fqafw/)):
381-
-
382-
-
371+
- ...
372+
- ...
383373

384-
Unlike the AST, the HIR doesn't follow a SoA design because it's [less ergonomic for semantic analysis](https://www.reddit.com/r/rust/comments/160kqz9/comment/jxq36ug/).
374+
Unlike the AST, the HIR doesn't follow a SoA design because it's less ergonomic for semantic analysis due to the several passes involved ([source](https://www.reddit.com/r/rust/comments/160kqz9/comment/jxq36ug/)).
385375

386376
### Semantic Analyzer
387377

378+
The collecting phase which allows forward references involves two parts per scope: first scan for item definitions to populate the symbol table, then walk statements sequentially ([source](https://rustc-dev-guide.rust-lang.org/name-resolution.html#overall-strategy), [source 2, page 23](https://web.stanford.edu/class/cs143/lectures/lecture09.pdf)).
379+
380+
Name resolution and type checking must be interleaved when resolution depends on types ([source](https://www.reddit.com/r/ProgrammingLanguages/comments/w0biir/comment/igdt1ce/)). For instance:
381+
```
382+
// Which `foo`? Depends on type of `x`
383+
x.foo()
384+
385+
// Which `+`? Depends on types of operands (if you have overloading)
386+
a + b
387+
```
388+
388389
## Stage 4: HIR Lowering
389390

390-
https://www.reddit.com/r/ProgrammingLanguages/comments/1boul8y/comment/kwtxulc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
391+
Rather than going from HIR directly to LLVM, it makes sense to [source](https://www.reddit.com/r/ProgrammingLanguages/comments/1boul8y/comment/kwtxulc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button).
391392

392393
### Mid-level Intermediateion Representation (MIR)
393394

src/front_end/semantic_analysis/module_table.rs

Whitespace-only changes.

src/front_end/semantic_analysis/semantic_analyzer.rs

Lines changed: 100 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,11 @@ use crate::diagnostics::diagnostic_store::Diagnostic;
33
use crate::diagnostics::semantic_diagnostics::SemanticDiagnostic;
44
use crate::front_end::semantic_analysis::hir::{BindingId, Hir};
55
use crate::front_end::semantic_analysis::symbol_table::{DefineError, ScopeKind, SymbolTable};
6-
use crate::front_end::semantic_analysis::types::{Ty, TypeInterner};
7-
use crate::front_end::syntactic_analysis::ast::{Ast, FunctionDefinitionId, ItemKind};
6+
use crate::front_end::semantic_analysis::types::TypeInterner;
7+
use crate::front_end::syntactic_analysis::ast::{
8+
Ast, BlockExpressionId, ConstantDefinitionId, ExpressionId, FunctionDefinitionId, ItemId,
9+
ItemKind, StatementId, StatementKind,
10+
};
811

912
pub struct SemanticAnalyzer<'ast> {
1013
ast: &'ast Ast,
@@ -28,103 +31,127 @@ impl<'ast> SemanticAnalyzer<'ast> {
2831
}
2932

3033
pub fn analyze(mut self) -> (Hir, Vec<Diagnostic>) {
31-
self.collect_source_file_items();
3234
self.analyze_source_file_items();
3335
(self.hir, self.diagnostics)
3436
}
3537

36-
fn collect_source_file_items(&mut self) {
38+
fn analyze_source_file_items(&mut self) {
3739
self.symbols.enter_scope(ScopeKind::Item);
38-
let unknown_type = self.types.intern(Ty::Unknown);
39-
for &item in &self.ast.source_file_items {
40-
match item.kind() {
41-
ItemKind::FunctionDefinition => {
42-
let node = &self.ast.function_definitions[item.index().into()];
43-
let Some(name) = self.ast.resolve_identifier(node.name) else {
44-
continue;
45-
};
46-
let item_binding = self.hir.create_item_binding(name, unknown_type, node.span);
47-
if let Err(DefineError::AlreadyDefined { prev_binding }) = self
48-
.symbols
49-
.add_binding(name, BindingId::Item(item_binding))
50-
{
51-
self.diagnostics.push(Diagnostic::Semantic(
52-
SemanticDiagnostic::DuplicateDefinition {
53-
name: self.string_interner.resolve(name).unwrap().to_string(),
54-
span: node.span,
55-
previous_span: self.hir.get_item_binding_info(prev_binding).span,
56-
},
57-
));
58-
}
59-
}
60-
ItemKind::ConstantDefinition => {
61-
let node = &self.ast.constant_definitions[item.index().into()];
62-
let Some(name) = self.ast.resolve_identifier(node.name) else {
63-
continue;
64-
};
65-
let item_binding = self.hir.create_item_binding(name, unknown_type, node.span);
66-
if let Err(DefineError::AlreadyDefined { prev_binding }) = self
67-
.symbols
68-
.add_binding(name, BindingId::Item(item_binding))
69-
{
70-
self.diagnostics.push(Diagnostic::Semantic(
71-
SemanticDiagnostic::DuplicateDefinition {
72-
name: self.string_interner.resolve(name).unwrap().to_string(),
73-
span: node.span,
74-
previous_span: self.hir.get_item_binding_info(prev_binding).span,
75-
},
76-
));
77-
}
78-
}
79-
ItemKind::Error => continue,
80-
}
40+
41+
// pre-pass for item hoisting
42+
let start = self.ast.source_file.items.start as usize;
43+
let len = self.ast.source_file.items.len as usize;
44+
for &item_id in &self.ast.source_file_items[start..start + len] {
45+
self.collect_item(item_id);
8146
}
82-
}
8347

84-
fn analyze_source_file_items(&mut self) {
85-
for &item in &self.ast.source_file_items {
86-
match item.kind() {
48+
// analyze
49+
let start = self.ast.source_file.items.start as usize;
50+
let len = self.ast.source_file.items.len as usize;
51+
for &item_id in &self.ast.source_file_items[start..start + len] {
52+
match item_id.kind() {
8753
ItemKind::FunctionDefinition => {
88-
self.analyze_function_definition(item.index().into());
54+
self.analyze_function_definition(item_id.index().into());
8955
}
9056
ItemKind::ConstantDefinition => {
91-
self.analyze_constant_definition(item.index().into());
57+
self.analyze_constant_definition(item_id.index().into());
9258
}
9359
ItemKind::Error => continue,
9460
}
9561
}
62+
9663
self.symbols.exit_scope();
9764
}
9865

99-
fn analyze_function_definition(&mut self, node: FunctionDefinitionId) {
66+
fn analyze_function_definition(&mut self, id: FunctionDefinitionId) {
10067
todo!()
10168
}
10269

103-
fn analyze_constant_definition(&mut self, node: FunctionDefinitionId) {
70+
fn analyze_constant_definition(&mut self, id: ConstantDefinitionId) {
10471
todo!()
10572
}
106-
}
10773

108-
// fn lower_block(&mut self, block: &ast::BlockExpressionId) -> hir::NodeId {
109-
// self.symbols.enter_scope(ScopeKind::Item);
110-
// for statement in self.ast.block_statements {
111-
// if let StatementKind::FunctionDefinition(...) | StatementKind::ConstDefinition(...) = stmt {
112-
// let item_id = self.hir.push_node(...);
113-
// self.symbols.add_binding(name, BindingId::Item(item_id))?;
114-
// }
115-
// }
74+
fn analyze_block(&mut self, id: BlockExpressionId) {
75+
let block = &self.ast.block_expressions[id];
76+
let start = block.statements.start as usize;
77+
let len = block.statements.len as usize;
11678

117-
// self.symbol_table.enter_scope(ScopeKind::Normal);
118-
// for stmt in block.statements {
119-
// self.lower_statement(stmt);
120-
// }
121-
// let tail = block.tail.map(|e| self.lower_expression(e));
79+
// pre-pass for item hoisting
80+
self.symbols.enter_scope(ScopeKind::Item);
81+
for &statement_id in &self.ast.block_statements[start..start + len] {
82+
if let StatementKind::ItemStatement = statement_id.kind() {
83+
let node = &self.ast.item_statements[statement_id.index().into()];
84+
self.collect_item(node.item);
85+
}
86+
}
12287

123-
// // 3. exit both scopes
88+
// main pass: normal scope, walk statements sequentially
89+
self.symbols.enter_scope(ScopeKind::Normal);
90+
for &statement_id in &self.ast.block_statements[start..start + len] {
91+
self.analyze_statement(statement_id);
92+
}
93+
let tail = block.tail.map(|expr_id| self.analyze_expression(expr_id));
94+
self.symbols.exit_scope();
12495

125-
// // emit the block node
126-
// }
127-
// }
96+
self.symbols.exit_scope();
97+
}
98+
99+
fn analyze_statement(&mut self, id: StatementId) {
100+
todo!()
101+
}
102+
103+
fn analyze_expression(&mut self, id: ExpressionId) {
104+
todo!()
105+
}
106+
107+
fn collect_item(&mut self, id: ItemId) {
108+
match id.kind() {
109+
ItemKind::FunctionDefinition => {
110+
let node = &self.ast.function_definitions[id.index().into()];
111+
let Some(name) = self.ast.resolve_identifier(node.name) else {
112+
return;
113+
};
114+
let item_binding_id =
115+
self.hir
116+
.create_item_binding(name, self.types.unknown_id, node.span);
117+
if let Err(DefineError::AlreadyDefined { prev_binding_id }) = self
118+
.symbols
119+
.add_binding(name, BindingId::Item(item_binding_id))
120+
{
121+
self.diagnostics.push(Diagnostic::Semantic(
122+
SemanticDiagnostic::DuplicateDefinition {
123+
name: self.string_interner.resolve(name).unwrap().to_string(),
124+
span: node.span,
125+
previous_span: self.hir.get_item_binding_info(prev_binding_id).span,
126+
},
127+
));
128+
}
129+
}
130+
ItemKind::ConstantDefinition => {
131+
let node = &self.ast.constant_definitions[id.index().into()];
132+
let Some(name) = self.ast.resolve_identifier(node.name) else {
133+
return;
134+
};
135+
let item_binding_id =
136+
self.hir
137+
.create_item_binding(name, self.types.unknown_id, node.span);
138+
if let Err(DefineError::AlreadyDefined { prev_binding_id }) = self
139+
.symbols
140+
.add_binding(name, BindingId::Item(item_binding_id))
141+
{
142+
self.diagnostics.push(Diagnostic::Semantic(
143+
SemanticDiagnostic::DuplicateDefinition {
144+
name: self.string_interner.resolve(name).unwrap().to_string(),
145+
span: node.span,
146+
previous_span: self.hir.get_item_binding_info(prev_binding_id).span,
147+
},
148+
));
149+
}
150+
}
151+
ItemKind::Error => return,
152+
}
153+
}
154+
}
128155

129156
// fn analyze_let(&mut self, node: &LetStatementNode) -> StatementId {
130157
// let value = self.lower_expression(node.value);
@@ -151,24 +178,6 @@ impl<'ast> SemanticAnalyzer<'ast> {
151178
// }
152179
// }
153180

154-
// fn analyze_top_level_items(&mut self) {
155-
// for &item_id in &self.ast.top_level_items {
156-
// if item_id.is_error() {
157-
// continue;
158-
// }
159-
160-
// match item_id.kind() {
161-
// TopLevelItemKind::ConstDefinition => {
162-
// self.analyze_const_definition(item_id);
163-
// }
164-
// TopLevelItemKind::FunctionDefinition => {
165-
// // TODO: Implement later
166-
// }
167-
// TopLevelItemKind::Error => {}
168-
// }
169-
// }
170-
// }
171-
172181
// fn analyze_const_definition(
173182
// &mut self,
174183
// item_id: crate::front_end::syntactic_analysis::ast::TopLevelItemId,

src/front_end/semantic_analysis/symbol_table.rs

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,8 @@ use crate::common::string_interner::Symbol;
22
use crate::front_end::semantic_analysis::hir::BindingId;
33
use std::collections::HashMap;
44

5-
#[derive(Debug)]
6-
pub enum DefineError {
7-
AlreadyDefined { prev_binding: BindingId },
5+
pub struct SymbolTable {
6+
scopes: Vec<Scope>,
87
}
98

109
#[derive(Clone, Debug)]
@@ -20,16 +19,14 @@ pub enum ScopeKind {
2019
Item,
2120
}
2221

23-
#[derive(Default)]
24-
pub struct SymbolTable {
25-
scopes: Vec<Scope>,
22+
#[derive(Debug)]
23+
pub enum DefineError {
24+
AlreadyDefined { prev_binding_id: BindingId },
2625
}
2726

2827
impl SymbolTable {
2928
pub fn new() -> Self {
30-
let mut st = Self::default();
31-
st.enter_scope(ScopeKind::Normal);
32-
st
29+
SymbolTable { scopes: Vec::new() }
3330
}
3431

3532
pub fn enter_scope(&mut self, kind: ScopeKind) {
@@ -43,30 +40,29 @@ impl SymbolTable {
4340
self.scopes.pop().expect("pop on empty scope stack");
4441
}
4542

46-
pub fn add_binding(&mut self, name: Symbol, binding: BindingId) -> Result<(), DefineError> {
43+
pub fn add_binding(&mut self, name: Symbol, binding_id: BindingId) -> Result<(), DefineError> {
4744
let scope = self.scopes.last_mut().unwrap();
48-
if let Some(&prev) = scope.bindings.get(&name) {
49-
return Err(DefineError::AlreadyDefined { prev_binding: prev });
45+
if let Some(&prev_binding_id) = scope.bindings.get(&name) {
46+
return Err(DefineError::AlreadyDefined { prev_binding_id });
5047
}
51-
scope.bindings.insert(name, binding);
48+
scope.bindings.insert(name, binding_id);
5249
Ok(())
5350
}
5451

5552
pub fn find_binding(&self, name: Symbol) -> Option<BindingId> {
5653
let mut block_outer_locals = false;
5754
for scope in self.scopes.iter().rev() {
58-
if let Some(&binding) = scope.bindings.get(&name) {
59-
match binding {
60-
BindingId::Item(_) => return Some(binding),
61-
BindingId::Local(_) if !block_outer_locals => return Some(binding),
55+
if let Some(&binding_id) = scope.bindings.get(&name) {
56+
match binding_id {
57+
BindingId::Item(_) => return Some(binding_id),
58+
BindingId::Local(_) if !block_outer_locals => return Some(binding_id),
6259
BindingId::Local(_) => {} // crossed item boundary: block outer locals
6360
}
6461
}
6562
if scope.kind == ScopeKind::Item {
6663
block_outer_locals = true;
6764
}
6865
}
69-
7066
None
7167
}
7268
}

0 commit comments

Comments
 (0)