You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ARCHITECTURE.md
+21-20Lines changed: 21 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -331,21 +331,13 @@ However, we don't emit error nodes for missing delimiters, but rather, recover b
331
331
332
332
## Stage 3: Semantic Analysis and AST Lowering
333
333
334
-
**Semantic Analysis** involves walking the AST (a post-order traversal) and constructing a **high-level intermediate representation (HIR)**. Specifically, to create the nodes of an HIR, it involves three parts:
335
-
- Name Resolution
336
-
- Type Checking
337
-
- Desugaring
334
+
**Semantic Analysis** involves traversing the AST starting at the root node, and constructing a **high-level intermediate representation (HIR)**. Specifically, semantic analysis involves three phases:
335
+
-**Collection**: register item signatures because crawfish supports top-level and block-level item hoisting
336
+
-**Name Resolution**: resolve identifiers at use-sites to their bindings
337
+
-**Type Checking**: verify type correctness of expressions and statements
338
+
-**Desugaring**: As the HIR is constructed, certain AST constructs are desugared along the way.
338
339
339
-
However, that two-pass design breaks when resolution depends on types ([source](https://www.reddit.com/r/ProgrammingLanguages/comments/w0biir/comment/igdt1ce/)). For instance:
340
-
```
341
-
// Which `foo`? Depends on type of `x`
342
-
x.foo()
343
-
344
-
// Which `+`? Depends on types of operands (if you have overloading)
345
-
a + b
346
-
```
347
-
348
-
### Name Resolution
340
+
### Symbol Table
349
341
350
342
An identifier's **scope** is the part of a program where it's accessible. An identifier may refer to different values in different parts of the program. Crawfish particularly has **static scope**, which means the visibility and accessibility of an identifier are determined by its physical location within the source code, at compile-time.
351
343
@@ -366,8 +358,6 @@ func outer(x: i32) -> i32 {
366
358
367
359
This means that scope frames are not always "transparent" (i.e., the semantic analyzer can see outer names). As such, each scope frame has an associated `ScopeKind` to be used as a flag that tells lookup when to start filtering out locals. Most scopes are `ScopeKind::Normal`, which behave like ordinary lexical scopes. However, when the semantic analyzer enters a nested function item, it pushes a `ScopeKind::Item` scope, which acts as an item boundary. During lookup, once resolution crosses an Item scope, outer local bindings (`BindingId::Local`) are no longer visible, while item bindings (`BindingId::Item`) remain visible.
368
360
369
-
Additionally, since crawfish supports top-level and block-level item hoisting, a simple single-pass AST walk is not sufficient. Inspired by rustc, the compiler uses a two-step approach per scope. First, it walks the immediate contents of a module or block and records all item names into the scope's symbol table. Then, it walks statements and expressions in that scope and resolves each identifier by looking it up in the symbol tables that were already populated ([source](https://rustc-dev-guide.rust-lang.org/name-resolution.html#overall-strategy)).
370
-
371
361
> [!NOTE]
372
362
> Since most code don't nest very deeply, we _may_ further optimize the symbol table (specifically, avoiding allocation churn) by pre-allocating around 4 to 8 empty hashmaps and reusing cleared hashmaps instead of allocating and dropping them ([source](https://www.reddit.com/r/Compilers/comments/1dy9722/comment/lc833ho/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button))
373
363
@@ -378,16 +368,27 @@ Additionally, since crawfish supports top-level and block-level item hoisting, a
378
368
Our high-level intermediate representation is close to our AST, and should be able to support high-level optimizations such as inlining and constant folding ([source](https://www.cs.cornell.edu/courses/cs4120/2026sp/notes.html?id=ir)).
379
369
380
370
There are two common approaches for high-level IRs ([source](https://www.reddit.com/r/ProgrammingLanguages/comments/1cj0oj2/comment/l2fqafw/)):
381
-
-
382
-
-
371
+
- ...
372
+
- ...
383
373
384
-
Unlike the AST, the HIR doesn't follow a SoA design because it's [less ergonomic for semantic analysis](https://www.reddit.com/r/rust/comments/160kqz9/comment/jxq36ug/).
374
+
Unlike the AST, the HIR doesn't follow a SoA design because it's less ergonomic for semantic analysis due to the several passes involved ([source](https://www.reddit.com/r/rust/comments/160kqz9/comment/jxq36ug/)).
385
375
386
376
### Semantic Analyzer
387
377
378
+
The collecting phase which allows forward references involves two parts per scope: first scan for item definitions to populate the symbol table, then walk statements sequentially ([source](https://rustc-dev-guide.rust-lang.org/name-resolution.html#overall-strategy), [source 2, page 23](https://web.stanford.edu/class/cs143/lectures/lecture09.pdf)).
379
+
380
+
Name resolution and type checking must be interleaved when resolution depends on types ([source](https://www.reddit.com/r/ProgrammingLanguages/comments/w0biir/comment/igdt1ce/)). For instance:
381
+
```
382
+
// Which `foo`? Depends on type of `x`
383
+
x.foo()
384
+
385
+
// Which `+`? Depends on types of operands (if you have overloading)
Rather than going from HIR directly to LLVM, it makes sense to [source](https://www.reddit.com/r/ProgrammingLanguages/comments/1boul8y/comment/kwtxulc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button).
0 commit comments