WIP: Type checker #96

YerinAlexey · 2024-03-22T16:57:26Z

Fixes #50

This took around two evenings of work, but there it is.

Currently not integrated in build system and generators because everything is going to break horribly in that case.

A quick design overview:

There are 3 main components to this:

Checked AST
Type table
Scope table

Checked AST

The checked AST is quite similar to the regular parsed AST with a few additions:

All expressions have a result type attached to them
Types in all structures are mandatory and inferred if needed. For functions that don't have a return type, void is used
Variables and types are referenced by IDs in their respective tables

Type table

The type table is actually a tree structure that stores and deduplicates types. Each type is given a unique Id value which can then be used to reference it anywhere in the type checker. The Type structure stores some metadata about the type like size and alignment (necessary for QBE/LLVM codegen) and a Repr (standing for representation) which is what "type" usually refers to.

Basic types are:

any (any type can be converted to it, currently has no defined size; TODO)
int
string
bool
void

The base types are inserted into the type table during its construction and their IDs are available as fields of the types::Table struct.

Those can be combined into composite types:

T[n] - fixed arrays
T[] - dynamic arrays
struct{...} - structures

There's also a special named type, which just relays information about its inner type but can be referred to by name. This is currently only used for structs, but can allow for more generic type aliases like C's typedef or Rust's type a = b;.

Though, I'm currently not very happy with the design of named types in particular right now

Scope table

Scope table is a tree-like structure, implemented similarly to type table, that tracks scopes and their relations, as well as variables in those scopes. A scope is established by either a function declaration, a loop or a block statement. Scopes also store a loop flag which makes tracking loops for break/continue much easier.

A scope stores mappings of variables to their type IDs, it could possibly also store additional information for codegen.

TODO

garritfra · 2024-03-22T17:25:43Z

Oh wow, nice work!

This check phase is quite complex, but I assume there's no way around a second AST structure if we want proper typing.

(Please don't forget to add tests and fix up the dev docs afterwards, but I'm sure you're aware of that)

YerinAlexey · 2024-03-22T17:47:14Z

This check phase is quite complex, but I assume there's no way around a second AST structure if we want proper typing.

It's not actually that complex, there's just a lot of boilerplate code (especially function parameters that get split onto multiple lines). The logic is really simple if you get to it

(Please don't forget to add tests and fix up the dev docs afterwards, but I'm sure you're aware of that)

I'll probably add tests after refactoring error handling so failure tests can explicitly specify which error should be produced

The parser only checks the following token for a 'return;' construct but doesn't actually consume the semicolon. When the parser starts processing the next statement, it will fail because semicolon is not a valid token in that context.

YerinAlexey · 2024-03-23T21:03:21Z

The type checker finally supports all language features! I also went through a few rounds of refactoring and I think the new internal API turned out to be a bit better. There are some improvements for parser/AST here, which are independent of the type checker. Most notable change is disallowing uninitialized bindings without an explicit type (let x).

I also relaxed type restrictions for +, it can also concatenate strings now. I'm not sure what the exact semantics should be, but I ended up allowing string + (int|string|bool). The code also turned out to be a bit convoluted, I feel like it's better to move concatenation into a separate operator.

A few last things to be solved:

There needs to be a way of declaring a prototype for a function so that the type checker knows types of its arguments and return type
Handle duplicate objects in a better way. Currently I expect a lot of things to break if something is declared twice
Figure out the semantics of any for non-JS backends

My next goal is going to be adding tests for all this, refactoring error handling (it's inconsistent and lacks useful context), and trying to integrate it with the build system and generators

Short assignment forms (+=, -=, *=, /=) are now combined with the Assign statement instead of being expressions. To implement that, convoluted suffix expressions (field/array access and calls) handling in the parser had to be reworked. Right now all suffix expressions are parsed inside parse_expression. The assignment logic is also deduplicated between parse_statement and various expression parsers.

Those are already handled under Whitespace

- Introduce a FileTable for easily referencing files instead of cloning paths - Attach a Position to errors returned from lexer and parser - Add End token signifying the end of file to provide position when reaching an unexpected EOF - Move util::string_util::highlight_position_in_file to a method of lexer::Error

This significantly speeds up tests

YerinAlexey added 3 commits March 23, 2024 17:19

ast: Use TypedVariable in places where types are required by the grammar

7d53bc7

parser: Consume the semicolon in 'return;'

4ea1701

The parser only checks the following token for a 'return;' construct but doesn't actually consume the semicolon. When the parser starts processing the next statement, it will fail because semicolon is not a valid token in that context.

ast: Remove redundant capacity field of Expression::Array

4f431ae

YerinAlexey force-pushed the typecheck branch from 0e55691 to 5a0dfa1 Compare March 23, 2024 20:43

YerinAlexey added 10 commits March 26, 2024 15:06

command/run: Simplify

e483d7a

parser: Require either an expression or type for variable declarations

f6eebee

Get rid of parser-level type inferencer

a486e80

Split Function and Method in the AST, add function prototypes

7f70d6d

lexer: Remove Tab and CarriageReturn tokens

e559a89

Those are already handled under Whitespace

ast: Add location information to Statement and Expression

d4395fe

Run tests using create::command::run instead of spawning cargo

5597474

This significantly speeds up tests

WIP: Type checker

d20784e

YerinAlexey force-pushed the typecheck branch from 5a0dfa1 to d20784e Compare March 26, 2024 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Type checker #96

WIP: Type checker #96

YerinAlexey commented Mar 22, 2024 •

edited

Loading

garritfra commented Mar 22, 2024

YerinAlexey commented Mar 22, 2024

YerinAlexey commented Mar 23, 2024 •

edited

Loading

WIP: Type checker #96

Are you sure you want to change the base?

WIP: Type checker #96

Conversation

YerinAlexey commented Mar 22, 2024 • edited Loading

Checked AST

Type table

Scope table

TODO

garritfra commented Mar 22, 2024

YerinAlexey commented Mar 22, 2024

YerinAlexey commented Mar 23, 2024 • edited Loading

YerinAlexey commented Mar 22, 2024 •

edited

Loading

YerinAlexey commented Mar 23, 2024 •

edited

Loading