-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Type checker #96
base: master
Are you sure you want to change the base?
Conversation
Oh wow, nice work! This check phase is quite complex, but I assume there's no way around a second AST structure if we want proper typing. (Please don't forget to add tests and fix up the dev docs afterwards, but I'm sure you're aware of that) |
It's not actually that complex, there's just a lot of boilerplate code (especially function parameters that get split onto multiple lines). The logic is really simple if you get to it
I'll probably add tests after refactoring error handling so failure tests can explicitly specify which error should be produced |
The parser only checks the following token for a 'return;' construct but doesn't actually consume the semicolon. When the parser starts processing the next statement, it will fail because semicolon is not a valid token in that context.
The type checker finally supports all language features! I also went through a few rounds of refactoring and I think the new internal API turned out to be a bit better. There are some improvements for parser/AST here, which are independent of the type checker. Most notable change is disallowing uninitialized bindings without an explicit type ( I also relaxed type restrictions for A few last things to be solved:
My next goal is going to be adding tests for all this, refactoring error handling (it's inconsistent and lacks useful context), and trying to integrate it with the build system and generators |
Short assignment forms (+=, -=, *=, /=) are now combined with the Assign statement instead of being expressions. To implement that, convoluted suffix expressions (field/array access and calls) handling in the parser had to be reworked. Right now all suffix expressions are parsed inside parse_expression. The assignment logic is also deduplicated between parse_statement and various expression parsers.
Those are already handled under Whitespace
- Introduce a FileTable for easily referencing files instead of cloning paths - Attach a Position to errors returned from lexer and parser - Add End token signifying the end of file to provide position when reaching an unexpected EOF - Move util::string_util::highlight_position_in_file to a method of lexer::Error
This significantly speeds up tests
Fixes #50
This took around two evenings of work, but there it is.
Currently not integrated in build system and generators because everything is going to break horribly in that case.
A quick design overview:
There are 3 main components to this:
Checked AST
The checked AST is quite similar to the regular parsed AST with a few additions:
void
is usedType table
The type table is actually a tree structure that stores and deduplicates types. Each type is given a unique
Id
value which can then be used to reference it anywhere in the type checker. TheType
structure stores some metadata about the type like size and alignment (necessary for QBE/LLVM codegen) and aRepr
(standing for representation) which is what "type" usually refers to.Basic types are:
any
(any type can be converted to it, currently has no defined size; TODO)int
string
bool
void
The base types are inserted into the type table during its construction and their IDs are available as fields of the
types::Table
struct.Those can be combined into composite types:
T[n]
- fixed arraysT[]
- dynamic arraysstruct{...}
- structuresThere's also a special named type, which just relays information about its inner type but can be referred to by name. This is currently only used for structs, but can allow for more generic type aliases like C's
typedef
or Rust'stype a = b;
.Though, I'm currently not very happy with the design of named types in particular right now
Scope table
Scope table is a tree-like structure, implemented similarly to type table, that tracks scopes and their relations, as well as variables in those scopes. A scope is established by either a function declaration, a loop or a block statement. Scopes also store a loop flag which makes tracking loops for
break
/continue
much easier.A scope stores mappings of variables to their type IDs, it could possibly also store additional information for codegen.
TODO
fn declared_elsewhere(x: int): string;
let x: int[] = []
andlet x: string[] = []
when checking the[]
expressionany
type) to code generatorsString
for errors (hints should also fit here)