-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Process comments (and extra nodes) separately from the rest of the code #500
base: main
Are you sure you want to change the base?
Conversation
In order to fix #430, this should be done for OCaml attributes too. They are not included with the current heuristic. Attributes can appear almost everywhere in OCaml. To avoid overly complicating the grammar, tree-sitter-ocaml really allows them everywhere, even in some invalid places. If you replace the comments in the example of #489 with attributes, the idempotency rule will still be violated (the input will be parseable by tree-sitter, but the result will not be). But since the code was invalid OCaml to begin with, that's not a huge problem. |
I think this heuristic can be avoided by using the languages.toml file. |
48bfcc0
to
f5da50b
Compare
f5da50b
to
936a925
Compare
936a925
to
43a106e
Compare
43a106e
to
74199c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly follow this and it feels like it's on the right track. I did a commit-wise review, so some of my suggestions against earlier commits may no longer apply, or apply elsewhere.
I appreciate the modularisation of comments and types and the comments you added to the source to describe what's going on. This is quite a tricky operation, so those comments are super-encouraged to save our future selves' sanity 😅
d566486
to
edc5e08
Compare
723f95f
to
5fb3642
Compare
274844c
to
0b8f39b
Compare
ad5b82c
to
2a0f34b
Compare
2a0f34b
to
c20032b
Compare
This PR is an attempt at processing the comments separately from the rest of the code, so that formatting queries can be both resilient to the presence of random comments, and simple to write.
Identifying comments
This is a challenge in itself:
(block_comment) @comment
. Indeed, it would mean either running the query twice, or accounting for the presence of comments in the rest of the query file, both situations we want to avoid.node.is_extra() && node.kind().to_string().contains("comment")
. I think it should capture all types of comments for all supported languages, but I haven't checked that yet.Anchoring comments
Each comment should be anchored to a non-comment node of the code. Ideally, it should be the node it's supposed to comment, but such semantics can't be deduced from the CST alone. This PR uses the following heuristics instead:
Extracting comments
tree-sitter
grammars and queries don't offer lots of tools to edit an existing CST. The only reasonable way I've found is to useinput.replace_range()
to remove the comment's bytes from the input, thentree.edit()
to mark a node as edited in the CST, then finallyreparse(old_tree, new_input, grammar)
to get the new CST, without comments. The query file would then be applied to this new CST.Re-inserting comments
This is a part I haven't had time to experiment with, but I think re-insertion should be done after processing all append/prepend directives, but before post-processing. I imagine something like this should work:
(anchor).(space).(comment)
(line breaks should already have been taken care of).(comment).(line break).(anchor)
.State of the PR