Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tree-sitter based highlighter #5099

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pjungkamp
Copy link
Contributor

Description

This introduces a tree-sitter highlighter that maps tree-sitter captures to kakoune faces.
This allows kakoune to highlight some recursive grammars that the regions based highlighting is not able to parse (e.g. Nix, Shell, Python).

  • The default mapping from query captures to kakounes faces is based on nvim-treesitter.
  • A subset of language injections are supported, these can be added as tree-sitter-injection sub-highlighters.

Example - Nix

This is an example from a Nix codebase i recently visited.

The problem here is that a Nix string can contain interpolated nix expression, which in turn can have nested strings with the same delimiter (e.g "outer string ${ let var = "inner string"; in var }").

  • Regions based highlighting:
    Screenshot from 2024-02-06 14-08-50

  • tree-sitter based highlighting:
    Screenshot from 2024-02-06 14-08-16

Building

This highlighter is optional and can be excluded by passing tree_sitter=no to make.

Related Issues

TODO

  • proper tree edits with cached trees (currently reparses the whole file instead of adjusting trees incrementally)
  • lazily compile grammars (if precompiled grammars are part of the the configuration tree a config couldn't be shared across architectures)
  • decide on a file path for grammars and queries (currently %val{runtime}/grammars)
  • support more injections (especially combined injections this would allow us to e.g. highlight the embedded Bash in the example above)

@tototest99
Copy link

Hello,
are you aware of kak-tree-sitter and merge initiative with redondant LSP features ?

{ create_tree_sitter_highlighter, &tree_sitter_desc } });
registry.insert({
"tree-sitter-injection",
{ create_tree_sitter_injection_highlighter, &tree_sitter_injection_desc } });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't care about about highlighting but I'm interested in commands to select syntax tree nodes etc.
LSP provides some of that but it's not optimized for it.

In general, integrations with external tools live in scripts in rc/.
I wonder if that could work for this feature too?
We can add any missing generic highlighter types like the InjectionHighlighterApplier to support these cases.

I think there is great value in having an obvious boundary between C++ core and scripts. It keeps us honest.
With the shared library approach, tree sitter can do something that other integrations cannot.

I wonder what's the difference to https://github.com/phaazon/kak-tree-sitter ?
As a user, I think it would be great if we concentrate most effort on one approach.

In either case, I think tree sitter integration is highly valuable and I'd probably follow whichever approach gains traction.

Thanks

@hadronized
Copy link
Contributor

Hello,

I think it’s an interesting approach, but in the same time, it bothers me a bit. Not because I’m the author of kak-tree-sitter and I spent a lot of time working on it. But because of the fact that I love the philosophy behind Kakoune, and I think you’re bending it. The reason for that is that Kakoune ships with zero dependencies. What makes it use external tools is basically just calling-out to external programs via %sh{} or kak -p. You would argue that you have :git commands, but look closer: those are just Kakoune commands wrapping around the git, awk, perl etc. commands.

However, I do think that what I made with KTS is super complex and that Kakoune needs more toolings to make it easier to integrate (and I think @krobelus would also benefit from that for kak-lsp). For instance, in KTS, I have to start a daemonized server to handle a parsed representation of buffers for a set of sessions. That means that Kakoune must stream its buffers’ content to KTS (which can be expensive). I optimized that by writing directly to FIFO opened by KTS (i.e. that means that there is no shell creation to do so, it’s just a basic, low-level write to the FIFO). Even with that, I think we could do better, because you need to do that for all « integrations » (KTS, kak-lsp, kak-whatnot, etc.).

@mawww
Copy link
Owner

mawww commented Feb 10, 2024

This is indeed an impressive PR, but I do not intend to merge it for a few reason.

First I do not want to have multiple, competing, built-in highlighting to maintain, and I do not want to solely rely on tree-sitter for highlighting. Having both will likely lead to one bitrotting with time.

Second (and most importantly), as noted by @phaazon, this goes against Kakoune's design principles. Introducing a dependency directly in core for a functionality that could be implemented externally. I do agree that there are some limitations at the moment, I have not looked at kak-tree-sitter but I suspect it is a far more complex codebase than what you did in that PR, I hope we can find a way to simplify how external plugins that rely on buffer content work.

@pjungkamp
Copy link
Contributor Author

Motivation

I just spend some time with typst which is a language that suffers extraordinarily bad from the limitations of the regions highlighter. I tried to write kakoune highlighters that worked both for the markdown and code regions in typst but I ran into so many unfixable highlighting errors that I looked into other types of highlighting.

kak-tree-sitter

The most promising approach that I saw for accurate highlighting is probably tree-sitter. I did check out kak-tree-sitter. I'm using it and love it! But it feels very much less responsive than what I'd expect from kakoune, especially when using a power-saving governor and platform profile on my laptop...

The inherent problem there is that it can't use the ts_tree_edit API efficiently to make reparsing of large buffers cheap instead reparsing the entire buffer sent over a pipe and it has to use a kak -p process to actually report the highlighted ranges back to kakoune.

I do have some ideas for the second problem, e.g. a kak -P <session> flag that in contrast to kak -p <session> keeps running and allows multiple commands to be passed to kakoune.

The first problem though lead me to write the code in this PR. I checked out kak-tree-sitter's code and the tree-sitter project itself. This draft is just a POC of integrating tree-sitter with kakoune's internal structures. My main goal is to see where I'd have to introduce IPC interfaces to expose the necessary information to drive a more efficient tree-sitter plugin without introducing the tree-sitter dependency.

I wouldn't want to have this tree-sitter highlighter PR merged into master either. It does not fit kakounes goals. You can't even compile statically because of the dynamic loading of parsers.

Some thoughts

My goal is to supply the ts_tree_edit API with the changes in a buffer. The code in this PR can't do that, it reparses the entire buffer.

The tree-sitter library takes all positions on edits as both point (row & column) based and byte offset based coordinates. The kakoune struct Buffer is optimized for line-based editing and does only provide point based views of changes. See Buffer::changes_since. I don't quite see a way to add efficient annotations of byte offsets to struct Buffer.


TLDR: I opened this PR because I was tinkering around and wanted to see who's interested and active on this topic.

@hadronized
Copy link
Contributor

About the partial updates / edit in place, I have this still open about that topic. It’s not something I have started working on because I want to stabilize the performance and features already (and I think it’s more important to have semantic text-objects first before going full optimizations), but clearly yes, it can have a negative impact on “how fast you see highlighting”. Also, the speed at which kak -p applies and blocks the editor is probably something that could be worked on completely independently of KTS or anything else.

@krobelus
Copy link
Contributor

The inherent problem there is that it can't use the ts_tree_edit API efficiently to make reparsing of large buffers cheap instead reparsing the entire buffer sent over a pipe

if the slow part is parsing you can probably work around it by computing a diff so you can use the incremental API.
It would probably be more elegant if Kakoune provided the changes in some diff format in a DidChangeIdle hook.. something like %val{history} but since the previous timestamp. But it's hard to tell if that actually makes a difference. Probably a test case would help.

it has to use a kak -p process to actually report the highlighted ranges back to kakoune.

not necessarily; you can write to Kakoune's socket directly, see https://github.com/tomKPZ/pykak

The tree-sitter library takes all positions on edits as both point (row & column) based and byte offset based coordinates. The kakoune struct Buffer is optimized for line-based editing and does only provide point based views of changes. See Buffer::changes_since. I don't quite see a way to add efficient annotations of byte offsets to struct Buffer.

That's an interesting problem indeed.
I think until Kakoune provides buffer diffs, addressing this won't make much of a difference.

@Song-Tianxiang
Copy link

can we have something like vim's text-properties

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regions are matched greedily, not recursively
6 participants