-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any support for custom UNICODE encodings? #473
Comments
Yes, that would be a good feature to have. Pull requests would be welcome! |
Good to know. I'm still new to Go, so as I ramp up and reach some level of comfort with it, can someone give me a quick overview of how Unicode is handled in limetext? |
It's just using Go native utf8 string and []rune literals. Other encodings would have to be converted to/from the Go native format upon load and save. IIRC there's currently no single point used for all IO operations so that might be a good point to start. Not sure what all the different file IO requirements are, but it might be possible to satisfy them with just a ReadWriteCloser interface. Each encoding would then satisfy that very same IO interface, and hence can just wrap its corresponding Reader and Writer function which accepts data in one encoding, converts the data to the other encoding and forwards it to the next Reader/Writer in the chain. |
Or if you were hoping to be able to work directly in an encoded format (mmap-ed files comes to mind), you'd have to implement a InnerBufferInterface dealing with that. |
Go native strings are utf8, but rune[]s are utf32... To be really neutral to all encodings, utf32 seems like an ideal.. And yeah, I read the utf8 and utf16 codec packages for Go. That's exactly the kind of package/module I envision could be created to enable a new encoding. In fact, I'm thinking of an encoding designing GUI tool, where people map what they need to different binary ranges, and the code is automatically generated.. About working directly on encoded representation, I guess it'll be useful only in case of very large files. Is that also one of the aims for Limetext? |
Type conversion from/to
I was doing something similar for reverse engineering purposes. Might put that up later today or in the weekend if I get a chance if you'd like to use any of it. It pretty much allows you to specify that a range of memory is in a certain format and it'll use QML to display it.
Not personally, but I wouldn't stop someone who finds that useful from submitting a pull request enabling that use case. |
No I get that conversion is built into the language. But from the point of view of LimeText, the in-memory representation of the content of a text file is what I'm trying to understand... Is it
Sounds interesting... what do you mean by "is in a certain format" ? Since I'm still learning the ropes, I think I'll defer working on the very-very-large-file use case for later. |
Just to confirm, this enhancement would be entirely implemented within the limetext/text repository, and not involving the limetext/lime repo... right? |
Neither actually. It's a rope-ish hierarchical data structure with individual nodes dealing with But the details of that is hidden behind the InnerBufferInterface which only deals with positions and For ease of use there's also a Buffer interface which expands the InnerBufferInterface with helper functions that allows you to work with strings if that's preferred.
Well, this is getting of topic for this repo, but what I did was to have a
Depends on what you mean with "this enhancement". If you intend to look into multiple text encodings it would be better to have your own repository for that, or if you'd like to donate it to the limetext org a separate repo here, as it'll likely be useful for others who are interested in dealing with different encodings but have no interest at all in limetext otherwise. At the very least it should go into its own package IMO as it's not strictly related to any existing functionality but rather expands current functionality with a new dimension. |
Yeah I finally got it, after also reading the source and the article linked w.r.t the rope-ish data structure. But it's clear that the internal encoding of the editor is UTF-32, which is good.
Yes most definitely I would host the encoding maps separately. But the architecture/entry point that accepts a plugin with encode/decode functions has to be part of the editor itself. Each editor also needs its own implementation of the encode/decode functions, I don't think DLLs are a good option, can't use them cross platform. |
Not sure why DLLs are mentioned. If they were considered, why not just use gconv? |
And for reference, https://github.com/qiniu/iconv enables one such approach |
Also looks like there are some pure go interfaces and concrete implementations in the golang.org/x/text package. Nice! |
yeah eventually I would add these encodings to gconv if/when it gets widespread.... EDIT: After checking out iconv, that seems like a nice starting point... they have done much of the work I had in mind, including the codec source code generator given the character map. Maybe you could include a version of iconv in LimeText, in which case writing a new plugin is tantamount to adding a new encoding to iconv...? Being so solid and standard, I hope i can persuade more people to take that approach too. Perhaps a separate directory for custom encodings. What I'm trying to do is not invent a new encoding to use in programming languages, which can be done quite easily... I want to be able to use these new encodings with text files, which requires the editor to be able to be augmented with new encoding plugins. I mentioned DLLs because that's one way plugins are made. It's not ideal because it's tied to Windows. I would prefer each such encoding to be in the form of a spec, which is (manually/automatically) converted into Go source code in a form suitable to be plugged in into LimeText. Right now I'm looking for some help on how to enable the plugging in part in Lime.Then UTF-8 would be the first test case... I hope you got the general order of how I think this should go. Please do point out any deficiencies, factors I have not considered, and your suggestions about what order I should implement these aspects. |
Bump... |
I'd suggest you create a github repo with your encoding and make it satisfy the https://godoc.org/golang.org/x/text/encoding#Encoding. For hooking into lime, we'd need to make sure all our IO operations go via io.Reader and io.Writer, and if they aren't figure out why and if they can be changed to do so. That way we can just use https://godoc.org/golang.org/x/text/transform#NewReader to support reading files in any encoding and https://godoc.org/golang.org/x/text/transform#NewWriter to support writing files in any encoding. |
Anybody here interested to allow custom encodings that work on UNICODE? To clarify, I mean an alternative unicode encoding to UTF-8, UTF-16, etc...
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/7079233-any-support-for-custom-unicode-encodings?utm_campaign=plugin&utm_content=tracker%2F282001&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F282001&utm_medium=issues&utm_source=github).The text was updated successfully, but these errors were encountered: