-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mod_translation_updater.py: Parses escape codes in .tr files incorrectly #1
Comments
The current behavior is definitely incorrect. Are you sure that regex is inadequate, as opposed to being inadequately used, though? From a quick glance and the issues you have presented, I currently see no reason why this would need "more than regex" (a context-free or even context-sensitive grammar). |
The regex used is naive and incorrect. I think this can be captured by a regular language, but it's better to match the engine parser and do it character by character |
In my defense: I never claimed this script was validating, so garbage in, garbage out. I've written my own *.tr validator separately, named That having said, yes, this I agree that writing the parser in code rather than regex sounds like a better and less error-prone alternative. Although I’m not entirely convinced a regex-solution would be impossible to solve this problem. Maybe a complex regex could do the trick? Python Regex is powerful. But even then, code is probably more robust.
Yeah, I forgot about that. To be honest, I really hate that feature of the TR language when But in any case, throwing away information is absolutely not an option; preserving both comment texts and their position is important. |
I had a similar situation last year when writing Lua code to parse quoted C strings (I forgot the exact purpose). The approach I recall coming up with was to match I assume a similar approach can be taken here by matching multiple
|
Minetest version
Summary
The script uses regex to parse translation lines, which is incorrect. Take the following example:
The script will ignore this line as it thinks the
@
is escaping the=
, but it is actually adding@
. Another issue: the script ignores invalid lines without warning!The script also assumes that a translation is only ever on one line. This is not the case, given
@\n
:Steps to reproduce
I've written my own Python .tr file parser with unit tests here. Unit tests should be added to mod_translation_updater.py, it's unacceptable to have a parser without tests.
My implemention throws a lot of information away (ie: comments) so couldn't be taken directly, but inspiration can be taken from the approach (which is based on the .cpp code)
The text was updated successfully, but these errors were encountered: