-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should open .puz with iso-8859-1 encoding interpretation #12
Comments
On 2021-01-28 06:04, piyo wrote:
Thanks for this enjoyable Emacs package.
Quite welcome!
I notice that puz files are not opened with the correct encoding. At
best (1) some hints and copyright symbol will be garbled.
I thought I fixed that a while ago, What version of the package are you
using?
At worst (2) the puz file's width cannot be read correctly and
crossword-summary will abort.
That I haven't seen before. Now for sure I want to read the rest of the
report...
Recipe for duplicating (1):
1. On Debian 10, Emacs 27.1 self-compiled.
Very nice. I'm using a version of debian, and am stuck with the official
26 package (although I have a development snapshot deb from 2020-09~).
OFF-TOPIC: Are you seeing screen-flicker when using the crossword
package with emacs 27?
2. Start emacs with export LANG=ja_JP.UTF-8 ; export LANGUAGE=ja_JP:ja
; emacs -nw
It looks like I didn't need to create the locale. I guess because of my
development work for emacs-w3m...
Another off-topic question: Do you have non-English crossword puzzle
files? I'd like to get a source of them in order to further test /
develop the package.
3. Download Matt Jones's crossword for 2021-01-21 using
crossword-download.
4. Open crossword-summary.
5. Open the puzzle file by pressing Enter.
6. Notice the copyright symbol is not correct. Also the hint for
Across 24 does not show the é character, but \xe9.
I do see what may be a Japanese character in place of the copyright
symbol, so that's a bug; however, I do get the correct é character in
clue 24 across.
Recipe for duplicating (2):
Cannot re-verify at the moment. It was on an Ubuntu 16.04 with Emacs
27.1 self-compiled.
Maybe a different variant of the JP locale?
Possible Cause:
puz files are opened with an encoding which is user-config dependent.
Sounds right.
Possible solution:
Locally set the coding-system-for-read before using
insert-file-contents.
...
I've tried a version of this out, and it does the job. I don't see any
untoward side effects, so I would like to push the commit, crediting
you. How do you want to listed in the Changelog file (ie. name, email)?
My differences are to use a defcustom instead of defvar to be more
user-friendly if it needs to be changed on-the-fly, and to set the
locally-scoped definition in the pre-existing 'let's.
I expect this to get more complex as I get more feedback from people
using the package for non-English language puzzles. It may be that I
eventually need to do something like add an element to
'crossword-download-puz-alist' for encodings and then upon reads compare
file names to the file-specs in that list in order to get that file's encoding.
Reference:
According to
[1]https://code.google.com/archive/p/puz/wikis/FileFormat.wiki the (de
facto?) encoding of puzzle files is iso-8859-1.
I don't think I ever saw that document before. Thank you. It seems a
much more thorough analysis than anything I've read before. The page
doesn't display in emacs-w3m because the page is so
javascript-dependent, but for this I went ahead and used another
browser, and it looks quite worth the effort.
…--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
|
+ Addresses github issue #12, in which the copyright symbol was improperly decoded when the locale was: + LANG=ja_JP.UTF-8 + LANGUAGE=ja_JP:ja + May call for re-evaluation of commit 0f39144 "Decode strings to coding system 'prefer-utf-8-dos'" + Expect more of this to crop up as we get feedback from more users with non-English environments and puzzles...
I see the commit you added for this issue. You may want to git-amend (or magit-reword) it after this reply, sorry.
I have the exact steps to repro this problem (starting from emacs -Q), and it has nothing to do with LC_ALL and LANGUAGE environment variables. However your commit should essentially fix the problem. Here is the code:
crossword.el was version crossword-20210126.1409 from MELPA or Package-Commit: a0bbd80.
This is strange, I either get both wrong, or with the workaround, both correct. Please see the repro steps above including the part where I open the puz file.
Thank you for the credit in that commit, that's fine.
Looks good for now. I would rather reduce the effect of coding-system-for-read but then you probably don't need to read other files in the same function, right? Also I would just make the functionality of reading a puz file into a temporary buffer into a separate function so that people can defadvice just that part.
I was going to speculate but nah. YAGNI, until you do. I suspect supporting other single-byte encodings will be easier than multibyte encodings, though.
It's a link in a link in your README.md.
I am using a terminal emulator, so no. I.e. Windows 10 -> Msys mintty.exe -> Debian 10 -> screen -> Emacs 27.1 -> crossword.el. Glorious wide vision,195 by 60 with HiDPI fonts. ;-P
No. |
On 2021-01-29 13:07, piyo wrote:
I see the commit you added for this issue. You may want to
git-amend (or magit-reword) it after this reply, sorry.
I haven't yet gotten to it. Maybe now is a good time...
Thank you for the credit in that commit, that's fine.
OK. You can change your mind at any time.
Also I would just make the functionality of reading a puz file into
a temporary buffer into a separate function so that people can
defadvice just that part.
Another thing to look at maybe now.
I suspect supporting other single-byte encodings will be easier
than multibyte encodings, though.
For this very reason, I would very much like to get sample puz files
found 'in the wild' for non-English solutions.
Reference: According to
[1][2]https://code.google.com/archive/p/puz/wikis/FileFormat.wiki
It's a link in a link in your README.md.
Ha. I might have seen it in year 2018 and forgotten it, or maybe I just
ignored because it requires JS. Anyway, I did look at it, and found that
it's html was mangling what were clearly intended to be tables and code
blocks, so I reformatted the content as an org-mode file, now bundled in
the git repository.
I was curious and I searched in the language I can use, Japanese.
...
If I could play these with the excellent interface your package
provides I would be happy. But alas it would need multibyte support
in this modern era.
No commitment, but that is of interest to me to do.
I've just a few minutes ago pushed several commits. One is a bug-fix
unrelated to our discussion.
Regards,
…--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
|
Thanks for this enjoyable Emacs package.
I notice that puz files are not opened with the correct encoding. At best (1) some hints and copyright symbol will be garbled. At worst (2) the puz file's width cannot be read correctly and crossword-summary will abort.
Recipe for duplicating (1):
Recipe for duplicating (2):
Cannot re-verify at the moment. It was on an Ubuntu 16.04 with Emacs 27.1 self-compiled.
Possible Cause:
puz files are opened with an encoding which is user-config dependent.
Possible solution:
Locally set the coding-system-for-read before using insert-file-contents.
Define a package default variable that holds the desired coding for known puzzle sources.
Workaround 1:
Define the default coding for read for "*.puz" files.
Version:
crossword.el was version crossword-20210126.1409 from MELPA or
Package-Commit: a0bbd80.
Other notes:
Once puz files are decoded correctly, puz-emacs files will be saved with the correct string data.
Reference:
According to https://code.google.com/archive/p/puz/wikis/FileFormat.wiki the (de facto?) encoding of puzzle files is iso-8859-1.
The text was updated successfully, but these errors were encountered: