extra guards for empty lines for Mouth::readToken #2215

dginev · 2023-09-19T22:50:38Z

The regression is really subtle, so this PR definitely deserves a careful look-over. As Tim helpfully diagnosed in the issue, the underlying cause were the improvements from commit acaa51d .

The remaining subtlety seemed to be revealed when babel.sty loads its english .ini file on a recent texlive (babel-en.ini). Every time that file read encounters a completely empty line, it seems to terminate the read early. Explaining how that happens is not something I can do with sufficient clarity at the moment, but it has to do with the interplay of Gullet::readUntil, Mouth::readToken and Mouth::getNextChar, where the undef from the empty line needs to be handled at just the right context point - or the assembled token list for the argument is incorrect, raising the higher level Errors of the regression.

I arrived at the PR contents by experiment - trying to tweak in a variety of contexts and settling on the first reasonable combination that worked out. The key change was adding a $$self{nchars} == 0 check to the branch of Mouth::readToken that creates an EOL marker if a line is completely empty - tests continue to pass, and the regression is resolved.

I also changed a couple of checks relying on $ch to instead rely on defined $ch along the way, as I kept worrying about 0 chars.

dginev · 2023-09-19T22:54:28Z

I should also mention that I have tested this on the way the regression manifested itself on texlive 2022 (as I described in the issue) - but I haven't checked a texlive 2023 installation, where the test suite was also reported to be failing. I guess I should install that and double-check the fix is also appropriate there...

brucemiller · 2023-09-28T13:36:35Z

lib/LaTeXML/Core/Mouth.pm

@@ -214,7 +214,7 @@ sub handle_escape {    # Read control sequence
  # Bit I believe that he does NOT mean within control sequences
  my $cs = "\\" . $ch;    # I need this standardized to be able to lookup tokens (A better way???)
  if ((defined $cc) && ($cc == CC_LETTER)) {    # For letter, read more letters for csname.
-    while ((($ch, $cc) = getNextChar($self)) && $ch && ($cc == CC_LETTER)) {
+    while ((($ch, $cc) = getNextChar($self)) && (length($ch) > 0) && ($cc == CC_LETTER)) {


Is this really supposed to be positive length, rather than defined?

True, the length check is extra-paranoid (I believe I was triple-checking I am not misunderstanding the return values).

My fear was that the empty string "" is defined, but empty. But studying getNextChar, there is no code path that could return "". So I can switch this to a "defined" test, thanks.

Done. Also rebased to master - the English babel load continues to succeed.

…ating 0 char as undef

brucemiller · 2023-09-30T20:38:23Z

Slightly scary, but seems right, and doesn't cause any problems that I can detect. Thanks!

dginev requested a review from brucemiller September 19, 2023 22:50

brucemiller reviewed Sep 28, 2023

View reviewed changes

dginev added 2 commits September 28, 2023 15:09

extra care for catching empty lines at Mouth::readToken; avoid mistre…

65f7eff

…ating 0 char as undef

consistently used defined check for char in Mouth

9f2645f

dginev force-pushed the babel-english-regression branch from 29cb020 to 9f2645f Compare September 28, 2023 19:13

brucemiller merged commit 41bad55 into brucemiller:master Sep 30, 2023
13 checks passed

brucemiller deleted the babel-english-regression branch September 30, 2023 20:38

dginev mentioned this pull request Oct 3, 2023

Test fails w/ latest CTAN snapshot (babel/greek) #2175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extra guards for empty lines for Mouth::readToken #2215

extra guards for empty lines for Mouth::readToken #2215

dginev commented Sep 19, 2023

dginev commented Sep 19, 2023

brucemiller Sep 28, 2023

dginev Sep 28, 2023

dginev Sep 28, 2023

brucemiller commented Sep 30, 2023

extra guards for empty lines for Mouth::readToken #2215

extra guards for empty lines for Mouth::readToken #2215

Conversation

dginev commented Sep 19, 2023

dginev commented Sep 19, 2023

brucemiller Sep 28, 2023

Choose a reason for hiding this comment

dginev Sep 28, 2023

Choose a reason for hiding this comment

dginev Sep 28, 2023

Choose a reason for hiding this comment

brucemiller commented Sep 30, 2023