-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance #scan_integer
to check for valid character following it
#119
Comments
I understand the motivation but I feel that we can find better API for this... |
I'm not sure about the argument name but I think it does make sense to extend the functionality of |
Can we process a generic token and then an integer? token = scanner.scan(/[\da-zA-Z]?[a-zA-Z]/)
return token if token
integer = scanner.scan_integer
return integer if integer |
It is the other way around: The thing at pos might be a number (e.g. |
Can we reconstruct the processing order? I feel that loop do
token = scanner.scan(/[\da-zA-Z]?[a-zA-Z]/)
if token
process_token(token)
next
end
number = scanner.scan_integer
if number
process_number
next
end |
For HexaPDF, tokenization starts at So yes, With the currently implemented |
Recently the new method
#scan_integer
was introduced (see #113) to optimize scanning integer values.The current implementation works regardless of what follows the integer, i.e. scanning
123
,123 something
,123,something
,123.32
and123something
all work and would return 123.However, in - I suspect - many cases an integer may only be a valid integer if it is (not) followed by certain characters. One example is the input
123d
which leads to an error when interpreted as Ruby code.My use case is PDF syntax. There a token is an integer only when it is followed by a whitespace (ASCII decimal 0, 9, 10, 12, 13 and 32) or a delimiter (
( ) < > [ ] / %
) character (otherwise it is a generic token). To handle this the implementation using#scan_integer
looks like this:As you can see we
This could be simplified to just a call of
#scan_integer
if this method would optionally check the contents after it. Something like#scan_integer(separator: SEPARATOR_PATTERN)
or maybe#scan_integer(separator_chars: STRING)
(whereSTRING
contains separator characters, similar to wholeString#tr
works).Would it make sense to include such functionality?
The text was updated successfully, but these errors were encountered: