Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve funcId recognition and squash misdetection bugs #636

Merged
merged 12 commits into from
Jul 30, 2024

Conversation

novusnota
Copy link
Member

@novusnota novusnota commented Jul 28, 2024

That took a looooooooot of RegEx iterations.

Closes #635

  • I have updated CHANGELOG.md
    - [ ] I have documented my contribution in Tact Docs: https://github.com/tact-lang/tact-docs/pull/PR-NUMBER
  • I have added tests to demonstrate the contribution is correctly implemented: this usually includes both positive and negative tests, showing the happy path(s) and featuring intentionally broken cases
  • I have run all the tests locally and no test failure was reported
  • I have run the linter, formatter and spellchecker
  • I did not do unrelated and/or undiscussed refactorings

That took a looooooooot of RegEx iterations, including going into
`grammar.ts` and trying to adjust things in the semantic analysis.
Fortunately, that wasn't needed.
@novusnota novusnota requested a review from anton-trunov July 28, 2024 01:44
CHANGELOG.md Outdated Show resolved Hide resolved
src/grammar/test/items-native-fun-funcid.tact Outdated Show resolved Hide resolved
@anton-trunov
Copy link
Member

To be on the safe side, let's take stdlib.fc and mathlib.fc as libraries that are commonly used and see if we can add more function identifiers from those libs into our positive tests to make sure people will be able to FFI with those.

novusnota and others added 2 commits July 28, 2024 11:03
Co-authored-by: Anton Trunov <[email protected]>
Came up with a simple solution for 0x0_ stuff, so our parser can deal
with them too!
@novusnota novusnota requested a review from anton-trunov July 28, 2024 09:31
@anton-trunov anton-trunov self-assigned this Jul 28, 2024
@anton-trunov anton-trunov added this to the v1.4.2 milestone Jul 28, 2024
Copy link
Member

@anton-trunov anton-trunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR accepts, for instance, + as a valid FunC identifier, but it's actually not.

Something like this should work:

    funcInvalidId = "_" ")" --notUnderscore
                  | ("+" | "-" | "*" | "/" | "~/" | "^/" | "%" | "~%" | "^%" | "/%") ")" --notArithOperator
      // more operators here
                  | ("-"? digit+) ")" --notDecimalNumber
                  | ("-"? "0x" hexDigit+) ")" --notHexadecimalNumber
    funcPlainId = ~funcInvalidId (~(whiteSpace | "(" | ")" | "," | "." | ";" | "~") any)+

@novusnota
Copy link
Member Author

novusnota commented Jul 28, 2024

Nice observation! I'd try putting lookaheads & in front of those ")" so they don't get parsed, but let me try your suggestion first, it should work. UPD: Yeah, it does, clever idea!

The keywords are not allowed and should be excluded too. Basically, contents of https://github.com/ton-blockchain/ton/blob/master/crypto/func/keywords.cpp has to be put in the exceptions list. I'm on it.

UPD2: Order of operators in those inner () alternations really does matter, the ones with bigger prefix shall stand in front of others.

@novusnota
Copy link
Member Author

novusnota commented Jul 28, 2024

There was a suggestion, which didn't pass negative tests, so I hid it.

The thing we've engineered works, but possibly I've found a cleaner solution, which doesn't require the ")" and can be used in the FunC parser too (the current approach in this PR fails there in absence of ")" and stuff). But this one works:

image

It passes similar set of tests we have for function identifiers, but adapted to FunC code:

image

UPD: Negative tests failed terribly, reverting :)

@anton-trunov
Copy link
Member

@novusnota So, what's the current status of this PR?

@novusnota
Copy link
Member Author

novusnota commented Jul 30, 2024

@anton-trunov It's a working and robust solution. The alternative would be to do invalidation of identifiers during syntactic analysis, which enhances the error messages further, but that seems to be a bit out of scope of this PR. Wdyt?

@anton-trunov anton-trunov merged commit 7f192a6 into main Jul 30, 2024
3 checks passed
@anton-trunov anton-trunov deleted the issues/635 branch July 30, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants