-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can we avoid ReDoS without trust on PCRE limits #2072
Comments
There are two popular regex algorithms:
Key DifferencesHandling alternativesRegex:
PossessivenessRegex:
Exploitation strategies
Since DFA will consider all the possibilities anyway, a bad regular expression will always eat up memory (not time) irrespective of the input. NFA on the other hand, performs fine unless a specially crafted string is passed to the engine. I am not sure what are your takeaways from this but I believe NFA is better, DFA doesn't even support lazy/greedy matching. Mod_Security currently uses a NFA one. How to spot a bad regular expression?Nested repetition operatorsRegex: Intersecting patterns (Remote)Regex: For a string, Intersecting patterns (Adjacent)Regex: Intersecting patterns (Alternative)Regex: Both alternate patterns start with On top of that, nested repetition operators are being used. ConclusionI do not support automation for checking if ReDOS exists. It's better if the maintainers can see if the rules do not have the properties above. You can also mention me on a PR to take a look, I am hyperactive on Github. |
Hi @zimmerle thanks for reply here too :).
Do you think about for example check the regular expressions when ModSec starts, and notify user if there is a potential wrong syntax? Or what? Could you show some example? |
Hi @zimmerle, I am the Hyperscan maintainer and developer. Hyperscan delivers high performance for both multiple literal matching and regex matching, so it could naturally replace Aho Corasick algorithm and PCRE to get performance boost. There is also a published whitepaper at https://www.hyperscan.io/2020/09/28/optimize-azure-cloud-security-with-intel-hyperscan/ for Microsoft Azure WAF that is now accelerated by Hyperscan. Any plan or progress of adding Hyperscan as a regex matching option? We would like to collaborate if any help is needed. Thanks, |
Hi @xiangwang1, We definitely want to support Hyperscan 🎆. To make the transition smoothly to our users, we want to have a compilation flag that allows the user to change the regular expression engine to whatever they feel comfortable with, including Hyperscan. On issue #2012, the researcher @WGH- offered an additional implementation for RE2. One of the reasons we hold merging the pull request was that we didn't have this selection mechanism available yet. The implementation for the support of Hyperscan as a regular expression engine should be considered easy in terms of effort. All regular expression usage inside ModSecurity is limited to the class Regex - Populating those methods with the correct interface for Hyperscan will do the job. The Aho Corasick replacement could be used as the kernel for the @pm operator. It currently uses this Aho Corasick-ish implementation that we are more than welcome to replace with something more performant and/or use less memory. @xiangwang1 do you think you can help us to fill the Regex class to use the Hyperscan ? 👷 |
Hi @zimmerle, Since current ModSecurity rules are written for PCRE support only, one problem with replacing PCRE with Hyperscan or RE2 is lack of full syntax support such as zero-width assertion, back reference, etc. Do you have any idea or plan to address this issue? If we stick to PCRE syntax, then we can't achieve 100% replacement. Here is what we do for PCRE replacement in other systems like Snort and Suricata:
There are also good slides about this for Rspamd. It's easy to replace Aho Corasick with Hyperscan as they share the same match behavior. This is the major performance boost we get in Snort and Suricata. In general, the more literal rules we have, the more performance boost and memory space saving from Hyperscan. We may think about this replacement as the first step. 😊 |
Just an idea for introduction of Hyperscan (and RE2 too) instead of replacing of PCRE. It would be better to use a new operator (similar to rxGlobal): if somebody wants to use Hyperscan compatible expressions, then it would be the Another solution would be that user can decide in compilation time (I mean The list of regex engines can be expanded with pcre2. |
@xiangwang1 : In the prefiltering mode is the construction of the superset of matches automatic, or is this pattern something that a user needs to provide? If it's automatic, does the following work in practice:
That would mean one can submit PCRE specific patterns and treat the engine(s) like a black box that returns pcre matches. This would of course be very welcome to projects like the OWASP ModSecurity Core Rule Set that is being used in the Azure WAF as well, if I am not mistaken. |
RE2 is largely compatible with PCRE. My pull request (#2012) is built on idea that the patterns that can't be compiled on RE2 can use PCRE as a fallback. Also, some people apparently use CRS outside of ModSecurity, and contributed fixing the patterns to make them RE2-compatible, where it's possible. Hyperscan, however, is a different beast. It shines when you match a set of multiple patterns against the same input. It would require some major changes in ModSecurity code to do that. At the first glance, you would have to group all rules that use regular expression operator against the same transformation/variable, and make a single regular expression "database" (that's what they call it) for it, and basically evaluate all correspondng operators in one go. |
(...)
Sounds like a plan 🚀 👍. That seems to be an easy improvement with a real benefit to our users. I think we can start to work on that one as the change is more stragithforward. As a roadmap I would consider -
Idea for Configure summary
Notice -
QA Compilation matrixCurrently ACMP implementationWhere the ACMP implementation is used?InitializationRun time evaluation |
Regular expression engine replacement
Although I like @WGH-'s run time fallback option, I most say that it could be confusing while supporting n+2 engines. I would go for the compilation option. One option may include the combination of RE+RE2 as suggested on #2012.
@xiangwang1 Compatibility is with older versions is important, however, we understand that sometimes - in order to evolve - we need to avoid legacy. Specially if there are some technical debts involved. Having said that, as long as we keep the fallback option (whatever it is: run time and/or compilation time) and consider it as addition to the correct version, we should be fine to go. The main goal should be on performance and usability. To cite other projects approaches on that matter -
As mentioned by @WGH-
In terms of CRS, Microsoft's waf team may want to share their experience (CC: @allanbomsft, @alonzop !?!?). But, I would be concerned about avoiding road blocks for the users on ModSecurity engine, eventually they update accordingly. I am considering this change as an addition to v3.1, where significant changes are expected anyway. The impact on the changes may be clear with the existence o a PoC. Is that something that we can consider @xiangwang1? |
Yes, I think we can leverage the existing PoC as starting point. |
Closing this due to the recent merge of #2736 . (The conversation in this item has covered some separate matters; individuals should feel free to open separate issues for these as desired.) |
PCRE is one of the most popular regex libraries available out there. It is heavily used in ModSecurity although it may be optional on 3.1 where Hyperscan and RE2 are likely to be added to the soup as well.
ModSecurity assumes that the one who is written the rules are not an attacker, rather someone that is trying to protect the application. Unfortunately, some times, not-so-experienced users may have a problem writing the rules which leads to undesired or unexpected behavior.
A clear example of that is while the user accesses a variable that is not yet set on a given phase; Assuming that it was filled and had an empty value. Another example is while the user accesses a key that wasn't set on a collection, leading to always having an empty value. Those will be addressed soon on v3 error handler — blog post on its way.
Other examples were already fixed on v3 as part of the usability up taken:
Those may lead to unexpected behavior on v2 and silent fail.
On #2071 a subject that was discussed back on 2013 reappear in the surface [#267]. Should ModSecurity apply restrictions on the regex match? The implications are described at #2071.
To illustrate the problem we have: #56, #267, #1176, #1481, and #1290
The main objective of this issue is to incite a discussion on how to handle it well, focused on the regular user.
The text was updated successfully, but these errors were encountered: