Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we avoid ReDoS without trust on PCRE limits #2072

Closed
zimmerle opened this issue Apr 17, 2019 · 13 comments
Closed

How can we avoid ReDoS without trust on PCRE limits #2072

zimmerle opened this issue Apr 17, 2019 · 13 comments
Assignees
Labels
3.x Related to ModSecurity version 3.x

Comments

@zimmerle
Copy link
Contributor

zimmerle commented Apr 17, 2019

PCRE is one of the most popular regex libraries available out there. It is heavily used in ModSecurity although it may be optional on 3.1 where Hyperscan and RE2 are likely to be added to the soup as well.

ModSecurity assumes that the one who is written the rules are not an attacker, rather someone that is trying to protect the application. Unfortunately, some times, not-so-experienced users may have a problem writing the rules which leads to undesired or unexpected behavior.

A clear example of that is while the user accesses a variable that is not yet set on a given phase; Assuming that it was filled and had an empty value. Another example is while the user accesses a key that wasn't set on a collection, leading to always having an empty value. Those will be addressed soon on v3 error handler — blog post on its way.

Other examples were already fixed on v3 as part of the usability up taken:

Those may lead to unexpected behavior on v2 and silent fail.

On #2071 a subject that was discussed back on 2013 reappear in the surface [#267]. Should ModSecurity apply restrictions on the regex match? The implications are described at #2071.

To illustrate the problem we have: #56, #267, #1176, #1481, and #1290

The main objective of this issue is to incite a discussion on how to handle it well, focused on the regular user.

@zimmerle
Copy link
Contributor Author

I would also like to mention that @s0md3v was the one that brought our eyes to this very problem. Thank you @s0md3v and @airween for open the pull request #2071.

@s0md3v
Copy link

s0md3v commented Apr 17, 2019

There are two popular regex algorithms:

  • NFA
  • DFA

Key Differences

Handling alternatives

Regex: (b|c|m)at
String: cat

  • NFA favors first alternative. It starts with the first one and skips the other alternatives when a match is found and hence it will stop at cat. An ideal exploit string will have the last alternative to make sure the engine has to go through all the alternatives.
  • DFA tries to match all of them and then selects the longest match and hence it will try to match bat, cat and mat i.e. all possibilities.

Possessiveness

Regex: a(b)?(bc)?
String: abc

  • NFA is possessive, it doesn't want to give up what it has already matched. It sees that a matches, then it moves to b which also matches but then bc fails to match but that's okay because it's optional and hence the result is ab
    .
  • DFA goes for the longest match possible. After the a matches, it sees that both b and bc can be matched and hence it goes for the longest match possible i.e. abc

Exploitation strategies

  • NFA: Write a string that matches everything except the last token so it keeps backtracking.

  • DFA: Give it too many permutations to consider.

Since DFA will consider all the possibilities anyway, a bad regular expression will always eat up memory (not time) irrespective of the input.

NFA on the other hand, performs fine unless a specially crafted string is passed to the engine.

I am not sure what are your takeaways from this but I believe NFA is better, DFA doesn't even support lazy/greedy matching. Mod_Security currently uses a NFA one.

How to spot a bad regular expression?

Nested repetition operators

Regex: (?:/[dflr].*)*
This is a sub-pattern that I found in CRS. Notice how one repetition operator * is being used over another? That's bad.

Intersecting patterns (Remote)

Regex: .*(\d+)
As you might has guessed, it is looking for <anythng>(<more than one digits>).
Now the problem is, .* matches the strings matched by (, \d+ and ).

For a string, function(1234)
NFA will first match function(1234) with .* but (\d+) fails to match so it goes one step back and matches function(1235 but then the remaining ) fails to match (\d+) and keeps going backwards. See the problem?
.* should stop matching once an ( is encountered but it doesn't because NFA is greedy by default.

Intersecting patterns (Adjacent)

Regex: \d+.*
This is similar to the previous one but this time adjacent patterns are intersecting.

Intersecting patterns (Alternative)

Regex: (?:^[\"'`\\\\]*?[^\"'`]+[\"'`])+|(?:^[\"'`\\\\]*?[\d\"'`]+)+
This is another bad regular expression that I found in CRS.

Both alternate patterns start with ^[\"'`\\\\]*? which causes the regex engine to keep looking for both patterns and hence increasing the permutations.
In the second alternate pattern, the tokens [\"'`\\\\]*? and [\d\"'`]+ intersect and match ", ' and `.

On top of that, nested repetition operators are being used.

Conclusion

I do not support automation for checking if ReDOS exists. It's better if the maintainers can see if the rules do not have the properties above. You can also mention me on a PR to take a look, I am hyperactive on Github.

@airween
Copy link
Member

airween commented Apr 18, 2019

Hi @zimmerle

thanks for reply here too :).

The main objective of this issue is to incite a discussion on how to handle it well, focused on the regular user.

Do you think about for example check the regular expressions when ModSec starts, and notify user if there is a potential wrong syntax? Or what? Could you show some example?

@zimmerle zimmerle self-assigned this Apr 22, 2019
@zimmerle zimmerle added the 3.x Related to ModSecurity version 3.x label Jan 18, 2021
@xiangwang1
Copy link

Hi @zimmerle,

I am the Hyperscan maintainer and developer.

Hyperscan delivers high performance for both multiple literal matching and regex matching, so it could naturally replace Aho Corasick algorithm and PCRE to get performance boost.
We conduct performance comparisons in our paper at https://www.usenix.org/system/files/nsdi19-wang-xiang.pdf.

There is also a published whitepaper at https://www.hyperscan.io/2020/09/28/optimize-azure-cloud-security-with-intel-hyperscan/ for Microsoft Azure WAF that is now accelerated by Hyperscan.

Any plan or progress of adding Hyperscan as a regex matching option? We would like to collaborate if any help is needed.

Thanks,
Xiang

@zimmerle
Copy link
Contributor Author

zimmerle commented Feb 5, 2021

Hi @xiangwang1,

We definitely want to support Hyperscan 🎆.

To make the transition smoothly to our users, we want to have a compilation flag that allows the user to change the regular expression engine to whatever they feel comfortable with, including Hyperscan.

On issue #2012, the researcher @WGH- offered an additional implementation for RE2. One of the reasons we hold merging the pull request was that we didn't have this selection mechanism available yet.

The implementation for the support of Hyperscan as a regular expression engine should be considered easy in terms of effort. All regular expression usage inside ModSecurity is limited to the class Regex -

https://github.com/SpiderLabs/ModSecurity/blob/50fc347ed4d0852d54561e15559de07c698fbb16/src/utils/regex.h#L33-L91

Populating those methods with the correct interface for Hyperscan will do the job.

The Aho Corasick replacement could be used as the kernel for the @pm operator. It currently uses this Aho Corasick-ish implementation that we are more than welcome to replace with something more performant and/or use less memory.

@xiangwang1 do you think you can help us to fill the Regex class to use the Hyperscan ? 👷

@xiangwang1
Copy link

Hi @zimmerle,

Since current ModSecurity rules are written for PCRE support only, one problem with replacing PCRE with Hyperscan or RE2 is lack of full syntax support such as zero-width assertion, back reference, etc. Do you have any idea or plan to address this issue?

If we stick to PCRE syntax, then we can't achieve 100% replacement. Here is what we do for PCRE replacement in other systems like Snort and Suricata:

  1. Use Hyperscan prefiltering mode to transform natively unsupported rules to supported ones with a superset of matches.
  2. Use Hyperscan as the filter and only trigger PCRE match if there are any matches found by Hyperscan.

There are also good slides about this for Rspamd.

It's easy to replace Aho Corasick with Hyperscan as they share the same match behavior. This is the major performance boost we get in Snort and Suricata. In general, the more literal rules we have, the more performance boost and memory space saving from Hyperscan. We may think about this replacement as the first step. 😊

@airween
Copy link
Member

airween commented Feb 8, 2021

Just an idea for introduction of Hyperscan (and RE2 too) instead of replacing of PCRE.

It would be better to use a new operator (similar to rxGlobal): if somebody wants to use Hyperscan compatible expressions, then it would be the @rxHyper, for RE2 would be @re2.

Another solution would be that user can decide in compilation time (I mean ./configure --with-regex-engine=pcre|re2|hyperscan) which regex engine he/she wants to use.

The list of regex engines can be expanded with pcre2.

@dune73
Copy link
Member

dune73 commented Feb 8, 2021

@xiangwang1 : In the prefiltering mode is the construction of the superset of matches automatic, or is this pattern something that a user needs to provide?

If it's automatic, does the following work in practice:

  • PCRE specific pattern: prefiltering mode constructs superset, hyperscan is executed, superset match, pcre is invoked, original pattern match is executed, pcre match is being returned.

That would mean one can submit PCRE specific patterns and treat the engine(s) like a black box that returns pcre matches.

This would of course be very welcome to projects like the OWASP ModSecurity Core Rule Set that is being used in the Azure WAF as well, if I am not mistaken.

@WGH-
Copy link
Contributor

WGH- commented Feb 8, 2021

RE2 is largely compatible with PCRE. My pull request (#2012) is built on idea that the patterns that can't be compiled on RE2 can use PCRE as a fallback.

Also, some people apparently use CRS outside of ModSecurity, and contributed fixing the patterns to make them RE2-compatible, where it's possible.

Hyperscan, however, is a different beast. It shines when you match a set of multiple patterns against the same input. It would require some major changes in ModSecurity code to do that. At the first glance, you would have to group all rules that use regular expression operator against the same transformation/variable, and make a single regular expression "database" (that's what they call it) for it, and basically evaluate all correspondng operators in one go.

@zimmerle
Copy link
Contributor Author

zimmerle commented Feb 8, 2021

(...)

It's easy to replace Aho Corasick with Hyperscan as they share the same match behavior. This is the major performance boost we get in Snort and Suricata. In general, the more literal rules we have, the more performance boost and memory space saving from Hyperscan. We may think about this replacement as the first step.

Sounds like a plan 🚀 👍. That seems to be an easy improvement with a real benefit to our users. I think we can start to work on that one as the change is more stragithforward.

As a roadmap I would consider -

  • Having the hyperscan detected on the build system.
    • Make sure that hyperscan info is shown on the configure.
    • Having a definition set, such as: -DWITH_HYPERSCAN conditional to the existence of hyperscan.
  • Having an option --keep-old-acpm-tree to avoid the replacement of the old Aho, this may set: -DUSING_OLD_ACMP_TREES.
  • Having hyperscan as a replacement of Aho if USING_OLD_ACMP_TREES is not set.
  • Make sure this new compilation was added to the QA compilation matrix.
    • Make sure all test from QA are passing.
  • Have it ready for shipment on the selected version (TBD).
  • Blog post and media in coordination with the hyperscan community.

Idea for Configure summary

ModSecurity - v3.0.4-112-gfc75b5c9 for Linux
 
 Mandatory dependencies
   + libInjection                                  ....v3.9.2-46-gbfba51f
   + SecLang tests                                 ....a3d4405
 
 Optional dependencies
   + GeoIP/MaxMind                                 ....found 
      * (MaxMind) v
         /usr/lib//libmaxminddb.so, /usr/include, -DWITH_MAXMIND -I/usr/include
      * (GeoIP) v1.6.12
         -lGeoIP , -I/usr/include/ 
   + LibCURL                                       ....found v7.74.0 
      -lcurl,  -DWITH_CURL_SSLVERSION_TLSv1_2 -DWITH_CURL
   + YAJL                                          ....found v2.1.0
      -lyajl , -DWITH_YAJL -I/usr/include/yajl 
   + LMDB                                          ....disabled
   + LibXML2                                       ....found v2.9.10
      -lxml2 -lz -llzma -licui18n -licuuc -licudata -lm -ldl, -I/usr/include/libxml2 -DWITH_LIBXML2
   + SSDEEP                                        ....found 
      -lfuzzy -L/usr/lib/, -DWITH_SSDEEP -I/usr/include
   + LUA                                           ....found v503
      -llua5.3 -L/usr/lib/, -DWITH_LUA -DWITH_LUA_5_3 -I/usr/include/lua5.3
   + Hyperscan                                     ... found vX.Y.Z
       -lhyperscan, -DWITH_HYPERSCAN

 Other Options
   + Test Utilities                                ....enabled
   + SecDebugLog                                   ....enabled
   + afl fuzzer                                    ....disabled
   + library examples                              ....enabled
   + Building parser                               ....disabled
   + Treating pm operations as critical section    ....disabled
   + Using Hyperscan instead of Aho Corasick  (pm) ....enabled

Notice -

   + Hyperscan                                     ... found vX.Y.Z
       -lhyperscan, -DWITH_HYPERSCAN
   + Using Hyperscan instead of Aho Corasick  (pm) ....enabled

QA Compilation matrix

https://github.com/SpiderLabs/ModSecurity/blob/50fc347ed4d0852d54561e15559de07c698fbb16/.github/workflows/ci.yml#L15-L23

Currently ACMP implementation

https://github.com/SpiderLabs/ModSecurity/blob/50fc347ed4d0852d54561e15559de07c698fbb16/src/utils/acmp.h#L28-L195

Where the ACMP implementation is used?

Initialization

https://github.com/SpiderLabs/ModSecurity/blob/50fc347ed4d0852d54561e15559de07c698fbb16/src/operators/pm.cc#L116-L151

Run time evaluation

https://github.com/SpiderLabs/ModSecurity/blob/50fc347ed4d0852d54561e15559de07c698fbb16/src/operators/pm.cc#L84-L113

@zimmerle
Copy link
Contributor Author

zimmerle commented Feb 8, 2021

Regular expression engine replacement

RE2 is largely compatible with PCRE. My pull request (#2012) is built on idea that the patterns that can't be compiled on RE2 can use PCRE as a fallback.

Although I like @WGH-'s run time fallback option, I most say that it could be confusing while supporting n+2 engines. I would go for the compilation option. One option may include the combination of RE+RE2 as suggested on #2012.

Since current ModSecurity rules are written for PCRE support only, one problem with replacing PCRE with Hyperscan or RE2 is lack of full syntax support such as zero-width assertion, back reference, etc. Do you have any idea or plan to address this issue?

@xiangwang1 Compatibility is with older versions is important, however, we understand that sometimes - in order to evolve - we need to avoid legacy. Specially if there are some technical debts involved.

Having said that, as long as we keep the fallback option (whatever it is: run time and/or compilation time) and consider it as addition to the correct version, we should be fine to go. The main goal should be on performance and usability. To cite other projects approaches on that matter -

As mentioned by @WGH-

Also, some people apparently use CRS outside of ModSecurity, and contributed fixing the patterns to make them RE2-compatible, where it's possible.

In terms of CRS, Microsoft's waf team may want to share their experience (CC: @allanbomsft, @alonzop !?!?). But, I would be concerned about avoiding road blocks for the users on ModSecurity engine, eventually they update accordingly.

I am considering this change as an addition to v3.1, where significant changes are expected anyway.

The impact on the changes may be clear with the existence o a PoC. Is that something that we can consider @xiangwang1?

@xiangwang1
Copy link

The impact on the changes may be clear with the existence o a PoC. Is that something that we can consider @xiangwang1?

Yes, I think we can leverage the existing PoC as starting point.
I personally prefer the compilation option to let users can select the regex engine (PCRE, RE2 and Hyperscan) based on their configured rules and requirements (performance and usability v.s. syntax support).

@martinhsv
Copy link
Contributor

Closing this due to the recent merge of #2736 .

(The conversation in this item has covered some separate matters; individuals should feel free to open separate issues for these as desired.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Related to ModSecurity version 3.x
Projects
None yet
Development

No branches or pull requests

7 participants