[ruby-core:113850] [Ruby master Feature#19720] Warning for non-linear Regexps
From:
"Eregon (Benoit Daloze) via ruby-core" <ruby-core@...>
Date:
2023-06-09 10:17:57 UTC
List:
ruby-core #113850
Issue #19720 has been updated by Eregon (Benoit Daloze).
duerst (Martin D=FCrst) wrote in #note-5:
> Introducing such a warning might be a good idea. But there are several is=
sues:
Thanks for the feedback.
> 1) The warning should only be used when asked for with an option (i.e. de=
fault off).
Yes, as I mentioned in the description, it would be opt-in.
> 2) To a very large extent, whether a regular expression is linear or not =
depends on the implementation. (The recent improvements to the CRuby implem=
entation show that very clearly). Does that mean that different implementat=
ions would warn for different regular expressions?
Yes it can differ, although the restrictions for linear engines are fairly =
similar, see https://bugs.ruby-lang.org/issues/19104#note-3.
> 3) In some cases, it may not be possible to conclusively say whether a re=
gular expression will run in linear time or not. The proposed warning text =
makes this clear with the word "might".
> 4) Non-linear can be quadratic, cubic, or exponential, and so on. A quadr=
atic case on data with limited length (e.g. out of a database with fixed fi=
eld lengths) might be absolutely harmless. Even an exponential case on very=
short data can be harmless.
In general, anything non-linear with user input is highly susceptible to Do=
S. Even quadratic is often enough to be problematic.
> 5) In many cases, the only person who would do a DoS attack would be the =
programmer him/herself.
There were about a dozen ReDoS CVEs in the last 1-2 years. I don't think th=
e reporters were the gem maintainers.
I would think in some cases it was reported by people who actually got a Re=
DoS in production (it seems the typical way these things get noticed).
> 6) Overall, careful design and implementation is needed to make sure that=
this doesn't become a "crying wolf" warning that quickly gets deactivated =
and then no longer helps anybody.
I think any company or individual which cares about security should take th=
ese warnings seriously.
We need to make them actionable though so they get resolved.
At least we should address the non-linear Regexps in default and bundled ge=
ms, as well as in RubyGems.
For the RubyGems Regexps above, the issue is the atomic groups.
One idea would be to check if the Regexp without atomic groups is `Regexp.l=
inear_time?` and if so use it, that seems easy.
Another idea would be a way to specify atomic groups for some Regexp(s) can=
be ignored for semantics and are only meant for performance (seems almost =
always the case). That would then use the linear engine if the Regexp with =
atomic groups changed to non-capturing groups can run on that linear engine=
, otherwise use the original regexp with atomic groups and warn.
----------------------------------------
Feature #19720: Warning for non-linear Regexps
https://bugs.ruby-lang.org/issues/19720#change-103499
* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
----------------------------------------
I believe the best way to solve ReDoS is to ensure all Regexps used in the =
process are linear.
Using `Regexp.timeout =3D 5.0` or so does not really prevent ReDoS, given e=
nough requests causing that timeout the servers will still be very unrespon=
sive.
To this purpose, we should make it easy to identify non-linear Regexps and =
fix them.
I suggest we either use
1. a performance warning (enabled with `Warning[:performance] =3D true`, #1=
9538) or
2. a new regexp warning category (enabled with `Warning[:regexp] =3D true`).
I think we should warn only once per non-linear Regexp, to avoid too many s=
uch warnings.
We could warn as soon as the Regexp is created, or on first match.
On first match might makes more sense for Ruby implementations which compil=
e the Regexp lazily (since that is costly during startup), and also avoids =
warning for Regexps which are never used (which can be good or bad).
OTOH, if the warning is enabled, we could always compile the Regexp eagerly=
(or at least checks whether it's linear), and that would then provide a be=
tter way to guarantee that all Regexps created so far are linear.
Because warnings are easily customizable, it is also possible to e.g. `rais=
e/abort` on such a warning, if one wants to ensure their application does n=
ot use a non-linear Regexp and so cannot be vulnerable to ReDoS:
```ruby
Warning.extend Module.new {
def warn(message, category: nil, **)
raise message if category =3D=3D :regexp
super
end
}
```
A regexp warning category seems better for that as it makes it easy to filt=
er by category, if a performance warning one would need to match the messag=
e which is less clean.
As a note, TruffleRuby already has a similar warning, as a command-line opt=
ion:
```
$ truffleruby --experimental-options --warn-truffle-regex-compile-fallback =
-e 'Gem'
truffleruby-dev/lib/mri/rubygems/version.rb:176: warning: Regexp /\A\s*([0-=
9]+(?>\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?)?\s*\z/ at_start=
=3Dfalse encoding=3DUS-ASCII requires backtracking and will not match in li=
near time
truffleruby-dev/lib/mri/rubygems/requirement.rb:105: warning: Regexp /\A\s*=
(=3D|!=3D|>|<|>=3D|<=3D|~>)?\s*([0-9]+(?>\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.=
[0-9A-Za-z-]+)*)?)\s*\z/ at_start=3Dfalse encoding=3DUS-ASCII requires back=
tracking and will not match in linear time
```
So the warning message could be like
`FILE:LINE: warning: Regexp /REGEXP/ requires backtracking and might not ma=
tch in linear time and might cause ReDoS`
or more concise:
`FILE:LINE: warning: Regexp /REGEXP/ requires backtracking and might cause =
ReDoS`
--=20
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/