From: shyouhei@... Date: 2014-04-03T02:42:17+00:00 Subject: [ruby-core:61816] [ruby-trunk - Bug #9694] Bad regexp hangs ruby Issue #9694 has been updated by Shyouhei Urabe. Ruby's regexp engine is NP-complete. It's ultimately impossible to guarantee regexp matches to run fast (if you don't think so please send us a proof). It might be possible to warn your specific bad regexp, but in general it's also impossible to tell which regexp is bad and which isn't. That's the Halintg problem http://en.wikipedia.org/wiki/Halting_problem . So, in short, there is (at least believed to be) no ultimate solution. All what might be possible is to find a reasonable compromise. For instance python does not detect that shorter one: ~~~python zsh % python Python 2.7.5+ (default, Feb 27 2014, 19:37:08) [GCC 4.8.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> re.compile(r'(\w*)*$') <_sre.SRE_Pattern object at 0x1dee030> >>> ~~~ ---------------------------------------- Bug #9694: Bad regexp hangs ruby https://bugs.ruby-lang.org/issues/9694#change-46048 * Author: Nikolay Markov * Status: Open * Priority: Normal * Assignee: * Category: regexp * Target version: * ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0] * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN ---------------------------------------- Here is an extracted problem i ran into recently: ~~~ $ cat test.rb str = ('a' * ARGV[0].to_i) + '?' re = /(\w*)*$/ re.match(str) ~~~ On few chars match returns quite fast, but here's what happens on 14 'a'-s and up: ~~~ $ time RBENV_VERSION=2.1.1 ruby test.rb 14 real 1.392 user 1.364 sys 0.026 pcpu 99.83 $ time RBENV_VERSION=2.1.1 ruby test.rb 15 real 3.979 user 3.949 sys 0.026 pcpu 99.89 $ time RBENV_VERSION=2.1.1 ruby test.rb 16 real 11.995 user 11.954 sys 0.031 pcpu 99.92 ~~~ Ruby versions 1.9.3 and 2.0 behave similarly. I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow: /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/ I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby. Python (2.7) says that this regexp is bad and does not compile it. Perl matches without any performance issues -- https://bugs.ruby-lang.org/