From: jakub.wozny@... Date: 2018-01-29T18:31:52+00:00 Subject: [ruby-core:85232] [Ruby trunk Bug#14418] ruby 2.5 slow regexp execution Issue #14418 has been updated by jakub.wozny (Jakub Wo��ny). Ok, Blow is the regexp that I tested. I used utf-8 encodnings at the begining: ~~~ ruby "fu��ball "*20 =~ /([\S\s]{1000})/i ~~~ Some measurements: ~~~ ruby (0..20).each { |n| puts Benchmark.measure { "fu��ball "*n =~ /^([\S\s]{1000})/i } } 0.000000 0.000000 0.000000 ( 0.000481) 0.000000 0.000000 0.000000 ( 0.000079) 0.000000 0.000000 0.000000 ( 0.000246) 0.000000 0.000000 0.000000 ( 0.000751) 0.010000 0.000000 0.010000 ( 0.002447) 0.000000 0.000000 0.000000 ( 0.006554) 0.010000 0.000000 0.010000 ( 0.007416) 0.020000 0.000000 0.020000 ( 0.022623) 0.070000 0.000000 0.070000 ( 0.066888) 0.200000 0.000000 0.200000 ( 0.196393) 0.590000 0.000000 0.590000 ( 0.591980) 1.770000 0.000000 1.770000 ( 1.772828) 5.290000 0.010000 5.300000 ( 5.292948) 15.860000 0.000000 15.860000 ( 15.868370) ~~~ I would expect that this code should work as fast as version without ````/i```` flag. ~~~ ruby "fu��ball "*20 =~ /([\S\s]{1000})/ (0..20).each { |n| puts Benchmark.measure { "fu��ball "*n =~ /^([\S\s]{1000})/ } } 0.000000 0.000000 0.000000 ( 0.000036) 0.000000 0.000000 0.000000 ( 0.000009) 0.000000 0.000000 0.000000 ( 0.000011) 0.000000 0.000000 0.000000 ( 0.000016) 0.000000 0.000000 0.000000 ( 0.000018) 0.000000 0.000000 0.000000 ( 0.000029) 0.000000 0.000000 0.000000 ( 0.000020) 0.000000 0.000000 0.000000 ( 0.000021) 0.000000 0.000000 0.000000 ( 0.000023) 0.000000 0.000000 0.000000 ( 0.000024) 0.000000 0.000000 0.000000 ( 0.000016) 0.000000 0.000000 0.000000 ( 0.000027) 0.000000 0.000000 0.000000 ( 0.000022) 0.000000 0.000000 0.000000 ( 0.000023) 0.000000 0.000000 0.000000 ( 0.000024) 0.000000 0.000000 0.000000 ( 0.000023) 0.000000 0.000000 0.000000 ( 0.000024) 0.000000 0.000000 0.000000 ( 0.000026) 0.000000 0.000000 0.000000 ( 0.000025) 0.000000 0.000000 0.000000 ( 0.000026) 0.000000 0.000000 0.000000 ( 0.000053) ~~~ Another test cases: ~~~ ruby Benchmark.measure { "�� "*20 =~ /^([\S\s]{20})/i } # 0.000000 0.000000 0.000000 ( 0.000431) Benchmark.measure { "�� "*20 =~ /^([\S\s]{30})/i } # 0.000000 0.000000 0.000000 ( 0.000427) Benchmark.measure { "�� "*20 =~ /^([\S\s]{40})/i } # 0.000000 0.000000 0.000000 ( 0.000430) Benchmark.measure { "�� "*20 =~ /^([\S\s]{50})/i } # too long to wait #without /i flag: Benchmark.measure { "�� "*20 =~ /^([\S\s]{50})/ } #0.000000 0.000000 0.000000 ( 0.000043) ~~~ I tested in other encodings: ~~~ ruby Benchmark.measure{("fu��ball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/i}.to_s # => " 3.450000 0.000000 3.450000 ( 3.452036)\n" ~~~ In case of other encoding, removing **/i** also speeds up: ~~~ ruby Benchmark.measure{("fu��ball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/}.to_s #=> " 0.010000 0.000000 0.010000 ( 0.000514)\n" ~~~ > Reason I ask mostly is because I assume you output german text and the german umlauts are one huge reason for me to prefer ISO encoding (due to it being simpler for me to handle with it in a project, as opposed to Unicode variants). I have multilingual app so I need to stay in unicode. ---------------------------------------- Bug #14418: ruby 2.5 slow regexp execution https://bugs.ruby-lang.org/issues/14418#change-69983 * Author: jakub.wozny (Jakub Wo��ny) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: 2.5 * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- I have simple regexp that performing very slow. ~~~ ruby "fu��ball "*20 =~ /^([\S\s]{1000})/i ~~~ It works fast if I remove ```\i``` flag. I figured out that is also depends on string length or on quantifier value (in this case it is ```{1000}```). When you remove ```��``` form the string it also works fast. I tested on 2.3.1, 2.4.3 and 2.5.0. I'm not sure it is a bug or it just works that way. -- https://bugs.ruby-lang.org/ Unsubscribe: