[#84867] [Ruby trunk Bug#14357] thread_safe tests suite segfaults — v.ondruch@...

Issue #14357 has been reported by vo.x (Vit Ondruch).

11 messages 2018/01/15

[#84980] [Ruby trunk Feature#13618][Assigned] [PATCH] auto fiber schedule for rb_wait_for_single_fd and rb_waitpid — hsbt@...

Issue #13618 has been updated by hsbt (Hiroshi SHIBATA).

10 messages 2018/01/23
[#85012] Re: [Ruby trunk Feature#13618][Assigned] [PATCH] auto fiber schedule for rb_wait_for_single_fd and rb_waitpid — Eric Wong <normalperson@...> 2018/01/23

hsbt@ruby-lang.org wrote:

[ruby-core:85232] [Ruby trunk Bug#14418] ruby 2.5 slow regexp execution

From: jakub.wozny@...
Date: 2018-01-29 18:31:52 UTC
List: ruby-core #85232
Issue #14418 has been updated by jakub.wozny (Jakub Woナコny).


Ok, Blow is the regexp that I tested. I used utf-8 encodnings at the begining:

~~~ ruby
"fuテ歟all "*20 =~ /([\S\s]{1000})/i
~~~

Some measurements:

~~~ ruby
 (0..20).each { |n| puts Benchmark.measure { "fuテ歟all "*n =~ /^([\S\s]{1000})/i } }
  0.000000   0.000000   0.000000 (  0.000481)
  0.000000   0.000000   0.000000 (  0.000079)
  0.000000   0.000000   0.000000 (  0.000246)
  0.000000   0.000000   0.000000 (  0.000751)
  0.010000   0.000000   0.010000 (  0.002447)
  0.000000   0.000000   0.000000 (  0.006554)
  0.010000   0.000000   0.010000 (  0.007416)
  0.020000   0.000000   0.020000 (  0.022623)
  0.070000   0.000000   0.070000 (  0.066888)
  0.200000   0.000000   0.200000 (  0.196393)
  0.590000   0.000000   0.590000 (  0.591980)
  1.770000   0.000000   1.770000 (  1.772828)
  5.290000   0.010000   5.300000 (  5.292948)
 15.860000   0.000000  15.860000 ( 15.868370)
~~~


I would expect that this code should work as fast as version without ````/i```` flag.

~~~ ruby
"fuテ歟all "*20 =~ /([\S\s]{1000})/

(0..20).each { |n| puts Benchmark.measure { "fuテ歟all "*n =~ /^([\S\s]{1000})/ } }
  0.000000   0.000000   0.000000 (  0.000036)
  0.000000   0.000000   0.000000 (  0.000009)
  0.000000   0.000000   0.000000 (  0.000011)
  0.000000   0.000000   0.000000 (  0.000016)
  0.000000   0.000000   0.000000 (  0.000018)
  0.000000   0.000000   0.000000 (  0.000029)
  0.000000   0.000000   0.000000 (  0.000020)
  0.000000   0.000000   0.000000 (  0.000021)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000016)
  0.000000   0.000000   0.000000 (  0.000027)
  0.000000   0.000000   0.000000 (  0.000022)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000026)
  0.000000   0.000000   0.000000 (  0.000025)
  0.000000   0.000000   0.000000 (  0.000026)
  0.000000   0.000000   0.000000 (  0.000053)
~~~

Another test cases:

~~~ ruby
Benchmark.measure { "テ"*20 =~ /^([\S\s]{20})/i } # 0.000000   0.000000   0.000000 (  0.000431)
Benchmark.measure { "ß "*20 =~ /^([\S\s]{30})/i } # 0.000000   0.000000   0.000000 (  0.000427)
Benchmark.measure { "ß "*20 =~ /^([\S\s]{40})/i } # 0.000000   0.000000   0.000000 (  0.000430)
Benchmark.measure { "ß "*20 =~ /^([\S\s]{50})/i } # too long to wait

#without /i flag:
Benchmark.measure { "ß "*20 =~ /^([\S\s]{50})/ } #0.000000   0.000000   0.000000 (  0.000043)
~~~

I tested in other encodings:

~~~ ruby
Benchmark.measure{("fußball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/i}.to_s # => "  3.450000   0.000000   3.450000 (  3.452036)\n"
~~~

In case of other encoding, removing **/i** also speeds up:

~~~ ruby
Benchmark.measure{("fußball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/}.to_s #=> "  0.010000   0.000000   0.010000 (  0.000514)\n"
~~~

> Reason I ask mostly is because I assume you output german text and
the german umlauts are one huge reason for me to prefer ISO encoding
(due to it being simpler for me to handle with it in a project, as
opposed to Unicode variants).

I have multilingual app so I need to stay in unicode.




----------------------------------------
Bug #14418: ruby 2.5 slow regexp execution
https://bugs.ruby-lang.org/issues/14418#change-69983

* Author: jakub.wozny (Jakub Woźny)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.5
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I have simple regexp that performing very slow.
~~~ ruby
"fußball "*20 =~ /^([\S\s]{1000})/i
~~~

It works fast if I remove ```\i``` flag. I figured out that is also depends on string length or on quantifier value (in this case it is ```{1000}```).
When you remove ```ß``` form the string it also works fast.

I tested on 2.3.1, 2.4.3 and 2.5.0.

I'm not sure it is a bug or it just works that way.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next