From: fatkodima123@... Date: 2020-08-25T22:09:46+00:00 Subject: [ruby-core:99690] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp Issue #17030 has been updated by fatkodima (Dima Fatko). I have implemented a simple PoC - https://github.com/ruby/ruby/pull/3455. I got the following results. ## Enumerable#grep ```ruby ARR = %w[foobar foobaz bazquux hello world just making this array longer] REGEXP = /o/ FAST_REGEXP = /o/f Benchmark.ips do |x| x.report("select.match?") { ARR.select { |e| e.match?(REGEXP) } } x.report("grep") { ARR.grep(REGEXP) } x.report("fast_grep") { ARR.grep(FAST_REGEXP) } x.compare! end puts "********* MEMORY *********\n" Benchmark.memory do |x| x.report("select.match?") { ARR.select { |e| e.match?(REGEXP) } } x.report("grep") { ARR.grep(REGEXP) } x.report("fast_grep") { ARR.grep(FAST_REGEXP) } x.compare! end ``` ``` Warming up -------------------------------------- select.match? 57.956k i/100ms grep 22.715k i/100ms fast_grep 59.434k i/100ms Calculating ------------------------------------- select.match? 580.339k (� 0.5%) i/s - 2.956M in 5.093260s grep 225.854k (� 0.6%) i/s - 1.136M in 5.028890s fast_grep 532.658k (� 9.0%) i/s - 2.675M in 5.067008s Comparison: select.match?: 580338.8 i/s fast_grep: 532658.1 i/s - same-ish: difference falls within error grep: 225853.7 i/s - 2.57x (� 0.00) slower ********* MEMORY ********* Calculating ------------------------------------- select.match? 120.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) grep 536.000 memsize ( 168.000 retained) 3.000 objects ( 1.000 retained) 0.000 strings ( 0.000 retained) fast_grep 200.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: select.match?: 120 allocated fast_grep: 200 allocated - 1.67x more grep: 536 allocated - 4.47x more ``` ## case-when ```ruby REGEXP = /z/ FAST_REGEXP = /z/f def case_when(str) case str when REGEXP true end end def fast_case_when(str) case str when FAST_REGEXP true end end STR = 'foobarbaz' Benchmark.ips do |x| x.report("case_when") { case_when(STR) } x.report("fast_case_when") { fast_case_when(STR) } x.compare! end puts "********* MEMORY *********\n" Benchmark.memory do |x| x.report("case_when") { case_when(STR) } x.report("fast_case_when") { fast_case_when(STR) } x.compare! end ``` ``` Warming up -------------------------------------- case_when 95.463k i/100ms fast_case_when 456.981k i/100ms Calculating ------------------------------------- case_when 964.438k (� 0.8%) i/s - 4.869M in 5.048469s fast_case_when 4.571M (� 0.6%) i/s - 23.306M in 5.098414s Comparison: fast_case_when: 4571379.8 i/s case_when: 964438.3 i/s - 4.74x (� 0.00) slower ********* MEMORY ********* Calculating ------------------------------------- case_when 168.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) fast_case_when 0.000 memsize ( 0.000 retained) 0.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: fast_case_when: 0 allocated case_when: 168 allocated - Infx more ``` ## Enumerable#any? ```ruby REGEXP = /longer/ FAST_REGEXP = /longer/f ARR = %w[foobar foobaz bazquux hello world just making this array longer] Benchmark.ips do |x| x.report("any?") { ARR.any?(REGEXP) } x.report("fast_any?") { ARR.any?(FAST_REGEXP) } x.compare! end puts "********* MEMORY *********\n" Benchmark.memory do |x| x.report("any?") { ARR.any?(REGEXP) } x.report("fast_any?") { ARR.any?(FAST_REGEXP) } x.compare! end ``` ``` Warming up -------------------------------------- any? 25.840k i/100ms fast_any? 95.381k i/100ms Calculating ------------------------------------- any? 261.095k (� 1.0%) i/s - 1.318M in 5.047859s fast_any? 893.676k (�13.2%) i/s - 4.388M in 5.070820s Comparison: fast_any?: 893675.9 i/s any?: 261095.0 i/s - 3.42x (� 0.00) slower ********* MEMORY ********* Calculating ------------------------------------- any? 168.000 memsize ( 168.000 retained) 1.000 objects ( 1.000 retained) 0.000 strings ( 0.000 retained) fast_any? 0.000 memsize ( 0.000 retained) 0.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: fast_any?: 0 allocated any?: 168 allocated - Infx more ``` If that seems OK, I will update and finish my PR with tests/docs/etc. ---------------------------------------- Bug #17030: Enumerable#grep{_v} should be optimized for Regexp https://bugs.ruby-lang.org/issues/17030#change-87181 * Author: marcandre (Marc-Andre Lafortune) * Status: Open * Priority: Normal * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- Currently: ```ruby array.select { |e| e.match?(REGEXP) } # about 3x faster and 6x more memory efficient than array.grep(REGEXP) ``` This is because `grep` calls `Regexp#===` which creates useless `MatchData` -- https://bugs.ruby-lang.org/ Unsubscribe: