[#115565] [Ruby master Feature#20034] [mkmf] Support creating a compilation database for C language tooling — "pounce (Calvin Lee) via ruby-core" <ruby-core@...>

Issue #20034 has been reported by pounce (Calvin Lee).

7 messages 2023/12/01

[#115595] [Ruby master Bug#20043] `defined?` checks for method existence but only sometimes — "tenderlovemaking (Aaron Patterson) via ruby-core" <ruby-core@...>

Issue #20043 has been reported by tenderlovemaking (Aaron Patterson).

10 messages 2023/12/05

[#115598] [Ruby master Bug#20044] Add runtime flag and environment variable for prism — "HParker (Adam Hess) via ruby-core" <ruby-core@...>

Issue #20044 has been reported by HParker (Adam Hess).

7 messages 2023/12/06

[#115647] [Ruby master Bug#20048] UDPSocket#remote_address spec errors — "vo.x (Vit Ondruch) via ruby-core" <ruby-core@...>

Issue #20048 has been reported by vo.x (Vit Ondruch).

9 messages 2023/12/07

[#115648] [Ruby master Feature#20049] Destructive drop_while for Array and Hash — "chucke (Tiago Cardoso) via ruby-core" <ruby-core@...>

Issue #20049 has been reported by chucke (Tiago Cardoso).

8 messages 2023/12/07

[#115649] [Ruby master Bug#20050] Segfault on Ruby 3.2.2 on x86_64 Darwin 20 (maybe in Array#hash) — "martinemde (Martin Emde) via ruby-core" <ruby-core@...>

Issue #20050 has been reported by martinemde (Martin Emde).

11 messages 2023/12/07

[#115671] [Ruby master Feature#20054] Replace the use of `def` in endless method definitions with a new sigil — "sawa (Tsuyoshi Sawada) via ruby-core" <ruby-core@...>

Issue #20054 has been reported by sawa (Tsuyoshi Sawada).

7 messages 2023/12/09

[#115682] [Ruby master Misc#20056] Dir#chdir inconsistency with Dir.chdir — "zverok (Victor Shepelev) via ruby-core" <ruby-core@...>

Issue #20056 has been reported by zverok (Victor Shepelev).

12 messages 2023/12/10

[#115684] [Ruby master Feature#20057] Change behaviour of rb_register_postponed_job for Ruby 3.3 — "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@...>

Issue #20057 has been reported by kjtsanaktsidis (KJ Tsanaktsidis).

8 messages 2023/12/11

[#115688] [Ruby master Bug#20058] `warning: bigdecimal/util is found in bigdecimal` even if the gem spec has the `add_dependency "bigdecimal"` entry — "yahonda (Yasuo Honda) via ruby-core" <ruby-core@...>

Issue #20058 has been reported by yahonda (Yasuo Honda).

10 messages 2023/12/11

[#115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats — "jeremyevans0 (Jeremy Evans) via ruby-core" <ruby-core@...>

Issue #20066 has been reported by jeremyevans0 (Jeremy Evans).

19 messages 2023/12/15

[#115764] [Ruby master Feature#20069] Buffer class in stdlib — "pynix (Pynix wang) via ruby-core" <ruby-core@...>

Issue #20069 has been reported by pynix (Pynix wang).

9 messages 2023/12/16

[#115830] [Ruby master Misc#20075] DevMeeting-2024-01-17 — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

Issue #20075 has been reported by mame (Yusuke Endoh).

9 messages 2023/12/21

[#115831] [Ruby master Bug#20076] M:N scheduler crashes on macOS with RUBY_MN_THREADS=1 — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

Issue #20076 has been reported by hsbt (Hiroshi SHIBATA).

7 messages 2023/12/21

[#115847] [Ruby master Bug#20079] alexandria testsuite began to segfault recently — "mtasaka (Mamoru TASAKA) via ruby-core" <ruby-core@...>

Issue #20079 has been reported by mtasaka (Mamoru TASAKA).

15 messages 2023/12/22

[#115864] [Ruby master Feature#20080] Implement #begin_and_end method on Range — "stuyam (Stuart Yamartino) via ruby-core" <ruby-core@...>

Issue #20080 has been reported by stuyam (Stuart Yamartino).

17 messages 2023/12/22

[#115892] [Ruby master Bug#20085] Fiber.new{ }.resume causes Segmentation fault for Ruby 3.3.0 on aarch64-linux — "oleksii (Oleksii Leonov) via ruby-core" <ruby-core@...>

Issue #20085 has been reported by oleksii (Oleksii Leonov).

27 messages 2023/12/25

[#115912] [Ruby master Bug#20090] Anonymous arguments are now syntax errors in unambiguous cases — "willcosgrove (Will Cosgrove) via ruby-core" <ruby-core@...>

Issue #20090 has been reported by willcosgrove (Will Cosgrove).

8 messages 2023/12/26

[#115919] [Ruby master Feature#20093] Syntax or keyword to reopen existing classs/modules, never to define new classs/modules — "tagomoris (Satoshi Tagomori) via ruby-core" <ruby-core@...>

Issue #20093 has been reported by tagomoris (Satoshi Tagomori).

11 messages 2023/12/27

[#115923] [Ruby master Bug#20094] Inline while loop behavior changed unexpectedly in 3.3.0 — "sisyphus_cg (Sisyphus CG) via ruby-core" <ruby-core@...>

Issue #20094 has been reported by sisyphus_cg (Sisyphus CG).

12 messages 2023/12/27

[#115925] [Ruby master Bug#20096] Ruby 3.2.2 win32/registry: Junk appended to Windows Registry String Value — "jay4rubydev (Jay M) via ruby-core" <ruby-core@...>

SXNzdWUgIzIwMDk2IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IGpheTRydWJ5ZGV2IChKYXkgTSkuDQ0K

8 messages 2023/12/27

[ruby-core:115828] [Ruby master Feature#20031] Regexp using greedy quantifier and unions on a big string uses a lot of memory

From: "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>
Date: 2023-12-21 02:20:48 UTC
List: ruby-core #115828
Issue #20031 has been updated by mame (Yusuke Endoh).

Tracker changed from Bug to Feature
Status changed from Open to Feedback
ruby -v deleted (ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux])
Backport deleted (3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN)

This is not a bug. If anyone creates a patch to improve this, we may consider it. We could reduce the stack size consumed, or add a `/.*(foo|bar)/` pattern-specific optimization, if this pattern is frequent.

----------------------------------------
Feature #20031: Regexp using greedy quantifier and unions on a big string uses a lot of memory
https://bugs.ruby-lang.org/issues/20031#change-105781

* Author: FabienChaynes (Fabien Chaynes)
* Status: Feedback
* Priority: Normal
----------------------------------------
Trying to match on some regexp using a greedy quantifier on any character (`.*`) and unions (`foo|bar`) uses a lot of memory if the string we want to match on is big.

# Reproduction code

```ruby
puts RUBY_DESCRIPTION

unless File.exist?("/tmp/fake_email.eml")
  content = "Accept-Language: fr-FR, en-US#{"0" * 50_000_000}"
  File.write("/tmp/fake_email.eml", content)
end

content = File.read("/tmp/fake_email.eml")

def print_size(context, sike_kb)
  puts "#{context}: #{(sike_kb / 1024.0).round(1)} MB"
end

print_size("baseline", `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last)

print_size("content", content.bytesize / 1024)

REGEXP_UNION = Regexp.union(
  "Accept-Language:",
  "Thread-Topic:",
).freeze
reg = /.*#{REGEXP_UNION}/
p reg

GC.start
print_size("before match?", `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last)
allocated = GC.stat(:total_allocated_objects)

p content.match(reg)

p GC.stat(:total_allocated_objects) - allocated
GC.start
print_size("after match?", `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last)
p "---"
```

Output:
```bash
$ /usr/bin/time -v ruby full_repro.rb
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
baseline: 71.3 MB
content: 47.7 MB
/.*(?-mix:Accept\-Language:|Thread\-Topic:)/
before match?: 71.3 MB
#<MatchData "Accept-Language:">
14
after match?: 71.6 MB
"---"
  Command being timed: "ruby full_repro.rb"
  User time (seconds): 1.15
  System time (seconds): 0.74
  Percent of CPU this job got: 100%
  Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.89
  Average shared text size (kbytes): 0
  Average unshared data size (kbytes): 0
  Average stack size (kbytes): 0
  Average total size (kbytes): 0
  Maximum resident set size (kbytes): 2423108
  Average resident set size (kbytes): 0
  Major (requiring I/O) page faults: 0
  Minor (reclaiming a frame) page faults: 609575
  Voluntary context switches: 52
  Involuntary context switches: 12
  Swaps: 0
  File system inputs: 0
  File system outputs: 0
  Socket messages sent: 0
  Socket messages received: 0
  Signals delivered: 0
  Page size (bytes): 4096
  Exit status: 0
```

strace:
```bash
$ sudo strace -p 588834
strace: Process 588834 attached
[...]
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8e92dd3000
mremap(0x7f8e92dd3000, 249856, 495616, MREMAP_MAYMOVE) = 0x7f8e92d5a000
mremap(0x7f8e92d5a000, 495616, 987136, MREMAP_MAYMOVE) = 0x7f8e92c69000
mremap(0x7f8e92c69000, 987136, 1970176, MREMAP_MAYMOVE) = 0x7f8e92a88000
mremap(0x7f8e92a88000, 1970176, 3936256, MREMAP_MAYMOVE) = 0x7f8e926c7000
mremap(0x7f8e926c7000, 3936256, 7868416, MREMAP_MAYMOVE) = 0x7f8e91f46000
mremap(0x7f8e91f46000, 7868416, 15732736, MREMAP_MAYMOVE) = 0x7f8e91045000
mremap(0x7f8e91045000, 15732736, 31461376, MREMAP_MAYMOVE) = 0x7f8e8a8af000
mremap(0x7f8e8a8af000, 31461376, 62918656, MREMAP_MAYMOVE) = 0x7f8e86cae000
mremap(0x7f8e86cae000, 62918656, 125833216, MREMAP_MAYMOVE) = 0x7f8e7f4ad000
mremap(0x7f8e7f4ad000, 125833216, 251662336, MREMAP_MAYMOVE) = 0x7f8e704ac000
mremap(0x7f8e704ac000, 251662336, 503320576, MREMAP_MAYMOVE) = 0x7f8e524ab000
mremap(0x7f8e524ab000, 503320576, 1006637056, MREMAP_MAYMOVE) = 0x7f8e164aa000
mremap(0x7f8e164aa000, 1006637056, 2013270016, MREMAP_MAYMOVE) = 0x7f8d9e4a9000
mremap(0x7f8d9e4a9000, 2013270016, 4026535936, MREMAP_MAYMOVE) = 0x7f8cae4a8000
mmap(NULL, 12500992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8e92224000
munmap(0x7f8cae4a8000, 4026535936)      = 0
munmap(0x7f8e92224000, 12500992)        = 0
writev(1, [{iov_base="#<MatchData \"Accept-Language:\">", iov_len=31}, {iov_base="\n", iov_len=1}], 2) = 32
[...]
```

We can see that the memory retained by Ruby is low (71.6 MB), but `content.match(reg)` used 2_423_108 kbytes (~2.4 GB) in the regex engine.

# Findings

To reproduce we need:
- a greedy quantifier on any character (`.*`)
- a union with at least two members (`foo|bar`). More members won't increase the memory consumption further
- to perform the match on a long string

The memory seems to increase linearly against the string size:
|String size |Memory consumed |
|--|--|
| 11.9 MB | 629_988 kbytes |
| 23.8 MB | 1_235_976 kbytes |
| 47.7 MB | 2_423_076 kbytes |

I was able to reproduce it on old Ruby versions as well.

@byroot helped me to put all this together and according to his investigation the memory allocation would be coming from `regexec.c:4100` (https://github.com/ruby/ruby/blob/85092ecd6f5c4d12d0cb1d6dfa7040337a4f558b/regexec.c#L4100):
```
size_t match_cache_buf_length = (num_match_cache_points >> 3) + (num_match_cache_points & 7 ? 1 : 0) + 1;
uint8_t* match_cache_buf = (uint8_t*)xmalloc(match_cache_buf_length * sizeof(uint8_t));
```

It seems surprising to use that much memory compared to the string size. Is it expected?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

In This Thread

Prev Next