From: "vzdor (Vladimir Zdorovenco) via ruby-core" <ruby-core@...>
Date: 2023-01-21T16:36:57+00:00
Subject: [ruby-core:111952] [Ruby master Bug#19361] String#[Integer] is orders slower for strings with some UTF characters

Issue #19361 has been reported by vzdor (Vladimir Zdorovenco).

----------------------------------------
Bug #19361: String#[Integer] is orders slower for strings with some UTF characters
https://bugs.ruby-lang.org/issues/19361

* Author: vzdor (Vladimir Zdorovenco)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
#[] is not only slower compared to itself, but slower compared to #each_char.

seq1
```
# s = '*' * 10e4
s = '��' * 10e4
count = 0
size = s.size
while count < size
  s[count]
  count += 1
end
```

seq2
```
ss = '��' * 10e4
s = ss.chars
count = 0
size = s.size
while count < size
  s[count]
  count += 1
end
```

On my computer seq1 runs in 11 seconds and seq2 in 0.5 second. It can
be '���' symbol, too, I'm sure not only those symbols.

I would not have assumed seq1 can be slower, I do not call s[n] more
than once for some n.

It is a Debian package with some patches, but they do not touch string.c.

$ locale
LANG=en_US.UTF-8




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/