From: Charles Nutter Date: 2011-01-23T03:42:20+09:00 Subject: [ruby-core:34793] [Ruby 1.9-Bug#4298] Duration of calling String#[] with the same index is strangely related to string length. Issue #4298 has been updated by Charles Nutter. Perhaps I'm mistaken, but isn't RSTRING_LEN the byte length of the string? Isn't that always guaranteed to be >= str_strlen, which is the character length? And doesn't this patch basically break completely if beg is greater than the encoded character length but not greater than the raw byte length? It seems to me that the problem is more complicated. For many encodings you must either save the new length each time you mutate it (doing str_strlen on every mutation) or recalculate it as needed (such as in rb_str_substr). It seems ruby-core has chosen the latter, and the behavior also exists in JRuby (since we have tried to mimic behavior very closely). This is a peril of using variable-length encodings like UTF-8. str_strlen does do a coderange check, to see if it can do a simple byte offset, but if it can't it must walk the characters one at a time. As an experiment, I modified the benchmark to encode the text as UTF-32BE, so there would be a constant size for each character. And as expected, the performance improves dramatically: # encoding: utf-8 require 'benchmark' s1 = "������������������!0123456789".encode('UTF-32BE') puts Benchmark.measure { 1000.times { s1[1] } } s2 = "������������������!0123456789".encode('UTF-32BE') * 100_000 puts Benchmark.measure { 1000.times { s2[1] } } Results: ~/projects/jruby ��� ruby1.9 strlen_thing.rb 0.000000 0.000000 0.000000 ( 0.000283) 0.000000 0.000000 0.000000 ( 0.000259) I believe this is one reason Python 3+ allows you to force all strings to be either UTF-8 or UTF-32 in-memory, and only supports those two encodings. It is also one of the original reasons Java and .NET chose to use UTF-16, though that obviously became a small problem when Unicode expanded to 24 bits... ---------------------------------------- http://redmine.ruby-lang.org/issues/show/4298 ---------------------------------------- http://redmine.ruby-lang.org