From: "byroot (Jean Boussier) via ruby-core" Date: 2024-06-19T11:03:59+00:00 Subject: [ruby-core:118349] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value Issue #20585 has been updated by byroot (Jean Boussier). Most of this comes from: https://github.com/ruby/ruby/pull/8825 Long story short, `capacity` is a bit confusing because since Ruby strings are null terminated, there is always at least one extra byte needed. So it's debatable whether the terminating byte is accounted for in the capacity. I see how when using `String.new(capacity:)`, the goal is to avoid reallocation, so if you precomputed the final string size, that might defeat the purpose. The other side of the coin though, is that if you use sizes like `4096` hoping to fit in a specific size in memory, the extra terminator byte make it not behave as you'd hoped. > If the initial string and its bytesize are specified, about twice the size is allocated. I need to dig more to answer this one. ---------------------------------------- Bug #20585: Size of memory allocated by String.new(:capacity) is different from the specified value https://bugs.ruby-lang.org/issues/20585#change-108854 * Author: os (Shigeki OHARA) * Status: Open * ruby -v: ruby 3.3.2 (2024-05-30 revision e5a195edf6) [x86_64-freebsd14.0] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- IMHO, if :capacity is specified in String.new, capa will be its value. In fact, Ruby 3.2 seems to allocate the size as specified. ``` % cat string_capacity.rb unless /\A3\.[23]\./ =~ RUBY_VERSION raise NotImplementedError, 'Not Supported Ruby Version' end require 'inline' class String def super_inspect self.class.superclass.instance_method(:inspect).bind(self).call end inline do |builder| builder.include '' builder.add_compile_flags '-Wall' builder.c_raw <<~CODE VALUE capacity(int argc, VALUE *argv, VALUE self) { struct RString *rstring = RSTRING(self); if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) { return rb_to_symbol(rb_str_new_cstr("EMBED")); } else { if (RBASIC(self)->flags & ELTS_SHARED) { return rb_to_symbol(rb_str_new_cstr("SHARED")); } else { return LONG2NUM(rstring->as.heap.aux.capa); } } return Qnil; /* NOTREACHED */ } CODE end end ``` ``` % irb -I. -rstring_capacity irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION] => ["x86_64-freebsd14.0", "3.2.4"] irb(main):002:0> String.new('', capacity: 1024).capacity => 1024 irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity => 1024 irb(main):004:0> ``` This is what I expect. However, Ruby 3.3 seems to behave differently. ``` % irb -I. -rstring_capacity irb(main):001> [RUBY_PLATFORM, RUBY_VERSION] => ["x86_64-freebsd14.0", "3.3.2"] irb(main):002> String.new('', capacity: 1024).capacity => 1023 irb(main):003> String.new('*'*1024, capacity: 1024).capacity => 2047 irb(main):004> ``` * If only :capacity is specified, one byte less is allocated. * If the initial string and its bytesize are specified, about twice the size is allocated. Is this intentional? -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/