From: nobu@... Date: 2014-07-03T01:26:59+00:00 Subject: [ruby-core:63526] [ruby-trunk - Bug #9847] Cannot create new String when using File.read(size, buffer) Issue #9847 has been updated by Nobuyoshi Nakada. Duplicated by Bug #10007: IO#read does not respect String Copy-On-Write in some cases added ---------------------------------------- Bug #9847: Cannot create new String when using File.read(size, buffer) https://bugs.ruby-lang.org/issues/9847#change-47575 * Author: Kenneth Guerin * Status: Closed * Priority: Normal * Assignee: * Category: core * Target version: current: 2.2.0 * ruby -v: 2.1.2p95 * Backport: 2.0.0: DONE, 2.1: DONE ---------------------------------------- This bug was first noticed in version 1.9.2 and is still present in 2.1.2p95 The attached script does the following to highlight this bug: - create a file of 13 fixed-length records of a specific size: all records contain a repeated letter, A for the first, B for the second, through M - Test `#1`: read all records from the file and store them into an array of Strings, using `File.read(size)` and storing via '`cache << buffer`' - Test `#2`: read all records from the file and store them into an Array of Strings, using `File.read(size, buffer)` and storing via '`cache << String.new(buffer)`'; buffer will be reused during each read, cache will hold copies - test cycle is run using a record length of 23 and a record length of 24, highlighting Ruby's optimization of short strings Results of running this script: - with a record size of 23, Tests `#1` & `#2` show the cache containing all records: `[ A, B, ... M ]` - with a record size of 24, Test `#1` shows the cache containing all records: `[ A, B, ... M ]` - with a record size of 24, Test `#2` shows the cache containing: `[ M, M, ... M ]` Diagnosis & Notes: - with a record size > 23 and reading a file using `File.read(size, buffer)`, buffer is in such a state as to prevent a new unique String from being derived - variations of this script showed that `String.new` was a creating new String object on each record read, but the contents of all of the Strings stored in the cache Array were being overwritten on each call to `File.read` - this shows that `String.new(buffer)` is creating new String objects, but that the underlying values of all strings based on buffer were sharing the same internal memory - this behavior exists in the "long string" variation of String; short optimized Strings do not share this property ---Files-------------------------------- strbug.rb (2.07 KB) -- https://bugs.ruby-lang.org/