From: "Eregon (Benoit Daloze) via ruby-core" Date: 2023-01-06T13:17:47+00:00 Subject: [ruby-core:111686] [Ruby master Feature#18598] Add String#bytesplice Issue #18598 has been updated by Eregon (Benoit Daloze). shugo (Shugo Maeda) wrote in #note-4: > > * Do not use String and e.g. use an Array of byte values or a C extension > > I wouldn't like to implement regular expressions on Array. > > > * Use Ropes or similar implemented in Ruby, which would avoid extra copying and might not need to use byte offsets at all > > I prefer String for the reasons stated above. The typical approach is to flatten (or convert) the Rope to String before matching (whether the Rope is in Ruby or from the VM). I think that is good enough for a text editor. ---------------------------------------- Feature #18598: Add String#bytesplice https://bugs.ruby-lang.org/issues/18598#change-101085 * Author: shugo (Shugo Maeda) * Status: Closed * Priority: Normal ---------------------------------------- I withdrew the proposal of String#bytesplice in #13110 because it may cause problems if the specified offset does not land on character boundary. But how about to raise IndexError in such cases? ``` # encoding: utf-8 s = "������������������������������" s.bytesplice(9, 6, "xx") p s #=> "���������xx���������������" s.bytesplice(2, 3, "x") #=> offset 2 does not land on character boundary (IndexError) s.bytesplice(3, 4, "x") #=> offset 7 does not land on character boundary (IndexError) ``` ## Pull request https://github.com/ruby/ruby/pull/5584 ## Spec ``` bytesplice(index, length, str) -> string bytesplice(range, str) -> string ``` Replaces some or all of the content of +self+ with +str+, and returns +str+. The portion of the string affected is determined using the same criteria as String#byteslice, except that +length+ cannot be omitted. If the replacement string is not the same length as the text it is replacing, the string will be adjusted accordingly. The form that take an Integer will raise an IndexError if the value is out of range; the Range form will raise a RangeError. If the beginning or ending offset does not land on character (codepoint) boundary, an IndexError will be raised. ## Motivation On a text editor [Textbringer](https://github.com/shugo/textbringer/pull/31/files), the content of a buffer is represented by a String whose encoding is ASCII-8BIT, and `force_encoding(Encoding::UTF_8)` is called when necessary. It's because point (cursor position) and marks are represented by byte offsets for performance, and currently there is no way to modify UTF-8 strings with byte offsets. If String#bytesplice is introduced, the content of a text buffer can be represented by a UTF-8 string, and force_encoding can be removed: https://github.com/shugo/textbringer/pull/31/files -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/