From: sawadatsuyoshi@... Date: 2020-05-27T04:52:42+00:00 Subject: [ruby-core:98540] [Ruby master Feature#15771] Add `String#split` option to set `split_type string` with a single space separator Issue #15771 has been updated by sawa (Tsuyoshi Sawada). Dan0042 (Daniel DeLorme) wrote in #note-10: > That optimization is nice to have, but I think the point of this ticket is that it's currently not possible to have an arbitrary string separator. I agree. Dan0042 (Daniel DeLorme) wrote in #note-10: > sawa (Tsuyoshi Sawada) wrote in #note-7: > > My guess is that, perhaps, even now, it is very rare to use a single space string argument with the expectation to match single or multiple spaces. > > It looks like using a single space string argument is not so rare: > https://pastebin.com/pPyEf2GA I didn't write that using a single space string argument is rare, I wrote that using a single space string argument **with the expectation to match single or multiple spaces** is rare. In fact, my guess is that using a single space string argument is frequent, and that most of them expect to match only single space**. I have not confirmed this. If it turns out to be correct, then that would constitute the use cases that asked for. ---------------------------------------- Feature #15771: Add `String#split` option to set `split_type string` with a single space separator https://bugs.ruby-lang.org/issues/15771#change-85831 * Author: 284km (kazuma furuhashi) * Status: Feedback * Priority: Normal ---------------------------------------- When `String#split`'s separator is a single space character, it executes under `split_type: awk`. When you want to split literally by a single space `" "`, and not a sequence of space characters, you need to take special care. For example, the CSV library detours this behavior like [this](https://github.com/ruby/csv/blob/7ff57a50e81c368029fa9b664700bec4a456b81b/lib/csv/parser.rb#L508-L512): ```ruby if @column_separator == " ".encode(@encoding) @split_column_separator = Regexp.new(@escaped_column_separator) else @split_column_separator = @column_separator end ``` Unfortunately, using a regexp here makes it slower than using a string. [The following result](https://github.com/284km/benchmarks_no_yatu#stringsplitstring-or-regexp) shows it is about nine times slower. ```sh $ be benchmark-driver string_split_string-regexp.yml --rbenv '2.6.2' Comparison: string: 3161117.6 i/s regexp: 344448.0 i/s - 9.18x slower ``` I want to add a `:literal` option to execute the method under `split_type: string` as follows: ```ruby " a b c ".split(" ") # => ["a", "b", "c"] " a b c ".split(" ", literal: true) # => ["", "a", "", "b", "", "", "c"] " a b c ".split(" ", -1) # => ["a", "b", "c", ""] " a b c ".split(" ", -1, literal: true) # => ["", "a", "", "b", "", "", "c", "", "", "", ""] ``` ### Implementation - https://github.com/284km/ruby/tree/split_space - test code: https://github.com/284km/ruby/blob/split_space/test/ruby/test_string.rb#L1708-L1713 -- https://bugs.ruby-lang.org/ Unsubscribe: