[ruby-core:98550] [Ruby master Feature#15771] Add `String#split` option to set `split_type string` with a single space separator
From:
daniel@...42.com
Date:
2020-05-27 21:19:20 UTC
List:
ruby-core #98550
Issue #15771 has been updated by Dan0042 (Daniel DeLorme).
I think it's worth mentioning nobu's comment from the dev meeting:
> have wanted to deprecate that behavior for years, and made non-nil $/ warned.
So what about finally deprecating this behavior? If the incompatibility is too bad it's always possible to go back. sawa's use-case, rather than being about the `literal` option, is more about the benefit of treating " " as split_type string. And I remember being very surprised about the behavior of `str.split(" ")` when I started out in ruby.
Maybe:
if $VERBOSE show a warning "use nil or / /" if separator is " "
if !$VERBOSE show a warning "use nil" if separator is " " and matches differently from / /
----------------------------------------
Feature #15771: Add `String#split` option to set `split_type string` with a single space separator
https://bugs.ruby-lang.org/issues/15771#change-85843
* Author: 284km (kazuma furuhashi)
* Status: Feedback
* Priority: Normal
----------------------------------------
When `String#split`'s separator is a single space character, it executes under `split_type: awk`.
When you want to split literally by a single space `" "`, and not a sequence of space characters, you need to take special care. For example, the CSV library detours this behavior like [this](https://github.com/ruby/csv/blob/7ff57a50e81c368029fa9b664700bec4a456b81b/lib/csv/parser.rb#L508-L512):
```ruby
if @column_separator == " ".encode(@encoding)
@split_column_separator = Regexp.new(@escaped_column_separator)
else
@split_column_separator = @column_separator
end
```
Unfortunately, using a regexp here makes it slower than using a string. [The following result](https://github.com/284km/benchmarks_no_yatu#stringsplitstring-or-regexp) shows it is about nine times slower.
```sh
$ be benchmark-driver string_split_string-regexp.yml --rbenv '2.6.2'
Comparison:
string: 3161117.6 i/s
regexp: 344448.0 i/s - 9.18x slower
```
I want to add a `:literal` option to execute the method under `split_type: string` as follows:
```ruby
" a b c ".split(" ") # => ["a", "b", "c"]
" a b c ".split(" ", literal: true) # => ["", "a", "", "b", "", "", "c"]
" a b c ".split(" ", -1) # => ["a", "b", "c", ""]
" a b c ".split(" ", -1, literal: true) # => ["", "a", "", "b", "", "", "c", "", "", "", ""]
```
### Implementation
- https://github.com/284km/ruby/tree/split_space
- test code: https://github.com/284km/ruby/blob/split_space/test/ruby/test_string.rb#L1708-L1713
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>