From: k.furuhashi10@... Date: 2019-04-16T04:42:36+00:00 Subject: [ruby-core:92301] [Ruby trunk Feature#15771] Add `String#split` option to set split_type string when a single space separator Issue #15771 has been reported by 284km (kazuma furuhashi). ---------------------------------------- Feature #15771: Add `String#split` option to set split_type string when a single space separator https://bugs.ruby-lang.org/issues/15771 * Author: 284km (kazuma furuhashi) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- In String#split, when separator is a single space character, it execute as split_type: awk. For example, CSV library handles it like this. https://github.com/ruby/csv/blob/7ff57a50e81c368029fa9b664700bec4a456b81b/lib/csv/parser.rb#L508-L512 ``` if @column_separator == " ".encode(@encoding) @split_column_separator = Regexp.new(@escaped_column_separator) else @split_column_separator = @column_separator end ``` Unfortunately, in this case regexp is slower than string. For example, the following result is about 9 times slower. https://github.com/284km/benchmarks_no_yatu#stringsplitstring-or-regexp ``` $ be benchmark-driver string_split_string-regexp.yml --rbenv '2.6.2' Comparison: string: 3161117.6 i/s regexp: 344448.0 i/s - 9.18x slower ``` So I want to add the :literal option to run as split_type: string. # Implementation - https://github.com/284km/ruby/tree/split_space - test code: https://github.com/284km/ruby/blob/split_space/test/ruby/test_string.rb#L1708-L1713 This change will result in the following: ``` " a b c ".split(" ") => ["a", "b", "c"] " a b c ".split(" ", -1) => ["a", "b", "c", ""] " a b c ".split(" ", literal: true) => ["", "a", "", "b", "", "", "c"] " a b c ".split(" ", -1, literal: true) => ["", "a", "", "b", "", "", "c", "", "", "", ""] ``` -- https://bugs.ruby-lang.org/ Unsubscribe: