From: mame@... Date: 2018-12-30T10:52:52+00:00 Subject: [ruby-core:90824] [Ruby trunk Feature#15485] Refactor String#split Issue #15485 has been updated by mame (Yusuke Endoh). zverok (Victor Shepelev) wrote: > ```ruby > result = [] > lines.split("\n") { |ln| result << ln if ln.match?(PATTERN) } > ``` > > This is very unidiomatic and unlike most of other methods that accept both block and no-block forms (Personally I like this style because it is very clear and explicit. Anyway.) There is `Object#to_enum` for your use case. lines.to_enum(:split, "\n").select {|ln| ln.match?(PATTERN) } `Object#to_enum` is applicable to all methods that accept a block. This is a more general solution than adding `enumerator:` keyword to individual methods. ---------------------------------------- Feature #15485: Refactor String#split https://bugs.ruby-lang.org/issues/15485#change-76006 * Author: zverok (Victor Shepelev) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- In #4780, new "block form" of `#split` was introduced. It behaves this way: ```ruby "several\nlong\nlines".split("\n") { |part| puts part if part.start_with?('l') } # prints: # long # lines # => "several\nlong\nlines" ``` Justification is stated as: "If the string is very long, and I only need to play with the split string one by one, this will not create a useless expensive array." I understand the justification, but strongly believe that **implementation is unfortunate**. In the current implementation, the only way to "play with the split string one by one" is side-effect-full, like this: ```ruby result = [] lines.split("\n") { |ln| result << ln if ln.match?(PATTERN) } ``` This is very unidiomatic and unlike most of other methods that accept both block and no-block forms (it is understandable as original ticket is 7 years old, community practices were pretty different back then). Our typical modern solution of the same problem is **enumerators**. I propose redefining method as following: ```ruby lines.split("\n") # => Array, calculated immediately lines.split("\n", enumerator: true) # => Enumerator, yielding split results one by one ``` It will allow all kind of idiomatic processing without any intermediate Array creation, like: ```ruby lines.split("\n", enumerator: true).take_while { |ln| ln == '__END__' } lines.split("\n", enumerator: true).grep(PATTERN) # ...and so on... ``` One more thing to note, that this call-sequence underlines "just an optimization" nature of the change: When you have "too large string" to process, you just add `enumerator: true` to your code without changing anything else. PS: We can't change `split` to return enumerator **always**, because it would break a lot of sane code like `lines.split("\n").join("\r\n")` -- https://bugs.ruby-lang.org/ Unsubscribe: