From: "knu (Akinori MUSHA)" Date: 2021-11-18T04:59:10+00:00 Subject: [ruby-core:106112] [Ruby master Feature#18262] Enumerator::Lazy#partition Issue #18262 has been updated by knu (Akinori MUSHA). I agree this would be a good addition, and I think the existing users of `lazy` would understand the incompatibility this would bring is a necessary step to make `partition` more useful. However, the buffering could be a pitfall for new users. In today's developer meeting, Matz and I agreed to suggest that the behavior should be well documented. If you were dividing a huge (or infinite) list into two where one enumerator would yield a value extremely less likely than the other, the buffer could become huge. That is not straightforward from what you normally expect from "lazy", so it should be noted in the documentation. ---------------------------------------- Feature #18262: Enumerator::Lazy#partition https://bugs.ruby-lang.org/issues/18262#change-94707 * Author: zverok (Victor Shepelev) * Status: Open * Priority: Normal ---------------------------------------- (Part of my set of proposals about making `.lazy` more useful/popular.) Currently: ```ruby file = File.open('very-large-file.txt') lines_with_errors, lines_without_errors = file.lazy.partition { _1.start_with?('E:') } lines_with_errors.class # => Array, all file is read by this moment ``` This might be not very practical performance-wise and memory-wise. I am thinking that maybe returning a pair of lazy enumerators might be a good addition to `Enumerator::Lazy` Naive prototype: ```ruby class Enumerator::Lazy def partition(&block) buffer1 = [] buffer2 = [] source = self [ Enumerator.new { |y| loop do if buffer1.empty? begin item = source.next if block.call(item) y.yield(item) else buffer2.push(item) end rescue StopIteration break end else y.yield buffer1.shift end end }.lazy, Enumerator.new { |y| loop do if buffer2.empty? begin item = source.next if !block.call(item) y.yield(item) else buffer1.push(item) end rescue StopIteration break end else y.yield buffer2.shift end end }.lazy ] end end ``` Testing it: ```ruby Enumerator.produce(1) { |i| puts "processing #{i}"; i + 1 }.lazy .take(30) .partition(&:odd?) .then { |odd, even| p odd.first(3), even.first(3) } # Prints: # processing 1 # processing 2 # processing 3 # processing 4 # processing 5 # [1, 3, 5] # [2, 4, 6] ``` As you might notice by the "processing" log, it only fetched the amount of entries that was required by produced enumerators. The **drawback** would be���as my prototype implementation shows���the need of internal "buffering" (I don't think it is possible to implement lazy partition without it), but it still might be worth a shot? -- https://bugs.ruby-lang.org/ Unsubscribe: