From: "Dan0042 (Daniel DeLorme)" Date: 2021-11-20T04:12:26+00:00 Subject: [ruby-core:106187] [Ruby master Feature#18262] Enumerator::Lazy#partition Issue #18262 has been updated by Dan0042 (Daniel DeLorme). I wouldn't say that constant memory is the "whole point" of Enumerator::Lazy. It's more about performing the minimum amount of computation needed, only when needed. Executing the block twice for each element is not minimal, and much more surprising to me than any amount of buffering. What if there are side effects in the block? ---------------------------------------- Feature #18262: Enumerator::Lazy#partition https://bugs.ruby-lang.org/issues/18262#change-94795 * Author: zverok (Victor Shepelev) * Status: Open * Priority: Normal ---------------------------------------- (Part of my set of proposals about making `.lazy` more useful/popular.) Currently: ```ruby file = File.open('very-large-file.txt') lines_with_errors, lines_without_errors = file.lazy.partition { _1.start_with?('E:') } lines_with_errors.class # => Array, all file is read by this moment ``` This might be not very practical performance-wise and memory-wise. I am thinking that maybe returning a pair of lazy enumerators might be a good addition to `Enumerator::Lazy` Naive prototype: ```ruby class Enumerator::Lazy def partition(&block) buffer1 = [] buffer2 = [] source = self [ Enumerator.new { |y| loop do if buffer1.empty? begin item = source.next if block.call(item) y.yield(item) else buffer2.push(item) end rescue StopIteration break end else y.yield buffer1.shift end end }.lazy, Enumerator.new { |y| loop do if buffer2.empty? begin item = source.next if !block.call(item) y.yield(item) else buffer1.push(item) end rescue StopIteration break end else y.yield buffer2.shift end end }.lazy ] end end ``` Testing it: ```ruby Enumerator.produce(1) { |i| puts "processing #{i}"; i + 1 }.lazy .take(30) .partition(&:odd?) .then { |odd, even| p odd.first(3), even.first(3) } # Prints: # processing 1 # processing 2 # processing 3 # processing 4 # processing 5 # [1, 3, 5] # [2, 4, 6] ``` As you might notice by the "processing" log, it only fetched the amount of entries that was required by produced enumerators. The **drawback** would be���as my prototype implementation shows���the need of internal "buffering" (I don't think it is possible to implement lazy partition without it), but it still might be worth a shot? -- https://bugs.ruby-lang.org/ Unsubscribe: