From: "zverok (Victor Shepelev)" Date: 2021-11-20T10:17:24+00:00 Subject: [ruby-core:106189] [Ruby master Feature#18262] Enumerator::Lazy#partition Issue #18262 has been updated by zverok (Victor Shepelev). @Eregon your version is not the same for effectfull enumerators (where `.lazy` is extremely useful): ``` require 'stringio' str = StringIO.new(<<~ROWS) 1: OK 2: Err 3: OK 4: Err 5: OK 6: Err ROWS err, ok = str.each_line(chomp: true).lazy.partition { _1.include?('Err') } p [err.first(2), ok.first(2)] # mine: [["2: Err", "4: Err"], ["1: OK", "3: OK"]] # yours: [["2: Err", "4: Err"], ["5: OK"]] ``` ...because yours is consuming both kinds of rows while producing `err`s. I agree that if not a "whole point", the "it consumes much less memory" is an implicit expectation of a lazy enumerator. But I believe having (well-documented) quirk in `partition` is better than not having lazy `partition` at all. ---------------------------------------- Feature #18262: Enumerator::Lazy#partition https://bugs.ruby-lang.org/issues/18262#change-94796 * Author: zverok (Victor Shepelev) * Status: Open * Priority: Normal ---------------------------------------- (Part of my set of proposals about making `.lazy` more useful/popular.) Currently: ```ruby file = File.open('very-large-file.txt') lines_with_errors, lines_without_errors = file.lazy.partition { _1.start_with?('E:') } lines_with_errors.class # => Array, all file is read by this moment ``` This might be not very practical performance-wise and memory-wise. I am thinking that maybe returning a pair of lazy enumerators might be a good addition to `Enumerator::Lazy` Naive prototype: ```ruby class Enumerator::Lazy def partition(&block) buffer1 = [] buffer2 = [] source = self [ Enumerator.new { |y| loop do if buffer1.empty? begin item = source.next if block.call(item) y.yield(item) else buffer2.push(item) end rescue StopIteration break end else y.yield buffer1.shift end end }.lazy, Enumerator.new { |y| loop do if buffer2.empty? begin item = source.next if !block.call(item) y.yield(item) else buffer1.push(item) end rescue StopIteration break end else y.yield buffer2.shift end end }.lazy ] end end ``` Testing it: ```ruby Enumerator.produce(1) { |i| puts "processing #{i}"; i + 1 }.lazy .take(30) .partition(&:odd?) .then { |odd, even| p odd.first(3), even.first(3) } # Prints: # processing 1 # processing 2 # processing 3 # processing 4 # processing 5 # [1, 3, 5] # [2, 4, 6] ``` As you might notice by the "processing" log, it only fetched the amount of entries that was required by produced enumerators. The **drawback** would be���as my prototype implementation shows���the need of internal "buffering" (I don't think it is possible to implement lazy partition without it), but it still might be worth a shot? -- https://bugs.ruby-lang.org/ Unsubscribe: