[#45426] [ruby-trunk - Feature #6546][Open] Net::HTTP to check for HTTP_PROXY environment setting. — "dekz (Jacob Evans)" <dekzter@...>

14 messages 2012/06/04

[#45431] [ruby-trunk - Bug #6548][Open] Rake doesn't ignore arguments after -- — "rosenfeld (Rodrigo Rosenfeld Rosas)" <rr.rosas@...>

12 messages 2012/06/05

[#45474] [ANN] Request for "slide-show" of your feature proposal — Yusuke Endoh <mame@...>

(Japanese later; 日本語が後にあります)

18 messages 2012/06/07

[#45563] [ruby-trunk - Bug #6573][Open] Webrick test failures — "bkabrda (Bohuslav Kabrda)" <bkabrda@...>

19 messages 2012/06/11

[#45567] [ruby-trunk - Bug #6575][Open] Thread#kill sets rb_errinfo() to Fixnum 8 after rb_protect(function, data, &error_tag) — ibc (Iñaki Baz Castillo) <ibc@...>

9 messages 2012/06/11

[#45647] [ruby-trunk - Bug #6592][Open] test_call_double(DL::TestDL) fails on ARM HardFP — "vo.x (Vit Ondruch)" <v.ondruch@...>

15 messages 2012/06/14

[#45657] [ruby-trunk - Feature #6594][Open] Integrated Functor — "trans (Thomas Sawyer)" <transfire@...>

20 messages 2012/06/15

[#45664] [ruby-trunk - Bug #6596][Open] New method for Arrays : Array#index — "robin850 (Robin Dupret)" <robin.dupret@...>

20 messages 2012/06/15

[#45694] [ruby-trunk - Feature #6602][Open] Tail call optimization: enable by default? — "ko1 (Koichi Sasada)" <redmine@...>

12 messages 2012/06/18

[#45715] [ruby-trunk - Feature #6609][Open] Toplevel as self extended module — "trans (Thomas Sawyer)" <transfire@...>

17 messages 2012/06/19

[#45732] [ruby-trunk - Bug #6614][Open] GC doesn't collect objects bound to (collectable) proc — "rogerdpack (Roger Pack)" <rogerpack2005@...>

9 messages 2012/06/20

[#45733] [ruby-trunk - Feature #6615][Open] Release GVL in zlib when calling inflate() or deflate() — "drbrain (Eric Hodel)" <drbrain@...7.net>

12 messages 2012/06/21

[#45735] [ruby-trunk - Bug #6616][Open] MinGW: cannot build extensions or run tests due changes in exec_arg? — "luislavena (Luis Lavena)" <luislavena@...>

9 messages 2012/06/21

[#45798] [ruby-trunk - Bug #6634][Open] Deadlock with join and ConditionVariable — "meh. (meh. I don't care)" <meh@...>

20 messages 2012/06/23

[#45805] [ruby-trunk - Feature #6636][Open] Enumerable#size — "marcandre (Marc-Andre Lafortune)" <ruby-core@...>

15 messages 2012/06/23

[#45864] [ruby-trunk - Bug #6647][Open] Exceptions raised in threads should be logged — "headius (Charles Nutter)" <headius@...>

71 messages 2012/06/25

[#45902] [ruby-trunk - Bug #6653][Open] 1.9.2/1.9.3 exhibit SEGV with many threads+tcp connections — "erikh (Erik Hollensbe)" <erik@...>

11 messages 2012/06/26

[#45960] [ruby-trunk - Feature #6669][Open] A method like Hash#map but returns hash — "yhara (Yutaka HARA)" <redmine@...>

18 messages 2012/06/29

[#45963] [ruby-trunk - Feature #6670][Open] str.chars.last should be possible — "yhara (Yutaka HARA)" <redmine@...>

36 messages 2012/06/29

[#46021] [ruby-trunk - Feature #6679][Open] Default Ruby source file encoding to utf-8 — "claytrump (Clay Trump)" <clay.trump@...>

21 messages 2012/06/30

[ruby-core:45724] [ruby-trunk - Feature #6612][Open] Add streaming inflate and deflate to Zlib

From: "drbrain (Eric Hodel)" <drbrain@...7.net>
Date: 2012-06-20 06:35:14 UTC
List: ruby-core #45724
Issue #6612 has been reported by drbrain (Eric Hodel).

----------------------------------------
Feature #6612: Add streaming inflate and deflate to Zlib
https://bugs.ruby-lang.org/issues/6612

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee: 
Category: ext
Target version: 2.0.0


=begin
Currently there is no way to control the inflate output size of a deflate stream from ruby.  For example, 50MB of "0"s compress to just under 50KB:

  ruby -rzlib -e 'p Zlib.deflate("0" * 50_000_000, Zlib::BEST_COMPRESSION).length' #=> 48611

When inflating this, 50MB are allocated which is undesirable.

The attached patch allows Zlib::Inflate#inflate, Zlib::Deflate#deflate, Zlib::ZStream#finish and other methods that end up calling zstream_expand_buffer or zstream_detach_buffer to be called with a block which gives the user more control over the amount of memory allocated in their process. (A fixed maximum chunk size of 16384 bytes is used in this patch.)

The new API looks like:

  z = Zlib::Inflate.new
  z.inflate deflate_string do |chunk|
    # write chunk to output stream
    # nil is returned from inflate
  end
  
  footer = z.finish
  # flush buffer to output stream

Here's a comparison of resource usage:

  $ dd if=/dev/zero of=/dev/stdout bs=1m count=1024 | gzip -c > 1G.gz
  $ cat test.rb 
  require 'zlib'
  
  gzipped = File.read '1G.gz'
  
  z = Zlib::Inflate.new Zlib::MAX_WBITS + 32
  
  z.inflate gzipped do |chunk|
    # do nothing with chunk
    # current ruby will ignore this block
  end
  
  z.finish

With existing ruby (trunk 35758) 1GB of memory is allocated:

  $ /usr/bin/time -l ruby20 test.rb
          3.86 real         3.24 user         0.61 sys
  1080475648  maximum resident set size
           0  average shared memory size
           0  average unshared data size
           0  average unshared stack size
      263860  page reclaims
           0  page faults
           0  swaps
           0  block input operations
           0  block output operations
           0  messages sent
           0  messages received
           0  signals received
           1  voluntary context switches
          34  involuntary context switches

With the patch only 37MB are allocated:

  $ /usr/bin/time -l ./ruby20 -I .ext/x86_64-darwin11.4.0 test.rb
          3.47 real         3.43 user         0.03 sys
    36724736  maximum resident set size
           0  average shared memory size
           0  average unshared data size
           0  average unshared stack size
        8981  page reclaims
           0  page faults
           0  swaps
           0  block input operations
           9  block output operations
           0  messages sent
           0  messages received
           0  signals received
           1  voluntary context switches
          90  involuntary context switches

Some notes about this patch:

zstream_expand_buffer() yields each block of inflate (or deflate) output instead of expanding the buffer (as in non-block output).  zstream_expand_buffer_into() is used to reduce duplication as well.

zstream_detach_buffer() yields the buffer and returns nil if a block was given.  OBJ_INFECT was moved into zstream_detach_buffer() as well.

A new stream flag ZSTREAM_FLAG_GZFILE is added to prevent a block given to a gzip method that invokes zstream_expand_buffer() or zstream_detach_buffer() from yielding.

To ensure proper handling of ((|z->buf|)) such as resetting the buffer after yielding a chunk, rb_protect is used in zstream_expand_buffer and zstream_run.  In zstream_run, the for loop has been extracted to zstream_run_loop.

In zstream_run_loop, Zlib::BufError is no longer raised when flushing (finishing) the stream.  This allows the stream to be flushed into one string when the inflate block is interrupted.

=end



-- 
http://bugs.ruby-lang.org/

In This Thread

Prev Next