From: Martin Bosslet <Martin.Bosslet@...> Date: 2011-11-28T19:19:12+09:00 Subject: [ruby-core:41357] [ruby-trunk - Feature #5677] IO C API Issue #5677 has been updated by Martin Bosslet. Eric Wong wrote: First off, thanks for your comments. > Martin Bosslet <Martin.Bosslet@googlemail.com> wrote: > > This is related to the proposal in [ruby-core:41321][1]. > > > > I'd like to take advantage of streaming IO in an extension I am > > working on. The problem I'm having is that I don't want to call > > IO#read on the rb_funcall level because that would kill the > > performance due to wrapping the bytes into Ruby objects back and > > forth again. > > Is starting with Ruby String objects (with binary encoding) and then > having read(2)/write(2) hit RSTRING_PTR not possible? You mean reading String chunks from the underlying IO? I'm afraid not. The only way I could right now is calling the Ruby methods for IO#read/write using rb_funcall. But there's a lot of overhead involved, VM roundtrip plus lots of short-lived objects that trigger GC. It would likely end up being slower than the current ASN1.decode, a situation I'd like to avoid. > > I saw two solutions to my problem: > > > > 1. Duplicating the file descriptor to obtain a pure FILE* > > like it is done in ext/openssl/ossl_bio.c[2] and continue > > working on the raw FILE*. > > That may be from the old 1.8 days when all IO objects wrapped FILE *. > It might be better to use BIO_new_fd() nowadays instead since 1.9 > generally prefers bare file descriptors (for all fd > 2). Good point, I will look into using it instead. > > 2. Since I really only need to read and write on the stream, > > I was looking for public Ruby C API that would support me > > in the process, and I found > > > > - ssize_t rb_io_bufwrite(VALUE io, const void *buf, size_t size) > > - ssize_t rb_io_bufread(VALUE io, void *buf, size_t size) > > Is userspace buffering really necessary in your case? No, not really, but currently it's the only way the C API allows me to do C-level streaming on an IO. > If you're working with sockets/pipes, I would reckon not (Ruby already > defaults to IO#sync=false on sockets/pipes when writing). If you're > reading (and probably parsing), you would need to do your own read > buffering anyways, no? see below > > I think both cases are valid use cases, 1. is likely necessary > > if there is the need to pass a FILE* on to an external C library, > > It's not easily possible to share userspace buffers in FILE * with > userspace buffers in rb_io_t. Userspace buffering is pretty miserable > and error-prone whenever/wherever IPC is concerned. > > > 2. is for cases like mine where there is the need to operate > > on raw C data types for performance reasons. > > It depends on what you're doing, but if performance is a concern you > should try to work on largish chunks off the file descriptor and > skip the userspace buffering stages. Userspace buffering can improve > performance by reducing syscalls, but it can also double the memory > bandwidth required to do things. Yes, I would have to do my own buffering during parsing in any case, so double buffering means unneccesary waste of memory. I guess making a clean cut and working on the file descriptor directly seems like the best solution. Still, I am wondering if there is the need for a low-level C API for doing IO on Ruby IO objects, or is the "clean cut approach" using the file descriptor directly the recommended solution in any case? ---------------------------------------- Feature #5677: IO C API http://redmine.ruby-lang.org/issues/5677 Author: Martin Bosslet Status: Open Priority: Normal Assignee: Category: core Target version: 2.0.0 This is related to the proposal in [ruby-core:41321][1]. I'd like to take advantage of streaming IO in an extension I am working on. The problem I'm having is that I don't want to call IO#read on the rb_funcall level because that would kill the performance due to wrapping the bytes into Ruby objects back and forth again. I saw two solutions to my problem: 1. Duplicating the file descriptor to obtain a pure FILE* like it is done in ext/openssl/ossl_bio.c[2] and continue working on the raw FILE*. 2. Since I really only need to read and write on the stream, I was looking for public Ruby C API that would support me in the process, and I found - ssize_t rb_io_bufwrite(VALUE io, const void *buf, size_t size) - ssize_t rb_io_bufread(VALUE io, void *buf, size_t size) I think both cases are valid use cases, 1. is likely necessary if there is the need to pass a FILE* on to an external C library, 2. is for cases like mine where there is the need to operate on raw C data types for performance reasons. The problem, though, is that only rb_io_bufwrite is public API in io.h, rb_io_bufread is declared private in internal.h and rb_cloexec_dup is semi-public in intern.h. Could we make rb_io_bufread public API in io.h as well? What about rb_cloexec_dup? [1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/41321 [2] https://github.com/ruby/ruby/blob/trunk/ext/openssl/ossl_bio.c#L17 -- http://redmine.ruby-lang.org