From: Martin Bosslet <Martin.Bosslet@...>
Date: 2011-11-28T19:19:12+09:00
Subject: [ruby-core:41357] [ruby-trunk - Feature #5677] IO C API


Issue #5677 has been updated by Martin Bosslet.


Eric Wong wrote:

First off, thanks for your comments.

> Martin Bosslet <Martin.Bosslet@googlemail.com> wrote:
>  > This is related to the proposal in [ruby-core:41321][1].
>  > 
>  > I'd like to take advantage of streaming IO in an extension I am
>  > working on. The problem I'm having is that I don't want to call
>  > IO#read on the rb_funcall level because that would kill the
>  > performance due to wrapping the bytes into Ruby objects back and
>  > forth again.
>  
>  Is starting with Ruby String objects (with binary encoding) and then
>  having read(2)/write(2) hit RSTRING_PTR not possible?

You mean reading String chunks from the underlying IO? I'm afraid not.
The only way I could right now is calling the Ruby methods for 
IO#read/write using rb_funcall. But there's a lot of overhead involved, 
VM roundtrip plus lots of short-lived objects that trigger GC. It would 
likely end up being slower than the current ASN1.decode, a situation I'd 
like to avoid.
  
>  > I saw two solutions to my problem:
>  > 
>  > 1. Duplicating the file descriptor to obtain a pure FILE*
>  > like it is done in ext/openssl/ossl_bio.c[2] and continue
>  > working on the raw FILE*.
>  
>  That may be from the old 1.8 days when all IO objects wrapped FILE *.
>  It might be better to use BIO_new_fd() nowadays instead since 1.9
>  generally prefers bare file descriptors (for all fd > 2).

Good point, I will look into using it instead.

>  > 2. Since I really only need to read and write on the stream,
>  > I was looking for public Ruby C API that would support me
>  > in the process, and I found
>  > 
>  >  - ssize_t rb_io_bufwrite(VALUE io, const void *buf, size_t size)
>  >  - ssize_t rb_io_bufread(VALUE io, void *buf, size_t size)
>  
>  Is userspace buffering really necessary in your case?

No, not really, but currently it's the only way the C API allows
me to do C-level streaming on an IO. 
  
>  If you're working with sockets/pipes, I would reckon not (Ruby already
>  defaults to IO#sync=false on sockets/pipes when writing).  If you're
>  reading (and probably parsing), you would need to do your own read
>  buffering anyways, no?

see below

>  > I think both cases are valid use cases, 1. is likely necessary
>  > if there is the need to pass a FILE* on to an external C library,
>  
>  It's not easily possible to share userspace buffers in FILE * with
>  userspace buffers in rb_io_t.  Userspace buffering is pretty miserable
>  and error-prone whenever/wherever IPC is concerned.
>  
>  > 2. is for cases like mine where there is the need to operate
>  > on raw C data types for performance reasons.
>  
>  It depends on what you're doing, but if performance is a concern you
>  should try to work on largish chunks off the file descriptor and
>  skip the userspace buffering stages.  Userspace buffering can improve
>  performance by reducing syscalls, but it can also double the memory
>  bandwidth required to do things.

Yes, I would have to do my own buffering during parsing in any case, so
double buffering means unneccesary waste of memory. I guess making a clean
cut and working on the file descriptor directly seems like the best solution.

Still, I am wondering if there is the need for a low-level C API for doing
IO on Ruby IO objects, or is the "clean cut approach" using the file descriptor
directly the recommended solution in any case?
----------------------------------------
Feature #5677: IO C API
http://redmine.ruby-lang.org/issues/5677

Author: Martin Bosslet
Status: Open
Priority: Normal
Assignee: 
Category: core
Target version: 2.0.0


This is related to the proposal in [ruby-core:41321][1].

I'd like to take advantage of streaming IO in an extension I am
working on. The problem I'm having is that I don't want to call
IO#read on the rb_funcall level because that would kill the
performance due to wrapping the bytes into Ruby objects back and
forth again.

I saw two solutions to my problem:

1. Duplicating the file descriptor to obtain a pure FILE*
like it is done in ext/openssl/ossl_bio.c[2] and continue
working on the raw FILE*.

2. Since I really only need to read and write on the stream,
I was looking for public Ruby C API that would support me
in the process, and I found

 - ssize_t rb_io_bufwrite(VALUE io, const void *buf, size_t size)
 - ssize_t rb_io_bufread(VALUE io, void *buf, size_t size)


I think both cases are valid use cases, 1. is likely necessary
if there is the need to pass a FILE* on to an external C library,
2. is for cases like mine where there is the need to operate
on raw C data types for performance reasons.

The problem, though, is that only rb_io_bufwrite is public API in io.h,
rb_io_bufread is declared private in internal.h and rb_cloexec_dup is 
semi-public in intern.h.

Could we make rb_io_bufread public API in io.h as well? What about
rb_cloexec_dup?

[1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/41321
[2] https://github.com/ruby/ruby/blob/trunk/ext/openssl/ossl_bio.c#L17


-- 
http://redmine.ruby-lang.org