From: Tanaka Akira Date: 2013-06-10T20:03:43+09:00 Subject: [ruby-core:55408] Re: [ruby-trunk - Feature #6065] Allow Bignum marshalling/unmarshalling from C API 2013/6/5 Tanaka Akira : > > At first time I see that, I guess there is no application of "nails" argument. > > Now I think I know several applications: > > * Integer#to_s(radix) for radix is power of 2 > For example, i.to_s(2) can be implemented using that with size=1 and > nails=7 and > 0 and 1 are converted to '0' and '1' later. > > * BER-compressed integer for pack (size=1 and nails=1) I implemented rb_integer_pack and rb_integer_unpack in ruby trunk. (It is declared in internal.h. So it is not public now.) I designed them as follows: /* "MS" in MSWORD and MSBYTE means "most significant" */ /* "LS" in LSWORD and LSBYTE means "least significant" */ /* For rb_integer_pack and rb_integer_unpack: */ #define INTEGER_PACK_MSWORD_FIRST 0x01 #define INTEGER_PACK_LSWORD_FIRST 0x02 #define INTEGER_PACK_MSBYTE_FIRST 0x10 #define INTEGER_PACK_LSBYTE_FIRST 0x20 #define INTEGER_PACK_NATIVE_BYTE_ORDER 0x40 /* For rb_integer_unpack: */ #define INTEGER_PACK_FORCE_BIGNUM 0x100 /* Combinations: */ #define INTEGER_PACK_LITTLE_ENDIAN \ (INTEGER_PACK_LSWORD_FIRST | \ INTEGER_PACK_LSBYTE_FIRST) #define INTEGER_PACK_BIG_ENDIAN \ (INTEGER_PACK_MSWORD_FIRST | \ INTEGER_PACK_MSBYTE_FIRST) /* * Export an integer into a buffer. * * This function fills the buffer specified by _words_ and _numwords_ as * abs(val) in the format specified by _wordsize_, _nails_ and _flags_. * * [val] Fixnum, Bignum or another integer like object which has to_int method. * [words] buffer to export abs(val). * [numwords] the size of given buffer as number of words. * [wordsize] the size of word as number of bytes. * [nails] number of padding bits in a word. * Most significant nails bits of each word are filled by zero. * [flags] bitwise or of constants which name starts "INTEGER_PACK_". * It specifies word order and byte order. * * This function returns the signedness and overflow condition as follows: * -2 : negative overflow. val <= -2**(numwords*(wordsize*CHAR_BIT-nails)) * -1 : negative without overflow. -2**(numwords*(wordsize*CHAR_BIT-nails)) < val < 0 * 0 : zero. val == 0 * 1 : positive without overflow. 0 < val < 2**(numwords*(wordsize*CHAR_BIT-nails)) * 2 : positive overflow. 2**(numwords*(wordsize*CHAR_BIT-nails)) <= val * * The least significant words of abs(val) are filled in the buffer when overflow occur. */ int rb_integer_pack(VALUE val, void *words, size_t numwords, size_t wordsize, size_t nails, int flags); /* * Import an integer into a buffer. * * [sign] signedness of the value. * -1 for non-positive. 0 or 1 for non-negative. * [words] buffer to import. * [numwords] the size of given buffer as number of words. * [wordsize] the size of word as number of bytes. * [nails] number of padding bits in a word. * Most significant nails bits of each word are ignored. * [flags] bitwise or of constants which name starts "INTEGER_PACK_". * It specifies word order and byte order. * Also, INTEGER_PACK_FORCE_BIGNUM specifies that the result will be a Bignum * even if it is representable as a Fixnum. * * This function returns the imported integer as Fixnum or Bignum. */ VALUE rb_integer_unpack(int sign, const void *words, size_t numwords, size_t wordsize, size_t nails, int flags); I also implemented two functions to calculate the require buffer size. /* * Calculate a number of bytes to be required to represent * the absolute value of the integer given as _val_. * * [val] an integer. * [nlz_bits_ret] number of leading zero bits in the most significant byte is returned if not NULL. * * This function returns ((val_numbits * CHAR_BIT + CHAR_BIT - 1) / CHAR_BIT) * where val_numbits is the number of bits of abs(val). * This function should not overflow. * * If nlz_bits_ret is not NULL, * (return_value * CHAR_BIT - val_numbits) is stored in *nlz_bits_ret. * In this case, 0 <= *nlz_bits_ret < CHAR_BIT. * */ size_t rb_absint_size(VALUE val, int *nlz_bits_ret); /* * Calculate a number of words to be required to represent * the absolute value of the integer given as _val_. * * [val] an integer. * [word_numbits] number of bits in a word. * [nlz_bits_ret] number of leading zero bits in the most significant word is returned if not NULL. * * This function returns ((val_numbits * CHAR_BIT + word_numbits - 1) / word_numbits) * where val_numbits is the number of bits of abs(val). * If it overflows, (size_t)-1 is returned. * * If nlz_bits_ret is not NULL and overflow is not occur, * (return_value * word_numbits - val_numbits) is stored in *nlz_bits_ret. * In this case, 0 <= *nlz_bits_ret < word_numbits. * */ size_t rb_absint_numwords(VALUE val, size_t word_numbits, size_t *nlz_bits_ret); Any opinions about the API? Note that packing/unpacking BER-compressed integer uses them now and they are much faster (for very big integers). 200% time ./ruby -e '[3**200000].pack("w")' ./ruby -e '[3**200000].pack("w")' 3.14s user 0.00s system 99% cpu 3.147 total trunk% time ./ruby -e '[3**200000].pack("w")' ./ruby -e '[3**200000].pack("w")' 0.02s user 0.01s system 95% cpu 0.029 total 200% time ./ruby -e '("\x81"*100000+"\x00").unpack("w")' ./ruby -e '("\x81"*100000+"\x00").unpack("w")' 4.21s user 0.03s system 99% cpu 4.251 total trunk% time ./ruby -e '("\x81"*100000+"\x00").unpack("w")' ./ruby -e '("\x81"*100000+"\x00").unpack("w")' 0.02s user 0.00s system 88% cpu 0.023 total Integer#to_s(power-of-2) is also bit faster now. 200% time ./ruby -e 'v = 3**100000; 1000.times { v.to_s(16) }' ./ruby -e 'v = 3**100000; 1000.times { v.to_s(16) }' 4.77s user 0.01s system 99% cpu 4.787 total trunk% time ./ruby -e 'v = 3**100000; 1000.times { v.to_s(16) }' ./ruby -e 'v = 3**100000; 1000.times { v.to_s(16) }' 3.16s user 0.01s system 99% cpu 3.184 total Versions: 200% ./ruby -v ruby 2.0.0p214 (2013-06-09 revision 41193) [x86_64-linux] trunk% ./ruby -v ruby 2.1.0dev (2013-06-10 trunk 41214) [x86_64-linux] -- Tanaka Akira