From: Joshua Ballanco Date: 2012-05-04T00:44:48+09:00 Subject: [ruby-core:44852] Re: [ruby-trunk - Feature #6361] Bitwise string operations --4fa2a7e6_38a5d054_6574 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Thursday, May 3, 2012 at 9:16 AM, =22Martin J. D=C3=BCrst=22 wrote: > On 2012/04/30 1:50, Joshua Ballanco wrote: > =20 > > I know it seems like this class is just wrapping String and always de= faulting to byte-wise operations, but it's more fundamental than that. Be= cause there is no encoding on the bytes, there will never be an encoding = error when working with them. This could be extremely useful for applicat= ions that combine bytes from multiple sources (e.g. Socket data + a file = on disk + immediate strings in code) that could potentially have differen= t encodings. By operating on bytes, you can defer the encoding checks unt= il later, if at all. > =20 > I'm not saying I'm totally against this, but =22extremely useful=22 cou= ld =20 > also mean =22too useful=22. There are clearly cases where one needs to = put =20 > things together at the byte level. But there are also quite some cases = =20 > that seem to =22just work=22 when using byte-wise operations, at least = as =20 > long as nothing else but US-ASCII gets used. Things then blow up =20 > terribly once some other characters get into the mix. > =20 > =20 So, as an addendum to the spec, what about adding a flag when doing a str= ing conversion: d.string=5Fwith=5Fencoding('UT=46-8', reject=5Fif=5Finvalid: true) So that we could ensure that the return value is always either nil or a s= tring with valid encoding. =20 > Actually, the binary/ASCII-8bit encoding is very close to a Blob. It wa= s > mostly Akira Tanaka who didn't want to distinguish between =22true=22 b= inary =20 > and ASCII-8bit, because that would have made the use of regular =20 > expressions with binary impossible or convoluted. > =20 > =20 My problem with String and ASCII-8BIT/BINARY encoding currently is that y= ou *can't* just set a string's encoding to binary and forget about encodi= ngs. You will still run into issues working with binary data using Ruby 1= .9 strings. I demonstrated the issue here: http://blade.nagaokaut.ac.jp/c= gi-bin/scat.rb/ruby/ruby-core/40269 (where I, consequently, also made a p= lea for a Data/Blob type). =20 > Despite the title of this issue, I didn't see any *bit*wise operations > (e.g. bitwise and/or/xor/not) proposed. Were you just taking them for =20 > granted=3F What about adding these to String, maybe limiting them to =20 > binary/ASCII-8bit=3F > =20 > =20 I was taking bit-wise operations for granted. Ideally, a Data/Blob type w= ould just represent N groupings of 8 1s and/or 0s, with byte-wise access = and bit-wise manipulation. i.e. Less structured than an Array, less restr= ictive than a String. Just data. --4fa2a7e6_38a5d054_6574 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
On T= hursday, May 3, 2012 at 9:16 AM, =22Martin J. D=C3=BCrst=22 wrote:=
On 2012/04/30 1:50, Joshua Balla= nco wrote:

I know = it seems like this class is just wrapping String and always defaulting to= byte-wise operations, but it's more fundamental than that. Because there= is no encoding on the bytes, there will never be an encoding error when = working with them. This could be extremely useful for applications that c= ombine bytes from multiple sources (e.g. Socket data + a file on disk + i= mmediate strings in code) that could potentially have different encodings= . By operating on bytes, you can defer the encoding checks until later, i= f at all.

I'm not saying I'm totall= y against this, but =22extremely useful=22 could
also mean =22= too useful=22. There are clearly cases where one needs to put
= things together at the byte level. But there are also quite some cases
that seem to =22just work=22 when using byte-wise operations, at= least as
long as nothing else but US-ASCII gets used. Things = then blow up
terribly once some other characters get into the = mix.

So, as an a= ddendum to the spec, what about adding a flag when doing a string convers= ion:

    d.string=5Fwith=5Fencoding('U= T=46-8', reject=5Fif=5Finvalid: true)

So that we= could ensure that the return value is always either nil or a string with= valid encoding.
 
Actually, the binary/ASCII-8bit encoding is very close to a = Blob. It was
mostly Akira Tanaka who didn't want to d= istinguish between =22true=22 binary
and ASCII-8bit, because t= hat would have made the use of regular
expressions with binary= impossible or convoluted.

=
My problem with String and ASCII-8BIT/BINARY encoding currentl= y is that you *can't* just set a string's encoding to binary and forget a= bout encodings. You will still run into issues working with binary data u= sing Ruby 1.9 strings. I demonstrated the issue here: http:= //blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/40269 (wh= ere I, consequently, also made a plea for a Data/Blob type).
&n= bsp;
Despite the titl= e of this issue, I didn't see any *bit*wise operations
(e.g. bitwise and/or/xor/not) proposed. Were you just taking them for <= /div>
granted=3F What about adding these to String, maybe limiting th= em to
binary/ASCII-8bit=3F

I was taking bit-wise operations for granted. Ideal= ly, a Data/Blob type would just represent N groupings of 8 1s and/or 0s, = with byte-wise access and bit-wise manipulation. i.e. Less structured tha= n an Array, less restrictive than a String. Just data.

--4fa2a7e6_38a5d054_6574--