[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>

Hi all,

81 messages 2008/09/02
[#18667] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/17

Hi,

[#18847] Re: [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...> 2008/09/24

Hi, Yusuke

[#18848] Re: [ANN] Ruby 1.9.1 feature freeze — "Yusuke ENDOH" <mame@...> 2008/09/24

Hi,

[#18886] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/25

[#18889] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Ryan Davis wrote:

[#18906] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/25

[#18908] Re: [ANN] Ruby 1.9.1 feature freeze — SASADA Koichi <ko1@...> 2008/09/25

Dave Thomas wrote:

[#19032] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19036] Re: [ANN] Ruby 1.9.1 feature freeze — Jim Weirich <jim.weirich@...> 2008/09/30

[#19039] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/09/30

[#19042] Re: [ANN] Ruby 1.9.1 feature freeze — Dave Thomas <dave@...> 2008/09/30

[#19195] Re: [ANN] Ruby 1.9.1 feature freeze — Ryan Davis <ryand-ruby@...> 2008/10/08

[#19202] Re: [ANN] Ruby 1.9.1 feature freeze — "Austin Ziegler" <halostatue@...> 2008/10/08

On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

[#19203] Re: [ANN] Ruby 1.9.1 feature freeze — Paul Brannan <pbrannan@...> 2008/10/08

On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:

[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>

Would it be possible to have a few patches applied before freeze [if

27 messages 2008/09/04
[#18471] Re: [ANN] Ruby 1.9.1 feature freeze — Yukihiro Matsumoto <matz@...> 2008/09/06

Hi,

[#18490] Re: [ANN] Ruby 1.9.1 feature freeze — Nobuyoshi Nakada <nobu@...> 2008/09/08

Hi,

[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>

Firstly, I apologise if I am going over old ground here - I haven't been

39 messages 2008/09/08
[#18492] Re: Ruby 1.9 strings & character encoding — Yukihiro Matsumoto <matz@...> 2008/09/08

Hi,

[#18494] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/08

On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto

[#18499] Re: Ruby 1.9 strings & character encoding — "NARUSE, Yui" <naruse@...> 2008/09/08

Hi,

[#18500] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/08

On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:

[#18515] Re: Ruby 1.9 strings & character encoding — Urabe Shyouhei <shyouhei@...> 2008/09/09

# First off, I'm neutral to this issue

[#18530] Re: Ruby 1.9 strings & character encoding — Tim Bray <Tim.Bray@...> 2008/09/10

On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:

[#18533] Re: Ruby 1.9 strings & character encoding — Tanaka Akira <akr@...> 2008/09/10

In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,

[#18504] Re: Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...> 2008/09/09

On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:

[#18572] Working on CSV's Encoding Support — James Gray <james@...>

I'm trying to get the standard CSV library ready for m17n in Ruby

23 messages 2008/09/13
[#18575] Re: Working on CSV's Encoding Support — James Gray <james@...> 2008/09/14

On Sep 13, 2008, at 5:39 PM, James Gray wrote:

[#18576] Re: Working on CSV's Encoding Support — "Michael Selig" <michael.selig@...> 2008/09/14

On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>

[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>

Hi,

89 messages 2008/09/17
[#18643] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:

[#18647] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/17

On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>

[#18658] Re: Character encodings - a radical suggestion — James Gray <james@...> 2008/09/17

On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:

[#18660] Re: Character encodings - a radical suggestion — "NARUSE, Yui" <naruse@...> 2008/09/17

Hi,

[#18663] Re: Character encodings - a radical suggestion — Matthias Wächter <matthias@...> 2008/09/17

On 9/17/2008 3:39 PM, NARUSE, Yui wrote:

[#18666] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18728] Re: Character encodings - a radical suggestion — Martin Duerst <duerst@...> 2008/09/19

At 00:01 08/09/18, Yukihiro Matsumoto wrote:

[#18729] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Hi,

[#18732] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/19

On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto

[#18734] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/19

Oops, I misfired my mail reader; the following is the right one:

[#18751] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/20

On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto

[#18761] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/20

Hi,

[#18774] Re: Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...> 2008/09/21

On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto

[#18776] Re: Character encodings - a less radical suggestion — Martin Duerst <duerst@...> 2008/09/22

Hello Michael,

[#18664] Re: Character encodings - a radical suggestion — Yukihiro Matsumoto <matz@...> 2008/09/17

Hi,

[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>

Feature #578: add method to disassemble Proc objects

17 messages 2008/09/20

[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>

Hello,

14 messages 2008/09/24

[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>

Hi,

27 messages 2008/09/25

[#18937] A stupid question... — Dave Thomas <dave@...>

Just what was wrong with Test::Unit? Sure, it was slightly bloated.

25 messages 2008/09/25
[#18941] Re: A stupid question... — "Berger, Daniel" <Daniel.Berger@...> 2008/09/25

> -----Original Message-----

[#19004] Let Ruby be Ruby — Trans <transfire@...> 2008/09/28

[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>

Hi,

14 messages 2008/09/27

[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>

Hi!

13 messages 2008/09/30

[ruby-core:18580] Re: Working on CSV's Encoding Support

From: James Gray <james@...>
Date: 2008-09-14 16:52:41 UTC
List: ruby-core #18580
On Sep 13, 2008, at 5:44 PM, Gregory Brown wrote:

> On Sat, Sep 13, 2008 at 6:32 PM, James Gray  
> <james@grayproductions.net> wrote:
>> * Have the parser always work in ASCII-8BIT.  I imagine this could  
>> work for
>> some cases, but I assume it would do bad things to something like  
>> UTF-16.
>
> Not a good option.  Without knowing the encoding, you're really
> talking about ASCII-7bit being the only common point between the most
> popular encodings, as things like Latin1 display accents in one byte
> wheras UTF-8 uses two.
>
> But I suppose if the parser is only looking for things like commas and
> quotes and things, this might be possible.

Yeah, that's kind of the point of ASCII-8BIT, as I understand it.   
ASCII is treated a ASCII (the common ground you mention), and  
everything else is just bytes.

> Would this mean that smart quotes in UTF-8 would blow up the parser?

Well, they wouldn't match normal quotes, if that's you question:

$ cat smart_quotes.rb
#!/usr/bin/env ruby -w
# encoding: UTF-8

puts %Q{笛ames媒 =~ /\A"/ ? "Match" : "No Match"
$ ruby_dev smart_quotes.rb
No Match

I assume their are values of UTF-8 where it would fail though.  For  
example, if the last byte of a multibyte character looks like a quote  
or comma.

That's why UTF-16 makes a great edge case for testing this stuff.   
Every other byte looks like normal ASCII, so the parser would latch on  
to the wrong things.  It looks like Ruby knows this though, and just  
disallows it:

$ cat utf16_match.rb
#!/usr/bin/env ruby -w
# encoding: UTF-8

data = "abc,xyz"
data.encode!("UTF-16BE")
binary = "[^,]+"
binary.force_encoding("ASCII-8BIT")
re = Regexp.new(binary)

p data.encoding
p binary.encoding
p re.encoding

p data.scan(re)
$ ruby_dev utf16_match.rb
#<Encoding:UTF-16BE>
#<Encoding:ASCII-8BIT>
#<Encoding:US-ASCII>
utf16_match.rb:14:in `scan': incompatible encoding regexp match (US- 
ASCII regexp with UTF-16BE string) (ArgumentError)
	from utf16_match.rb:14:in `<main>'

Ruby is "safely downgrading" my Regexp to US-ASCII there as you can  
see.  This is what Michael Selig explained in his message.  I haven't  
been able to figure out a way to force it to ASCII-8BIT.

> Please let me know when you start working on this, I'd be happy to
> help test / debug / patch.

I started about a week ago.  My hope is that I'm almost done now, as  
soon as I figure out the right way to resolve these issues.

I'll definitely encourage everyone to try it out when I'm finished  
though.

James Edward Gray II

In This Thread