[#11890] Ruby and Solaris door library — "Hiro Asari" <asari.ruby@...>

Hi, there. This is my first patch against ruby. I think I followed

19 messages 2007/08/13
[#11892] Re: Ruby and Solaris door library — Daniel Berger <djberg96@...> 2007/08/14

Hiro Asari wrote:

[#11899] pack/unpack 64bit Integers — Hadmut Danisch <hadmut@...>

Hi,

13 messages 2007/08/14
[#11903] Re: pack/unpack 64bit Integers — Brian Candler <B.Candler@...> 2007/08/15

On Wed, Aug 15, 2007 at 06:50:01AM +0900, Hadmut Danisch wrote:

[#11948] Fibers in Ruby 1.9? — David Flanagan <david@...>

I just noticed that my ruby1.9 build of August 17th includes a Fiber

22 messages 2007/08/22
[#11949] Re: Fibers in Ruby 1.9? — Daniel Berger <djberg96@...> 2007/08/22

David Flanagan wrote:

[#11950] Re: Fibers in Ruby 1.9? — "Francis Cianfrocca" <garbagecat10@...> 2007/08/22

On 8/22/07, Daniel Berger <djberg96@gmail.com> wrote:

[#11952] Re: Fibers in Ruby 1.9? — MenTaLguY <mental@...> 2007/08/22

On Wed, 22 Aug 2007 20:50:12 +0900, "Francis Cianfrocca" <garbagecat10@gmail.com> wrote:

[#11988] String#length not working properly in Ruby 1.9 — "Vincent Isambart" <vincent.isambart@...>

I saw that Matz just merged his M17N implementation in the trunk.

17 messages 2007/08/25
[#11991] Re: String#length not working properly in Ruby 1.9 — "Michael Neumann" <mneumann@...> 2007/08/25

On Sat, 25 Aug 2007 10:54:20 +0200, Yukihiro Matsumoto

[#11992] Re: String#length not working properly in Ruby 1.9 — Yukihiro Matsumoto <matz@...> 2007/08/25

Hi,

[#12042] Encodings of string literals; explicit codepoint escapes? — David Flanagan <david@...>

This message contains queries that probably only Matz can answer:

16 messages 2007/08/31
[#12043] Re: Encodings of string literals; explicit codepoint escapes? — Yukihiro Matsumoto <matz@...> 2007/08/31

Hi,

Re: Encodings of string literals; explicit codepoint escapes?

From: David Flanagan <david@...>
Date: 2007-08-31 17:23:04 UTC
List: ruby-core #12059
For what it's worth, I don't think that the \N{name} escape is 
necessary, even in the standard library.  Unicode names are so long 
(like "ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM") that I think this 
notation would be terribly cumbersome.

Also, \N escapes could easily be approximated (at runtime, instead of 
compile time) with #{} interpolation.  Define a class UN (for Unicode 
Name) and give it a const_missing method to look up (and define 
permanently) codepoints by name.  Then you can write strings like 
"#{UN.COPYRIGHT_SIGN} 2007".  Which isn't much longer or harder than 
"\N{COPYRIGHT SIGN} 2007"

I've done something similar (for codepoints instead of names) with 
const_missing here: http://www.davidflanagan.com/blog/2007_08.html#000136

	David

Yukihiro Matsumoto wrote:
> Hi,
> 
> In message "Re: Encodings of string literals; explicit codepoint escapes?"
>     on Fri, 31 Aug 2007 15:52:51 +0900, David Flanagan <david@davidflanagan.com> writes:
> 
> |I'm excited to see that strings have encodings now!  Thank you for your 
> |Unicode support!  I have a few questions:
> |
> |1) I gather that string literals are given the encoding specified by the 
> |-K option or by the encoding comment at the top of the file.  Do you 
> |plan any changes to the string literal syntax so that encodings can be 
> |specified for individual literals?  Will I be able to include a utf-8 
> |encoding string literal within a file that is otherwise in ASCII? I 
> |don't like Python's u"" syntax, but I'm hoping that you'll provide some 
> |more elegant alternative.
> 
> We will provide "binary" string literals probably via b"" or ""b (not
> fixed yet).  If you want to have string encoded in utf-8 in ASCII
> coded script, you can have utf-8 binary string in binary then specify
> utf-8 later, e.g.
> 
>   # my last name in Japanese
>   m = b"\343\201\276\343\201\244\343\202\202\343\201\250"
>   m.encoding="utf8"
> 
> or possible alternative in the distant future may be:
> 
>   m = "\343\201\276\343\201\244\343\202\202\343\201\250".utf8
>   m = "\343\201\276\343\201\244\343\202\202\343\201\250"u
>   m = "\343\201\276\343\201\244\343\202\202\343\201\250"e:utf8
> 
> |2) This is really part of the same question: will you extend the string 
> |literal syntax to allow the inclusion of arbitrary codepoints in 
> |ASCII-encoded files using some kind of character escape?  I'm accustomed 
> |to Java's \uxxxx escape sequence and would like to see something like 
> |this.  (I don't know enough about SJIS and EUC to know if that would be 
> |relevant to those encodings or not.)
> |
> |Despite my relative ignorance, I suggest something along these lines:
> |
> |\uxxxx: represents Unicode codepoint U+xxxx
> |\Uxxxxxx: represents Unicode codepoint U+xxxxxx
> |\Exxxx: represents EUC codepoint xxxx
> |\Sxxxx: repersents SJIS codepoint xxxx
> |
> |xxxx: is a string of four hex digits.
> 
> We just had a meeting to discuss about issues like this yesterday.
> And the end result was
> 
>   \xXX         -> single byte
>   \uXXXX       -> single Unicode character by codepoint (BMP)
>   \u{XXXXXXXX} -> single Unicode character up to 4 bytes
>   \N{name}     -> single character by name
> 
> But you need to require additional library to get:
> 
>   * characters from Unicode name
>   * Unicode character embedded in non-Unicode encoding strings
> 
> |If a string literal ends with \u, \U, \E, or \S (with no hex digits 
> |following) then the escape specifies the encoding of the string, even 
> |when the string does not contain any characters outside of the ASCII subset.
> 
> This is an interesting idea.  We haven't made a way to specify
> encoding of literals yet.  This might be an input for inspiration.
> 
> 							matz.
> 


In This Thread