[#11890] Ruby and Solaris door library — "Hiro Asari" <asari.ruby@...>

Hi, there. This is my first patch against ruby. I think I followed

19 messages 2007/08/13
[#11892] Re: Ruby and Solaris door library — Daniel Berger <djberg96@...> 2007/08/14

Hiro Asari wrote:

[#11899] pack/unpack 64bit Integers — Hadmut Danisch <hadmut@...>

Hi,

13 messages 2007/08/14
[#11903] Re: pack/unpack 64bit Integers — Brian Candler <B.Candler@...> 2007/08/15

On Wed, Aug 15, 2007 at 06:50:01AM +0900, Hadmut Danisch wrote:

[#11948] Fibers in Ruby 1.9? — David Flanagan <david@...>

I just noticed that my ruby1.9 build of August 17th includes a Fiber

22 messages 2007/08/22
[#11949] Re: Fibers in Ruby 1.9? — Daniel Berger <djberg96@...> 2007/08/22

David Flanagan wrote:

[#11950] Re: Fibers in Ruby 1.9? — "Francis Cianfrocca" <garbagecat10@...> 2007/08/22

On 8/22/07, Daniel Berger <djberg96@gmail.com> wrote:

[#11952] Re: Fibers in Ruby 1.9? — MenTaLguY <mental@...> 2007/08/22

On Wed, 22 Aug 2007 20:50:12 +0900, "Francis Cianfrocca" <garbagecat10@gmail.com> wrote:

[#11988] String#length not working properly in Ruby 1.9 — "Vincent Isambart" <vincent.isambart@...>

I saw that Matz just merged his M17N implementation in the trunk.

17 messages 2007/08/25
[#11991] Re: String#length not working properly in Ruby 1.9 — "Michael Neumann" <mneumann@...> 2007/08/25

On Sat, 25 Aug 2007 10:54:20 +0200, Yukihiro Matsumoto

[#11992] Re: String#length not working properly in Ruby 1.9 — Yukihiro Matsumoto <matz@...> 2007/08/25

Hi,

[#12042] Encodings of string literals; explicit codepoint escapes? — David Flanagan <david@...>

This message contains queries that probably only Matz can answer:

16 messages 2007/08/31
[#12043] Re: Encodings of string literals; explicit codepoint escapes? — Yukihiro Matsumoto <matz@...> 2007/08/31

Hi,

Re: Encodings of string literals; explicit codepoint escapes?

From: Yukihiro Matsumoto <matz@...>
Date: 2007-08-31 07:39:10 UTC
List: ruby-core #12043
Hi,

In message "Re: Encodings of string literals; explicit codepoint escapes?"
    on Fri, 31 Aug 2007 15:52:51 +0900, David Flanagan <david@davidflanagan.com> writes:

|I'm excited to see that strings have encodings now!  Thank you for your 
|Unicode support!  I have a few questions:
|
|1) I gather that string literals are given the encoding specified by the 
|-K option or by the encoding comment at the top of the file.  Do you 
|plan any changes to the string literal syntax so that encodings can be 
|specified for individual literals?  Will I be able to include a utf-8 
|encoding string literal within a file that is otherwise in ASCII? I 
|don't like Python's u"" syntax, but I'm hoping that you'll provide some 
|more elegant alternative.

We will provide "binary" string literals probably via b"" or ""b (not
fixed yet).  If you want to have string encoded in utf-8 in ASCII
coded script, you can have utf-8 binary string in binary then specify
utf-8 later, e.g.

  # my last name in Japanese
  m = b"\343\201\276\343\201\244\343\202\202\343\201\250"
  m.encoding="utf8"

or possible alternative in the distant future may be:

  m = "\343\201\276\343\201\244\343\202\202\343\201\250".utf8
  m = "\343\201\276\343\201\244\343\202\202\343\201\250"u
  m = "\343\201\276\343\201\244\343\202\202\343\201\250"e:utf8

|2) This is really part of the same question: will you extend the string 
|literal syntax to allow the inclusion of arbitrary codepoints in 
|ASCII-encoded files using some kind of character escape?  I'm accustomed 
|to Java's \uxxxx escape sequence and would like to see something like 
|this.  (I don't know enough about SJIS and EUC to know if that would be 
|relevant to those encodings or not.)
|
|Despite my relative ignorance, I suggest something along these lines:
|
|\uxxxx: represents Unicode codepoint U+xxxx
|\Uxxxxxx: represents Unicode codepoint U+xxxxxx
|\Exxxx: represents EUC codepoint xxxx
|\Sxxxx: repersents SJIS codepoint xxxx
|
|xxxx: is a string of four hex digits.

We just had a meeting to discuss about issues like this yesterday.
And the end result was

  \xXX         -> single byte
  \uXXXX       -> single Unicode character by codepoint (BMP)
  \u{XXXXXXXX} -> single Unicode character up to 4 bytes
  \N{name}     -> single character by name

But you need to require additional library to get:

  * characters from Unicode name
  * Unicode character embedded in non-Unicode encoding strings

|If a string literal ends with \u, \U, \E, or \S (with no hex digits 
|following) then the escape specifies the encoding of the string, even 
|when the string does not contain any characters outside of the ASCII subset.

This is an interesting idea.  We haven't made a way to specify
encoding of literals yet.  This might be an input for inspiration.

							matz.

In This Thread