[#4065] Surprise in Time#sec — Steven Jenkins <steven.jenkins@...>
This bit me:
[#4067] Segfault in Thread#initialize / caller — Florian Gro<florgro@...>
Moin!
[#4076] Ruby/DL — Jamis Buck <jamis_buck@...>
I recently used Ruby/DL to create bindings to the SQLite3 embedded
On Tue, Jan 04, 2005 at 02:53:49AM +0900, Jamis Buck wrote:
>>>>> "P" == Paul Brannan <pbrannan@atdesk.com> writes:
On Wed, Jan 05, 2005 at 03:05:48AM +0900, ts wrote:
>>>>> "P" == Paul Brannan <pbrannan@atdesk.com> writes:
On Thu, Jan 06, 2005 at 01:10:34AM +0900, ts wrote:
>>>>> "P" == Paul Brannan <pbrannan@atdesk.com> writes:
On Thu, Jan 06, 2005 at 06:57:57PM +0900, ts wrote:
>>>>> "P" == Paul Brannan <pbrannan@atdesk.com> writes:
On Fri, Jan 07, 2005 at 12:06:16AM +0900, ts wrote:
>>>>> "P" == Paul Brannan <pbrannan@atdesk.com> writes:
ts wrote:
[#4116] Test::Unit::Collector::Dir won't work with code that modifies $LOAD_PATH — Eric Hodel <drbrain@...7.net>
Any test code that depends upon modifications of $: fails when used
Hi,
On 11 Jan 2005, at 04:14, nobu.nokada@softhome.net wrote:
On 11 Jan 2005, at 09:39, Eric Hodel wrote:
On Sat, 15 Jan 2005 04:06:10 +0900, Eric Hodel <drbrain@segment7.net> wrote:
On Fri, 14 Jan 2005 23:48:58 -0500, Nathaniel Talbott
On Thu, 27 Jan 2005 17:17:14 -0500, Nathaniel Talbott
[#4146] The face of Unicode support in the future — Charles O Nutter <headius@...>
Hello Rubyists!
Hi,
Yukihiro Matsumoto <matz@ruby-lang.org> writes:
Paul Brannan <pbrannan@atdesk.com> writes:
Hi,
On Mon, Jan 10, 2005 at 11:53:48PM +0900, Yukihiro Matsumoto wrote:
Hi,
Yukihiro Matsumoto wrote:
Hi,
On Wed, Jan 12, 2005 at 02:13:35PM +0900, Yukihiro Matsumoto wrote:
Hi,
[#4189] Authenticated proxy support for open-uri — Neil Kohl <nakohl@...>
Hello!
[#4232] Carriage return on shebang — Florian Gro<florgro@...>
Moin.
[#4242] tracer.rb: Do not list pseudo source lines of binary extensions — Florian Gro<florgro@...>
Moin.
[#4243] Patch that enables https in open-uri.rb — Michael Neumann <mneumann@...>
Hi,
In article <41E93F42.9090705@ntecs.de>,
Tanaka Akira wrote:
[#4269] Re: The face of Unicode support in the future — Wes Nakamura <wknaka@...>
Hi,
Hi,
Yukihiro Matsumoto wrote:
Hi,
[#4296] parse_c.rb: allow whitespace after function names — Tilman Sauerbeck <tilman@...>
Hi,
Hi,
Yukihiro Matsumoto <matz@ruby-lang.org> [2005-01-21 17:43]:
[#4311] RFE: Enumerable#group_by, Array#^ — Florian Gro<florgro@...>
Moin.
[#4323] test/unit doesn't rescue a Exception — Tanaka Akira <akr@...17n.org>
test/unit doesn't rescue a Exception in a test method, as follows.
In article <87is5jb46q.fsf@serein.a02.aist.go.jp>,
On 9/1/06, Tanaka Akira <akr@fsij.org> wrote:
On Sep 2, 2006, at 6:34 PM, Nathaniel Talbott wrote:
In article <A604C0B3-95ED-4B9B-866C-79A2C7D5E3C4@segment7.net>,
On Sep 2, 2006, at 9:39 PM, Tanaka Akira wrote:
In article <622DAC7E-55DB-4854-B82B-A037CE9C75EF@segment7.net>,
In article <87ac5hv4bo.fsf@fsij.org>,
On Sep 3, 2006, at 8:21 AM, Tanaka Akira wrote:
[#4332] IO#clearerr missing in action — Eric Hodel <drbrain@...7.net>
I wanted to implement tail(1) in ruby cleanly, but found the best I
[#4335] When will Object#type disappear? — "David A. Black" <dblack@...>
Hi --
Re: The face of Unicode support in the future
On Tue, 18 Jan 2005, Yukihiro Matsumoto wrote:
| In message "Re: The face of Unicode support in the future"
| on Tue, 18 Jan 2005 12:08:34 +0900, Wes Nakamura <wknaka@pobox.com> writes:
|
| |Is there opposition to a separate unicode string class, that would
| |coexist with the current byte-based string class? I find a fixed-width
| |unicode-based string type to be much easier to deal with rather
| |than individual encodings. With the byte-based system you would have to
| |worry about the language of the text in each string, and check
| |encodings before doing something like a string compare.
|
| That's true in C strings (char* or wchar_t*), which you have to
| allocate by yourself, and handle then character-wise, but not for
| strings in Ruby with much higher abstraction in API. The lower level
| processing like allocation and resizing internal buffer, etc. are
| handled automagically.
|
Will this be efficient enough? When using a non-fixed-width encoding,
String#[] won't run in constant time.
Since "How to support unicode (and other character sets)" is a problem
that's already facing the jruby developers, I have a few questions
that go into more detail:
1. This method is mentioned:
String#encoding, returns a string specifying the encoding
But I haven't seen this, is there also:
String#encoding=
I assume that setting the encoding would do nothing to the internal
representation of the string (based on char *), it would just affect
how methods that work on strings deal with characters, etc.
2. What is the default encoding for strings? What encoding would
String.new("") have #encoding set to?
3. Are literal strings assumed to be a certain encoding, (encoding of
the script?) or can you specify an encoding at the time of creation?
"string in encoding \x{xxxx} of the script file" (#encoding automatically
set to script's encoding, xxxx taken as bytes in the same
encoding)
Specifying an encoding for a literal may not work since the
literal's encoding could conflict with the script's encoding.
This wouldn't work (if there were an encoding argument) since the
bytes of the literal do not correspond with the desired utf-16 characters:
String.new("\x{30b9} in script that's not utf-16", "utf-16")
This would work:
String.new("\x{e382b9} in script that's ascii", "utf-8")
([e3 82 b9] being the utf-8 equivalent of [30 b9] in utf-16).
(also see 3b)
Maybe it's just easier to assume that all string literals are in
the encoding of the script and any \x{} sequences represent bytes
in the same encoding.
3a. If there is a way of creating literal strings in other encodings,
is there also a way of creating literal regex's in other encodings?
(You could always create them as Regex.new(string_in_some_encoding)).
3b. In \x{xxxx}, does the number have to be a 4-digit (hex) number?
How would you specify a utf-8 character, which can be more than 2 bytes?
Is the \x{} syntax basically \x{byte byte byte..}?
4. Will String#explode return an array of Fixnums, basically a byte array,
of the raw char * values?
This would mean that s.explode.size is not necessarily == s.size
5. When using String#[idx]= to set a single character, it must take as
an argument a string which has a size of 1 (i.e. one codepoint) but
internally (i.e. #explode) doesn't necessarily have a size of 1?
6. Right now there is Fixnum#chr. Will there be Array#chr(encoding) or
something similiar? So you could do something like:
[ 0x30, 0xb9 ].chr("utf-16")
7. Will strings that, when converted to the same encoding, are identical,
give different results for #intern when left in different encodings?
What happens to an interned string with a binary encoding? Is it interned
based on the internal bytes of the string rather than the characters?
Wes