[#25897] Mail archive searching? — "Martin J. Dürst" <duerst@...>
Why does ruby-dev's official archive
[#25928] Ruby 1.8.6-p383 hangs in dln_load on Snow Leopard — Timothy Hunter <cyclists@...>
An RMagick user reports that Ruby 1.8.6 hangs when requiring RMagick.
On Oct 3, 2009, at 4:26 PM, Timothy Hunter wrote:
On Oct 3, 10:26=A0pm, Timothy Hunter <cycli...@nc.rr.com> wrote:
[#25936] [Bug:1.9] [rubygems] $LOAD_PATH includes bin directory — Nobuyoshi Nakada <nobu@...>
Hi,
On Sun, Oct 4, 2009 at 11:47 PM, Nobuyoshi Nakada <nobu@ruby-lang.org> wrot=
[#25943] Disabling tainting — Tony Arcieri <tony@...>
Would it make sense to have a flag passed to the interpreter on startup that
Tony Arcieri wrote:
2009/10/6 Tony Arcieri <tony@medioh.com>:
On Tue, Oct 6, 2009 at 3:52 AM, Yugui <yugui@yugui.jp> wrote:
[#25964] mis filed bug reports — Roger Pack <rogerdpack2@...>
If i accidentally file a bug under 1.9 that belongs in 1.8, I assume I
[#25965] [Bug #2180] request: add *Method#source_location to 1.8.x — Roger Pack <redmine@...>
Bug #2180: request: add *Method#source_location to 1.8.x
[#25969] [Bug #2181] Segmentation fault for test/drb/* -- possible bug in Marshal/GC — Nikolai Lugovoi <redmine@...>
Bug #2181: Segmentation fault for test/drb/* -- possible bug in Marshal/GC
[#26012] Segfaults after multiple call of ruby_node_run — Christoph Kappel <unexist@...>
[#26028] [Bug #2189] Math.atanh(1) & Math.atanh(-1) should not raise an error — Marc-Andre Lafortune <redmine@...>
Bug #2189: Math.atanh(1) & Math.atanh(-1) should not raise an error
[#26070] [Bug #2201] Process.spawn fails in 1.9.1 — Roger Pack <redmine@...>
Bug #2201: Process.spawn fails in 1.9.1
[#26087] [Bug #2212] Using a Lambda with Inappropriate Arity for Hash#default_proc= — Run Paint Run Run <redmine@...>
Bug #2212: Using a Lambda with Inappropriate Arity for Hash#default_proc=
[#26126] The fate of my keyword documentation — "David A. Black" <dblack@...>
Hi --
[#26200] [Bug #2243] Random instance variables order — Maxim Chechel <redmine@...>
Bug #2243: Random instance variables order
[#26222] [Bug #2250] IO::for_fd() objects' finalization dangerously closes underlying fds — Mike Pomraning <redmine@...>
Bug #2250: IO::for_fd() objects' finalization dangerously closes underlying fds
[#26232] [Feature #2255] unicode parameters cannot be passed to ruby — Vit Ondruch <redmine@...>
Feature #2255: unicode parameters cannot be passed to ruby
[#26237] [Bug #2256] net\ftp.rb failing on implicit cast of Pathname to string — Sai Fujinaro <redmine@...>
Bug #2256: net\ftp.rb failing on implicit cast of Pathname to string
[#26262] [Feature #2260] better access with GC_DEBUG — Roger Pack <redmine@...>
Feature #2260: better access with GC_DEBUG
[#26299] Which commit fixed Set#hash (Hash#hash, I assume) between 1.9.1 and 1.9.2? — "Shot (Piotr Szotkowski)" <shot@...>
Hello, good people of ruby-core.
[#26303] IO.foreach (and friends) effect on $< and $. — Charles Oliver Nutter <headius@...>
I have a few questions about how the line-by-line IO operations are
[#26336] [Bug #2283] Ruby 1.9.1p243 spinning with 100% CPU; perhaps rb_str_slice_bang-related — Mark Aiken <redmine@...>
Bug #2283: Ruby 1.9.1p243 spinning with 100% CPU; perhaps rb_str_slice_bang-related
[#26361] [Feature #2294] [PATCH] ruby_bind_stack() to embed Ruby in coroutine — Suraj Kurapati <redmine@...>
Feature #2294: [PATCH] ruby_bind_stack() to embed Ruby in coroutine
Issue #2294 has been updated by Anonymous Anonymous.
Hi,
Hi,
Hi,
[#26388] suggestion: gems.ruby-lang.org — Yusuke ENDOH <mame@...>
Hi --
On Wed, Oct 28, 2009 at 3:20 AM, Yusuke ENDOH <mame@tsg.ne.jp> wrote:
Hi,
On Wed, Oct 28, 2009 at 9:00 PM, Yusuke ENDOH <mame@tsg.ne.jp> wrote:
Hi,
[#26390] [Bug #2303] dl.so segfaults on mingw32 — Nikolai Weibull <redmine@...>
Bug #2303: dl.so segfaults on mingw32
[#26429] [Bug #2313] Incomplete encoding conversion? — Adam Salter <redmine@...>
Bug #2313: Incomplete encoding conversion?
[#26447] [Bug #2316] [BUG] cfp consistency error — Cezary Baginski <redmine@...>
Bug #2316: [BUG] cfp consistency error
[#26458] [Bug #2319] gethostbyname fails in windows — Roger Pack <redmine@...>
Bug #2319: gethostbyname fails in windows
[#26459] [Bug #2320] patch to trunk .document to include more readme's etc. — Roger Pack <redmine@...>
Bug #2320: patch to trunk .document to include more readme's etc.
[ruby-core:26176] Re: [Feature #2034] Consider the ICU Library for Improving and Expanding Unicode Support
Hello Perry, On 2009/10/19 1:03, Perry Smith wrote: > Issue #2034 has been updated by Perry Smith. > > > I discovered ICU and ICU4R back in 2007 and I just now moved it to > Ruby 1.9. I'm a pretty big advocate of using ICU. There is nothing > that has as many encodings as ICU to my knowledge. It is the only one > that addresses many of the EBCDIC encodings (of which there 147 some > odd of them). It's no surprise that ICU is strong on EBCDIC. ICU started at IBM, and IBM still contributes a lot :-). [If IBM contributed on Ruby, Ruby may also be stronger on EBCDIC.] > The reason I came to use ICU is the application I'm working on needs > to translate EBCDIC encoded Japanese characters to something a browser > can use such as utf-8. ICU is the only portable library that I found > and it is also the only library that had the encodings that I needed. Can you tell me what encodings exactly you need? And which of them are table based? (see also Yui's message) We can definitely have a look at them. One big problem with ICU is that it is UTF-16-based, whereas Ruby (mainly) uses UTF-8 for Unicode. But fortunately, there are exceptions. I learned just last week at the Internationalization and Unicode conference that there is now a purely UTF-8 based sorting routine in ICU. I think it may make sense for Ruby to try and extract it. > I'm assuming a few things here. One is that this: > > http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html > > is accurate for the most part. In particular, this paper seems to say > that there is choice between a UCS model and an CSI model and Ruby 1.9 > has choosen CSI. From my perspective, a CSI model should be an > envelope around a UCS model. Can you explain what you mean by 'envelope around UCS model'? The way I understand your "envelope around UCS model" is that it's easy to use an UCS model inside Ruby CSI; the main thing you have to do is to use the -U option. But maybe you meant something different? > I believe the implementors of a UCS model fall back and say that if > the application is going to compare strings they must be in a common > encoding -- Ruby agrees with this point. And, they also would argue > that if you want to translate "aaa" into B, it is simply more > practical to go to a common encoding C first. Then you have only 2N > encoders instead of N^2 encoders. To me, that argument is very > sound. If plausible, I would allow specific A to B translators to be > plugged in. Ruby allows this. It's actually used e.g. for Shift_JIS <-> EUC-JP translation. The reason to use it is that it allows to transcode "gaiji", at least to a certain extent. > The key place where I believe Ruby's choice of a CSI model wins is the > fact that there are a lot of places that data can be used and > manipulated without translation. Keeping and using the CSI model in > all those places is a clear win. In all those places, the data is > opaque; it is not interpreted or understood by the application. > > Opaque data can be compared for equality as Ruby appears to be doing > now -- the two strings must have the same encoding and byte for byte > compare as equal. > > Technically, opaque data can be concatenated and spliced as well. > This is one place that Ruby's 1.9 implementation surprised me a bit. Yes, you can take the CSI model further and further. But you will always bump into problems where encodings do not match sooner or later. (btw, in Ruby, you can concatenate as long as the data is in an ASCII-compatible encoding and is ASCII-only.) > It could be that "aaa" + "bbb" yields String that is a list of > SubStrings. I'll write as x = [ "aaa", "bbb" ]. On the file level, this would be similar to having a file with internal change of character encoding. At the very, very early stages of Web internationalization, some people proposed such a model, but the Web went a different way. And so went most if not all text editors, you can't have a file with many different encodings at the same time. Sure file encodings and internal encodings work a bit differently, but it's not a disadvantage if those two models match. > The places where the actual characters are "understood" by an > application is for sorting (collation) and if, for some external > reason, they need to be translated to a particular encoding. There's lots more cases. In particular regular expressions. Even with Ruby's current model, it took a long time to smooth the edges. > Sorting not only depends upon the encoding but also the language. Yes, but please note that sorting depends on encoding in completely different ways than on language. For language, what counts is not the language of the text being sorted, but the language of the user. Let's say you have two words, a Swedish one (旦vers辰tter, to translate), and a German one (旦ffnen, to open). Swedish sorts '旦' after 'z', German sorts '旦' with 'o', taking the difference between the two just as a secondary difference (i.e. to order words with 'o' and '旦', first look at the rest of the word, and only if the rest of the word is identical, then order the word with '旦' after the word with 'o'). So some people argue that in an alphabetical list, the two words above should be ordered (with some others thrown in) as follows: abstract nominal 旦ffnen (German, so goes into the 'o' section) often substring xylophone zebra 旦vers辰tter (Swedish, so goes after 'z') But this is wrong. There should be (at least) two sort orders for the above data, one for Swedish and the other for German: Swedish sort order: abstract nominal often substring xylophone zebra 旦ffnen (all '旦's go after 'z') 旦vers辰tter German sort order: abstract nominal 旦ffnen (all '旦's go with 'o') often 旦vers辰tter substring xylophone zebra So there is no need for sorting to know the language of the data. > Sorting could be done with routines specific to an encoding plus > language but I believe that is impractical to implement. Utopia would > be the ability to plug (and grow) sort routines that would be specific > to the encoding and language with a fall back going to a sort routine > tailored for the language and a common encoding such as UTF-16 and if > the language was not known (or implemented), fall back to sorting > based upon just the encoding, and if that was not available, fall back > to a sort based upon a common encoding. If you think this is necessary, please start implementing. In my opinion, it will take you a lot of time, with very little advantage over a single-encoding sorting implementation. > As has been pointed out already, the String#to_i routine needs to be > encoding savvy. There are probably a few more methods that need to be > encoding savvy. Lots of places can be made more encoding-savy. But overall, I think concentrating on getting more functionality for UTF-8 strings, and transcoding to UTF-8 for heavy functionality, is the way to go. Regards, Martin. -- #-# Martin J. D端rst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp