[#59462] [ruby-trunk - Bug #9342][Open] [PATCH] SizedQueue#clear does not notify waiting threads in Ruby 1.9.3 — "jsc (Justin Collins)" <redmine@...>

9 messages 2014/01/02

[#59466] [ruby-trunk - Bug #9343][Open] [PATCH] SizedQueue#max= wakes up waiters properly — "normalperson (Eric Wong)" <normalperson@...>

11 messages 2014/01/02

[#59498] [ruby-trunk - Bug #9352][Open] [BUG] rb_sys_fail_str(connect(2) for [fe80::1%lo0]:3000) - errno == 0 — "kain (Claudio Poli)" <claudio@...>

10 messages 2014/01/03

[#59516] [ruby-trunk - Bug #9356][Open] TCPSocket.new does not seem to handle INTR — "charliesome (Charlie Somerville)" <charliesome@...>

48 messages 2014/01/03

[#59538] [ruby-trunk - Feature #9362][Assigned] Minimize cache misshit to gain optimal speed — "shyouhei (Shyouhei Urabe)" <shyouhei@...>

33 messages 2014/01/03
[#59582] Re: [ruby-trunk - Feature #9362][Assigned] Minimize cache misshit to gain optimal speed — SASADA Koichi <ko1@...> 2014/01/06

Intersting challenge.

[#59541] Re: [ruby-trunk - Feature #9362][Assigned] Minimize cache misshit to gain optimal speed — Eric Wong <normalperson@...> 2014/01/04

Hi, I noticed a trivial typo in array.c, and it fails building struct.c

[#59583] [ruby-trunk - Bug #9367][Open] REXML::XmlDecl doesn't use user specified quotes — "bearmini (Takashi Oguma)" <bear.mini@...>

12 messages 2014/01/06

[#59642] [ruby-trunk - Bug #9384][Open] Segfault in ruby 2.1.0p0 — "cbliard (Christophe Bliard)" <christophe.bliard@...>

11 messages 2014/01/08

[#59791] About unmarshallable DRb objects life-time — Rodrigo Rosenfeld Rosas <rr.rosas@...>

A while ago I created a proof-of-concept that I intended to use in my

16 messages 2014/01/15
[#59794] Re: About unmarshallable DRb objects life-time — Eric Hodel <drbrain@...7.net> 2014/01/15

On 15 Jan 2014, at 11:58, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:

[#59808] Re: About unmarshallable DRb objects life-time — Rodrigo Rosenfeld Rosas <rr.rosas@...> 2014/01/16

Em 15-01-2014 19:42, Eric Hodel escreveu:

[#59810] Re: About unmarshallable DRb objects life-time — Eric Hodel <drbrain@...7.net> 2014/01/16

On 16 Jan 2014, at 02:15, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:

[#59826] Re: About unmarshallable DRb objects life-time — Rodrigo Rosenfeld Rosas <rr.rosas@...> 2014/01/17

Em 16-01-2014 19:43, Eric Hodel escreveu:

[#59832] Re: About unmarshallable DRb objects life-time — Eric Hodel <drbrain@...7.net> 2014/01/17

On 17 Jan 2014, at 04:22, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:

[ruby-core:60365] Re: Clarification on the behaviour of String#scrub

From: "NARUSE, Yui" <naruse@...>
Date: 2014-01-31 02:26:45 UTC
List: ruby-core #60365
Hi,

> * In what cases are certain replacement values used when no custom one
>   is given?

Current CRuby uses:
Unicode family: U+FFFD
others: ?

> * How exactly are groups of invalid sequences determined and replaced?
>   It seems that in some cases two invalid characters are replaced
>   separately whereas in other cases they are replaced as a group.

It follows Unicode spec (5.22 Best Practice for U+FFFD Substitution)
http://www.unicode.org/versions/Unicode6.2.0/ch05.pdf
The practice says "The maximal subpart should be replaced".

> * When exactly would Encoding::CompatibilityError be raised? When both
>  the input String and replacement are in non matching encodings?

Following logic.

if the replacement string is broken
  raise ArgumentError
else if the coderange of the replacement is 7bit
  if the input is not ASCII compatible
    raise Encoding::CompatibilityError.
  end
else
  if the encoding of the input and the encoding of the replacement is different
    raise Encoding::CompatibilityError.
  end
end

Thanks,


2014-01-24 Yorick Peterse <yorickpeterse@gmail.com>:
> I am currently working on porting String#scrub and String#scrub! to
> Rubinius (https://github.com/rubinius/rubinius/issues/2901). Looking at
> the source code of this method in MRI
> (https://github.com/ruby/ruby/blob/trunk/string.c#L8022) and the
> corresponding tests there are several different paths the code takes.
> For example, if I'm reading it correctly it will use different
> replacement values depending on the input encoding.
>
> Since my C knowledge and the understanding of the MRI internals is
> limited I'd like to request some clarification on the behaviour of these
> methods. In particular, I'd like to know the following:
>
> * In what cases are certain replacement values used when no custom one
>   is given?
>
> * How exactly are groups of invalid sequences determined and replaced?
>   It seems that in some cases two invalid characters are replaced
>   separately whereas in other cases they are replaced as a group.
>
> * When exactly would Encoding::CompatibilityError be raised? When both
>   the input String and replacement are in non matching encodings?
>
> To clarify the second item, consider the following snippet:
>
>     "\xE3\x80".scrub('-') # => "-"
>
> Here the two sequences get replaced as a group, resulting in only one
> instance of "-". However, in the following snippet they are replaced
> separately:
>
>     "\x80\x80".scrub('-') # => "--"
>
> Maybe I'm not fully understanding Unicode but it would be nice if this
> behaviour was documented somewhere as right now it's not clear whether
> this is intentional or a bug.
>
> The closest thing to a spec of the behaviour I could find is
> https://bugs.ruby-lang.org/issues/6752 but most of this is in Japanese,
> a language I sadly can't read.
>
> Thanks for the info!



-- 
NARUSE, Yui  <naruse@airemix.jp>

In This Thread

Prev Next