[#15359] Timeout::Error — Jeremy Thurgood <jerith@...>

Good day,

41 messages 2008/02/05
[#15366] Re: Timeout::Error — Eric Hodel <drbrain@...7.net> 2008/02/06

On Feb 5, 2008, at 06:20 AM, Jeremy Thurgood wrote:

[#15370] Re: Timeout::Error — Jeremy Thurgood <jerith@...> 2008/02/06

Eric Hodel wrote:

[#15373] Re: Timeout::Error — Nobuyoshi Nakada <nobu@...> 2008/02/06

Hi,

[#15374] Re: Timeout::Error — Jeremy Thurgood <jerith@...> 2008/02/06

Nobuyoshi Nakada wrote:

[#15412] Re: Timeout::Error — Nobuyoshi Nakada <nobu@...> 2008/02/07

Hi,

[#15413] Re: Timeout::Error — Jeremy Thurgood <jerith@...> 2008/02/07

Nobuyoshi Nakada wrote:

[#15414] Re: Timeout::Error — Nobuyoshi Nakada <nobu@...> 2008/02/07

Hi,

[#15360] reopen: can't change access mode from "w+" to "w"? — Sam Ruby <rubys@...>

I ran 'rake test' on test/spec [1], using

16 messages 2008/02/05
[#15369] Re: reopen: can't change access mode from "w+" to "w"? — Nobuyoshi Nakada <nobu@...> 2008/02/06

Hi,

[#15389] STDIN encoding differs from default source file encoding — Dave Thomas <dave@...>

This seems strange:

21 messages 2008/02/06
[#15392] Re: STDIN encoding differs from default source file encoding — Yukihiro Matsumoto <matz@...> 2008/02/06

Hi,

[#15481] very bad character performance on ruby1.9 — "Eric Mahurin" <eric.mahurin@...>

I'd like to bring up the issue of how characters are represented in

16 messages 2008/02/10

[#15528] Test::Unit maintainer — Kouhei Sutou <kou@...>

Hi Nathaniel, Ryan,

22 messages 2008/02/13

[#15551] Proc#curry — ts <decoux@...>

21 messages 2008/02/14
[#15557] Re: [1.9] Proc#curry — David Flanagan <david@...> 2008/02/15

ts wrote:

[#15558] Re: [1.9] Proc#curry — Yukihiro Matsumoto <matz@...> 2008/02/15

Hi,

[#15560] Re: Proc#curry — Trans <transfire@...> 2008/02/15

[#15585] Ruby M17N meeting summary — Martin Duerst <duerst@...>

This is a rough translation of the Japanese meeting summary

19 messages 2008/02/18

[#15596] possible bug in regexp lexing — Ryan Davis <ryand-ruby@...>

current:

17 messages 2008/02/19

[#15678] Re: [ANN] MacRuby — "Rick DeNatale" <rick.denatale@...>

On 2/27/08, Laurent Sansonetti <laurent.sansonetti@gmail.com> wrote:

18 messages 2008/02/28
[#15679] Re: [ANN] MacRuby — "Laurent Sansonetti" <laurent.sansonetti@...> 2008/02/28

On Thu, Feb 28, 2008 at 6:33 AM, Rick DeNatale <rick.denatale@gmail.com> wrote:

[#15680] Re: [ANN] MacRuby — Yukihiro Matsumoto <matz@...> 2008/02/28

Hi,

[#15683] Re: [ANN] MacRuby — "Laurent Sansonetti" <laurent.sansonetti@...> 2008/02/28

On Thu, Feb 28, 2008 at 1:51 PM, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

Re: Options for String#encode

From: "NARUSE, Yui" <naruse@...>
Date: 2008-02-22 09:40:09 UTC
List: ruby-core #15645
Hi,

First of all, String#encode should be a simple API.  For complex uses, 
Encoding::Converter or something is suitable.  So the problem is, where 
is the border between simple and complex.

Martin Duerst wrote:
> I'm now looking for comments on how to name these and further options.
> 
> invalid: What to do for an invalid byte (sequence) in the input

compared with iconv(3) of SUSv3,
http://www.unix.org/single_unix_specification/
http://www.opengroup.org/onlinepubs/000095399/functions/iconv.html

"invalid" corresponds with following two cases.
* "If a sequence of input bytes does not form a valid character in the 
specified codeset"
* "If the input buffer ends with an incomplete character or shift sequence"

The spec, String#encode doesn't distinct them, seems reasonable.  When 
this difference is important, another complex method is suitable.

The name of this can be "decoder fallback" or something refer to other 
UCS based converers.

> unknown: What to do if the target encoding doesn't include the character

"unknown" corresponds with "If iconv() encounters a character in the 
input buffer that is valid, but for which an identical character does 
not exist in the target codeset".

The name of this can be "encoder fallback" or something refer to other 
UCS based converers.

> ???: We may need a third option, to indicate a combination of invalid
>      and unknown.

The differnece between illegal byte sequence and incomplete character or 
shift sequence may come to be a third option.  But I don't think there 
are needs to identify then at String#encode.

> Values for each of the above options could include:
> 
> :ignore - Ignore/drop the problem data.

:ignore have some security issue.
http://support.microsoft.com/kb/940521

This function is also available by :substitute with empty string.

> :substitute (or :subst or so to be shorter) - Use an
>           (encoding-dependent) substitution character.

:substitute is needed and can be the default behavior.  The name of this 
can be :replacement.

cf. EncoderReplacementFallback
http://msdn2.microsoft.com/en-us/library/system.text.encoderreplacementfallback.aspx

> :warn   - Produce a warning, helpful for debugging.

this is realy needed?

> :error  - The current behavior, available just for completeness.

:exception seems better than :error.  This raises not an error but an 
exception.

> :stop   - Stop transcoding, for encode! this will mean
>           loosing the rest of the string.

this is realy needed?

> :x_escape - add problem data to the output using \x escapes
> 
> :u_escape - add problem characters to the output using \u escapes
>             (unknown: only)
> 
> :hex_ncr - add problem characters to the output using XML/HTML
>            hex escapes (&#xhhhh;, unknown: only)
> 
> :dec_ncr - add problem characters to the output using XML/HTML
>            dec escapes (&#ddddd;, unknown: only)
> 
> :uri_escape - add problem characters to the output using
>            UTF-8->URI %-encoding conversion (for IRI->URI
>            conversion and similar things, unknown: only)

Needed for performance.

> :block - Use result of block, with interface to be worked out
>          (only needed to indicate that a block is used for
>           one case but not for the other)

As Gary said, giving block seems better. Or simply give proc or lambda. 
  But block's parameter needs more discussion.

> 'string' - Replace by string (have to work out details about
>            encoding,...)

The encoding of replacement string will be that of target.  But how 
treat replaced characters duaring conversion is problem.  (give them the 
special codepoint or byte array or struct?)

-- 
NARUSE, Yui  <naruse@airemix.com>
DBDB A476 FDBD 9450 02CD 0EFC BCE3 C388 472E C1EA

In This Thread

Prev Next