[#32434] signature of exit() on C++ — "KISHIMOTO, Makoto" <ksmakoto@...4u.or.jp>
きしもとです
なかだです。
> > /usr/local/include/ruby-1.9/i686-linux/ruby/config.h
[#32447] ruby 1.9 trunk NKF and KCONV Encoding:ASCII-8BIT — WATANABE Tetsuya <Tetsuya.WATANABE@...>
渡辺哲也です。
[#32448] SEGV on "abcd\xf0".force_encoding("utf-8").reverse — Tanaka Akira <akr@...>
以下のようにすると SEGV します。
[#32452] `split': negative string size (or size too big) (ArgumentError) — Tanaka Akira <akr@...>
"あいうえお".force_encoding("euc-jp").split(//) と (EUC-JP
[#32462] SEGV by test/ruby/test_fiber.rb — Tanaka Akira <akr@...>
test/ruby/test_fiber.rb ですが、以下のように insnhelper.ci
In article <87fxyhpw0t.fsf@fsij.org>,
[#32468] Iconv.list patch for NetBSD/Citrus — "NARUSE, Yui" <naruse@...>
成瀬です。
[#32473] about to_path and to_open — "Yusuke ENDOH" <mame@...>
遠藤と申します。
[#32498] Re: [ruby-cvs:21399] Ruby:r14162 (trunk): * parse.y (expr): redefinable not (!) operator. — SASADA Koichi <ko1@...>
ささだです.
まつもと ゆきひろです
[#32512] Re: [ruby-cvs:21409] Ruby:r14172 (trunk): * transcode.c: new file to provide encoding conversion features. — Nobuyoshi Nakada <nobu@...>
なかだです。
中田さん、こんにちは。
成瀬です。
中田さん、こんにちは。
なかだです。
まつもと ゆきひろです
At 15:33 07/12/11, Yukihiro Matsumoto wrote:
まつもと ゆきひろです
なかだです。
まつもと ゆきひろです
成瀬です。
At Wed, 12 Dec 2007 02:49:09 +0900,
At 02:55 07/12/12, SATOH Fumiyasu wrote:
At 21:50 07/12/10, Nobuyoshi Nakada wrote:
松本さん、中田さん、こんにちは。
なかだです。
[#32518] bug in Array#slice! — Satoshi Nakagawa <snakagawa@...>
中川といいます。
At Mon, 10 Dec 2007 19:27:17 +0900,
[#32550] Binary String — Hidetoshi NAGAI <nagai@...>
永井@知能.九工大です.
まつもと ゆきひろです
永井@知能.九工大です.
まつもと ゆきひろです
永井@知能.九工大です.
なかだです。
永井@知能.九工大です.
In article <20080111.171950.78716471.nagai@ai.kyutech.ac.jp>,
永井@知能.九工大です.
In article <20080111.184442.74744388.nagai@ai.kyutech.ac.jp>,
まつもと ゆきひろです
永井@知能.九工大です.
In article <20080112.004750.74741782.nagai@ai.kyutech.ac.jp>,
永井@知能.九工大です.
In article <20080112.100830.112615025.nagai@ai.kyutech.ac.jp>,
永井@知能.九工大です.
成瀬です。
永井@知能.九工大です.
成瀬です。
永井@知能.九工大です.
成瀬です。
永井@知能.九工大です.
成瀬です。
遊楽庵です。
成瀬です。
まつもと ゆきひろです
In article <E1JFVE8-0000Co-QL@x61.netlab.jp>,
成瀬です。
In article <47975933.8010907@airemix.com>,
まつもと ゆきひろです
こんにちは、なかむら(う)です。
まつもと ゆきひろです
こんにちは、なかむら(う)です。
まつもと ゆきひろです
こんにちは、なかむら(う)です。
まつもと ゆきひろです
西山和広です。
まつもと ゆきひろです
In article <20080115.024201.41653719.nagai@ai.kyutech.ac.jp>,
永井@知能.九工大です.
In article <20080116.102057.41656941.nagai@ai.kyutech.ac.jp>,
永井@知能.九工大です.
In article <20080117.233832.74721189.nagai@ai.kyutech.ac.jp>,
Gimiteといいます。
成瀬です。
Gimiteです。
永井@知能.九工大です.
まつもと ゆきひろです
永井@知能.九工大です.
まつもと ゆきひろです
永井@知能.九工大です.
m17n には近づかないようにしているささだです。
成瀬です。
遊楽庵です。
成瀬です。
まつもと ゆきひろです
成瀬です。
まつもと ゆきひろです
永井@知能.九工大です.
成瀬です。
永井@知能.九工大です.
成瀬です。
永井@知能.九工大です.
長文失礼します。
まつもと ゆきひろです
From: Yukihiro Matsumoto <matz@ruby-lang.org>
まつもと ゆきひろです
成瀬です。
At 04:55 08/01/20, NARUSE, Yui wrote:
成瀬です。
成瀬です。
永井@知能.九工大です.
成瀬です。
遊楽庵と申します。
永井@知能.九工大です.
[#32556] default completion for irb1.9 — Tadashi Saito <shiba@...2.accsnet.ne.jp>
斎藤と申します。
[#32563] transcoder loading — Nobuyoshi Nakada <nobu@...>
なかだです。
[#32567] [nil, [...]] — Tanaka Akira <akr@...>
以下のようにすると作っていないはずの再帰的な配列が出てきます。
[#32588] /(?<foo>...)/ =~ str assigns foo — Tanaka Akira <akr@...>
以下のように named capture の結果を自動的に変数に代入させた
まつもと ゆきひろです
In article <E1J34q8-00027E-EF@localhost>,
[#32610] 1.9.1 issues left (as of 12/15) — Yukihiro Matsumoto <matz@...>
まつもと ゆきひろです
Yukihiro Matsumoto さんは書きました:
まつもと ゆきひろです
You may consider this:
[#32629] faster Bignum#* — "Yusuke ENDOH" <mame@...>
遠藤と申します。
[#32662] encode! は変換しないときに <nil> になってしまう。 — Martin Duerst <duerst@...>
中田さん、こんにちは。
[#32668] syntax errors on ext/tk/sample — "U.Nakamura" <usa@...>
こんにちは、なかむら(う)です。
[#32695] ISO-2022-JP output for transcode — "NARUSE, Yui" <naruse@...>
成瀬です。
なかだです。
成瀬さん、中田さん、こんにちは。
[#32708] Enumerable can't take multiple parameters — GOTOU Yuuzou <gotoyuzo@...>
eachで複数のパラメータをyieldしたときに、Enumerable#colectで、
[#32715] issues left as of 12/25 2:00am JST — Yukihiro Matsumoto <matz@...>
まつもと ゆきひろです
まつもとさん、こんにちは。
まつもと ゆきひろです
まつもと ゆきひろです
まつもと ゆきひろです
まつもと ゆきひろです
[#32726] Can't build on MacOSX 10.4(Tiger) (was Re: Re: 1.9.1 issues left (as of 12/15)) — "MOROHASHI Kyosuke" <moronatural@...>
もろはしです。お世話になっております。
[#32756] make rdoc cause segv on OpenBSD — SASADA Koichi <ko1@...>
ささだです。
[#32763] Re: [ruby-cvs:21913] Ruby:r14676 (trunk): * trunk/common.mk, goruby.c, golf_prelude.rb: for golfers. — Yukihiro Matsumoto <matz@...>
まつもと ゆきひろです
[#32791] Re: [ruby-list:44387] [ANN] Ruby 1.9.0 is released — SASADA Koichi <ko1@...>
ささだです。
まつもとさん、笹田さん、
まつもと ゆきひろです
まつもと ゆきひろです
福島の藤岡です。
木村です。
[#32823] class TimeSpan — "NARUSE, Yui" <naruse@...>
成瀬です。
ActiveSupportにあるNumericの拡張はダメですか??
[#32834] Re: [ ruby-Bugs-16634 ] Tk#bindinfo fails with: NoMethodError: undefined method 'collect' for "":String — Urabe Shyouhei <shyouhei@...>
以下のバグ報告が来ています
[#32843] Windowでのデフォルトエンコーディング — KIMURA Koichi <kimura.koichi@...>
木村です。
こんにちは、なかむら(う)です。
At 13:55 07/12/28, U.Nakamura wrote:
成瀬です。
なかだです。
In article <20071228092137.97233E065F@mail.bc9.jp>,
なかだです。
U.Nakamura wrote:
こんにちは、なかむら(う)です。
木村です。
成瀬です。
[#32848] Fwd: [ruby-cvs:21983] Ruby:r14746 (trunk): * transcode.c (transcode_dispatch): allows transcoding from/to — Martin Duerst <duerst@...>
中田さん、こんにちは。
[#32852] Resolv::DNS#getaddresses doesn't return IPv6 address — "NARUSE, Yui" <naruse@...>
成瀬です。
こんにちは。
成瀬です。
In message <477EF0C9.4060103@airemix.com>
成瀬です
In message <477FAAB4.1060005@airemix.com>
梅本です。
成瀬です。
In article <47809D55.4030208@airemix.com>,
[#32892] *, z = 1 breaks stack consistency — "Yusuke ENDOH" <mame@...>
遠藤と申します。
[#32904] Integer overflow on struct timespec — zunda <zunda616e@...>
zundaと申します
[ruby-dev:32433] String#inspect
In article <87wsrz7wpg.fsf@fsij.org>,
Tanaka Akira <akr@fsij.org> writes:
> 文字列がそのエンコーディングとして正しいかどうかを確認する機
> 能とか。
とうのを実装するには Oniguruma レベルの mbclen あたりで間違っ
たエンコーディングをまともに扱って、その情報を提供させないと
いけないわけですが、やってみるとこんな感じですかね。
とりあえず、その情報を使って String#inspect を変なバイトをちゃ
んと判別してエスケープするようにしてみました。
% ./ruby -e 'p "\xa1x".force_encoding("euc-jp")'|cat -v
"M-!x"
というように文字になってない 8bit バイトが出てくるのが
% ./ruby -e 'p "\xa1x".force_encoding("euc-jp")'|cat -v
"\241x"
というようにエスケープされるようになります。
他にも
% ./ruby -e 'p "\374".force_encoding("utf-8")'
"\000"
というように、嘘つけといいたくなるようなのが
% ./ruby -e 'p "\374".force_encoding("utf-8")'
"\374"
と出るようになります。
Index: encoding.c
===================================================================
--- encoding.c (revision 14084)
+++ encoding.c (working copy)
@@ -495,6 +495,12 @@ rb_enc_mbclen(const char *p, const char
}
int
+rb_enc_precise_mbclen(const char *p, const char *e, rb_encoding *enc)
+{
+ return ONIGENC_PRECISE_MBC_ENC_LEN(enc, (UChar*)p, (UChar*)e);
+}
+
+int
rb_enc_codelen(int c, rb_encoding *enc)
{
int n = ONIGENC_CODE_TO_MBCLEN(enc,c);
Index: include/ruby/encoding.h
===================================================================
--- include/ruby/encoding.h (revision 14084)
+++ include/ruby/encoding.h (working copy)
@@ -71,6 +71,12 @@ rb_encoding * rb_enc_find(const char *na
/* ptr,encoding -> mbclen */
int rb_enc_mbclen(const char*, const char *, rb_encoding*);
+/* ptr,encoding -> mbclen, invalid or needmore */
+int rb_enc_precise_mbclen(const char*, const char *, rb_encoding*);
+#define MBCLEN_CHARFOUND(ret) ONIGENC_MBCLEN_CHARFOUND(ret)
+#define MBCLEN_INVALID(ret) ONIGENC_MBCLEN_INVALID(ret)
+#define MBCLEN_NEEDMORE(ret) ONIGENC_MBCLEN_NEEDMORE(ret)
+
/* code,encoding -> codelen */
int rb_enc_codelen(int, rb_encoding*);
Index: include/ruby/oniguruma.h
===================================================================
--- include/ruby/oniguruma.h (revision 14084)
+++ include/ruby/oniguruma.h (working copy)
@@ -144,7 +144,7 @@ typedef struct {
typedef int (*OnigApplyAllCaseFoldFunc)(OnigCodePoint from, OnigCodePoint* to, int to_len, void* arg);
typedef struct OnigEncodingTypeST {
- int (*mbc_enc_len)(const OnigUChar* p,const OnigUChar* e, struct OnigEncodingTypeST* enc);
+ int (*precise_mbc_enc_len)(const OnigUChar* p,const OnigUChar* e, struct OnigEncodingTypeST* enc);
const char* name;
int max_enc_len;
int min_enc_len;
@@ -282,7 +282,32 @@ ONIG_EXTERN OnigEncodingType OnigEncodin
#define ONIGENC_STEP_BACK(enc,start,s,n) \
onigenc_step_back((enc),(start),(s),(n))
-#define ONIGENC_MBC_ENC_LEN(enc,p,e) (enc)->mbc_enc_len(p,e,enc)
+
+#define ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(n) (n)
+#define ONIGENC_CONSTRUCT_MBCLEN_INVALID() (-1)
+#define ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(n) (-1-n)
+
+static inline int onigenc_mbclen_charfound(int r) { return 0 < r ? r : 0; }
+static inline int onigenc_mbclen_needmore(int r) { return r < -1 ? -1 - r : 0; }
+#define ONIGENC_MBCLEN_CHARFOUND(r) onigenc_mbclen_charfound(r)
+#define ONIGENC_MBCLEN_INVALID(r) ((r) == -1)
+#define ONIGENC_MBCLEN_NEEDMORE(r) onigenc_mbclen_needmore(r)
+
+#define ONIGENC_PRECISE_MBC_ENC_LEN(enc,p,e) (enc)->precise_mbc_enc_len(p,e,enc)
+
+static inline int onigenc_mbclen_recover(const OnigUChar* p,const OnigUChar* e, struct OnigEncodingTypeST* enc)
+{
+ int ret = ONIGENC_PRECISE_MBC_ENC_LEN(enc,p,e);
+ int r;
+ if (ONIGENC_MBCLEN_INVALID(ret))
+ return 1;
+ else if ((r = ONIGENC_MBCLEN_NEEDMORE(ret)))
+ return e-p+r;
+ else
+ return ONIGENC_MBCLEN_CHARFOUND(ret);
+}
+
+#define ONIGENC_MBC_ENC_LEN(enc,p,e) onigenc_mbclen_recover(p,e,enc)
#define ONIGENC_MBC_MAXLEN(enc) ((enc)->max_enc_len)
#define ONIGENC_MBC_MAXLEN_DIST(enc) ONIGENC_MBC_MAXLEN(enc)
#define ONIGENC_MBC_MINLEN(enc) ((enc)->min_enc_len)
Index: enc/euc_jp.c
===================================================================
--- enc/euc_jp.c (revision 14084)
+++ enc/euc_jp.c (working copy)
@@ -50,10 +50,85 @@ static const int EncLen_EUCJP[] = {
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1
};
+typedef enum { FAILURE = -2, ACCEPT = -1, S0 = 0, S1, S2 } state_t;
+#define A ACCEPT
+#define F FAILURE
+static const signed char trans[][0x100] = {
+ { /* S0 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 1 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 2 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 3 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 4 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 5 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 6 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 7 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 8 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, 1, 2,
+ /* 9 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* a */ F, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* b */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* c */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* d */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* e */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* f */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, F
+ },
+ { /* S1 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 9 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* a */ F, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* b */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* c */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* d */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* e */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* f */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, F
+ },
+ { /* S2 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 9 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* a */ F, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* b */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* c */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* d */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* e */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* f */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, F
+ },
+
+};
+#undef A
+#undef F
+
static int
mbc_enc_len(const UChar* p, const UChar* e, OnigEncoding enc)
{
- return EncLen_EUCJP[*p];
+ int firstbyte = *p++;
+ state_t s;
+ s = trans[0][firstbyte];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(1) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_EUCJP[firstbyte]-1);
+ s = trans[s][*p++];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(2) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_EUCJP[firstbyte]-2);
+ s = trans[s][*p++];
+ return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(3) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
}
static OnigCodePoint
Index: enc/utf8.c
===================================================================
--- enc/utf8.c (revision 14084)
+++ enc/utf8.c (working copy)
@@ -59,10 +59,155 @@ static const int EncLen_UTF8[] = {
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 1, 1
};
+typedef enum { FAILURE = -2, ACCEPT = -1, S0 = 0, S1, S2, S3, S4, S5 } state_t;
+#define A ACCEPT
+#define F FAILURE
+static const signed char trans[][0x100] = {
+ { /* S0 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 1 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 2 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 3 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 4 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 5 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 6 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 7 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 8 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 9 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* a */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* b */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* c */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* d */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* e */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+ /* f */ 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, F, F
+ },
+ { /* S1 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 9 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* a */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* b */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* c */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* d */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* e */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* f */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F
+ },
+ { /* S2 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* 9 */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* a */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* b */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* c */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* d */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* e */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* f */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F
+ },
+ { /* S3 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+ /* 9 */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+ /* a */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+ /* b */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+ /* c */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* d */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* e */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* f */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F
+ },
+ { /* S4 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+ /* 9 */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+ /* a */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+ /* b */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+ /* c */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* d */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* e */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* f */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F
+ },
+ { /* S5 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 5 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 6 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 7 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 8 */ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+ /* 9 */ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+ /* a */ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+ /* b */ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+ /* c */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* d */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* e */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* f */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F
+ }
+};
+#undef A
+#undef F
+
static int
utf8_mbc_enc_len(const UChar* p, const UChar* e, OnigEncoding enc)
{
- return EncLen_UTF8[*p];
+ int firstbyte = *p++;
+ state_t s;
+ s = trans[0][firstbyte];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(1) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_UTF8[firstbyte]-1);
+ s = trans[s][*p++];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(2) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_UTF8[firstbyte]-2);
+ s = trans[s][*p++];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(3) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_UTF8[firstbyte]-3);
+ s = trans[s][*p++];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(4) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_UTF8[firstbyte]-4);
+ s = trans[s][*p++];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(5) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_UTF8[firstbyte]-5);
+ s = trans[s][*p++];
+ return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(6) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
}
static int
Index: enc/sjis.c
===================================================================
--- enc/sjis.c (revision 14084)
+++ enc/sjis.c (working copy)
@@ -70,10 +70,62 @@ static const char SJIS_CAN_BE_TRAIL_TABL
#define SJIS_ISMB_FIRST(byte) (EncLen_SJIS[byte] > 1)
#define SJIS_ISMB_TRAIL(byte) SJIS_CAN_BE_TRAIL_TABLE[(byte)]
+typedef enum { FAILURE = -2, ACCEPT = -1, S0 = 0, S1 } state_t;
+#define A ACCEPT
+#define F FAILURE
+static const signed char trans[][0x100] = {
+ { /* S0 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 1 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 2 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 3 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 4 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 5 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 6 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 7 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 8 */ F, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* 9 */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* a */ F, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* b */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* c */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* d */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* e */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ /* f */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, F, F, F
+ },
+ { /* S1 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* 0 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 1 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 2 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 3 */ F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,
+ /* 4 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 5 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 6 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 7 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, F,
+ /* 8 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* 9 */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* a */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* b */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* c */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* d */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* e */ A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,
+ /* f */ A, A, A, A, A, A, A, A, A, A, A, A, A, F, F, F
+ }
+};
+#undef A
+#undef F
+
static int
mbc_enc_len(const UChar* p, const UChar* e, OnigEncoding enc)
{
- return EncLen_SJIS[*p];
+ int firstbyte = *p++;
+ state_t s;
+ s = trans[0][firstbyte];
+ if (s < 0) return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(1) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
+ if (p == e) return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(EncLen_SJIS[firstbyte]-1);
+ s = trans[s][*p++];
+ return s == ACCEPT ? ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(2) :
+ ONIGENC_CONSTRUCT_MBCLEN_INVALID();
}
static int
Index: string.c
===================================================================
--- string.c (revision 14084)
+++ string.c (working copy)
@@ -2919,10 +2919,19 @@ rb_str_inspect(VALUE str)
str_cat_char(result, '"', enc);
p = RSTRING_PTR(str); pend = RSTRING_END(str);
while (p < pend) {
- int c = rb_enc_codepoint(p, pend, enc);
- int n = rb_enc_codelen(c, enc);
+ int c;
+ int n;
int cc;
+ n = rb_enc_precise_mbclen(p, pend, enc);
+ if (!MBCLEN_CHARFOUND(n)) {
+ c = (unsigned char)*p++;
+ goto escape_byte;
+ }
+
+ c = rb_enc_codepoint(p, pend, enc);
+ n = rb_enc_codelen(c, enc);
+
p += n;
if (c == '"'|| c == '\\' ||
(c == '#' && (cc = rb_enc_codepoint(p,pend,enc),
@@ -2961,8 +2970,10 @@ rb_str_inspect(VALUE str)
}
else {
char buf[5];
- char *s = buf;
+ char *s;
+escape_byte:
+ s = buf;
sprintf(buf, "\\%03o", c & 0377);
while (*s) {
str_cat_char(result, *s++, enc);
Index: test/ruby/test_m17n.rb
===================================================================
--- test/ruby/test_m17n.rb (revision 14084)
+++ test/ruby/test_m17n.rb (working copy)
@@ -36,6 +36,38 @@ class TestM17N < Test::Unit::TestCase
assert_nothing_raised { eval(u(%{"\\u{6666}\xc0\xa0"})) }
end
+ def test_string_inspect
+ assert_equal('"\376"', e("\xfe").inspect)
+ assert_equal('"\216"', e("\x8e").inspect)
+ assert_equal('"\217"', e("\x8f").inspect)
+ assert_equal('"\217\241"', e("\x8f\xa1").inspect)
+ assert_equal('"\357"', s("\xef").inspect)
+ assert_equal('"\300"', u("\xc0").inspect)
+ assert_equal('"\340\200"', u("\xe0\x80").inspect)
+ assert_equal('"\360\200\200"', u("\xf0\x80\x80").inspect)
+ assert_equal('"\370\200\200\200"', u("\xf8\x80\x80\x80").inspect)
+ assert_equal('"\374\200\200\200\200"', u("\xfc\x80\x80\x80\x80").inspect)
+
+ assert_equal('"\376 "', e("\xfe ").inspect)
+ assert_equal('"\216 "', e("\x8e ").inspect)
+ assert_equal('"\217 "', e("\x8f ").inspect)
+ assert_equal('"\217\241 "', e("\x8f\xa1 ").inspect)
+ assert_equal('"\357 "', s("\xef ").inspect)
+ assert_equal('"\300 "', u("\xc0 ").inspect)
+ assert_equal('"\340\200 "', u("\xe0\x80 ").inspect)
+ assert_equal('"\360\200\200 "', u("\xf0\x80\x80 ").inspect)
+ assert_equal('"\370\200\200\200 "', u("\xf8\x80\x80\x80 ").inspect)
+ assert_equal('"\374\200\200\200\200 "', u("\xfc\x80\x80\x80\x80 ").inspect)
+
+
+ assert_equal(e("\"\\241\x8f\xa1\xa1\""), e("\xa1\x8f\xa1\xa1").inspect)
+
+ assert_equal('"\201."', s("\x81.").inspect)
+ assert_equal(s("\"\x81@\""), s("\x81@").inspect)
+
+ assert_equal('"\374"', u("\xfc").inspect)
+ end
+
def test_regexp_too_short_multibyte_character
assert_raise(SyntaxError) { eval('/\xfe/e') }
assert_raise(SyntaxError) { eval('/\x8e/e') }
--
[田中 哲][たなか あきら][Tanaka Akira]