[#36071] サマータイムでのsleepの動作について — Yoshikawa <yoshixool@...>

吉川と申します。

16 messages 2008/09/01
[#36074] Re: サマータイムでのsleepの動作について — "U.Nakamura" <usa@...> 2008/09/01

こんにちは、なかむら(う)です。

[#36084] Re: サマータイムでのsleepの動作について — Yoshikawa <yoshixool@...> 2008/09/01

吉川です。

[#36090] Re: サマータイムでのsleepの動作について — "U.Nakamura" <usa@...> 2008/09/02

こんにちは、なかむら(う)です。

[#36132] [Feature #542] cgi.rb : CGI::unescape return encoding — Takeyuki Fujioka <redmine@...>

Feature #542: cgi.rb : CGI::unescape return encoding

48 messages 2008/09/03
[#36145] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — "NARUSE, Yui" <naruse@...> 2008/09/03

成瀬です。

[#36146] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/04

藤岡です。

[#36161] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/05

藤岡です。

[#36239] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Tanaka Akira <akr@...> 2008/09/10

In article <48C0C20E.4000307@rabbix.jp>,

[#36242] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Tietew <tietew@...> 2008/09/10

[#36244] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/10

藤岡です。

[#36261] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Tanaka Akira <akr@...> 2008/09/10

In article <48C76705.5000202@rabbix.jp>,

[#36263] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/11

藤岡です。

[#36282] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Tanaka Akira <akr@...> 2008/09/12

In article <48C8A83E.8000200@rabbix.jp>,

[#36289] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/13

藤岡です。

[#36332] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — "NARUSE, Yui" <naruse@...> 2008/09/16

成瀬です。

[#36341] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Tietew <tietew@...> 2008/09/17

[#36342] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/17

藤岡です。

[#36345] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Tietew <tietew@...> 2008/09/17

[#36384] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/19

藤岡です。

[#36422] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/21

藤岡です。

[#36425] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Kazuhiro NISHIYAMA <zn@...> 2008/09/21

西山和広です。

[#36427] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/21

藤岡です。

[#36428] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — "NARUSE, Yui" <naruse@...> 2008/09/21

成瀬です。

[#36430] Re: [Feature #542] cgi.rb : CGI::unescape return encoding — Fujioka <fuj@...> 2008/09/21

藤岡です。

[#36147] GNU iconv dependency — Takahiro Kambe <taca@...>

こんにちは。

28 messages 2008/09/04
[#36222] Re: GNU iconv dependency — "NARUSE, Yui" <naruse@...> 2008/09/08

成瀬です。

[#36248] Re: GNU iconv dependency — Takahiro Kambe <taca@...> 2008/09/10

In message <48C544C3.6090607@airemix.jp>

[#36249] Re: GNU iconv dependency — "NARUSE, Yui" <naruse@...> 2008/09/10

成瀬です。

[#36250] Re: GNU iconv dependency — Takahiro Kambe <taca@...> 2008/09/10

In message <48C7D1E1.5040403@airemix.jp>

[#36256] Re: GNU iconv dependency — "NARUSE, Yui" <naruse@...> 2008/09/10

Takahiro Kambe wrote:

[#36257] Re: GNU iconv dependency — Takahiro Kambe <taca@...> 2008/09/10

In message <48C7EC6B.5060306@airemix.jp>

[#36258] Re: GNU iconv dependency — "NARUSE, Yui" <naruse@...> 2008/09/10

Takahiro Kambe wrote:

[#36259] Re: GNU iconv dependency — Takahiro Kambe <taca@...> 2008/09/10

In message <48C7F8DD.7060001@airemix.jp>

[#36281] 合成文字の2コードポイント目 — "NARUSE, Yui" <naruse@...>

成瀬です。

24 messages 2008/09/12
[#36283] Re: 合成文字の2コードポイント目 — Yukihiro Matsumoto <matz@...> 2008/09/13

まつもと ゆきひろです

[#36286] Re: 合成文字の2コードポイント目 — Tanaka Akira <akr@...> 2008/09/13

In article <E1KeKG9-0004NC-Jb@x61.netlab.jp>,

[#36287] Re: 合成文字の2コードポイント目 — Yukihiro Matsumoto <matz@...> 2008/09/13

まつもと ゆきひろです

[#36292] Re: 合成文字の2コードポイント目 — Tanaka Akira <akr@...> 2008/09/14

In article <E1KeRWe-00037N-Vb@x61.netlab.jp>,

[#36293] Re: 合成文字の2コードポイント目 — "NARUSE, Yui" <naruse@...> 2008/09/14

成瀬です。

[#36304] Re: 合成文字の2コードポイント目 — Tanaka Akira <akr@...> 2008/09/15

In article <48CC86FD.3000409@airemix.jp>,

[#36306] Re: 合成文字の2コードポイント目 — "NARUSE, Yui" <naruse@...> 2008/09/15

Tanaka Akira wrote:

[#36310] Re: 合成文字の2コードポイント目 — Tanaka Akira <akr@...> 2008/09/15

In article <48CE068E.3080701@airemix.jp>,

[#36314] Re: 合成文字の2コードポイント目 — "NARUSE, Yui" <naruse@...> 2008/09/15

成瀬です。

[#36315] Re: 合成文字の2コードポイント目 — Yukihiro Matsumoto <matz@...> 2008/09/15

まつもと ゆきひろです

[#36316] Re: 合成文字の2コードポイント目 — Tanaka Akira <akr@...> 2008/09/16

In article <E1KfNyE-0005XO-0P@x61.netlab.jp>,

[#36317] Re: 合成文字の2コードポイント目 — Yukihiro Matsumoto <matz@...> 2008/09/16

まつもと ゆきひろです

[#36290] adding Pathname#abspath() — "Akinori MUSHA" <knu@...>

 SUSv3のrealpath(3)の項には、存在しないコンポーネントがあったら

18 messages 2008/09/13
[#36291] adding Pathname#resolve (was: adding Pathname#abspath()) — "Akinori MUSHA" <knu@...> 2008/09/13

 名前が良くなかったですね。Pathname#resolve()でどうでしょうか。

[#36297] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — Tanaka Akira <akr@...> 2008/09/14

In article <86wshfn0zl.knu@iDaemons.org>,

[#36308] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — "Akinori MUSHA" <knu@...> 2008/09/15

At Sun, 14 Sep 2008 23:20:56 +0900,

[#36437] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — Tanaka Akira <akr@...> 2008/09/22

In article <86vdwxn9rh.knu@iDaemons.org>,

[#36456] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — "Akinori MUSHA" <knu@...> 2008/09/22

At Mon, 22 Sep 2008 12:43:18 +0900,

[#36489] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — Tanaka Akira <akr@...> 2008/09/23

In article <86od2gcvvj.knu@iDaemons.org>,

[#36560] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — "Akinori MUSHA" <knu@...> 2008/09/25

At Wed, 24 Sep 2008 02:02:59 +0900,

[#36582] Re: adding Pathname#resolve (was: adding Pathname#abspath()) — Tanaka Akira <akr@...> 2008/09/25

In article <86ljxgd0jt.knu@iDaemons.org>,

[#36325] mathn (#**) — Tadayoshi Funaba <tadf@...>

mahtn した場合、

16 messages 2008/09/16

[#36346] add "Error" suffix for Encoding Exceptions — Tadashi Saito <shiba@...2.accsnet.ne.jp>

斎藤と申します。

18 messages 2008/09/17
[#36356] Re: add "Error" suffix for Encoding Exceptions — "NARUSE, Yui" <naruse@...> 2008/09/17

成瀬です。

[#36366] Re: add "Error" suffix for Encoding Exceptions — Tadashi Saito <shiba@...2.accsnet.ne.jp> 2008/09/18

斎藤と申します。

[#36371] Re: add "Error" suffix for Encoding Exceptions — "Yusuke ENDOH" <mame@...> 2008/09/18

遠藤です。

[#36349] Complex/image — Tadayoshi Funaba <tadf@...>

[ruby-math:00543] を読んで、image はやめて imag に変更したらどうか、つ

16 messages 2008/09/17
[#36363] Re: Complex/image — keiju@... (石塚圭樹) 2008/09/18

けいじゅ@いしつかです.

[#36372] Re: Complex/image — Tadayoshi Funaba <tadf@...> 2008/09/18

> imageとimagの件ですが, 現行では両方定義されているけど, imageは削除って

[#36448] [Feature #583] TmpdirとTempfile — Yuki Sonoda <redmine@...>

Feature #583: TmpdirとTempfile

11 messages 2008/09/22

[#36461] {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...>

Complex と Rational では、Unify が定義された場合、可能なら整数等に正規

36 messages 2008/09/22
[#36468] Re: {Complex,Rational}::Unify — Yukihiro Matsumoto <matz@...> 2008/09/22

まつもと ゆきひろです

[#36472] Re: {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...> 2008/09/22

> mathn.rbはcomplex.rbなどと同一の作者が同時に(一体として)作成

[#36483] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/09/23

けいじゅ@いしつかです.

[#36487] Re: {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...> 2008/09/23

> まず, Unifyなしで動作するように, mathn側で対応させてください. それから,

[#36520] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/09/24

けいじゅ@いしつかです.

[#36561] Re: {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...> 2008/09/25

> mathn動かなくなっていると思うので, あまり大丈夫ではありません.

[#36566] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/09/25

けいじゅ@いしつかです.

[#36605] Re: {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...> 2008/09/26

> これらのメソッドを呼び出すのではなく, 再定義することによって振る舞いを

[#36608] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/09/26

けいじゅ@いしつかです.

[#36609] Re: {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...> 2008/09/26

> 当然, mathnで対応することはできます.

[#36651] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/10/01

けいじゅ@いしつかです.

[#36654] Re: {Complex,Rational}::Unify — Tadayoshi Funaba <tadf@...> 2008/10/02

> >それで、僕が改めて言うまでもないことですが、mathn は石塚さんの担当なの

[#36657] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/10/03

けいじゅ@いしつかです.

[#36658] Re: {Complex,Rational}::Unify — Yukihiro Matsumoto <matz@...> 2008/10/03

まつもと ゆきひろです

[#36883] Re: {Complex,Rational}::Unify — keiju@... (石塚圭樹) 2008/10/23

けいじゅ@いしつかです.

[#36903] Re: {Complex,Rational}::Unify — Yukihiro Matsumoto <matz@...> 2008/10/24

まつもと ゆきひろです

[#36512] Encoding.default_internal のためのパッチ — Martin Duerst <duerst@...>

[ruby-core:18774] に Michael Selig から Encoding::default_internal

57 messages 2008/09/24
[#36517] Re: Encoding.default_internal のためのパッチ — "NARUSE, Yui" <naruse@...> 2008/09/24

成瀬です。

[#36523] Re: Encoding.default_internal のためのパッチ — Yukihiro Matsumoto <matz@...> 2008/09/24

まつもと ゆきひろです

[#36550] Re: Encoding.default_internal のためのパッチ — Nobuyoshi Nakada <nobu@...> 2008/09/25

なかだです。

[#36551] Re: Encoding.default_internal のためのパッチ — Yukihiro Matsumoto <matz@...> 2008/09/25

まつもと ゆきひろです

[#36554] Re: Encoding.default_internal のためのパッチ — Martin Duerst <duerst@...> 2008/09/25

At 14:58 08/09/25, Yukihiro Matsumoto wrote:

[#36556] Re: Encoding.default_internal のためのパッチ — Yukihiro Matsumoto <matz@...> 2008/09/25

まつもと ゆきひろです

[#36547] [Feature #600] cgi.rbのマルチパートフォームの受信は1.8との互換性が低い — Takeyuki Fujioka <redmine@...>

Feature #600: cgi.rbのマルチパートフォームの受信は1.8との互換性が低い

7 messages 2008/09/25

[#36628] [IA-64]BigDecimal#sqrt の仕様 — TAKANO Mitsuhiro <takano32@...>

こんにちは

15 messages 2008/09/30
[#36630] Re: [IA-64]BigDecimal#sqrt の仕様 — TAKANO Mitsuhiro <takano32@...> 2008/09/30

高野です。

[ruby-dev:36286] Re: 合成文字の2コードポイント目

From: Tanaka Akira <akr@...>
Date: 2008-09-13 08:29:33 UTC
List: ruby-dev #36286
In article <E1KeKG9-0004NC-Jb@x61.netlab.jp>,
  Yukihiro Matsumoto <matz@ruby-lang.org> writes:

> できないような。1文字=1コードポイントでないエンコーディング
> (現在はサポートしていませんが)においては、コードポイントをベー
> スにした処理そのものを提供していないように思います。

コードポイントは見せるのを嫌うのはそのへんに起因しています。

どうしても見せたければ bignum でひとつにまとめて、とか思わな
いでもなかったのですが、

> String#each_codepointとか#codepointsとかが提供されれば、1文
> 字取り出してcodepoint列に分解するとか出来るようになりますね。

というように、文字が複数のコードポイントを持つということを認
めるならば、それはそれでいいのかもしれません。

ただしその場合 rb_encoding に拡張が必要で、実験的に
UTF-8-MAC を使ってやってみると、こんなですかねぇ。
(UTF-8-MAC じゃなくて UTF-8 でやるべきだという話はある)

Unicode の文字 (grapheme cluster) を扱うためのテーブルは省略
してあります。省略してないのは
http://www.a-k-r.org/tmp/utf-8-mac.patch
にあります。

とりあえず、

% ./ruby -e 'p "\u3042\u3099".force_encoding("UTF-8-MAC").length' 
1

くらいは動きます。

% svn diff --diff-cmd diff -x '-u -p'
Index: encoding.c
===================================================================
--- encoding.c	(revision 19292)
+++ encoding.c	(working copy)
@@ -389,11 +389,13 @@ enum {
     ENCINDEX_ASCII,
     ENCINDEX_UTF_8,
     ENCINDEX_US_ASCII,
+    ENCINDEX_UTF_8_MAC,
     ENCINDEX_BUILTIN_MAX
 };
 
 extern rb_encoding OnigEncodingUTF_8;
 extern rb_encoding OnigEncodingUS_ASCII;
+extern rb_encoding OnigEncodingUTF_8_MAC;
 
 void
 rb_enc_init(void)
@@ -406,6 +408,7 @@ rb_enc_init(void)
     ENC_REGISTER(ASCII);
     ENC_REGISTER(UTF_8);
     ENC_REGISTER(US_ASCII);
+    ENC_REGISTER(UTF_8_MAC);
 #undef ENC_REGISTER
     enc_table.count = ENCINDEX_BUILTIN_MAX;
 }
@@ -693,7 +696,7 @@ rb_obj_encoding(VALUE obj)
 int
 rb_enc_mbclen(const char *p, const char *e, rb_encoding *enc)
 {
-    int n = ONIGENC_PRECISE_MBC_ENC_LEN(enc, (UChar*)p, (UChar*)e);
+    int n = ONIGENC_PRECISE_MBCHAR_ENC_LEN(enc, (UChar*)p, (UChar*)e);
     if (MBCLEN_CHARFOUND_P(n) && MBCLEN_CHARFOUND_LEN(n) <= e-p)
         return MBCLEN_CHARFOUND_LEN(n);
     else {
@@ -715,6 +718,18 @@ rb_enc_precise_mbclen(const char *p, con
 }
 
 int
+rb_enc_precise_mbcharlen(const char *p, const char *e, rb_encoding *enc)
+{
+    int n;
+    if (e <= p)
+        return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(1);
+    n = ONIGENC_PRECISE_MBCHAR_ENC_LEN(enc, (UChar*)p, (UChar*)e);
+    if (e-p < n)
+        return ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE(n-(e-p));
+    return n;
+}
+
+int
 rb_enc_ascget(const char *p, const char *e, int *len, rb_encoding *enc)
 {
     int c, l;
Index: include/ruby/oniguruma.h
===================================================================
--- include/ruby/oniguruma.h	(revision 19292)
+++ include/ruby/oniguruma.h	(working copy)
@@ -151,6 +151,7 @@ typedef int (*OnigApplyAllCaseFoldFunc)(
 
 typedef struct OnigEncodingTypeST {
   int    (*precise_mbc_enc_len)(const OnigUChar* p,const OnigUChar* e, struct OnigEncodingTypeST* enc);
+  int    (*precise_mbchar_enc_len)(const OnigUChar* p,const OnigUChar* e, struct OnigEncodingTypeST* enc);
   const char*   name;
   int           max_enc_len;
   int           min_enc_len;
@@ -240,6 +241,7 @@ ONIG_EXTERN OnigEncodingType OnigEncodin
 #define ONIGENC_MBCLEN_NEEDMORE_LEN(r)          (-1-(r))
 
 #define ONIGENC_PRECISE_MBC_ENC_LEN(enc,p,e)   (enc)->precise_mbc_enc_len(p,e,enc)
+#define ONIGENC_PRECISE_MBCHAR_ENC_LEN(enc,p,e)   (enc)->precise_mbchar_enc_len(p,e,enc)
 
 ONIG_EXTERN
 int onigenc_mbclen_approximate P_((const OnigUChar* p,const OnigUChar* e, struct OnigEncodingTypeST* enc));
Index: include/ruby/encoding.h
===================================================================
--- include/ruby/encoding.h	(revision 19292)
+++ include/ruby/encoding.h	(working copy)
@@ -110,6 +110,7 @@ int rb_enc_mbclen(const char *p, const c
 
 /* -> chlen, invalid or needmore */
 int rb_enc_precise_mbclen(const char *p, const char *e, rb_encoding *enc);
+int rb_enc_precise_mbcharlen(const char *p, const char *e, rb_encoding *enc);
 #define MBCLEN_CHARFOUND_P(ret)     ONIGENC_MBCLEN_CHARFOUND_P(ret)
 #define MBCLEN_CHARFOUND_LEN(ret)     ONIGENC_MBCLEN_CHARFOUND_LEN(ret)
 #define MBCLEN_INVALID_P(ret)       ONIGENC_MBCLEN_INVALID_P(ret)
Index: enc/koi8_u.c
===================================================================
--- enc/koi8_u.c	(revision 19292)
+++ enc/koi8_u.c	(working copy)
@@ -203,6 +203,7 @@ koi8_u_get_case_fold_codes_by_str(OnigCa
 
 OnigEncodingDefine(koi8_u, KOI8_U) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "KOI8-U",       /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/gbk.c
===================================================================
--- enc/gbk.c	(revision 19292)
+++ enc/gbk.c	(working copy)
@@ -196,6 +196,7 @@ gbk_is_allowed_reverse_match(const UChar
 
 OnigEncodingDefine(gbk, GBK) = {
   gbk_mbc_enc_len,
+  gbk_mbc_enc_len,
   "GBK",      /* name */
   2,          /* max enc length */
   1,          /* min enc length */
Index: enc/euc_jp.c
===================================================================
--- enc/euc_jp.c	(revision 19292)
+++ enc/euc_jp.c	(working copy)
@@ -345,6 +345,7 @@ get_ctype_code_range(OnigCtype ctype, On
 
 OnigEncodingDefine(euc_jp, EUC_JP) = {
   mbc_enc_len,
+  mbc_enc_len,
   "EUC-JP",   /* name */
   3,          /* max enc length */
   1,          /* min enc length */
Index: enc/cp949.c
===================================================================
--- enc/cp949.c	(revision 19292)
+++ enc/cp949.c	(working copy)
@@ -196,6 +196,7 @@ cp949_is_allowed_reverse_match(const UCh
 
 OnigEncodingDefine(cp949, CP949) = {
   cp949_mbc_enc_len,
+  cp949_mbc_enc_len,
   "CP949",      /* name */
   2,          /* max enc length */
   1,          /* min enc length */
Index: enc/shift_jis.c
===================================================================
--- enc/shift_jis.c	(revision 19292)
+++ enc/shift_jis.c	(working copy)
@@ -353,6 +353,7 @@ get_ctype_code_range(OnigCtype ctype, On
 
 OnigEncodingDefine(shift_jis, Shift_JIS) = {
   mbc_enc_len,
+  mbc_enc_len,
   "Shift_JIS",   /* name */
   2,             /* max byte length */
   1,             /* min byte length */
Index: enc/utf_8.c
===================================================================
--- enc/utf_8.c	(revision 19292)
+++ enc/utf_8.c	(working copy)
@@ -241,6 +241,3336 @@ mbc_enc_len(const UChar* p, const UChar*
                        ONIGENC_CONSTRUCT_MBCLEN_INVALID();
 }
 
+static OnigCodePoint mbc_to_code(const UChar* p, const UChar* end, OnigEncoding enc);
+
+/* generated from GraphemeBreakProperty-5.1.0.txt
+ * Since CR LF is handled in another layer such as IO with text mode,
+ * CR and LF are merged into CONTROL.  */
+#define GRAPHEME_BIT_CONTROL         0x001
+#define GRAPHEME_BIT_EXTEND          0x002
+#define GRAPHEME_BIT_PREPEND         0x004
+#define GRAPHEME_BIT_SPACINGMARK     0x008
+#define GRAPHEME_BIT_L               0x010
+#define GRAPHEME_BIT_V               0x020
+#define GRAPHEME_BIT_T               0x040
+#define GRAPHEME_BIT_LV              0x080
+#define GRAPHEME_BIT_LVT             0x100
+struct graphme_table_t {
+    OnigCodePoint codepoint;
+    unsigned int properties;
+} graphme_table[] = {
+    { 0x00000, 0x001 }, { 0x00001, 0x001 }, { 0x00002, 0x001 }, { 0x00003, 0x001 },
(中略)
+    { 0xE01EA, 0x002 }, { 0xE01EB, 0x002 }, { 0xE01EC, 0x002 }, { 0xE01ED, 0x002 },
+    { 0xE01EE, 0x002 }, { 0xE01EF, 0x002 },
+};
+
+static int
+grapheme_cmp(const void *p1, const void *p2)
+{
+    OnigCodePoint c1 = ((struct graphme_table_t *)p1)->codepoint;
+    OnigCodePoint c2 = ((struct graphme_table_t *)p2)->codepoint;
+    if (c1 < c2)
+        return -1;
+    if (c1 > c2)
+        return 1;
+    return 0;
+}
+
+static unsigned int
+get_grapheme_properties(OnigCodePoint c)
+{
+    struct graphme_table_t entry, *found;
+    entry.codepoint = c;
+    found = bsearch(&entry, graphme_table, sizeof(graphme_table)/sizeof(*graphme_table),
+                sizeof(*graphme_table), grapheme_cmp);
+    if (found)
+        return found->properties;
+    return 0;
+}
+
+static int
+mbchar_enc_len(const UChar* p, const UChar* e, OnigEncoding enc ARG_UNUSED)
+{
+    /* 
+     * this implements extended grapheme clusters ("user-perceived characters")
+     * http://www.unicode.org/reports/tr29/
+     */
+    int r1, l1, r2, l2;
+    OnigCodePoint c1, c2;
+    unsigned int p1, p2;
+    r1 = mbc_enc_len(p, e, enc);
+    if (!ONIGENC_MBCLEN_CHARFOUND_P(r1))
+        return r1;
+    l1 = ONIGENC_MBCLEN_CHARFOUND_LEN(r1);
+    c1 = mbc_to_code(p, e, enc);
+    p1 = get_grapheme_properties(c1);
+
+    if (p + l1 == e)
+        return r1;
+    if (c1 & GRAPHEME_BIT_CONTROL)
+        return r1;
+
+    while (p + l1 < e) {
+        r2 = mbc_enc_len(p+l1, e, enc);
+        if (ONIGENC_MBCLEN_INVALID_P(r2))
+            return ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(l1);
+        if (ONIGENC_MBCLEN_NEEDMORE_P(r2))
+            return r2;
+        l2 = ONIGENC_MBCLEN_CHARFOUND_LEN(r2);
+        c2 = mbc_to_code(p+l1, e, enc);
+        p2 = get_grapheme_properties(c2);
+
+        if (p2 & GRAPHEME_BIT_CONTROL)
+            return ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(l1);
+        if (((p1 & GRAPHEME_BIT_L) &&   (p2 & (GRAPHEME_BIT_L|
+                                               GRAPHEME_BIT_V|
+                                               GRAPHEME_BIT_LV|
+                                               GRAPHEME_BIT_LVT))) ||
+            ((p1 & (GRAPHEME_BIT_LV|
+                    GRAPHEME_BIT_V)) && (p2 & (GRAPHEME_BIT_V|
+                                               GRAPHEME_BIT_T))) ||
+            ((p1 & (GRAPHEME_BIT_LVT|
+                    GRAPHEME_BIT_T)) && (p2 & GRAPHEME_BIT_T)) ||
+                                        (p2 & (GRAPHEME_BIT_EXTEND|
+                                               GRAPHEME_BIT_SPACINGMARK)) ||
+            (p1 & GRAPHEME_BIT_PREPEND)) {
+            l1 += l2;
+            p1 = p2;
+        }
+        else {
+            break;
+        }
+    }
+    return ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND(l1);
+}
+
 static int
 is_mbc_newline(const UChar* p, const UChar* end, OnigEncoding enc)
 {
@@ -426,6 +3756,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(utf_8, UTF_8) = {
   mbc_enc_len,
+  mbc_enc_len,
   "UTF-8",     /* name */
   6,           /* max byte length */
   1,           /* min byte length */
@@ -442,6 +3773,7 @@ OnigEncodingDefine(utf_8, UTF_8) = {
   left_adjust_char_head,
   onigenc_always_true_is_allowed_reverse_match
 };
+
 ENC_ALIAS("CP65001", "UTF-8")
 
 /*
@@ -450,6 +3782,24 @@ ENC_ALIAS("CP65001", "UTF-8")
  * Link: http://developer.apple.com/qa/qa2001/qa1235.html
  * Link: http://developer.apple.com/jp/qa/qa2001/qa1235.html
  */
-ENC_REPLICATE("UTF8-MAC", "UTF-8")
-ENC_ALIAS("UTF-8-MAC", "UTF8-MAC")
+OnigEncodingDefine(utf_8_mac, UTF_8_MAC) = {
+  mbc_enc_len,
+  mbchar_enc_len,
+  "UTF-8-MAC",     /* name */
+  6,           /* max byte length */
+  1,           /* min byte length */
+  is_mbc_newline,
+  mbc_to_code,
+  code_to_mbclen,
+  code_to_mbc,
+  mbc_case_fold,
+  onigenc_unicode_apply_all_case_fold,
+  get_case_fold_codes_by_str,
+  onigenc_unicode_property_name_to_ctype,
+  onigenc_unicode_is_code_ctype,
+  get_ctype_code_range,
+  left_adjust_char_head,
+  onigenc_always_true_is_allowed_reverse_match
+};
+ENC_ALIAS("UTF8-MAC", "UTF-8-MAC")
 
Index: enc/big5.c
===================================================================
--- enc/big5.c	(revision 19292)
+++ enc/big5.c	(working copy)
@@ -197,6 +197,7 @@ big5_is_allowed_reverse_match(const UCha
 
 OnigEncodingDefine(big5, BIG5) = {
   big5_mbc_enc_len,
+  big5_mbc_enc_len,
   "Big5",     /* name */
   2,          /* max enc length */
   1,          /* min enc length */
Index: enc/euc_tw.c
===================================================================
--- enc/euc_tw.c	(revision 19292)
+++ enc/euc_tw.c	(working copy)
@@ -215,6 +215,7 @@ euctw_is_allowed_reverse_match(const UCh
 
 OnigEncodingDefine(euc_tw, EUC_TW) = {
   euctw_mbc_enc_len,
+  euctw_mbc_enc_len,
   "EUC-TW",   /* name */
   4,          /* max enc length */
   1,          /* min enc length */
Index: enc/iso_8859_10.c
===================================================================
--- enc/iso_8859_10.c	(revision 19292)
+++ enc/iso_8859_10.c	(working copy)
@@ -225,6 +225,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_10, ISO_8859_10) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-10", /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_11.c
===================================================================
--- enc/iso_8859_11.c	(revision 19292)
+++ enc/iso_8859_11.c	(working copy)
@@ -78,6 +78,7 @@ is_code_ctype(OnigCodePoint code, unsign
 
 OnigEncodingDefine(iso_8859_11, ISO_8859_11) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-11",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/ascii.c
===================================================================
--- enc/ascii.c	(revision 19292)
+++ enc/ascii.c	(working copy)
@@ -31,6 +31,7 @@
 
 OnigEncodingDefine(ascii, ASCII) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ASCII-8BIT",/* name */
   1,           /* max byte length */
   1,           /* min byte length */
Index: enc/iso_8859_13.c
===================================================================
--- enc/iso_8859_13.c	(revision 19292)
+++ enc/iso_8859_13.c	(working copy)
@@ -214,6 +214,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_13, ISO_8859_13) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-13",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_14.c
===================================================================
--- enc/iso_8859_14.c	(revision 19292)
+++ enc/iso_8859_14.c	(working copy)
@@ -227,6 +227,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_14, ISO_8859_14) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-14",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_15.c
===================================================================
--- enc/iso_8859_15.c	(revision 19292)
+++ enc/iso_8859_15.c	(working copy)
@@ -221,6 +221,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_15, ISO_8859_15) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-15",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_16.c
===================================================================
--- enc/iso_8859_16.c	(revision 19292)
+++ enc/iso_8859_16.c	(working copy)
@@ -223,6 +223,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_16, ISO_8859_16) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-16",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/us_ascii.c
===================================================================
--- enc/us_ascii.c	(revision 19292)
+++ enc/us_ascii.c	(working copy)
@@ -10,6 +10,7 @@ us_ascii_mbc_enc_len(const UChar* p, con
 
 OnigEncodingDefine(us_ascii, US_ASCII) = {
   us_ascii_mbc_enc_len,
+  us_ascii_mbc_enc_len,
   "US-ASCII",/* name */
   1,           /* max byte length */
   1,           /* min byte length */
Index: enc/windows_1251.c
===================================================================
--- enc/windows_1251.c	(revision 19292)
+++ enc/windows_1251.c	(working copy)
@@ -182,6 +182,7 @@ cp1251_get_case_fold_codes_by_str(OnigCa
 
 OnigEncodingDefine(windows_1251, Windows_1251) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "Windows-1251",      /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_1.c
===================================================================
--- enc/iso_8859_1.c	(revision 19292)
+++ enc/iso_8859_1.c	(working copy)
@@ -256,6 +256,7 @@ is_code_ctype(OnigCodePoint code, unsign
 
 OnigEncodingDefine(iso_8859_1, ISO_8859_1) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-1",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_2.c
===================================================================
--- enc/iso_8859_2.c	(revision 19292)
+++ enc/iso_8859_2.c	(working copy)
@@ -221,6 +221,7 @@ is_code_ctype(OnigCodePoint code, unsign
 
 OnigEncodingDefine(iso_8859_2, ISO_8859_2) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-2",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/euc_kr.c
===================================================================
--- enc/euc_kr.c	(revision 19292)
+++ enc/euc_kr.c	(working copy)
@@ -173,6 +173,7 @@ euckr_is_allowed_reverse_match(const UCh
 
 OnigEncodingDefine(euc_kr, EUC_KR) = {
   euckr_mbc_enc_len,
+  euckr_mbc_enc_len,
   "EUC-KR",   /* name */
   2,          /* max enc length */
   1,          /* min enc length */
Index: enc/iso_8859_3.c
===================================================================
--- enc/iso_8859_3.c	(revision 19292)
+++ enc/iso_8859_3.c	(working copy)
@@ -221,6 +221,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_3, ISO_8859_3) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-3",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/utf_32be.c
===================================================================
--- enc/utf_32be.c	(revision 19292)
+++ enc/utf_32be.c	(working copy)
@@ -175,6 +175,7 @@ utf32be_get_case_fold_codes_by_str(OnigC
 
 OnigEncodingDefine(utf_32be, UTF_32BE) = {
   utf32be_mbc_enc_len,
+  utf32be_mbc_enc_len,
   "UTF-32BE",   /* name */
   4,            /* max byte length */
   4,            /* min byte length */
Index: enc/iso_8859_4.c
===================================================================
--- enc/iso_8859_4.c	(revision 19292)
+++ enc/iso_8859_4.c	(working copy)
@@ -223,6 +223,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_4, ISO_8859_4) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-4",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/emacs_mule.c
===================================================================
--- enc/emacs_mule.c	(revision 19292)
+++ enc/emacs_mule.c	(working copy)
@@ -320,6 +320,7 @@ is_code_ctype(OnigCodePoint code, unsign
  */
 OnigEncodingDefine(emacs_mule, Emacs_Mule) = {
   mbc_enc_len,
+  mbc_enc_len,
   "Emacs-Mule",   /* name */
   4,          /* max enc length */
   1,          /* min enc length */
Index: enc/iso_8859_5.c
===================================================================
--- enc/iso_8859_5.c	(revision 19292)
+++ enc/iso_8859_5.c	(working copy)
@@ -211,6 +211,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_5, ISO_8859_5) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-5",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/utf_16be.c
===================================================================
--- enc/utf_16be.c	(revision 19292)
+++ enc/utf_16be.c	(working copy)
@@ -239,6 +239,7 @@ utf16be_get_case_fold_codes_by_str(OnigC
 
 OnigEncodingDefine(utf_16be, UTF_16BE) = {
   utf16be_mbc_enc_len,
+  utf16be_mbc_enc_len,
   "UTF-16BE",   /* name */
   4,            /* max byte length */
   2,            /* min byte length */
Index: enc/iso_8859_6.c
===================================================================
--- enc/iso_8859_6.c	(revision 19292)
+++ enc/iso_8859_6.c	(working copy)
@@ -78,6 +78,7 @@ is_code_ctype(OnigCodePoint code, unsign
 
 OnigEncodingDefine(iso_8859_6, ISO_8859_6) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-6",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_7.c
===================================================================
--- enc/iso_8859_7.c	(revision 19292)
+++ enc/iso_8859_7.c	(working copy)
@@ -208,6 +208,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_7, ISO_8859_7) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-7",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_8.c
===================================================================
--- enc/iso_8859_8.c	(revision 19292)
+++ enc/iso_8859_8.c	(working copy)
@@ -78,6 +78,7 @@ is_code_ctype(OnigCodePoint code, unsign
 
 OnigEncodingDefine(iso_8859_8, ISO_8859_8) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-8",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/iso_8859_9.c
===================================================================
--- enc/iso_8859_9.c	(revision 19292)
+++ enc/iso_8859_9.c	(working copy)
@@ -214,6 +214,7 @@ get_case_fold_codes_by_str(OnigCaseFoldT
 
 OnigEncodingDefine(iso_8859_9, ISO_8859_9) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "ISO-8859-9",  /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: enc/utf_32le.c
===================================================================
--- enc/utf_32le.c	(revision 19292)
+++ enc/utf_32le.c	(working copy)
@@ -175,6 +175,7 @@ utf32le_get_case_fold_codes_by_str(OnigC
 
 OnigEncodingDefine(utf_32le, UTF_32LE) = {
   utf32le_mbc_enc_len,
+  utf32le_mbc_enc_len,
   "UTF-32LE",   /* name */
   4,            /* max byte length */
   4,            /* min byte length */
Index: enc/gb18030.c
===================================================================
--- enc/gb18030.c	(revision 19292)
+++ enc/gb18030.c	(working copy)
@@ -581,6 +581,7 @@ gb18030_is_allowed_reverse_match(const U
 
 OnigEncodingDefine(gb18030, GB18030) = {
   gb18030_mbc_enc_len,
+  gb18030_mbc_enc_len,
   "GB18030",   /* name */
   4,          /* max enc length */
   1,          /* min enc length */
Index: enc/utf_16le.c
===================================================================
--- enc/utf_16le.c	(revision 19292)
+++ enc/utf_16le.c	(working copy)
@@ -231,6 +231,7 @@ utf16le_get_case_fold_codes_by_str(OnigC
 
 OnigEncodingDefine(utf_16le, UTF_16LE) = {
   utf16le_mbc_enc_len,
+  utf16le_mbc_enc_len,
   "UTF-16LE",   /* name */
   4,            /* max byte length */
   2,            /* min byte length */
Index: enc/koi8_r.c
===================================================================
--- enc/koi8_r.c	(revision 19292)
+++ enc/koi8_r.c	(working copy)
@@ -199,6 +199,7 @@ koi8_r_get_case_fold_codes_by_str(OnigCa
 
 OnigEncodingDefine(koi8_r, KOI8_R) = {
   onigenc_single_byte_mbc_enc_len,
+  onigenc_single_byte_mbc_enc_len,
   "KOI8-R",       /* name */
   1,             /* max enc length */
   1,             /* min enc length */
Index: string.c
===================================================================
--- string.c	(revision 19292)
+++ string.c	(working copy)
@@ -775,7 +775,7 @@ rb_enc_strlen_cr(const char *p, const ch
 		c += q - p;
 		p = q;
 	    }
-	    ret = rb_enc_precise_mbclen(p, e, enc);
+	    ret = rb_enc_precise_mbcharlen(p, e, enc);
 	    if (MBCLEN_CHARFOUND_P(ret)) {
 		*cr |= ENC_CODERANGE_VALID;
 		p += MBCLEN_CHARFOUND_LEN(ret);
-- 
[田中 哲][たなか あきら][Tanaka Akira]

In This Thread