[#18436] [ANN] Ruby 1.9.1 feature freeze — "Yugui (Yuki Sonoda)" <yugui@...>
Hi all,
On Tue, Sep 2, 2008 at 3:09 PM, Yugui (Yuki Sonoda) <yugui@yugui.jp> wrote:
Michael Fellinger schrieb:
On 12/09/2008, Michael Neumann <mneumann@ntecs.de> wrote:
Hi,
Hi, Yusuke
Hi,
Ryan Davis wrote:
Dave Thomas wrote:
Jim Weirich wrote:
On Wed, Oct 8, 2008 at 3:05 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote=
On Wed, Oct 08, 2008 at 09:28:22PM +0900, Austin Ziegler wrote:
2008/10/8 Paul Brannan <pbrannan@atdesk.com>:
T24gV2VkLCBPY3QgOCwgMjAwOCBhdCA0OjM4IFBNLCBQaXQgQ2FwaXRhaW4gPHBpdC5jYXBpdGFp
Trans wrote:
Hi,
Hi,
NARUSE, Yui wrote:
On Fri, Oct 3, 2008 at 12:01 AM, David Flanagan <david@davidflanagan.com> wrote:
[#18437] Class as second-generation singleton class — "David A. Black" <dblack@...>
Hi --
[#18444] [PATCH] remove timer signal after last ruby thread has died — Joe Damato <ice799@...>
Hi -
Hi,
[#18446] Global constants and other magic in 1.9 stdlib — "Michal Suchanek" <hramrach@...>
Hello
On Thu, Sep 4, 2008 at 05:01, Michal Suchanek <hramrach@centrum.cz> wrote:
[#18447] useless external functions — SASADA Koichi <ko1@...>
Hi,
[#18452] [ANN] Ruby 1.9.1 feature freeze — "Roger Pack" <rogerpack2005@...>
Would it be possible to have a few patches applied before freeze [if
Hi,
Hi,
Hi,
[#18454] WEBrick issue - HTTP/1.1 and IO objects — Brian Candler <B.Candler@...>
I am wondering if the following is a bug in WEBrick.
[#18486] Ruby 1.9 strings & character encoding — "Michael Selig" <michael.selig@...>
Firstly, I apologise if I am going over old ground here - I haven't been
Hi,
On Mon, 08 Sep 2008 19:45:36 +1000, Yukihiro Matsumoto
Hi,
On Sep 8, 2008, at 10:43 AM, NARUSE, Yui wrote:
# First off, I'm neutral to this issue
On Sep 8, 2008, at 9:06 PM, Urabe Shyouhei wrote:
In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D@sun.com>,
At 18:07 08/09/10, Manfred Stienstra wrote:
In article <6.0.0.20.2.20080916184943.08a281f0@localhost>,
On 16/09/2008, Tanaka Akira <akr@fsij.org> wrote:
In article <a5d587fb0809170303x71ebde31r8adae082b82af182@mail.gmail.com>,
On Tue, 09 Sep 2008 03:43:54 +1000, NARUSE, Yui <naruse@airemix.jp> wrote:
In article <op.ug6ubske9245dp@kool>,
In article <9888DBB2-0FE8-4C5C-8EF0-02D7C30157FA@pragprog.com>,
[#18513] Make irb start a new line on EOF — "Daniel Luz" <dev@...>
Other interactive interpreters (namely `python`, `lua`, `psh`, and
[#18522] Warning for trailing comma in method declarations — Kornelius Kalnbach <murphy@...>
hello!
[#18525] Ruby for OS/2 Maintainer — "Brendan Oakley" <gentux2@...>
Hello.
[#18532] Ruby 1.9 string performance — "Michael Selig" <michael.selig@...>
I would like to submit the attached patch to string.c which substantially
[#18535] [Bug #557] Regexp does not match longest string — Wim Yedema <redmine@...>
Bug #557: Regexp does not match longest string
Wim Yedema schrieb:
2008/9/10 Wolfgang N=E1dasi-Donner <ed.odanow@wonado.de>:
Robert Klemme schrieb:
[#18572] Working on CSV's Encoding Support — James Gray <james@...>
I'm trying to get the standard CSV library ready for m17n in Ruby
On Sat, Sep 13, 2008 at 6:32 PM, James Gray <james@grayproductions.net> wrote:
On Sep 13, 2008, at 5:44 PM, Gregory Brown wrote:
On Sep 13, 2008, at 5:39 PM, James Gray wrote:
On Sep 13, 2008, at 11:55 PM, James Gray wrote:
At 00:43 08/09/15, James Gray wrote:
On Sun, 14 Sep 2008 14:48:47 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 2:49 AM, Michael Selig wrote:
On Mon, 15 Sep 2008 04:51:55 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 6:48 PM, Michael Selig wrote:
On Mon, 15 Sep 2008 10:45:52 +1000, James Gray <james@grayproductions.net>
On Sep 14, 2008, at 8:42 PM, Michael Selig wrote:
[#18594] [Bug #564] Regexp fails on UTF-16 & UTF-32 character encodings — Michael Selig <redmine@...>
Bug #564: Regexp fails on UTF-16 & UTF-32 character encodings
In article <48cddb5533ad_8725cd9524342@redmine.ruby-lang.org>,
On Mon, 15 Sep 2008 18:08:14 +1000, Tanaka Akira <akr@fsij.org> wrote:
[#18600] [Bug #566] String encoding error messages are inconsistent — Michael Selig <redmine@...>
Bug #566: String encoding error messages are inconsistent
[#18631] Request: File.binread (Or File.read_binary) — "Gregory Brown" <gregory.t.brown@...>
Just incase it got lost in the other thread, I'd like to recommend the
Hi,
On Wed, Sep 17, 2008 at 12:35 PM, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
On Sep 17, 2008, at 09:48 AM, Gregory Brown wrote:
On Sep 18, 2008, at 6:56 PM, Eric Hodel wrote:
[#18637] Reading non-ascii compatible files — "Michael Selig" <michael.selig@...>
Hi,
Hi,
[#18640] Character encodings - a radical suggestion — "Michael Selig" <michael.selig@...>
Hi,
On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:
On Sep 16, 2008, at 8:20 PM, Michael Selig wrote:
On Wed, 17 Sep 2008 12:51:14 +1000, James Gray <james@grayproductions.net>
On Sep 16, 2008, at 11:21 PM, Michael Selig wrote:
Hi,
On 9/17/2008 3:39 PM, NARUSE, Yui wrote:
Hi,
Hi,
On Sep 17, 2008, at 9:45 AM, NARUSE, Yui wrote:
At 00:01 08/09/18, Yukihiro Matsumoto wrote:
Hi,
On Fri, 19 Sep 2008 18:24:41 +1000, Yukihiro Matsumoto
Oops, I misfired my mail reader; the following is the right one:
On Fri, 19 Sep 2008 19:52:30 +1000, Yukihiro Matsumoto
Hi,
On Sun, 21 Sep 2008 02:05:30 +1000, Yukihiro Matsumoto
Hello Michael,
On Sep 21, 2008, at 9:35 PM, Martin Duerst wrote:
On Mon, 22 Sep 2008 12:35:49 +1000, Martin Duerst <duerst@it.aoyama.ac.jp>
At 12:25 08/09/22, Michael Selig wrote:
On Sep 21, 2008, at 9:35 PM, Martin Duerst wrote:
Hi,
Hi,
----- Original Message -----
On Sep 17, 2008, at 9:32 PM, Michael Selig wrote:
On Sep 17, 2008, at 8:43 PM, James Gray wrote:
[#18698] Next design meeting — Evan Phoenix <evan@...>
Hi everyone,
[#18710] Encoding Safe Regexp.escape() — James Gray <james@...>
As part of my ongoing process to make CSV m17n savvy, I'm needing an =20
[#18750] M17N Inspect Messages — James Gray <james@...>
What is the correct way to handle inspect() with regards to M17N? Do
[#18762] [Feature #578] add method to disassemble Proc objects — Roger Pack <redmine@...>
Feature #578: add method to disassemble Proc objects
[#18813] Feature idea: Class#subclasses — Charles Oliver Nutter <charles.nutter@...>
In JRuby we have added an extension that provides a "subclasses" method
[#18815] mv trunk/include/ruby/node.h to trunk/node.h — SASADA Koichi <ko1@...>
I moved trunk/include/ruby/node.h to trunk/node.h. On 1.9, only
[#18820] miniunit added — Ryan Davis <ryand-ruby@...>
I've replaced test/unit with miniunit in order to meet the feature
SASADA Koichi wrote:
I got it.
[#18844] [Bug #592] String#rstrip sometimes strips NULLs, sometimes doesn't - encoding dependent — Michael Selig <redmine@...>
Bug #592: String#rstrip sometimes strips NULLs, sometimes doesn't - encoding dependent
[#18861] tokenizing regular expressions when passed as method params — "Seth Dillingham" <seth.dillingham@...>
Hi,
[#18866] I'm changing the PickAxe to document miniunit — Dave Thomas <dave@...>
What's the correct way to load it up:
[#18872] [RIP] Guy Decoux. — "Jean-Fran輟is Tr穗" <jftran@...>
Hello,
[#18879] Mini Unit changing exceptions — Jim Weirich <jim.weirich@...>
Why does mini-unit change the exception in the test below?
On Sep 25, 2008, at 3:13 AM, Ryan Davis wrote:
[#18888] Re: [ruby-cvs:26761] Ruby:r19543 (trunk): Not a typo. The name is better plural. Better English and more consistent with the other assertions. — Nobuyoshi Nakada <nobu@...>
Hi,
[#18899] refute_{equal, match, nil, same} is not useful — Fujioka <fuj@...>
Hi,
On Thu, Sep 25, 2008 at 8:15 AM, Fujioka <fuj@rabbix.jp> wrote:
On Tue, Oct 7, 2008 at 10:40 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:
>I can actually see Ryan's point of saying that "refute_equal a, b"
Related to this:
On Wed, Oct 8, 2008 at 2:48 AM, Martin Duerst <duerst@it.aoyama.ac.jp>wrote:
2008/10/8 Eric Mahurin :
On Wed, Oct 8, 2008 at 5:08 PM, Jean-Fran=E7ois Tr=E2n
[#18905] output format of miniunit — "Yusuke ENDOH" <mame@...>
Hi,
Hi,
[#18931] test/testunit and miniunit — Tanaka Akira <akr@...>
Currently test-all exits prematurely.
[#18934] [ANN] delay of releasing 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>
Hi,
[#18937] A stupid question... — Dave Thomas <dave@...>
Just what was wrong with Test::Unit? Sure, it was slightly bloated.
> -----Original Message-----
On Sun, Sep 28, 2008 at 9:10 PM, Trans <transfire@gmail.com> wrote:
On Mon, Sep 29, 2008 at 1:20 AM, Meinrad Recheis
On Sep 28, 2008, at 3:19 PM, hemant wrote:
2008/9/28 Trans <transfire@gmail.com>:
[#18944] [RCR] $ABOUT.ts — _why <why@...>
I don't want to be indelicate and we can address this some other
[#18985] Encodings::default_internal patch — "Michael Selig" <michael.selig@...>
Hi,
On Sep 27, 2008, at 2:28 AM, Michael Selig wrote:
On Sun, 28 Sep 2008 02:02:57 +1000, James Gray <james@grayproductions.net>
On Sep 27, 2008, at 8:56 PM, Michael Selig wrote:
[#18986] miniunit problems and release of Ruby 1.9.0-5 — "Yugui (Yuki Sonoda)" <yugui@...>
Hi,
Hi,
Hi,
Hi,
[#19043] Ruby is "stealing" names from operating system API:s — "Johan Holmberg" <johan556@...>
Hi!
Hi,
[ruby-core:18616] Re: Ruby 1.9 string performance
> On Fri, 12 Sep 2008 02:16:51 +1000, NARUSE, Yui <naruse@airemix.jp> > wrote: > >> >> If you split your patch into small atomic patches, >> your patch will be merged rapidly. >> Here are 3 other patches for String performance. Please apply "codepoint.pat" last after all the other patches (including the 2 from the previous mail) becuase it overlaps. Details for ChangeLog: casecmp.pat: - Optimize String#casecmp for single-byte character strings case.pat: - Optimize String#upcase, downcase & swapcase for single-byte character strings codepoint.pat: - Added new rb_enc_codepoint_l() function to encoding.c which returns the codepoint, same as rb_enc_codepoint(), plus returns the character length - Modified string.c to use it, avoiding extra calls to determine length of character - Changed "single_byte_optimizable()" to a #define (for compilers which don't do "inline" properly) - All these changes make many methods on multi-byte character strings somewhat faster (maybe 4 -5% on UTF-8 - haven't tested others, but I think should be similar) I also have some comments and questions: 1) Currently "String#rstrip" on multi-byte character sets seems to work from the start to the end of the string. Can't it work backwards, which would be faster? 2) A recent change to tr_trans() (used by String#tr & others) fix the "coderange" issue mentioned earlier, sets the coderange of the result to that of the calling string object but only if the coderange of both the "from" string and the "to" string are the same as the input string. It is my understanding that the coderange of the result is dependent upon only the "calling" string object and the "to" string - not the "from" string. If this is right (please tell me if I am not!), then I think a better implementation is to use something like cr = ENC_CODERANGE_AND(ENC_CODERANGE(str), ENC_CODERANGE(to_str)); because this will preserve the "valid" flag if one of the strings is 7-bit ascii, and the other isn't (eg: UTF-8). 3) rb_str_modify() is actually a slight problem due to the fact that it clears the coderange flags. In many cases you then have to reset them back the way they were to avoid costly string re-scans. But sometimes you actually may want to reset the flags if they indicated "broken" and an "innocuous" change is then made (eg: changing a byte), because the change may make the string valid again. It seems to me that a neat implementation would be to have a function called say "str_modify()" which is almost the same as rb_str_modify()", but if the coderange says "broken", it clears it (forcing a leter rescan). If the coderange is valid it should leave it. Then this new function can be used in most places in string.c to save mucking around with the coderange flags. Cheers, Mike
Attachments (3)
Index: string.c
===================================================================
--- string.c (revision 19374)
+++ string.c (working copy)
@@ -4041,17 +4041,30 @@
rb_str_modify(str);
enc = STR_ENC_GET(str);
s = RSTRING_PTR(str); send = RSTRING_END(str);
- while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
+ if (single_byte_optimizable(str)) {
+ while (s < send) {
+ unsigned int c = *(unsigned char *)s;
- if (rb_enc_islower(c, enc)) {
- /* assuming toupper returns codepoint with same size */
- rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
- modify = 1;
+ if (rb_enc_islower(c, enc)) {
+ *s = rb_enc_toupper(c , enc);
+ modify = 1;
+ }
+ s++;
}
- s += rb_enc_codelen(c, enc);
}
+ else {
+ while (s < send) {
+ unsigned int c = rb_enc_codepoint(s, send, enc);
+ if (rb_enc_islower(c, enc)) {
+ /* assuming toupper returns codepoint with same size */
+ rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
+ modify = 1;
+ }
+ s += rb_enc_codelen(c, enc);
+ }
+ }
+
ENC_CODERANGE_SET(str, cr);
if (modify) return str;
return Qnil;
@@ -4099,17 +4112,30 @@
rb_str_modify(str);
enc = STR_ENC_GET(str);
s = RSTRING_PTR(str); send = RSTRING_END(str);
- while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
+ if (single_byte_optimizable(str)) {
+ while (s < send) {
+ unsigned int c = *(unsigned char *)s;
- if (rb_enc_isupper(c, enc)) {
- /* assuming toupper returns codepoint with same size */
- rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc);
- modify = 1;
+ if (rb_enc_isupper(c, enc)) {
+ *s = rb_enc_tolower(c , enc);
+ modify = 1;
+ }
+ s++;
}
- s += rb_enc_codelen(c, enc);
}
+ else {
+ while (s < send) {
+ unsigned int c = rb_enc_codepoint(s, send, enc);
+ if (rb_enc_isupper(c, enc)) {
+ /* assuming tolower returns codepoint with same size */
+ rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc);
+ modify = 1;
+ }
+ s += rb_enc_codelen(c, enc);
+ }
+ }
+
ENC_CODERANGE_SET(str, cr);
if (modify) return str;
return Qnil;
@@ -4228,20 +4254,37 @@
rb_str_modify(str);
enc = STR_ENC_GET(str);
s = RSTRING_PTR(str); send = RSTRING_END(str);
- while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
+ if (single_byte_optimizable(str)) {
+ while (s < send) {
+ unsigned int c = *(unsigned char *)s;
- if (rb_enc_isupper(c, enc)) {
- /* assuming toupper returns codepoint with same size */
- rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc);
- modify = 1;
+ if (rb_enc_isupper(c, enc)) {
+ *s = rb_enc_tolower(c , enc);
+ modify = 1;
+ }
+ else if (rb_enc_islower(c, enc)) {
+ *s = rb_enc_toupper(c , enc);
+ modify = 1;
+ }
+ s++;
}
- else if (rb_enc_islower(c, enc)) {
- /* assuming toupper returns codepoint with same size */
- rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
- modify = 1;
+ }
+ else {
+ while (s < send) {
+ unsigned int c = rb_enc_codepoint(s, send, enc);
+
+ if (rb_enc_isupper(c, enc)) {
+ /* assuming toupper returns codepoint with same size */
+ rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc);
+ modify = 1;
+ }
+ else if (rb_enc_islower(c, enc)) {
+ /* assuming toupper returns codepoint with same size */
+ rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
+ modify = 1;
+ }
+ s += rb_enc_codelen(c, enc);
}
- s += rb_enc_codelen(c, enc);
}
ENC_CODERANGE_SET(str, cr);
Index: string.c
===================================================================
--- string.c (revision 19374)
+++ string.c (working copy)
@@ -2067,19 +2067,33 @@
p1 = RSTRING_PTR(str1); p1end = RSTRING_END(str1);
p2 = RSTRING_PTR(str2); p2end = RSTRING_END(str2);
- while (p1 < p1end && p2 < p2end) {
- unsigned int c1 = rb_enc_codepoint(p1, p1end, enc);
- unsigned int c2 = rb_enc_codepoint(p2, p2end, enc);
+ if (single_byte_optimizable(str1) && single_byte_optimizable(str2)) {
+ while (p1 < p1end && p2 < p2end) {
+ if (*p1 != *p2) {
+ int c1 = rb_enc_toupper(*(unsigned char *)p1, enc);
+ int c2 = rb_enc_toupper(*(unsigned char *)p2, enc);
+ if (c1 > c2) return INT2FIX(1);
+ if (c1 < c2) return INT2FIX(-1);
+ }
+ p1++;
+ p2++;
+ }
+ }
+ else {
+ while (p1 < p1end && p2 < p2end) {
+ unsigned int c1 = rb_enc_codepoint(p1, p1end, enc);
+ unsigned int c2 = rb_enc_codepoint(p2, p2end, enc);
- if (c1 != c2) {
- c1 = rb_enc_toupper(c1, enc);
- c2 = rb_enc_toupper(c2, enc);
- if (c1 > c2) return INT2FIX(1);
- if (c1 < c2) return INT2FIX(-1);
+ if (c1 != c2) {
+ c1 = rb_enc_toupper(c1, enc);
+ c2 = rb_enc_toupper(c2, enc);
+ if (c1 > c2) return INT2FIX(1);
+ if (c1 < c2) return INT2FIX(-1);
+ }
+ len = rb_enc_codelen(c1, enc);
+ p1 += len;
+ p2 += len;
}
- len = rb_enc_codelen(c1, enc);
- p1 += len;
- p2 += len;
}
if (RSTRING_LEN(str1) == RSTRING_LEN(str2)) return INT2FIX(0);
if (RSTRING_LEN(str1) > RSTRING_LEN(str2)) return INT2FIX(1);
Index: encoding.c
===================================================================
--- encoding.c (revision 19374)
+++ encoding.c (working copy)
@@ -768,6 +768,22 @@
rb_raise(rb_eArgError, "invalid byte sequence in %s", rb_enc_name(enc));
}
+/* As above, but also return character length */
+unsigned int
+rb_enc_codepoint_l(const char *p, const char *e, int *len, rb_encoding *enc)
+{
+ int r;
+ if (e <= p)
+ rb_raise(rb_eArgError, "empty string");
+ r = rb_enc_precise_mbclen(p, e, enc);
+ if (MBCLEN_CHARFOUND_P(r)) {
+ *len = r;
+ return rb_enc_mbc_to_codepoint(p, e, enc);
+ }
+ else
+ rb_raise(rb_eArgError, "invalid byte sequence in %s", rb_enc_name(enc));
+}
+
int
rb_enc_codelen(int c, rb_encoding *enc)
{
Index: include/ruby/encoding.h
===================================================================
--- include/ruby/encoding.h (revision 19374)
+++ include/ruby/encoding.h (working copy)
@@ -121,6 +121,7 @@
/* -> code or raise exception */
unsigned int rb_enc_codepoint(const char *p, const char *e, rb_encoding *enc);
+unsigned int rb_enc_codepoint_l(const char *p, const char *e, int *len, rb_encoding *enc);
#define rb_enc_mbc_to_codepoint(p, e, enc) ONIGENC_MBC_TO_CODE(enc,(UChar*)(p),(UChar*)(e))
/* -> codelen>0 or raise exception */
Index: string.c
===================================================================
--- string.c.old 2008-09-16 13:00:12.000000000 +1000
+++ string.c 2008-09-16 13:05:11.000000000 +1000
@@ -112,23 +112,9 @@
#define STR_ENC_GET(str) rb_enc_from_index(ENCODING_GET(str))
-static inline int
-single_byte_optimizable(VALUE str)
-{
- rb_encoding *enc;
-
- /* Conservative. It may be ENC_CODERANGE_UNKNOWN. */
- if (ENC_CODERANGE(str) == ENC_CODERANGE_7BIT)
- return 1;
-
- enc = STR_ENC_GET(str);
- if (rb_enc_mbmaxlen(enc) == 1)
- return 1;
-
- /* Conservative. Possibly single byte.
- * "\xa1" in Shift_JIS for example. */
- return 0;
-}
+/* Conservative. Possibly single byte.
+ * "\xa1" in Shift_JIS for example. */
+#define single_byte_optimizable(str) (ENC_CODERANGE(str) == ENC_CODERANGE_7BIT || rb_enc_mbmaxlen(STR_ENC_GET(str)) == 1)
VALUE rb_fs;
@@ -2076,7 +2062,7 @@
static VALUE
rb_str_casecmp(VALUE str1, VALUE str2)
{
- long len;
+ int len;
rb_encoding *enc;
char *p1, *p1end, *p2, *p2end;
@@ -2102,7 +2088,7 @@
}
else {
while (p1 < p1end && p2 < p2end) {
- unsigned int c1 = rb_enc_codepoint(p1, p1end, enc);
+ unsigned int c1 = rb_enc_codepoint_l(p1, p1end, &len, enc);
unsigned int c2 = rb_enc_codepoint(p2, p2end, enc);
if (c1 != c2) {
@@ -2111,7 +2097,6 @@
if (c1 > c2) return INT2FIX(1);
if (c1 < c2) return INT2FIX(-1);
}
- len = rb_enc_codelen(c1, enc);
p1 += len;
p2 += len;
}
@@ -3876,8 +3861,7 @@
}
n = MBCLEN_CHARFOUND_LEN(n);
- c = rb_enc_codepoint(p, pend, enc);
- n = rb_enc_codelen(c, enc);
+ c = rb_enc_codepoint_l(p, pend, &n, enc);
p += n;
if (c == '"'|| c == '\\' ||
@@ -4089,14 +4073,15 @@
}
else {
while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
+ int clen;
+ unsigned int c = rb_enc_codepoint_l(s, send, &clen, enc);
if (rb_enc_islower(c, enc)) {
/* assuming toupper returns codepoint with same size */
rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
modify = 1;
}
- s += rb_enc_codelen(c, enc);
+ s += clen;
}
}
@@ -4160,14 +4145,15 @@
}
else {
while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
+ int clen;
+ unsigned int c = rb_enc_codepoint_l(s, send, &clen, enc);
if (rb_enc_isupper(c, enc)) {
/* assuming tolower returns codepoint with same size */
rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc);
modify = 1;
}
- s += rb_enc_codelen(c, enc);
+ s += clen;
}
}
@@ -4220,25 +4206,26 @@
int modify = 0;
unsigned int c;
int cr = ENC_CODERANGE(str);
+ int clen;
rb_str_modify(str);
enc = STR_ENC_GET(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
s = RSTRING_PTR(str); send = RSTRING_END(str);
- c = rb_enc_codepoint(s, send, enc);
+ c = rb_enc_codepoint_l(s, send, &clen, enc);
if (rb_enc_islower(c, enc)) {
rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
modify = 1;
}
- s += rb_enc_codelen(c, enc);
+ s += clen;
while (s < send) {
- c = rb_enc_codepoint(s, send, enc);
+ c = rb_enc_codepoint_l(s, send, &clen, enc);
if (rb_enc_isupper(c, enc)) {
rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc);
modify = 1;
}
- s += rb_enc_codelen(c, enc);
+ s += clen;
}
ENC_CODERANGE_SET(str, cr);
@@ -4306,7 +4293,8 @@
}
else {
while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
+ int clen;
+ unsigned int c = rb_enc_codepoint_l(s, send, &clen, enc);
if (rb_enc_isupper(c, enc)) {
/* assuming toupper returns codepoint with same size */
@@ -4318,7 +4306,7 @@
rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc);
modify = 1;
}
- s += rb_enc_codelen(c, enc);
+ s += clen;
}
}
@@ -4359,19 +4347,21 @@
static unsigned int
trnext(struct tr *t, rb_encoding *enc)
{
+ int len;
+
for (;;) {
if (!t->gen) {
if (t->p == t->pend) return -1;
if (t->p < t->pend - 1 && *t->p == '\\') {
t->p++;
}
- t->now = rb_enc_codepoint(t->p, t->pend, enc);
- t->p += rb_enc_codelen(t->now, enc);
+ t->now = rb_enc_codepoint_l(t->p, t->pend, &len, enc);
+ t->p += len;
if (t->p < t->pend - 1 && *t->p == '-') {
t->p++;
if (t->p < t->pend) {
- unsigned int c = rb_enc_codepoint(t->p, t->pend, enc);
- t->p += rb_enc_codelen(c, enc);
+ unsigned int c = rb_enc_codepoint_l(t->p, t->pend, &len, enc);
+ t->p += len;
if (t->now > c) continue;
t->gen = 1;
t->max = c;
@@ -4490,8 +4480,8 @@
char *buf = ALLOC_N(char, max), *t = buf;
while (s < send) {
- c0 = c = rb_enc_codepoint(s, send, enc);
- tlen = clen = rb_enc_codelen(c, enc);
+ c0 = c = rb_enc_codepoint_l(s, send, &clen, enc);
+ tlen = clen;
s += clen;
if (c < 256) {
@@ -4557,8 +4547,8 @@
char *buf = ALLOC_N(char, max), *t = buf;
while (s < send) {
- c0 = c = rb_enc_codepoint(s, send, enc);
- tlen = clen = rb_enc_codelen(c, enc);
+ c0 = c = rb_enc_codepoint_l(s, send, &clen, enc);
+ tlen = clen;
if (c < 256) {
c = trans[c];
@@ -4764,8 +4754,8 @@
if (!s || RSTRING_LEN(str) == 0) return Qnil;
send = RSTRING_END(str);
while (s < send) {
- unsigned int c = rb_enc_codepoint(s, send, enc);
- int clen = rb_enc_codelen(c, enc);
+ int clen;
+ unsigned int c = rb_enc_codepoint_l(s, send, &clen, enc);
if (tr_find(c, squeez, del, nodel)) {
modify = 1;
@@ -4867,8 +4857,7 @@
s++;
}
else {
- c = rb_enc_codepoint(s, send, enc);
- clen = rb_enc_codelen(c, enc);
+ c = rb_enc_codepoint_l(s, send, &clen, enc);
if (c != save || (argc > 0 && !tr_find(c, squeez, del, nodel))) {
if (t != s) rb_enc_mbcput(c, t, enc);
@@ -5008,8 +4997,7 @@
s++;
}
else {
- c = rb_enc_codepoint(s, send, enc);
- clen = rb_enc_codelen(c, enc);
+ c = rb_enc_codepoint_l(s, send, &clen, enc);
if (tr_find(c, table, del, nodel)) {
i++;
}
@@ -5131,11 +5119,12 @@
char *bptr = ptr;
int skip = 1;
unsigned int c;
+ int clen;
end = beg;
while (ptr < eptr) {
- c = rb_enc_codepoint(ptr, eptr, enc);
- ptr += rb_enc_mbclen(ptr, eptr, enc);
+ c = rb_enc_codepoint_l(ptr, eptr, &clen, enc);
+ ptr += clen;
if (skip) {
if (rb_enc_isspace(c, enc)) {
beg = ptr - bptr;
@@ -5362,13 +5351,12 @@
}
while (p < pend) {
- unsigned int c = rb_enc_codepoint(p, pend, enc);
+ unsigned int c = rb_enc_codepoint_l(p, pend, &n, enc);
again:
- n = rb_enc_codelen(c, enc);
if (rslen == 0 && c == newline) {
p += n;
- if (p < pend && (c = rb_enc_codepoint(p, pend, enc)) != newline) {
+ if (p < pend && (c = rb_enc_codepoint_l(p, pend, &n, enc)) != newline) {
goto again;
}
while (p < pend && rb_enc_codepoint(p, pend, enc) == newline) {
@@ -5715,10 +5703,11 @@
e = t = RSTRING_END(str);
/* remove spaces at head */
while (s < e) {
- unsigned int cc = rb_enc_codepoint(s, e, enc);
+ int clen;
+ unsigned int cc = rb_enc_codepoint_l(s, e, &clen, enc);
if (!rb_enc_isspace(cc, enc)) break;
- s += rb_enc_codelen(cc, enc);
+ s += clen;
}
if (s > RSTRING_PTR(str)) {
@@ -5787,7 +5776,8 @@
while (s < t && rb_enc_isspace(*(t-1), enc)) t--;
} else {
while (s < e) {
- unsigned int cc = rb_enc_codepoint(s, e, enc);
+ int clen;
+ unsigned int cc = rb_enc_codepoint_l(s, e, &clen, enc);
if (!cc || rb_enc_isspace(cc, enc)) {
if (!space_seen) t = s;
@@ -5796,7 +5786,7 @@
else {
space_seen = Qfalse;
}
- s += rb_enc_codelen(cc, enc);
+ s += clen;
}
if (!space_seen) t = s;
}