[#14696] Inconsistency in rescuability of "return" — Charles Oliver Nutter <charles.nutter@...>

Why can you not rescue return, break, etc when they are within

21 messages 2008/01/02

[#14738] Enumerable#zip Needs Love — James Gray <james@...>

The community has been building a Ruby 1.9 compatibility tip list on =20

15 messages 2008/01/03
[#14755] Re: Enumerable#zip Needs Love — Martin Duerst <duerst@...> 2008/01/04

Hello James,

[#14772] Manual Memory Management — Pramukta Kumar <prak@...>

I was thinking it would be nice to be able to free large objects at

36 messages 2008/01/04
[#14788] Re: Manual Memory Management — Marcin Raczkowski <mailing.mr@...> 2008/01/05

I would only like to add that RMgick for example provides free method to

[#14824] Re: Manual Memory Management — MenTaLguY <mental@...> 2008/01/07

On Sat, 5 Jan 2008 15:49:30 +0900, Marcin Raczkowski <mailing.mr@gmail.com> wrote:

[#14825] Re: Manual Memory Management — "Evan Weaver" <evan@...> 2008/01/07

Python supports 'del reference', which decrements the reference

[#14838] Re: Manual Memory Management — Marcin Raczkowski <mailing.mr@...> 2008/01/08

Evan Weaver wrote:

[#14911] Draft of some pages about encoding in Ruby 1.9 — Dave Thomas <dave@...>

Folks:

24 messages 2008/01/10

[#14976] nil encoding as synonym for binary encoding — David Flanagan <david@...>

The following just appeared in the ChangeLog

37 messages 2008/01/11
[#14977] Re: nil encoding as synonym for binary encoding — Yukihiro Matsumoto <matz@...> 2008/01/11

Hi,

[#14978] Re: nil encoding as synonym for binary encoding — Dave Thomas <dave@...> 2008/01/11

[#14979] Re: nil encoding as synonym for binary encoding — David Flanagan <david@...> 2008/01/11

Dave Thomas wrote:

[#14993] Re: nil encoding as synonym for binary encoding — Dave Thomas <dave@...> 2008/01/11

[#14980] Re: nil encoding as synonym for binary encoding — Gary Wright <gwtmp01@...> 2008/01/11

[#14981] Re: nil encoding as synonym for binary encoding — Yukihiro Matsumoto <matz@...> 2008/01/11

Hi,

[#14995] Re: nil encoding as synonym for binary encoding — David Flanagan <david@...> 2008/01/11

Yukihiro Matsumoto writes:

[#15050] how to "borrow" the RDoc::RubyParser and HTMLGenerator — Phlip <phlip2005@...>

Core Rubies:

17 messages 2008/01/13
[#15060] Re: how to "borrow" the RDoc::RubyParser and HTMLGenerator — Eric Hodel <drbrain@...7.net> 2008/01/14

On Jan 13, 2008, at 08:54 AM, Phlip wrote:

[#15062] Re: how to "borrow" the RDoc::RubyParser and HTMLGenerator — Phlip <phlip2005@...> 2008/01/14

Eric Hodel wrote:

[#15073] Re: how to "borrow" the RDoc::RubyParser and HTMLGenerator — Eric Hodel <drbrain@...7.net> 2008/01/14

On Jan 13, 2008, at 20:35 PM, Phlip wrote:

[#15185] Friendlier methods to compare two Time objects — "Jim Cropcho" <jim.cropcho@...>

Hello,

10 messages 2008/01/22

[#15194] Can large scale projects be successful implemented around a dynamic programming language? — Jordi <mumismo@...>

A good article I have found (may have been linked by slashdot, don't know)

8 messages 2008/01/24

[#15248] Symbol#empty? ? — "David A. Black" <dblack@...>

Hi --

24 messages 2008/01/28
[#15250] Re: Symbol#empty? ? — Yukihiro Matsumoto <matz@...> 2008/01/28

Hi,

Re: multibyte strings & bucket-of-bytes efficiency under 1.9.0

From: Martin Duerst <duerst@...>
Date: 2008-01-08 04:48:31 UTC
List: ruby-core #14835
Hello Brent,

Many thanks for your examples. I'm sure others will also have
a look at them.

At 15:21 08/01/07, Brent Roman wrote:
>
>Martin,
>
>The did some analysis of the log parsing application in which
>I observe a 5% slowdown under ruby 1.9.  It comes down
>to regex performance.  In fact, my log reader app spends > 50%
>of its runtime processing regex's.  The strings are all US-ASCII
>encoded.
>
>Per your request, I've distilled the most common regex into
>the attached simple benchmark:
>
>http://www.nabble.com/file/p14659564/regexbench.rb regexbench.rb 
>
>It merely scans a test string repeatedly for an escape sequence
>it does not contain.
>
>Ruby16 takes 9.3 seconds
>Ruby18 takes 9.7 seconds
>Ruby19 takes 12.8 seconds
>
>Ruby19 takes 18.0 seconds if the string encoding is forced to UTF-8
>
>So, in "US-ASCII" regexes are about 25% slower
>
>Are you able to confirm these benchmarks?
>
>Are you surprised by this?
>
>Would you agree that some of this slowdown is the result of the new 
>"encoding aware" regex engine in ruby19?
>Or, is it a "bug" that can be easily fixed?
>
>I included the UTF-8 case for comparison only.  
>It shows a 50% slowdown.

Your numbers for 1.9, both with and without UTF-8, are somewhat higher
than I would expect for simple encoding dispatching, but not exactly
high enough for calling this a bug.

When looking at an earlier example from Wolfgang, I confirmed my
suspicion expressed in my paper that using more, less primitive,
functions for encoding dispatching would improve performance.
I think I got about a 20% or so improvement in one case for
true UTF-8.

The example is as follows:
The function rb_enc_nth in encoding.c is used to find the n'th
character from a particular point in a string. It's used for
cases such as string[i] and so on. What this function currently
does is check for single-byte encodings first, then check for
fixed-width encoding, and in both cases use a simple multiplication.
For all other encodings, it repeatedly uses rb_enc_mbclen to get
the length (in bytes) of the next character, which then
somehow calls the actual primitive for the encoding.
Adding an additional (somewhat less) primitive to find
the n'th character directly per encoding may improve
performance somewhat. The way to implement these functions
per encoding is to implement them only for those encodings
that really matter, and to use a generic implementation
(going back to the lower primitive that's currently used)
for odd, rarely-used encodings.

I explained the general principle of this a few months ago
to Matz. I'm sure he wanted to concentrate on getting out
1.9 on time, so adding such (somewhat less) primitives wasn't
too urgent. It may be that they get picked up, or not,
depending on how much improvement it might be possible to
show. One problem with examples such as the above is that
it's not too difficult to tweak something for highest performance
for very very long strings. But most Ruby strings are very
short, and it's important to make sure that we don't
decrease short string performance when trying to increase
very long string performance.


>I suspect that the folks observing huge slowdowns in string
>performance are using UTF-8 or other multi-byte encodings.

Another aspect is that there should be a difference between
'real' UTF-8 strings and ASCII strings labeled as UTF-8.
There's a flag for strings that indicates whether they are
actually just all plain ASCII. But either that's not set
in your example, or it's not used, or both (I suspect at
least the later, because this flag is a Ruby mechanism,
and I don't know whether it's being used in Oniguruma or
not).

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     


In This Thread

Prev Next