[#90399] [Ruby trunk Feature#14813] [PATCH] gc.c: make gc_enter+gc_exit pairs dtrace probes, too — ko1@...
Issue #14813 has been updated by ko1 (Koichi Sasada).
3 messages
2018/12/10
[#90417] [Ruby trunk Bug#15398] TestThread#test_signal_at_join fails on FreeBSD — naruse@...
Issue #15398 has been reported by naruse (Yui NARUSE).
4 messages
2018/12/11
[#90423] Re: [Ruby trunk Bug#15398] TestThread#test_signal_at_join fails on FreeBSD
— Eric Wong <normalperson@...>
2018/12/11
naruse@airemix.jp wrote:
[#90519] Spoofing warnings for mail from bugs.ruby-lang.org — Charles Oliver Nutter <headius@...>
I'm getting a spoofing warning for emails sent from bugs.ruby-lang.org when
4 messages
2018/12/13
[#90522] Re: Spoofing warnings for mail from bugs.ruby-lang.org
— Eric Wong <normalperson@...>
2018/12/13
Charles Oliver Nutter <headius@headius.com> wrote:
[#90533] [Ruby trunk Feature#15413] unmarkable C stack (3rd stack) — normalperson@...
Issue #15413 has been reported by normalperson (Eric Wong).
3 messages
2018/12/14
[#90581] [Ruby trunk Bug#15424] Ruby 2.6.0rc1 & 2.6.0rc2 mutex exception — mat999@...
Issue #15424 has been reported by splitice (Mathew Heard).
3 messages
2018/12/17
[#90595] [Ruby trunk Bug#15430] test_fork_while_parent_locked is failing status on Ruby CI — hsbt@...
Issue #15430 has been reported by hsbt (Hiroshi SHIBATA).
3 messages
2018/12/18
[#90614] [Ruby trunk Bug#15430][Assigned] test_fork_while_parent_locked is failing status on Ruby CI — hsbt@...
Issue #15430 has been updated by hsbt (Hiroshi SHIBATA).
4 messages
2018/12/19
[#90630] Re: [Ruby trunk Bug#15430][Assigned] test_fork_while_parent_locked is failing status on Ruby CI
— Eric Wong <normalperson@...>
2018/12/20
> It still exists. https://rubyci.org/logs/rubyci.s3.amazonaws.com/centos7/ruby-trunk/log/20181218T230003Z.fail.html.gz
[#90820] Re: [ruby-cvs:73697] k0kubun:r66593 (trunk): accept_nonblock_spec.rb: skip spurious failure — Eric Wong <normalperson@...>
k0kubun@ruby-lang.org wrote:
3 messages
2018/12/30
[ruby-core:90778] [Ruby trunk Feature#15166] 2.5 times faster implementation than current gcd implmentation
From:
pdahorek@...
Date:
2018-12-28 18:23:31 UTC
List:
ruby-core #90778
Issue #15166 has been updated by ahorek (Pavel Rosick箪).
your micro-benchmarks aren't always fair, because some algorithms don't handle all edge cases.
for jruby I choose a different algorithm that is slightly slower than the fastest gcdwikipedia7fast32 (~15%) but in my opinion more readable.
here's the PR (gcdwikipedia7fast32 + minor changes) https://github.com/ruby/ruby/pull/2060
and some ruby numbers (benchmark https://github.com/ruby/ruby/pull/1596)
all variants tested on AMD FX 8300 8C
**ruby 2.7.0dev (2018-12-28 trunk 66617) [x86_64-linux]**
```
Time#subsec 2.969M (賊 9.6%) i/s - 14.733M in 5.010950s
Time#- 5.716M (賊11.4%) i/s - 28.103M in 5.000934s
Time#round 400.712k (賊11.9%) i/s - 1.992M in 5.046665s
Time#to_f 6.422M (賊10.5%) i/s - 31.613M in 4.999488s
Time#to_r 2.251M (賊10.4%) i/s - 11.124M in 5.002516s
Rational#+ 5.377M (賊10.1%) i/s - 26.577M in 5.001636s
Rational#- 5.542M (賊 9.5%) i/s - 27.419M in 5.001546s
Rational#* 6.341M (賊 9.5%) i/s - 31.390M in 5.002212s
gcd 6.922M (賊 9.0%) i/s - 34.285M in 5.001389s
```
**trunk + new gcd**
```
Time#subsec 3.348M (賊 8.9%) i/s - 16.592M in 4.999620s / 1.13
Time#- 5.840M (賊11.6%) i/s - 28.728M in 5.000946s / 1.02
Time#round 468.770k (賊12.5%) i/s - 2.319M in 5.028050s / 1.17
Time#to_f 6.713M (賊 9.8%) i/s - 33.214M in 4.999639s / 1.05
Time#to_r 3.191M (賊 7.9%) i/s - 15.884M in 5.010305s / 1.42
Rational#+ 5.893M (賊10.6%) i/s - 29.082M in 4.999884s / 1.10
Rational#- 6.183M (賊11.2%) i/s - 30.443M in 4.999746s / 1.12
Rational#* 7.069M (賊10.5%) i/s - 34.922M in 5.001804s / 1.11
gcd 9.742M (賊10.4%) i/s - 48.159M in 5.007085s / 1.40
```
**trunk + new gcd without __builtin_ctz support**
```
Time#subsec 2.699M (賊 8.9%) i/s - 13.385M in 5.002527s / 0.89
Time#- 5.734M (賊10.6%) i/s - 28.224M in 5.002541s / 1.00
Time#round 392.314k (賊13.8%) i/s - 1.926M in 5.012040s / 0.98
Time#to_f 6.725M (賊10.5%) i/s - 33.163M in 4.999346s / 1.04
Time#to_r 2.366M (賊 9.1%) i/s - 11.705M in 5.004491s / 1.05
Rational#+ 5.429M (賊10.1%) i/s - 26.851M in 5.006358s / 1.01
Rational#- 5.544M (賊 9.8%) i/s - 27.430M in 5.002418s / 0.98
Rational#* 6.225M (賊10.7%) i/s - 30.833M in 5.018386s / 0.98
gcd 7.001M (賊 7.1%) i/s - 34.855M in 5.006972s / 1.01
```
alternative implementations
**jruby 9.2.6.0-SNAPSHOT (2.5.3) 2018-12-27 e51a3e4 Java HotSpot(TM) 64-Bit Server VM 11.0.1+13-LTS on 11.0.1+13-LTS +jit [linux-x86_64]**
```
Time#subsec 5.018M (賊 6.3%) i/s - 24.866M in 4.979170s
Time#- 7.868M (賊 5.6%) i/s - 39.066M in 4.985576s
Time#round 3.461M (賊 8.1%) i/s - 17.138M in 4.998527s
Time#to_f 8.198M (賊 5.2%) i/s - 40.775M in 4.990224s
Time#to_r 4.789M (賊 6.9%) i/s - 23.777M in 4.992261s
Rational#+ 5.217M (賊 6.3%) i/s - 25.944M in 4.995694s
Rational#- 5.701M (賊 7.4%) i/s - 28.329M in 4.998743s
Rational#* 6.290M (賊 6.7%) i/s - 31.283M in 4.997365s
gcd 7.376M (賊 7.2%) i/s - 36.625M in 4.995073s
```
**truffleruby 1.0.0-rc10, like ruby 2.4.4, GraalVM CE Native [x86_64-linux]**
```
Time#subsec 3.541M (賊67.8%) i/s - 13.706M in 4.986699s
Time#- 8.279M (賊 9.4%) i/s - 38.671M in 4.984896s
Time#round 311.696k (賊43.3%) i/s - 502.226k in 4.991276s
Time#to_f 16.719M (賊 9.2%) i/s - 75.067M in 4.981367s
Time#to_r 1.386M (賊21.2%) i/s - 5.045M in 4.993055s
Rational#+ 7.332M (賊14.7%) i/s - 28.100M in 4.982371s
Rational#- 7.354M (賊24.3%) i/s - 22.682M in 4.992218s
Rational#* 7.340M (賊19.3%) i/s - 28.534M in 5.003816s
gcd 68.576M (賊 4.7%) i/s - 326.812M in 4.908116s
```
as you can see Time#to_r and Integer#gcd is about 40% faster which is the best case scenario even when in your micro-benchmark it was 300% faster.
using the new algorithm without __builtin_ctz introduced some perf regressions, but they're within margin of error
I don't think this change will have some impact on real application's performance, all of these cases are just micro-benchmarks...
----------------------------------------
Feature #15166: 2.5 times faster implementation than current gcd implmentation
https://bugs.ruby-lang.org/issues/15166#change-75952
* Author: jzakiya (Jabari Zakiya)
* Status: Assigned
* Priority: Normal
* Assignee: watson1978 (Shizuo Fujita)
* Target version:
----------------------------------------
This is to be more explicit (and accurate) than https://bugs.ruby-lang.org/issues/15161
This is my modified gcd benchmarks code, originally presented by Daniel Lemire (see 15161).
https://gist.github.com/jzakiya/44eae4feeda8f6b048e19ff41a0c6566
Ruby's current implementation of Stein's gcd algorithm is only slightly faster than the
code posted on the wikepedia page, and over 2.5 times slower than the fastest implementation
in the benchmarks.
```
[jzakiya@localhost ~]$ ./gcdbenchmarks
gcd between numbers in [1 and 2000]
gcdwikipedia7fast32 : time = 99
gcdwikipedia4fast : time = 121
gcdFranke : time = 126
gcdwikipedia3fast : time = 134
gcdwikipedia2fastswap : time = 136
gcdwikipedia5fast : time = 139
gcdwikipedia7fast : time = 138
gcdwikipedia2fast : time = 136
gcdwikipedia6fastxchg : time = 144
gcdwikipedia2fastxchg : time = 156
gcd_iterative_mod : time = 210
gcd_recursive : time = 215
basicgcd : time = 211
rubygcd : time = 267
gcdwikipedia2 : time = 321
gcd between numbers in [1000000001 and 1000002000]
gcdwikipedia7fast32 : time = 100
gcdwikipedia4fast : time = 121
gcdFranke : time = 126
gcdwikipedia3fast : time = 134
gcdwikipedia2fastswap : time = 136
gcdwikipedia5fast : time = 138
gcdwikipedia7fast : time = 138
gcdwikipedia2fast : time = 136
gcdwikipedia6fastxchg : time = 144
gcdwikipedia2fastxchg : time = 156
gcd_iterative_mod : time = 210
gcd_recursive : time = 215
basicgcd : time = 211
rubygcd : time = 269
gcdwikipedia2 : time = 323
```
This is Ruby's code per: https://github.com/ruby/ruby/blob/3abbaab1a7a97d18f481164c7dc48749b86d7f39/rational.c#L285-L307
which is basically the wikepedia implementation.
```
inline static long
i_gcd(long x, long y)
{
unsigned long u, v, t;
int shift;
if (x < 0)
x = -x;
if (y < 0)
y = -y;
if (x == 0)
return y;
if (y == 0)
return x;
u = (unsigned long)x;
v = (unsigned long)y;
for (shift = 0; ((u | v) & 1) == 0; ++shift) {
u >>= 1;
v >>= 1;
}
while ((u & 1) == 0)
u >>= 1;
do {
while ((v & 1) == 0)
v >>= 1;
if (u > v) {
t = v;
v = u;
u = t;
}
v = v - u;
} while (v != 0);
return (long)(u << shift);
}
```
This is the fastest implementation from the benchmarks. (I originally, wrongly, cited
the implementation in the article, which is 4|5th fastest in benchmarks, but
still almost 2x faster than the Ruby implementation.)
```
// based on wikipedia's article,
// fixed by D. Lemire, K. Willets
unsigned int gcdwikipedia7fast32(unsigned int u, unsigned int v)
{
int shift, uz, vz;
if ( u == 0) return v;
if ( v == 0) return u;
uz = __builtin_ctz(u);
vz = __builtin_ctz(v);
shift = uz > vz ? vz : uz;
u >>= uz;
do {
v >>= vz;
int diff = v;
diff -= u;
if ( diff == 0 ) break;
vz = __builtin_ctz(diff);
if ( v < u ) u = v;
v = abs(diff);
} while( 1 );
return u << shift;
}
```
The key to speeding up all the algorithms is using the ``__builtin_ctz(x)`` directive
to determine the number of trailing binary '0's.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>