From: yugui@... Date: 2018-01-29T06:21:25+00:00 Subject: [ruby-core:85212] [Ruby trunk Feature#14383] Making prime_division in prime.rb Ruby 3 ready. Issue #14383 has been updated by yugui (Yuki Sonoda). Assignee set to yugui (Yuki Sonoda) Target version set to Next Major Faster is better -- with certain conditions. Considering those conditions, I would like to take the following approach. 1. Remove the dependency from `mathn.rb` to `prime.rb` 2. Move `prime.rb` from `lib/` to a bundled gem 3. Apply some of the patches. This is because of the following conditions I would like the change to satisfy. 1. Simplicity of the dependency graph in the standard library * I don't want to let `mathn` have too much dependency. 2. Keeping the focus of the standard library * We are moving more libraries from `lib/` to bundled gems. The solution must be consistent to this trend. 3. Certain degree of backward compatibility * the algorithm must be deterministic. ---------------------------------------- Feature #14383: Making prime_division in prime.rb Ruby 3 ready. https://bugs.ruby-lang.org/issues/14383#change-69958 * Author: jzakiya (Jabari Zakiya) * Status: Open * Priority: Normal * Assignee: yugui (Yuki Sonoda) * Target version: Next Major ---------------------------------------- I have been running old code in Ruby 2.5.0 (released 2017.12.25) to check for speed and compatibility. I still see the codebase in `prime.rb` hardly has changed at all (except for replacing `Math.sqrt` with `Integer.sqrt`). To achieve the Ruby 3 goal to make it at least three times faster than Ruby 2 there are three general areas where Ruby improvements can occur. * increase the speed of its implementation at the machine level * rewrite its existing codebase in a more efficient|faster manner * use faster algorithms to implement routines and functions I want to suggest how to address the later two ways to improve performance of specifically the `prime_division` method in the `prime.rb` library. I've raised and made suggestions to some of these issues here [ruby-issues forum](https://bugs.ruby-lang.org/issues/12676) and now hope to invigorate additional discussion. Hopefully with the release of 2.5.0, and Ruby 3 conceptually closer to reality, more consideration will be given to coding and algorithmic improvements to increase its performance too. **Mathematical correctness** First I'd like to raise what I consider *math bugs* in `prime_division`, in how it handles `0` and `-1` inputs. ``` > -1.prime_division => [[-1,1]] > 0.prime_division Traceback (most recent call last): 4: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `
' 3: from (irb):85 2: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:30:in `prime_division' 1: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:203:in `prime_division' ZeroDivisionError (ZeroDivisionError) ``` First, `0` is a perfectly respectable integer, and is non-prime, so its output should be `[]`, an empty array to denote it has no prime factors. The existing behavior is solely a matter of `prime_division`'s' implementation, and does not take this mathematical reality into account. The output for `-1` is also mathematically wrong because `1` is also non-prime (and correctly returns `[]`), well then mathematically so should `-1`. Thus, `prime_division` treats `-1` as a new prime number, and factorization, that has no mathematical basis. Thus, for mathematical correctness and consistency `-1` and `0` should both return `[]`, as none have prime factors. ``` > -1.prime_division => [] > 0.prime_division => [] > 1.prime_division => [] ``` There's a very simple one-line fix to `prime_division` to do this: ``` # prime.rb class Prime def prime_division(value, generator = Prime::Generator23.new) -- raise ZeroDivisionError if value == 0 ++ return [] if (value.abs | 1) == 1 ``` **Simple Code and Algorithmic Improvements** As stated above, besides the machine implementation improvements, the other areas of performance improvements will come from coding rewrites and better algorithms. Below is the coding of `prime_division`. This coding has existed at least since Ruby 2.0 (the farthest I've gone back). ``` # prime.rb class Integer # Returns the factorization of +self+. # # See Prime#prime_division for more details. def prime_division(generator = Prime::Generator23.new) Prime.prime_division(self, generator) end end class Prime def prime_division(value, generator = Prime::Generator23.new) raise ZeroDivisionError if value == 0 if value < 0 value = -value pv = [[-1, 1]] else pv = [] end generator.each do |prime| count = 0 while (value1, mod = value.divmod(prime) mod) == 0 value = value1 count += 1 end if count != 0 pv.push [prime, count] end break if value1 <= prime end if value > 1 pv.push [value, 1] end pv end end ``` This can be rewritten in more modern and idiomatic Ruby, to become much shorter and easier to understand. ``` require 'prime.rb' class Integer def prime_division1(generator = Prime::Generator23.new) Prime.prime_division1(self, generator) end end class Prime def prime_division1(value, generator = Prime::Generator23.new) # raise ZeroDivisionError if value == 0 return [] if (value.abs | 1) == 1 pv = value < 0 ? [[-1, 1]] : [] value = value.abs generator.each do |prime| count = 0 while (value1, mod = value.divmod(prime); mod) == 0 value = value1 count += 1 end pv.push [prime, count] unless count == 0 break if prime > value1 end pv.push [value, 1] if value > 1 pv end end ``` By merely rewriting it we get smaller|concise code, that's easier to understand, which is slightly faster. A *triple win!* Just paste the above code into a 2.5.0 terminal session, and run the benchmarks below. ``` def tm; s=Time.now; yield; Time.now-s end n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 27.02951016 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 25.959149721 ``` Again, we get a *triple win* to this old codebase by merely rewriting it. It can be made 3x faster by leveraging the `prime?` method from the `OpenSSL` library to perform a more efficient|faster factoring algorithm, and implementation. ``` require 'prime.rb' require 'openssl' class Integer def prime_division2(generator = Prime::Generator23.new) return [] if (self.abs | 1) == 1 pv = self < 0 ? [-1] : [] value = self.abs prime = generator.next until value.to_bn.prime? or value == 1 while prime (pv << prime; value /= prime; break) if value % prime == 0 prime = generator.next end end pv << value if value > 1 pv.group_by {|prm| prm }.map{|prm, exp| [prm, exp.size] } end end ``` Here we're making much better use of Ruby idioms and libraries (`enumerable` and `openssl`), leading to a much greater performance increase. A bigger *triple win*. Pasting this code into a 2.5.0 terminal session gives the following results. ``` # Hardware: System76 laptop; I7 cpu @ 3.5GHz, 64-bit Linux def tm; s=Time.now; yield; Time.now-s end n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 27.02951016 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 25.959149721 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division2 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 9.39650374 ``` `prime_division2` is much more usable for significantly larger numbers and use cases than `prime_division`. I can even do multiple times better than this, if you review the above cited forum thread. My emphasis here is to show there are a lot of possible *low hanging fruit* performance gains ripe for the picking to achieve Ruby 3 performance goals, if we look (at minimum) for simpler|better code rewrites, and then algorithmic upgrades. So the question is, are the devs willing to upgrade the codebase to provide the demonstrated performance increases shown here for `prime_division`? ---Files-------------------------------- bm.rb (1.94 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: