From: jzakiya@... Date: 2018-01-27T20:30:21+00:00 Subject: [ruby-core:85160] [Ruby trunk Feature#14383] Making prime_division in prime.rb Ruby 3 ready. Issue #14383 has been updated by jzakiya (Jabari Zakiya). Hi Yusuke. Ah, we agree, `prime.rb` is not conducive for doing heavy-duty math. :-) Please look at and play with my [primes-utils](https://github.com/jzakiya/primes-utils) gem. It has a minimal universal useful set of methods for doing prime math. Again, I'm in the process of rewriting it and adding more methods. Maybe you want to collaborate with me and give Ruby a much better|serious prime math library. Ruby is a serious language and deserves much better in areas of scientific and numerical computation. This need has been recognized by such projects as [SciRuby](http://sciruby.com/), and I would like Ruby 3 to be better in these areas. I installed your `faster_prime` gem and ran it on my laptop, and started looking at the code. I haven't thoroughly studied it yet, but may I make some suggestions to simplify and speed it up. I don't know if you knew, but starting with Ruby 2.5, it now has `Integer.sqrt`. We had a vigorous discussion on this issue [here](https://bugs.ruby-lang.org/issues/13219). So you can replace your code below: ``` module Utils module_function FLOAT_BIGNUM = Float::RADIX ** (Float::MANT_DIG - 1) # Find the largest integer m such that m <= sqrt(n) def integer_square_root(n) if n < FLOAT_BIGNUM Math.sqrt(n).floor else # newton method a, b = n, 1 a, b = a / 2, b * 2 until a <= b a = b + 1 a, b = b, (b + n / b) / 2 until a <= b a end end ``` with `Integer.sqrt`. BTW, here's a fast|accurate pure Ruby implementation of it. ``` class Integer def sqrt # Newton's method version used in Ruby for Integer#sqrt return nil if (n = self) < 0 # or however you want to handle this case return n if n < 2 b = n.bit_length x = 1 << (b-1)/2 | n >> (b/2 + 1) # optimum initial root estimate while (t = n / x) < x; x = ((x + t) >> 1) end x end end ``` Also in same file, you can do this simplification: ``` def mod_sqrt(a, prime) return 0 if a == 0 case when prime == 2 a.odd? ? 1 : 0 => a & 1 => or better => a[0] # lsb of number .... ``` Finally, in `prime_factorization.rb` we can simplify and speedup code too. Don't count the primes factors when you find then, just stick then in an array, then when finished with factorization process you can use `Enumerable#group_by` to format output of them in one step. Saves code size|complexity, and is faster. I also restructured the code like mine to make it simpler. Now you don't have to do so many tests, and you get rid of all those individual `yields`, which complicates and slow things down. You will have to modify the part of your code that uses `prime_factorization`, but that code should be simpler|faster too. ``` require "faster_prime/utils" module FasterPrime module PrimeFactorization module_function # Factorize an integer def prime_factorization(n) return enum_for(:prime_factorization, n) unless block_given? return if n == 0 if n < 0 yield [-1, 1] n = -n end SMALL_PRIMES.each do |prime| if n % prime == 0 c = 0 begin n /= prime c += 1 end while n % prime == 0 yield [prime, c] end if prime * prime > n yield [n, 1] if n > 1 return end return if n == 1 end if PrimalityTest.prime?(n) yield [n, 1] else d = nil until d [PollardRho, MPQS].each do |algo| begin d = algo.try_find_factor(n) rescue Failed else break end end end pe = Hash.new(0) prime_factorization(n / d) {|p, e| pe[p] += e } prime_factorization(d) {|p, e| pe[p] += e } pe.keys.sort.each do |p| yield [p, pe[p]] end end end end ``` Change to below, with appropriate changes for factoring algorithm inside loop. ``` require "faster_prime/utils" module FasterPrime module PrimeFactorization module_function # Factorize an integer def prime_factorization(n) return enum_for(:prime_factorization, n) unless block_given? # 'factors' will hold the number of individual prime factors return [] if n.abs | 1 == 1 # for n = -1, 0, or 1 factors = n < 0 ? [-1] | [] # if you feel compelled for negative nums n = n.abs SMALL_PRIMES.each do |prime| # extract small prime factors, if any (factors << prime; n /= prime) while n % prime == 0 end # at this point n is either a prime, 1, or a composite of non-small primes until PrimalityTest.prime?(n) or n == 1 # exit when n is 1 or prime # if you're in here then 'n' is a composite thats needs factoring # when you find a factor, stick it in 'factors' and reduce 'n' by it # ultimately 'n' will be reduced to a prime or 1 # do whatever needs to be done to make this work right d = nil until d [PollardRho, MPQS].each do |algo| begin d = algo.try_find_factor(n) rescue Failed else break end end end end # at this point 'n' is either a prime or 1 factors << n if n > 1 # stick 'n' in 'factors' if it's a prime # 'factors' now has all the number of individual prime factors # now use Enumerable#group_by to make life simple and easy :-) # the xx.sort is unnecessary if you find the prime factors sequentially factors.group_by(&:itself).sort.map { |prime, exponents| [prime, exponents.size] } end end ``` This is now a generic template for factorization. To upgrade just use better|faster `PrimalityTest` and `factorization` functions. You also see it has separated each distinct algorithmic function into single area of concern, in a conceptually more functional programming style. This is now also able to be possibly implemented in parallel because of the isolation of functional areas of concern. ---------------------------------------- Feature #14383: Making prime_division in prime.rb Ruby 3 ready. https://bugs.ruby-lang.org/issues/14383#change-69906 * Author: jzakiya (Jabari Zakiya) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- I have been running old code in Ruby 2.5.0 (released 2017.12.25) to check for speed and compatibility. I still see the codebase in `prime.rb` hardly has changed at all (except for replacing `Math.sqrt` with `Integer.sqrt`). To achieve the Ruby 3 goal to make it at least three times faster than Ruby 2 there are three general areas where Ruby improvements can occur. * increase the speed of its implementation at the machine level * rewrite its existing codebase in a more efficient|faster manner * use faster algorithms to implement routines and functions I want to suggest how to address the later two ways to improve performance of specifically the `prime_division` method in the `prime.rb` library. I've raised and made suggestions to some of these issues here [ruby-issues forum](https://bugs.ruby-lang.org/issues/12676) and now hope to invigorate additional discussion. Hopefully with the release of 2.5.0, and Ruby 3 conceptually closer to reality, more consideration will be given to coding and algorithmic improvements to increase its performance too. **Mathematical correctness** First I'd like to raise what I consider *math bugs* in `prime_division`, in how it handles `0` and `-1` inputs. ``` > -1.prime_division => [[-1,1]] > 0.prime_division Traceback (most recent call last): 4: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `
' 3: from (irb):85 2: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:30:in `prime_division' 1: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:203:in `prime_division' ZeroDivisionError (ZeroDivisionError) ``` First, `0` is a perfectly respectable integer, and is non-prime, so its output should be `[]`, an empty array to denote it has no prime factors. The existing behavior is solely a matter of `prime_division`'s' implementation, and does not take this mathematical reality into account. The output for `-1` is also mathematically wrong because `1` is also non-prime (and correctly returns `[]`), well then mathematically so should `-1`. Thus, `prime_division` treats `-1` as a new prime number, and factorization, that has no mathematical basis. Thus, for mathematical correctness and consistency `-1` and `0` should both return `[]`, as none have prime factors. ``` > -1.prime_division => [] > 0.prime_division => [] > 1.prime_division => [] ``` There's a very simple one-line fix to `prime_division` to do this: ``` # prime.rb class Prime def prime_division(value, generator = Prime::Generator23.new) -- raise ZeroDivisionError if value == 0 ++ return [] if (value.abs | 1) == 1 ``` **Simple Code and Algorithmic Improvements** As stated above, besides the machine implementation improvements, the other areas of performance improvements will come from coding rewrites and better algorithms. Below is the coding of `prime_division`. This coding has existed at least since Ruby 2.0 (the farthest I've gone back). ``` # prime.rb class Integer # Returns the factorization of +self+. # # See Prime#prime_division for more details. def prime_division(generator = Prime::Generator23.new) Prime.prime_division(self, generator) end end class Prime def prime_division(value, generator = Prime::Generator23.new) raise ZeroDivisionError if value == 0 if value < 0 value = -value pv = [[-1, 1]] else pv = [] end generator.each do |prime| count = 0 while (value1, mod = value.divmod(prime) mod) == 0 value = value1 count += 1 end if count != 0 pv.push [prime, count] end break if value1 <= prime end if value > 1 pv.push [value, 1] end pv end end ``` This can be rewritten in more modern and idiomatic Ruby, to become much shorter and easier to understand. ``` require 'prime.rb' class Integer def prime_division1(generator = Prime::Generator23.new) Prime.prime_division1(self, generator) end end class Prime def prime_division1(value, generator = Prime::Generator23.new) # raise ZeroDivisionError if value == 0 return [] if (value.abs | 1) == 1 pv = value < 0 ? [[-1, 1]] : [] value = value.abs generator.each do |prime| count = 0 while (value1, mod = value.divmod(prime); mod) == 0 value = value1 count += 1 end pv.push [prime, count] unless count == 0 break if prime > value1 end pv.push [value, 1] if value > 1 pv end end ``` By merely rewriting it we get smaller|concise code, that's easier to understand, which is slightly faster. A *triple win!* Just paste the above code into a 2.5.0 terminal session, and run the benchmarks below. ``` def tm; s=Time.now; yield; Time.now-s end n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 27.02951016 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 25.959149721 ``` Again, we get a *triple win* to this old codebase by merely rewriting it. It can be made 3x faster by leveraging the `prime?` method from the `OpenSSL` library to perform a more efficient|faster factoring algorithm, and implementation. ``` require 'prime.rb' require 'openssl' class Integer def prime_division2(generator = Prime::Generator23.new) return [] if (self.abs | 1) == 1 pv = self < 0 ? [-1] : [] value = self.abs prime = generator.next until value.to_bn.prime? or value == 1 while prime (pv << prime; value /= prime; break) if value % prime == 0 prime = generator.next end end pv << value if value > 1 pv.group_by {|prm| prm }.map{|prm, exp| [prm, exp.size] } end end ``` Here we're making much better use of Ruby idioms and libraries (`enumerable` and `openssl`), leading to a much greater performance increase. A bigger *triple win*. Pasting this code into a 2.5.0 terminal session gives the following results. ``` # Hardware: System76 laptop; I7 cpu @ 3.5GHz, 64-bit Linux def tm; s=Time.now; yield; Time.now-s end n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 27.02951016 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 25.959149721 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division2 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 9.39650374 ``` `prime_division2` is much more usable for significantly larger numbers and use cases than `prime_division`. I can even do multiple times better than this, if you review the above cited forum thread. My emphasis here is to show there are a lot of possible *low hanging fruit* performance gains ripe for the picking to achieve Ruby 3 performance goals, if we look (at minimum) for simpler|better code rewrites, and then algorithmic upgrades. So the question is, are the devs willing to upgrade the codebase to provide the demonstrated performance increases shown here for `prime_division`? ---Files-------------------------------- bm.rb (1.94 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: