From: jzakiya@... Date: 2018-01-23T20:46:25+00:00 Subject: [ruby-core:85018] [Ruby trunk Feature#14383] Making prime_division in prime.rb Ruby 3 ready. Issue #14383 has been updated by jzakiya (Jabari Zakiya). The major problem with `prime_division` trying to accommodate negative numbers is that, mathematically, [prime factorization](https://en.wikipedia.org/wiki/Integer_factorization) is really only considered over positive integers > 1. I understand the creators intent, to be able to reconstruct negative integers from their prime factorization, but that's not what's done mathematically. `1|0` are not primes or composites so they can't be factored (they have no factors). You can see the [Numberphile](https://www.youtube.com/watch?v=IQofiPqhJ_s) video explanation of this, or if you prefer, the [wikipedia](https://en.wikipedia.org/wiki/Prime_number) one. From a serious mathematical perspective, `prime_division` (and the whole `prime.rb` lib) is inadequate for doing real high-level math. This is why I created the [primes-utils](https://rubygems.org/gems/primes-utils) gem, so I could do fast math involving `primes`. I also wrote the [Primes-Utils Handbook](https://www.scribd.com/document/266461408/Primes-Utils-Handbook), to provide and explain all the gem's source code. The [Coureutils](https://www.gnu.org/software/coreutils/coreutils.html) library, a part of all `[Li|U]nix` systems, provides a world class factorization function [factor](https://github.com/coreutils/coreutils/blob/master/src/factor.c). You're not going to create a Ruby version that will come close to it. I use `factor` for my default factoring function. Here's an implementation that mimics `prime_division`. ``` class Integer def factors(p=0) # p is unused variable for method consistency return [] if self | 1 == 1 factors = self < 0 ? [-1] : [] factors += `factor #{self.abs}`.split(' ')[1..-1].map(&:to_i) factors.group_by {|prm| prm }.map {|prm, exp| [prm, exp.size] } end end ``` And here's the performance difference against `prime_division2`, the fastest version of the previous functions. ``` n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division2 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 9.39650374 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.factors } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 0.007200317 ``` If it were up to me, I would deprecate the whole `prime.rb` library and use the capabilities in `primes-utils`, to give Ruby a much more useful|fast prime math library. In fact, I'm doing a serious rewrite of it for version 3.0, using faster math, with faster implementations. When Ruby ever gets true parallel capabilities it'll be capable of making use of it. But again in general, there are articles|videos showing how to speed up Ruby code. There are projects like [Fast Ruby ](https://github.com/JuanitoFatas/fast-ruby), specifically devoted to identifying and categorizing specific code constructs for speed. These resources can be used for evaluating the core codebase to help rewrite it using the fastest coding constructs. Ruby is 20+ years old now, and I would imagine some (a lot of?) code has probably never been reviewed|considered for rewriting for performance gains. A lot of similarly aged languages have gone through this process (Python, Perl, PHP) and now it's Ruby's turn. So while it's natural to talk and focus on the future machine implementation of Ruby 3, I think you can be getting current language speedups by making the ongoing codebase as efficiently and performantly written now, which will translate to an even faster Ruby 3. This is also a way to get more than just the C guru devs involved in creating Ruby 3. I wouldn't mind evaluating existing libraries for simplicity|speed rewrites if I knew my work would be truly considered. This would provide casual users an opportunity to learn more of the internal workings of Ruby, while contributing to its development, which can only be to Ruby's benefit. How about a `Library of the Week|Month` or `GoSC` rewrite project, or a `Make Ruby Faster Now` project! Lots of ways to make this effort fun and interesting. ---------------------------------------- Feature #14383: Making prime_division in prime.rb Ruby 3 ready. https://bugs.ruby-lang.org/issues/14383#change-69725 * Author: jzakiya (Jabari Zakiya) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- I have been running old code in Ruby 2.5.0 (released 2017.12.25) to check for speed and compatibility. I still see the codebase in `prime.rb` hardly has changed at all (except for replacing `Math.sqrt` with `Integer.sqrt`). To achieve the Ruby 3 goal to make it at least three times faster than Ruby 2 there are three general areas where Ruby improvements can occur. * increase the speed of its implementation at the machine level * rewrite its existing codebase in a more efficient|faster manner * use faster algorithms to implement routines and functions I want to suggest how to address the later two ways to improve performance of specifically the `prime_division` method in the `prime.rb` library. I've raised and made suggestions to some of these issues here [ruby-issues forum](https://bugs.ruby-lang.org/issues/12676) and now hope to invigorate additional discussion. Hopefully with the release of 2.5.0, and Ruby 3 conceptually closer to reality, more consideration will be given to coding and algorithmic improvements to increase its performance too. **Mathematical correctness** First I'd like to raise what I consider *math bugs* in `prime_division`, in how it handles `0` and `-1` inputs. ``` > -1.prime_division => [[-1,1]] > 0.prime_division Traceback (most recent call last): 4: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `
' 3: from (irb):85 2: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:30:in `prime_division' 1: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:203:in `prime_division' ZeroDivisionError (ZeroDivisionError) ``` First, `0` is a perfectly respectable integer, and is non-prime, so its output should be `[]`, an empty array to denote it has no prime factors. The existing behavior is solely a matter of `prime_division`'s' implementation, and does not take this mathematical reality into account. The output for `-1` is also mathematically wrong because `1` is also non-prime (and correctly returns `[]`), well then mathematically so should `-1`. Thus, `prime_division` treats `-1` as a new prime number, and factorization, that has no mathematical basis. Thus, for mathematical correctness and consistency `-1` and `0` should both return `[]`, as none have prime factors. ``` > -1.prime_division => [] > 0.prime_division => [] > 1.prime_division => [] ``` There's a very simple one-line fix to `prime_division` to do this: ``` # prime.rb class Prime def prime_division(value, generator = Prime::Generator23.new) -- raise ZeroDivisionError if value == 0 ++ return [] if (value.abs | 1) == 1 ``` **Simple Code and Algorithmic Improvements** As stated above, besides the machine implementation improvements, the other areas of performance improvements will come from coding rewrites and better algorithms. Below is the coding of `prime_division`. This coding has existed at least since Ruby 2.0 (the farthest I've gone back). ``` # prime.rb class Integer # Returns the factorization of +self+. # # See Prime#prime_division for more details. def prime_division(generator = Prime::Generator23.new) Prime.prime_division(self, generator) end end class Prime def prime_division(value, generator = Prime::Generator23.new) raise ZeroDivisionError if value == 0 if value < 0 value = -value pv = [[-1, 1]] else pv = [] end generator.each do |prime| count = 0 while (value1, mod = value.divmod(prime) mod) == 0 value = value1 count += 1 end if count != 0 pv.push [prime, count] end break if value1 <= prime end if value > 1 pv.push [value, 1] end pv end end ``` This can be rewritten in more modern and idiomatic Ruby, to become much shorter and easier to understand. ``` require 'prime.rb' class Integer def prime_division1(generator = Prime::Generator23.new) Prime.prime_division1(self, generator) end end class Prime def prime_division1(value, generator = Prime::Generator23.new) # raise ZeroDivisionError if value == 0 return [] if (value.abs | 1) == 1 pv = value < 0 ? [[-1, 1]] : [] value = value.abs generator.each do |prime| count = 0 while (value1, mod = value.divmod(prime); mod) == 0 value = value1 count += 1 end pv.push [prime, count] unless count == 0 break if prime > value1 end pv.push [value, 1] if value > 1 pv end end ``` By merely rewriting it we get smaller|concise code, that's easier to understand, which is slightly faster. A *triple win!* Just paste the above code into a 2.5.0 terminal session, and run the benchmarks below. ``` def tm; s=Time.now; yield; Time.now-s end n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 27.02951016 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 25.959149721 ``` Again, we get a *triple win* to this old codebase by merely rewriting it. It can be made 3x faster by leveraging the `prime?` method from the `OpenSSL` library to perform a more efficient|faster factoring algorithm, and implementation. ``` require 'prime.rb' require 'openssl' class Integer def prime_division2(generator = Prime::Generator23.new) return [] if (self.abs | 1) == 1 pv = self < 0 ? [-1] : [] value = self.abs prime = generator.next until value.to_bn.prime? or value == 1 while prime (pv << prime; value /= prime; break) if value % prime == 0 prime = generator.next end end pv << value if value > 1 pv.group_by {|prm| prm }.map{|prm, exp| [prm, exp.size] } end end ``` Here we're making much better use of Ruby idioms and libraries (`enumerable` and `openssl`), leading to a much greater performance increase. A bigger *triple win*. Pasting this code into a 2.5.0 terminal session gives the following results. ``` # Hardware: System76 laptop; I7 cpu @ 3.5GHz, 64-bit Linux def tm; s=Time.now; yield; Time.now-s end n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 27.02951016 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 25.959149721 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division2 } [[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]] => 9.39650374 ``` `prime_division2` is much more usable for significantly larger numbers and use cases than `prime_division`. I can even do multiple times better than this, if you review the above cited forum thread. My emphasis here is to show there are a lot of possible *low hanging fruit* performance gains ripe for the picking to achieve Ruby 3 performance goals, if we look (at minimum) for simpler|better code rewrites, and then algorithmic upgrades. So the question is, are the devs willing to upgrade the codebase to provide the demonstrated performance increases shown here for `prime_division`? -- https://bugs.ruby-lang.org/ Unsubscribe: