From: Marc-Andre Lafortune Date: 2009-05-09T11:47:53+09:00 Subject: [ruby-core:23402] [Bug #1448] [patch] Proper handling of recursive arrays Bug #1448: [patch] Proper handling of recursive arrays http://redmine.ruby-lang.org/issues/show/1448 Author: Marc-Andre Lafortune Status: Open, Priority: Normal Category: core, Target version: 1.9.2 ruby -v: ruby 1.9.2dev (2009-05-09 trunk 23371) [i386-darwin9.6.0] Dealing with recursive arrays & hashes can be tricky. The current handling of recursive arrays is much improved over that of Ruby 1.8.6. Array comparison still has some bugs though. For instance: x = []; x << x y = [[x]] x == y # ==> true y == x # ==> false, should be true! Morevoer, recursive arrays that are built the same way are not recognized as equal: z = []; z << z x == z # ==> false, should be true! Needless to say, arrays that have the same elements (e.g. a single element, containing a single element, ...) but built differently way are not recognized as equal: stone = []; stepping = [stone]; stone << stepping x == stepping # ==> false, would be nice to be true! The attached patch fixes all of these problems :-) * How: The function rb_exec_recursive handles the recursivity by pushing and poping the elements it encounters for a given method (for example eql?). For such comparisons, instead of keeping track of the elements it encounters, I modified it so that it keeps track of both the elements being compared. A recursion is detected only when a matching pair is found. This takes care of the first problem. For the next two, we only need to observe that if we have a recursion on the pair (x,y) when comparing x and y, then it is because they are not different! Changing the return value for recursive cases from nil (not comparable) / false (different) to Qundef (unknown yet) makes comparison of complex recursive "trees" work beautifully. I've added some cute samples in rubyspecs (core/array/shared/equal.rb) * Implementation details: Previous recursive_push/pop/check maintained a hash of encountered object ids, setting hash[obj] = true. I modified them so that in "paired" cases, it sets hash[obj] = paired_obj. If a pair (obj, different_paired_obj) is encountered later on, I set hash[obj] to {paired_obj => true, different_paired_obj => true}. This way, there is basically no runtime cost to this technique, except in the complex recursive cases. Only for these complex cases is there a small additional cost for the hash creation/destruction. * Last problem: There is one more problem that my patch doesn't cover (lack of mri-fu): hashes for recursive structures are incorrect. As per the official doc, "a.eql? b" should imply "a.hash == b.hash". On the other hand, we have (before or after my patch): a = [x] x.eql? a # ==> true a.eql? x # ==> true x.hash == a.hash # ==> false, should have same hash The solution is that when calculating the hash for an array, if a recursion is detected, then the hash should return a fixed value (say 0 or -length) _for the original_ array. Currently, 0 is returned but at the level that the recursion is detected. In Ruby pseudo-code, it would look like: static VALUE recursive_hash(VALUE ary, VALUE dummy, int recur) { long i, h; VALUE n; if (recur) { + raise HashingRecursionDetected - return LONG2FIX(0); } h = rb_hash_start(RARRAY_LEN(ary)); for (i=0; i