From: "mike@... (Mike Carlton)" Date: 2022-02-11T21:08:13+00:00 Subject: [ruby-core:107562] [Ruby master Bug#18582] Hash.group_by not grouping correctly with SortedSets Issue #18582 has been updated by mike@carltons.us (Mike Carlton). Thank you very much Nobu for your quick response. For anyone who stumbles up on this page, I used this quick and dirty monkey patch to add the necessary functionality to RBTree (until RBTree is updated); with this SortedSet works correctly for me in ruby 3.0: ``` require 'rbtree' class RBTree # conditionally define these methods so that if rbtree gains in a future upgrade them we don't override unless RBTree.instance_methods(false).include?(:eql?) class_eval <<-END, __FILE__, __LINE__+1 def eql?(other) # we could use 'self == other' (RBTree already implements ==), but if we do then # we wind up with SortedSet[1].eql?(SortedSet[1.0]) but !(1.eql?(1.0)) and !(Set[1].eql?(Set[1.0])) # we'll take a chance on a 64-bit collision instead self.hash == other.hash end END end unless RBTree.instance_methods(false).include?(:hash) class_eval <<-END, __FILE__, __LINE__+1 # Ruby hash.c implements something like MurmurHash on keys and values # Ruby also starts with a unique seed in each instance (so {a:1}.hash is different in every process) # We'll do something much simpler, but good enough for our purposes def hash result = 0 self.each do |k, v| # result ^= k.hash; result ^= v.hash is not correct: RBTree[a:1,b:2].hash would equal RBTree[a:2,b:1].hash # result ^= [ k, v ] would create a lot of unnecessary allocations and garbage # Ruby internals using gcc 128b integer type where possible and Object.hash returns a 64b integer, # so we'll take advantage of that and just create a 128b Integer hash instead of hashes of Arrays # In the SortedSet usage, the values are always 'true'; we will put this in the upper-half as they'll # cancel and half the time we'll have 64b value (does not really matter, but numbers are easier to read) result ^= (v.hash << 64) ^ k.hash end result end END end end ``` ---------------------------------------- Bug #18582: Hash.group_by not grouping correctly with SortedSets https://bugs.ruby-lang.org/issues/18582#change-96475 * Author: mike@carltons.us (Mike Carlton) * Status: Third Party's Issue * Priority: Normal * ruby -v: ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-darwin20] * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- With Ruby 3.0.3, when using SortedSets as group_by value for Hash, equal SortedSets are not grouped as they should be. This works correctly in Ruby 2.7.1 (when rbtree gem is not present, not tested with rbtree gem) This works correctly using Sets as the group_by value in both 2.7.1 and 3.0.3 This test code: ```ruby require 'set' require 'sorted_set' if RUBY_VERSION > '3' puts RUBY_VERSION # works when keys are Sets s1 = Set['fubar'] s2 = Set['fubar'] warn "expected #{s1} to equal #{s2}" unless s1 == s2 grouped = { 'a' => s1, 'b' => s2 }.group_by { |_, v| v } puts "grouped by Sets: #{grouped}" warn "expected 1 key in hash grouped by Sets, got #{grouped.keys.size}" unless grouped.keys.size == 1 # 3.0.3 fails when keys are SortdSets ss1 = SortedSet['fubar'] ss2 = SortedSet['fubar'] warn "expected #{ss1} to equal #{ss2}" unless ss1 == ss2 grouped = { 'a' => ss1, 'b' => ss2 }.group_by { |_, v| v } puts "grouped by SortedSets: #{grouped}" warn "expected 1 key in hash grouped by SortedSets, got #{grouped.keys.size}" unless grouped.keys.size == 1 ``` prints this under 2.7.1: ``` 2.7.1 grouped by Sets: {#=>[["a", #], ["b", #]]} grouped by SortedSets: {#=>[["a", #], ["b", #]]} ``` but prints this under 3.0.3: ``` 3.0.3 grouped by Sets: {#=>[["a", #], ["b", #]]} grouped by SortedSets: {#=>[["a", #]], #=>[["b", #]]} expected 1 key in hash grouped by SortedSets, got 2 ``` -- https://bugs.ruby-lang.org/ Unsubscribe: