From: plasticchicken@...
Date: 2014-11-27T19:29:34+00:00
Subject: [ruby-core:66534] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies

Issue #10552 has been updated by Brian Hempel.


Thanks for the feedback David. I can see a `map` functionality being useful, but here I will play some arguments against integrating `map`:

1. I was thinking the block could be reserved because in the future it might be nice to change the weighting: some elements might count as 1, but others are less important so each of them only counts as 0.5. However, I can't think of a good use case for that yet.
2.  `any?` `all?` and `none?` return booleans, not collections. All of the other enumerable methods that return a collection return elements from the original enumerable. For example, `my_enum.group_by(&:relation)` has elements from `my_enum` in the hash value arrays. It's a small code smell `my_enum.frequencies(&:relation)` would return a potentially large collection that contains nothing from `my_enum`.
3. `any?` `all?` and `none?` can exit early, so there's a performance improvement to `.any?(&:finished?)` compared to `.map(&:finished?).any?`. There would be little performance improvement here because `frequencies` always has to walk the entire collection.

On the other hand, there is one good argument for integrating `map`:

1. `Enumerable#count` takes a block to specify what to count, and `frequencies` is basically `count`, but on all elements at once.


----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50149

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/