[#33640] [Ruby 1.9-Bug#4136][Open] Enumerable#reject should not inherit the receiver's instance variables — Hiro Asari <redmine@...>

Bug #4136: Enumerable#reject should not inherit the receiver's instance variables

10 messages 2010/12/08

[#33667] [Ruby 1.9-Bug#4149][Open] Documentation submission: syslog standard library — mathew murphy <redmine@...>

Bug #4149: Documentation submission: syslog standard library

11 messages 2010/12/10

[#33683] [feature:trunk] Enumerable#categorize — Tanaka Akira <akr@...>

Hi.

14 messages 2010/12/12
[#33684] Re: [feature:trunk] Enumerable#categorize — "Martin J. Dst" <duerst@...> 2010/12/12

[#33687] Towards a standardized AST for Ruby code — Magnus Holm <judofyr@...>

Hey folks,

23 messages 2010/12/12
[#33688] Re: Towards a standardized AST for Ruby code — Charles Oliver Nutter <headius@...> 2010/12/12

On Sun, Dec 12, 2010 at 9:55 AM, Magnus Holm <judofyr@gmail.com> wrote:

[#33689] Re: Towards a standardized AST for Ruby code — "Haase, Konstantin" <Konstantin.Haase@...> 2010/12/12

On Dec 12, 2010, at 17:46 , Charles Oliver Nutter wrote:

[#33763] [Ruby 1.9-Bug#4168][Open] WeakRef is unsafe to use in Ruby 1.9 — Brian Durand <redmine@...>

Bug #4168: WeakRef is unsafe to use in Ruby 1.9

43 messages 2010/12/17

[#33815] trunk warnflags build issue with curb 0.7.9? — Jon <jon.forums@...>

As this may turn out to be a 3rd party issue rather than a bug, I'd like some feedback.

11 messages 2010/12/22

[#33833] Ruby 1.9.2 is going to be released — "Yuki Sonoda (Yugui)" <yugui@...>

-----BEGIN PGP SIGNED MESSAGE-----

15 messages 2010/12/23

[#33846] [Ruby 1.9-Feature#4197][Open] Improvement of the benchmark library — Benoit Daloze <redmine@...>

Feature #4197: Improvement of the benchmark library

15 messages 2010/12/23

[#33910] [Ruby 1.9-Feature#4211][Open] Converting the Ruby and C API documentation to YARD syntax — Loren Segal <redmine@...>

Feature #4211: Converting the Ruby and C API documentation to YARD syntax

10 messages 2010/12/26

[#33923] [Ruby 1.9-Bug#4214][Open] Fiddle::WINDOWS == false on Windows — Jon Forums <redmine@...>

Bug #4214: Fiddle::WINDOWS == false on Windows

15 messages 2010/12/27

[ruby-core:33963] Re: [feature:trunk] Enumerable#categorize

From: Tanaka Akira <akr@...>
Date: 2010-12-28 18:45:29 UTC
List: ruby-core #33963
2010/12/27 Marc-Andre Lafortune <ruby-core-mailing-list@marc-andre.ca>:
>
> I have an alternate proposition of a modified `categorize` which I
> believe addresses the problems I see with it:
> 1) Complex interface (as was mentioned by others)

I think your 'associate' is not so simple.
Some part is more simple than 'categorize'.
Some part is more complex than 'categorize'.

> 2) By default, `categorize` creates a "grouped hash" (like group_by),
> while there is not (yet) a way to create a normal hash. I would
> estimate that most hash created are not of the form {key => [some
> list]} and I would rather have a nicer way to construct the other
> hashes too. This would make for a nice replacement for most
> "inject({}){...}" and "Hash[enum.map{...}]".

Possible.

There are 2 reasons for that I proposed a method for "grouped hash" at first.
* It doesn't lose information at key conflict.
* I (and matz) don't have a good (enough) method name for "normal hash".

I'm not sure that matz will satisfy the name 'associate'.

> My alternate suggestion is a simple method that uses a block to build
> the key-value pairs and an optional Proc/lambda/symbol to handle key
> conflicts (with the same arguments as the block of `Hash#merge`). I
> would name this simply `associate`, but other names could do well too
> (e.g. `mash` or `graph` or even `to_h`).

'categorize' and 'associate' differs as follows.

* 'associate' creates normal hash.

  This is intentional difference.

* 'associate' doesn't create nested hash.

  'associate' is simpler here.

  I think 'associate' can be extended naturally that the method creates
  nested hash when the block returns an array with 3 or more elements.

  For the example in [ruby-talk:372481],
  Your 'associate' (without above extention) solves only the nest level
  but the 'categorize' solves any nest level.

  >    dest == orig.categorize(:op=>lambda {|x,y| y }) {|e| e }
  >    dest == orig.associate(:merge){|a, b, c| [a, {b=>c}]}

* 'associate' assumes {|v| v } if the block is not given.

  This simplify some usages.
  However this forbids Ruby 1.9 style enumerator creation
  which returns an enumerator when block is not given.
  This means we cannot write enum.associate.with_index {|v, i| ... }.

* 'associate' treates non-array block value.

  This is more complex than 'categorize'.

  I feel it is bit distorted specification.
  Especially "(first)" in "Otherwise the value is the result of the block
  and corresponding key is the (first) yielded item."

  'categorize' can adopt it but I don't want.

* 'associate' doesn't use hash argument.

  This may be good idea.

  'categorize' needs hash argument mainly because
  it must distinguish the merge function needs key or not.
  (proc specified by :update needs key.
  proc specified by :op don't need key.)

  'associate' classify them by symbol or proc.
  It can be applied for 'categorize'.

  However symbol and symbol.to_proc will be different, though.

* 'associate' doesn't have a way to specify the seed.

  This is simpler specification than 'categorize'
  but this makes some usages more complex.

  'associate' can be extended to take a second optional argument for seed.

  In your 'associate' examples for [ruby-talk:347364] and
  [ruby-talk:327908], array and string concatenation is O(n**2).
  (n is (maximum) number of elements in a category.)

  >    p dest == orig.associate(:+){|h, v| [h, [v]]}
  a = [v1]
  a = a + [v2]
  a = a + [v3]
  ...

  >    orig.associate(->(k, a, b){"#{a} #{b}"})
  s = v1
  s = "#{s} #{v2}"
  s = "#{s} #{v3}"
  ...

  To avoid this inefficiency, destructive concatenation method
  can be used:

  >    # or if duping the string is required (??):
  >    orig.associate(->(k, a, b){a << " " << b}){|x, y| [x, y.dup]}

  However the dup is required to not modify the receiver, orig.

  I think seed is a simple way to avoid O(n**2) and receiver modification
  without extra objects, as follows.

  >     orig.categorize(:seed=>nil, :op=>lambda {|x,y| !x ? y.dup : (x <<
  >    " " << y) }) {|e| e }

> It could of course be argued that both `associate` and `categorize`
> should be added. That may very be;

Yes.

Actually I want one more method for counting.
(I want 3 methods: grouped hash, normal hash, count hash)

> I just feel that `associate` should
> be added in priority over `categorize`.

matz felt similar.  [ruby-dev:42643]

But we couldn't find a good name for normal hash creation method.
So the discussion is pending.

>    * [ruby-talk:344723]
>
>    a=[1,2,5,13]
>    b=[1,1,2,2,2,5,13,13,13]
>    # to
>    dest =
>     [[0, 0], [0, 1], [1, 2], [1, 3], [1, 4], [2, 5], [3, 6], [3, 7], [3, 8]]
>
>    # This can be implemented as:
>     h = a.categorize.with_index {|e, i| [e,i] }
>     b.map.with_index {|e, j| h[e] ? h[e].map {|i| [i,j] } : [] }.flatten(1)
>    # or
>     h = a.each_with_index.associate
>     b.map.with_index{|e, i| [h[e], i] }

Your solution depends on 'a' has no duplicated elements.
Since [ruby-talk:344723] asks about INNER JOINING,
I think 'a' may have duplicated elements.

  a=[1,1]
  b=[1,1]
  # to
  dest = [[0, 0], [1, 0], [0, 1], [1, 1]]
  h = a.categorize.with_index {|e, i| [e,i] }
  p dest == b.map.with_index {|e, j| h[e] ? h[e].map {|i| [i,j] } : []
}.flatten(1)
  #=> true
  h = a.each_with_index.associate
  p dest == b.map.with_index{|e, i| [h[e], i] }
  #=> false
-- 
Tanaka Akira

In This Thread