如何合并散列数组以得到值数组的散列

时间:2021-10-22 11:41:14

This is the opposite of Turning a Hash of Arrays into an Array of Hashes in Ruby.

这与将数组的散列转换为Ruby中的散列的数组是相反的。

Elegantly and/or efficiently turn an array of hashes into a hash where the values are arrays of all values:

优雅和/或有效地将一个散列数组转换为散列,其中的值是所有值的数组:

hs = [
  { a:1, b:2 },
  { a:3, c:4 },
  { b:5, d:6 }
]
collect_values( hs )
#=> { :a=>[1,3], :b=>[2,5], :c=>[4], :d=>[6] }

This terse code almost works, but fails to create an array when there are no duplicates:

这个简洁的代码几乎可以工作,但是当没有重复的时候却不能创建一个数组:

def collect_values( hashes )
  hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }
end
collect_values( hs )
#=> { :a=>[1,3], :b=>[2,5], :c=>4, :d=>6 }

This code works, but can you write a better version?

这段代码可以工作,但是你能写出更好的版本吗?

def collect_values( hashes )
  # Requires Ruby 1.8.7+ for Object#tap
  Hash.new{ |h,k| h[k]=[] }.tap do |result|
    hashes.each{ |h| h.each{ |k,v| result[k]<<v } }
  end
end

Solutions that only work in Ruby 1.9 are acceptable, but should be noted as such.

只在Ruby 1.9中工作的解决方案是可以接受的,但应该注意这一点。


Here are the results of benchmarking the various answers below (and a few more of my own), using three different arrays of hashes:

下面是对下面各种答案(以及我自己的一些答案)进行基准测试的结果,使用三个不同的散列数组:

  • one where each hash has distinct keys, so no merging ever occurs:
    [{:a=>1}, {:b=>2}, {:c=>3}, {:d=>4}, {:e=>5}, {:f=>6}, {:g=>7}, ...]

    一个每个散列有不同的钥匙,所以从来没有合并发生:[{:1 = > },{:b = > 2 },{:c = > 3 },{:d = > 4 },{:e = > 5 },{:f = > 6 },{:7 g = > },…]

  • one where every hash has the same key, so maximum merging occurs:
    [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}, {:a=>5}, {:a=>6}, {:a=>7}, ...]

    在每一个散列有相同的一个关键,所以最大的合并发生:[{:1 = > },{:= > 2 },{:3 = > },{:4 = > },{:5 = > },{:= > 6 },{:7 = > },…]

  • and one that is a mix of unique and shared keys:
    [{:c=>1}, {:d=>1}, {:c=>2}, {:f=>1}, {:c=>1, :d=>1}, {:h=>1}, {:c=>3}, ...]
  • 和一个独特的和共享密钥:[{:c = > 1 },{:d = > 1 },{:c = > 2 },{:f = > 1 },{:c = > 1,:d = > 1 },{:h = > 1 },{:c = > 3 },…]
               user     system      total        real
Phrogz 2a  0.577000   0.000000   0.577000 (  0.576000)
Phrogz 2b  0.624000   0.000000   0.624000 (  0.620000)
Glenn 1    0.640000   0.000000   0.640000 (  0.641000)
Phrogz 1   0.671000   0.000000   0.671000 (  0.668000)
Michael 1  0.702000   0.000000   0.702000 (  0.700000)
Michael 2  0.717000   0.000000   0.717000 (  0.726000)
Glenn 2    0.765000   0.000000   0.765000 (  0.764000)
fl00r      0.827000   0.000000   0.827000 (  0.836000)
sawa       0.874000   0.000000   0.874000 (  0.868000)
Tokland 1  0.873000   0.000000   0.873000 (  0.876000)
Tokland 2  1.077000   0.000000   1.077000 (  1.073000)
Phrogz 3   2.106000   0.093000   2.199000 (  2.209000)

The fastest code is this method that I added:

最快的代码是我添加的方法:

def collect_values(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end

I've accepted "glenn mcdonald's answer" as it was competitive in terms of speed, reasonably terse, but (most importantly) because it pointed out the danger of using a Hash with a self-modifying default proc for convenient construction, as this may introduce bad changes when the user is indexing it later on.

我接受了“glenn mcdonald's answer”,因为它在速度上具有竞争力,而且相当简洁,但(最重要的)是因为它指出了使用带有自修改默认proc的散列进行方便构建的危险,因为这可能会在用户稍后进行索引时引入糟糕的更改。

Finally, here's the benchmark code, in case you want to run your own comparisons:

最后,这里是基准代码,如果您想进行您自己的比较:

require 'prime'   # To generate the third hash
require 'facets'  # For tokland1's map_by
AZSYMBOLS = (:a..:z).to_a
TESTS = {
  '26 Distinct Hashes'   => AZSYMBOLS.zip(1..26).map{|a| Hash[*a] },
  '26 Same-Key Hashes'   => ([:a]*26).zip(1..26).map{|a| Hash[*a] },
  '26 Mixed-Keys Hashes' => (2..27).map do |i|
    factors = i.prime_division.transpose
    Hash[AZSYMBOLS.values_at(*factors.first).zip(factors.last)]
  end
}

def phrogz1(hashes)
  Hash.new{ |h,k| h[k]=[] }.tap do |result|
    hashes.each{ |h| h.each{ |k,v| result[k]<<v } }
  end
end
def phrogz2a(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end
def phrogz2b(hashes)
  hashes.each_with_object({}){ |h,r| h.each{ |k,v| (r[k]||=[]) << v } }
end
def phrogz3(hashes)
  result = hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }
  result.each{ |k,v| result[k] = [v] unless v.is_a? Array }
end
def glenn1(hs)
  hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}
end
def glenn2(hs)
  hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}
end
def fl00r(hs)
  h = Hash.new{|h,k| h[k]=[]}
  hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}
  h
end
def sawa(a)
  a.map(&:to_a).flatten(1).group_by{|k,v| k}.each_value{|v| v.map!{|k,v| v}}
end
def michael1(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.each_with_object(h) do |h, result|
    h.each{ |k, v| result[k] << v }
  end
end
def michael2(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.inject(h) do |result, h|
    h.each{ |k, v| result[k] << v }
    result
  end
end
def tokland1(hs)
  hs.map(&:to_a).flatten(1).map_by{ |k, v| [k, v] }
end
def tokland2(hs)
  Hash[hs.map(&:to_a).flatten(1).group_by(&:first).map{ |k, vs|
    [k, vs.map{|o|o[1]}]
  }]
end

require 'benchmark'
N = 10_000
Benchmark.bm do |x|
  x.report('Phrogz 2a'){ TESTS.each{ |n,h| N.times{ phrogz2a(h) } } }
  x.report('Phrogz 2b'){ TESTS.each{ |n,h| N.times{ phrogz2b(h) } } }
  x.report('Glenn 1  '){ TESTS.each{ |n,h| N.times{ glenn1(h)   } } }
  x.report('Phrogz 1 '){ TESTS.each{ |n,h| N.times{ phrogz1(h)  } } }
  x.report('Michael 1'){ TESTS.each{ |n,h| N.times{ michael1(h) } } }
  x.report('Michael 2'){ TESTS.each{ |n,h| N.times{ michael2(h) } } }
  x.report('Glenn 2  '){ TESTS.each{ |n,h| N.times{ glenn2(h)   } } }
  x.report('fl00r    '){ TESTS.each{ |n,h| N.times{ fl00r(h)    } } }
  x.report('sawa     '){ TESTS.each{ |n,h| N.times{ sawa(h)     } } }
  x.report('Tokland 1'){ TESTS.each{ |n,h| N.times{ tokland1(h) } } }
  x.report('Tokland 2'){ TESTS.each{ |n,h| N.times{ tokland2(h) } } }
  x.report('Phrogz 3 '){ TESTS.each{ |n,h| N.times{ phrogz3(h)  } } }

end

8 个解决方案

#1


19  

Take your pick:

随便你挑,

hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}

hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}

I'm strongly against messing with the defaults for hashes, as the other suggestions do, because then checking for a value modifies the hash, which seems very wrong to me.

我强烈反对像其他建议那样对散列使用默认值,因为检查值会修改散列,这在我看来是非常错误的。

#2


3  

h = Hash.new{|h,k| h[k]=[]}
hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}

#3


1  

How's this?

这是如何?

def collect_values(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.each_with_object(h) do |h, result|
    h.each{ |k, v| result[k] << v }
  end
end

Edit - Also possible with inject, but IMHO not as nice:

编辑——也可以通过注入,但IMHO没有那么好:

def collect_values( hashes )
  h = Hash.new{|h,k| h[k]=[]}
  hashes.inject(h) do |result, h|
    h.each{ |k, v| result[k] << v }
    result
  end
end

#4


1  

Same with some other answers using map(&:to_a).flatten(1). The problem is how to modify the values of the hash. I used the fact that arrays are mutable.

使用map(&:to_a).flatten(1)也一样。问题是如何修改散列的值。我利用了数组是可变的这一事实。

def collect_values a
  a.map(&:to_a).flatten(1).group_by{|k, v| k}.
  each_value{|v| v.map!{|k, v| v}}
end

#5


1  

Facet's Enumerable#map_by comes in handy for these cases. This implementation will be no doubt slower than others, but modular and compact code is always easier to maintain:

Facet的可枚举的#map_by对于这些情况非常有用。这个实现无疑比其他的要慢,但是模块化和紧凑的代码总是更容易维护:

require 'facets'
hs.flat_map(&:to_a).map_by { |k, v| [k, v] }
#=> {:b=>[2, 5], :d=>[6], :c=>[4], :a=>[1, 3]

#6


1  

I thought it might be interesting to compare the winner:

我认为比较获胜者可能会很有趣:

def phrogz2a(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end

with a slight variant:

轻微的变体:

def phrogz2ai(hashes)
  Hash.new {|h,k| h[k]=[]}.tap {|r| hashes.each {|h| h.each {|k,v| r[k] << v}}}
end

because one can often employ either approach (typically to create an empty array or hash).

因为通常可以使用这两种方法(通常创建空数组或散列)。

Using Phrogz's benchmark code, here's how they compare here:

使用Phrogz的基准代码,以下是他们的比较:

            user     system      total        real
Phrogz 2a   0.440000   0.010000   0.450000 (  0.444435)
Phrogz 2ai  0.580000   0.010000   0.590000 (  0.580248)

#7


0  

What about this one?

这一个怎么样?

hs.reduce({}, :merge)

shortest! But performance is pretty bad:

最短!但是表现很差:

       user     system      total        real
 Phrogz 2a  0.240000   0.010000   0.250000 (  0.247337)
 Phrogz 2b  0.280000   0.000000   0.280000 (  0.274985)
 Glenn 1    0.290000   0.000000   0.290000 (  0.290370)
 Phrogz 1   0.310000   0.000000   0.310000 (  0.315548)
 Michael 1  0.360000   0.000000   0.360000 (  0.356760)
 Michael 2  0.360000   0.000000   0.360000 (  0.360119)
 Glenn 2    0.370000   0.000000   0.370000 (  0.369354)
 fl00r      0.390000   0.000000   0.390000 (  0.385883)
 sawa       0.410000   0.000000   0.410000 (  0.408190)
 Tokland 1  0.410000   0.000000   0.410000 (  0.410097)
 Tokland 2  0.490000   0.000000   0.490000 (  0.497325)
 Ich        1.410000   0.000000   1.410000 (  1.413176) # <<-- new
 Phrogz 3   1.760000   0.010000   1.770000 (  1.762979)

#8


-2  

[{'a' => 1}, {'b' => 2}, {'c' => 3}].reduce Hash.new, :merge

#1


19  

Take your pick:

随便你挑,

hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}

hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}

I'm strongly against messing with the defaults for hashes, as the other suggestions do, because then checking for a value modifies the hash, which seems very wrong to me.

我强烈反对像其他建议那样对散列使用默认值,因为检查值会修改散列,这在我看来是非常错误的。

#2


3  

h = Hash.new{|h,k| h[k]=[]}
hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}

#3


1  

How's this?

这是如何?

def collect_values(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.each_with_object(h) do |h, result|
    h.each{ |k, v| result[k] << v }
  end
end

Edit - Also possible with inject, but IMHO not as nice:

编辑——也可以通过注入,但IMHO没有那么好:

def collect_values( hashes )
  h = Hash.new{|h,k| h[k]=[]}
  hashes.inject(h) do |result, h|
    h.each{ |k, v| result[k] << v }
    result
  end
end

#4


1  

Same with some other answers using map(&:to_a).flatten(1). The problem is how to modify the values of the hash. I used the fact that arrays are mutable.

使用map(&:to_a).flatten(1)也一样。问题是如何修改散列的值。我利用了数组是可变的这一事实。

def collect_values a
  a.map(&:to_a).flatten(1).group_by{|k, v| k}.
  each_value{|v| v.map!{|k, v| v}}
end

#5


1  

Facet's Enumerable#map_by comes in handy for these cases. This implementation will be no doubt slower than others, but modular and compact code is always easier to maintain:

Facet的可枚举的#map_by对于这些情况非常有用。这个实现无疑比其他的要慢,但是模块化和紧凑的代码总是更容易维护:

require 'facets'
hs.flat_map(&:to_a).map_by { |k, v| [k, v] }
#=> {:b=>[2, 5], :d=>[6], :c=>[4], :a=>[1, 3]

#6


1  

I thought it might be interesting to compare the winner:

我认为比较获胜者可能会很有趣:

def phrogz2a(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end

with a slight variant:

轻微的变体:

def phrogz2ai(hashes)
  Hash.new {|h,k| h[k]=[]}.tap {|r| hashes.each {|h| h.each {|k,v| r[k] << v}}}
end

because one can often employ either approach (typically to create an empty array or hash).

因为通常可以使用这两种方法(通常创建空数组或散列)。

Using Phrogz's benchmark code, here's how they compare here:

使用Phrogz的基准代码,以下是他们的比较:

            user     system      total        real
Phrogz 2a   0.440000   0.010000   0.450000 (  0.444435)
Phrogz 2ai  0.580000   0.010000   0.590000 (  0.580248)

#7


0  

What about this one?

这一个怎么样?

hs.reduce({}, :merge)

shortest! But performance is pretty bad:

最短!但是表现很差:

       user     system      total        real
 Phrogz 2a  0.240000   0.010000   0.250000 (  0.247337)
 Phrogz 2b  0.280000   0.000000   0.280000 (  0.274985)
 Glenn 1    0.290000   0.000000   0.290000 (  0.290370)
 Phrogz 1   0.310000   0.000000   0.310000 (  0.315548)
 Michael 1  0.360000   0.000000   0.360000 (  0.356760)
 Michael 2  0.360000   0.000000   0.360000 (  0.360119)
 Glenn 2    0.370000   0.000000   0.370000 (  0.369354)
 fl00r      0.390000   0.000000   0.390000 (  0.385883)
 sawa       0.410000   0.000000   0.410000 (  0.408190)
 Tokland 1  0.410000   0.000000   0.410000 (  0.410097)
 Tokland 2  0.490000   0.000000   0.490000 (  0.497325)
 Ich        1.410000   0.000000   1.410000 (  1.413176) # <<-- new
 Phrogz 3   1.760000   0.010000   1.770000 (  1.762979)

#8


-2  

[{'a' => 1}, {'b' => 2}, {'c' => 3}].reduce Hash.new, :merge