Ruby:如何在数组中找到并返回一个重复的值?

arr is array of strings, e.g.: ["hello", "world", "stack", "overflow", "hello", "again"].

arr是字符串数组，例如:["hello"、"world"、"stack"、"overflow"、"hello"、"again"等。

What would be easy and elegant way to check if arr has duplicates, and if yes, return one of them (no matter which).

用什么简单而优雅的方式检查arr是否有副本，如果有，返回其中一个(无论哪个)。

Examples:

例子:

["A", "B", "C", "B", "A"]    # => "A" or "B"
["A", "B", "C"]              # => nil

18 个解决方案

#1

204

a = ["A", "B", "C", "B", "A"]
a.detect{ |e| a.count(e) > 1 }

#2

184

You can do this in a few ways, with the first option being the fastest:

你可以用几种方法来做，第一种选择是最快的:

ary = ["A", "B", "C", "B", "A"]

ary.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first)

ary.sort.chunk{ |e| e }.select { |e, chunk| chunk.size > 1 }.map(&:first)

And a O(N^2) option (i.e. less efficient):

和一个O(N ^ 2)选项(即低效率):

ary.select{ |e| ary.count(e) > 1 }.uniq

#3

Simply find the first instance where the index of the object (counting from the left) does not equal the index of the object (counting from the right).

只需找到第一个实例，其中对象的索引(从左边计数)不等于对象的索引(从右边计数)。

arr.detect {|e| arr.rindex(e) != arr.index(e) }

If there are no duplicates, the return value will be nil.

如果没有重复，返回值为nil。

I believe this is the fastest solution posted in the thread so far, as well, since it doesn't rely on the creation of additional objects, and #index and #rindex are implemented in C. The big-O runtime is N^2 and thus slower than Sergio's, but the wall time could be much faster due to the the fact that the "slow" parts run in C.

我认为这是最快的解决方案张贴线程到目前为止,,因为它不依赖于其他对象的创建,和# # rindex指数和在大0 C运行时实现N ^ 2,因此比塞吉奥的慢,但是墙上时间可以快得多,因为“慢”的部分运行在C。

#4

detect only finds one duplicate. find_all will find them all:

检测只发现一个副本。find_all会找到它们:

a = ["A", "B", "C", "B", "A"]
a.find_all { |e| a.count(e) > 1 }

#5

Here are two more ways of finding a duplicate.

这里还有另外两种寻找副本的方法。

Use a set

使用一组

require 'set'

def find_a_dup_using_set(arr)
  s = Set.new
  arr.find { |e| !s.add?(e) }
end

find_a_dup_using_set arr
  #=> "hello"

Use select in place of find to return an array of all duplicates.

使用select代替find返回所有重复的数组。

Use Array#difference

使用数组#差异

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

def find_a_dup_using_difference(arr)
  arr.difference(arr.uniq).first
end

find_a_dup_using_difference arr
  #=> "hello"

Drop .first to return an array of all duplicates.

删除。首先返回所有重复的数组。

Both methods return nil if there are no duplicates.

如果没有重复，这两个方法都返回nil。

I proposed that Array#difference be added to the Ruby core. More information is in my answer here.

我建议将数组#差异添加到Ruby核心中。更多信息在我的回答中。

Benchmark

基准

Let's compare suggested methods. First, we need an array for testing:

让我们来比较一下建议的方法。首先，我们需要一个数组进行测试:

CAPS = ('AAA'..'ZZZ').to_a.first(10_000)
def test_array(nelements, ndups)
  arr = CAPS[0, nelements-ndups]
  arr = arr.concat(arr[0,ndups]).shuffle
end

and a method to run the benchmarks for different test arrays:

以及一种为不同测试数组运行基准的方法:

require 'fruity'

def benchmark(nelements, ndups)
  arr = test_array nelements, ndups
  puts "\n#{ndups} duplicates\n"    
  compare(
    Naveed:    -> {arr.detect{|e| arr.count(e) > 1}},
    Sergio:    -> {(arr.inject(Hash.new(0)) {|h,e| h[e] += 1; h}.find {|k,v| v > 1} ||
                     [nil]).first },
    Ryan:      -> {(arr.group_by{|e| e}.find {|k,v| v.size > 1} ||
                     [nil]).first},
    Chris:     -> {arr.detect {|e| arr.rindex(e) != arr.index(e)} },
    Cary_set:  -> {find_a_dup_using_set(arr)},
    Cary_diff: -> {find_a_dup_using_set(arr)}
  )
end

I did not include @JjP's answer because only one duplicate is to be returned, and when his/her answer is modified to do that it is the same as @Naveed's earlier answer. Nor did I include @Marin's answer, which, while posted before @Naveed's answer, returned all duplicates rather than just one (a minor point but there's no point evaluating both, as they are identical when return just one duplicate).

我没有包含@JjP的答案，因为只返回一个副本，当他/她的答案被修改为这样时，它与@Naveed之前的答案相同。我也没有包括@Marin的答案，虽然在@Naveed的答案之前发布，但返回的是所有的副本而不是一个(这是一个小点，但没有必要对两者进行评估，因为当返回的是一个副本时，它们是相同的)。

I also modified other answers that returned all duplicates to return just the first one found, but that should have essentially no effect on performance, as they computed all duplicates before selecting one.

我还修改了返回所有副本的其他答案，只返回发现的第一个副本，但这对性能基本上没有影响，因为在选择一个副本之前，他们计算了所有副本。

First suppose the array contains 100 elements:

首先假设数组包含100个元素:

benchmark(100, 0)
0 duplicates
Running each test 64 times. Test will take about 2 seconds.
Cary_set is similar to Cary_diff
Cary_diff is similar to Ryan
Ryan is similar to Sergio
Sergio is faster than Chris by 4x ± 1.0
Chris is faster than Naveed by 2x ± 1.0

benchmark(100, 1)
1 duplicates
Running each test 128 times. Test will take about 2 seconds.
Cary_set is similar to Cary_diff
Cary_diff is faster than Ryan by 2x ± 1.0
Ryan is similar to Sergio
Sergio is faster than Chris by 2x ± 1.0
Chris is faster than Naveed by 2x ± 1.0

benchmark(100, 10)
10 duplicates
Running each test 1024 times. Test will take about 3 seconds.
Chris is faster than Naveed by 2x ± 1.0
Naveed is faster than Cary_diff by 2x ± 1.0 (results differ: AAC vs AAF)
Cary_diff is similar to Cary_set
Cary_set is faster than Sergio by 3x ± 1.0 (results differ: AAF vs AAC)
Sergio is similar to Ryan

Now consider an array with 10,000 elements:

现在考虑一个包含10,000个元素的数组:

benchmark(10000, 0)
0 duplicates
Running each test once. Test will take about 4 minutes.
Ryan is similar to Sergio
Sergio is similar to Cary_set
Cary_set is similar to Cary_diff
Cary_diff is faster than Chris by 400x ± 100.0
Chris is faster than Naveed by 3x ± 0.1

benchmark(10000, 1)
1 duplicates
Running each test once. Test will take about 1 second.
Cary_set is similar to Cary_diff
Cary_diff is similar to Sergio
Sergio is similar to Ryan
Ryan is faster than Chris by 2x ± 1.0
Chris is faster than Naveed by 2x ± 1.0

benchmark(10000, 10)
10 duplicates
Running each test once. Test will take about 11 seconds.
Cary_set is similar to Cary_diff
Cary_diff is faster than Sergio by 3x ± 1.0 (results differ: AAE vs AAA)
Sergio is similar to Ryan
Ryan is faster than Chris by 20x ± 10.0
Chris is faster than Naveed by 3x ± 1.0

benchmark(10000, 100)
100 duplicates
Cary_set is similar to Cary_diff
Cary_diff is faster than Sergio by 11x ± 10.0 (results differ: ADG vs ACL)
Sergio is similar to Ryan
Ryan is similar to Chris
Chris is faster than Naveed by 3x ± 1.0

Note that find_a_dup_using_difference(arr) would be much more efficient if Array#difference were implemented in C, which would be the case if it were added to the Ruby core.

注意，如果数组#差异在C中实现，那么find_a_dup_using_difference(arr)将更加有效，如果将数组#difference添加到Ruby核心中，则会出现这种情况。

#6

Ruby Array objects have a great method, select.

Ruby数组对象有一个很棒的方法，select。

select {|item| block } → new_ary
select → an_enumerator

The first form is what interests you here. It allows you to select objects which pass a test.

第一种形式是你在这里感兴趣的。它允许您选择通过测试的对象。

Ruby Array objects have another method, count.

Ruby数组对象还有另一个方法count。

count → int
count(obj) → int
count { |item| block } → int

In this case, you are interested in duplicates (objects which appear more than once in the array). The appropriate test is a.count(obj) > 1.

在本例中，您对重复的对象感兴趣(在数组中出现不止一次的对象)。合适的测试是a.o count(obj) > 1。

If a = ["A", "B", "C", "B", "A"], then

如果a = [" a "， "B"， "C"， "B"， " a "]，则

a.select{|item| a.count(item) > 1}.uniq
=> ["A", "B"]

You state that you only want one object. So pick one.

您声明您只想要一个对象。所以选择一个。

#7

I know this thread is about Ruby specifically, but I landed here looking for how to do this within the context of Ruby on Rails with ActiveRecord and thought I would share my solution too.

我知道这个线程是专门针对Ruby的，但是我在这里找到了如何在使用ActiveRecord的Ruby on Rails上下文中实现这一点的方法，并认为我也会分享我的解决方案。

class ActiveRecordClass < ActiveRecord::Base
  #has two columns, a primary key (id) and an email_address (string)
end

ActiveRecordClass.group(:email_address).having("count(*) > 1").count.keys

The above returns an array of all email addresses that are duplicated in this example's database table (which in Rails would be "active_record_classes").

上面的函数返回在本例的数据库表中复制的所有电子邮件地址的数组(在Rails中是“active_record_classes”)。

#8

find_all() returns an array containing all elements of enum for which block is not false.

find_all()返回一个数组，该数组包含枚举的所有元素，其中块不是false。

To get duplicate elements

把重复的元素

>> arr = ["A", "B", "C", "B", "A"]
>> arr.find_all { |x| arr.count(x) > 1 }

=> ["A", "B", "B", "A"]

Or duplicate uniq elements

或复制uniq元素

>> arr.find_all { |x| arr.count(x) > 1 }.uniq
=> ["A", "B"]

#9

Alas most of the answers are O(n^2).

唉大部分答案是O(n ^ 2)。

Here is an O(n) solution,

这是O(n)解，

a = %w{the quick brown fox jumps over the lazy dog}
h = Hash.new(0)
a.find { |each| (h[each] += 1) == 2 } # => 'the"

What is the complexity of this?

它的复杂性是什么?

Runs in O(n) and breaks on first match
在第一场比赛中运行并中断
Uses O(n) memory, but only the minimal amount
使用O(n)内存，但只有最小的量。

Now, depending on how frequent duplicates are in your array these runtimes might actually become even better. For example if the array of size O(n) has been sampled from a population of k << n different elements only the complexity for both runtime and space becomes O(k), however it is more likely that the original poster is validating input and wants to make sure there are no duplicates. In that case both runtime and memory complexity O(n) since we expect the elements to have no repetitions for the majority of inputs.

现在，根据数组中重复的频率，这些运行时可能会变得更好。例如,如果数组大小为O(n)已经从人口抽样k < < n不同元素只有运行时的复杂性和空间变得O(k),不过更有可能的是,楼主验证输入和想要确保没有重复。在这种情况下，运行时和内存复杂度都为O(n)，因为我们期望元素对于大多数输入没有重复。

#10

Something like this will work

像这样的东西会起作用。

arr = ["A", "B", "C", "B", "A"]
arr.inject(Hash.new(0)) { |h,e| h[e] += 1; h }.
    select { |k,v| v > 1 }.
    collect { |x| x.first }

That is, put all values to a hash where key is the element of array and value is number of occurences. Then select all elements which occur more than once. Easy.

也就是说，将所有值放到一个散列中，其中键是数组的元素，值是出现的次数。然后选择发生多次的所有元素。一件容易的事。

#11

a = ["A", "B", "C", "B", "A"]
a.each_with_object(Hash.new(0)) {|i,hash| hash[i] += 1}.select{|_, count| count > 1}.keys

This is a O(n) procedure.

这是一个O(n)过程。

Alternatively you can do either of the following lines. Also O(n) but only one iteration

或者，您也可以做以下任何一行。也是O(n)但只有一次迭代

a.each_with_object(Hash.new(0).merge dup: []){|x,h| h[:dup] << x if (h[x] += 1) == 2}[:dup]

a.inject(Hash.new(0).merge dup: []){|h,x| h[:dup] << x if (h[x] += 1) == 2;h}[:dup]

#12

If you are comparing two different arrays (instead of one against itself) a very fast way is to use the intersect operator & provided by Ruby's Array class.

如果您正在比较两个不同的数组(而不是一个数组本身)，一个非常快速的方法是使用intersect运算符&由Ruby的数组类提供。

# Given
a = ['a', 'b', 'c', 'd']
b = ['e', 'f', 'c', 'd']

# Then this...
a & b # => ['c', 'd']

#13

Here is my take on it on a big set of data - such as a legacy dBase table to find duplicate parts

以下是我对一组大数据的看法——例如查找重复部分的遗留dBase表

# Assuming ps is an array of 20000 part numbers & we want to find duplicates
# actually had to it recently.
# having a result hash with part number and number of times part is 
# duplicated is much more convenient in the real world application
# Takes about 6  seconds to run on my data set
# - not too bad for an export script handling 20000 parts

h = {};

# or for readability

h = {} # result hash
ps.select{ |e| 
  ct = ps.count(e) 
  h[e] = ct if ct > 1
}; nil # so that the huge result of select doesn't print in the console

#14

r = [1, 2, 3, 5, 1, 2, 3, 1, 2, 1]

r.group_by(&:itself).map { |k, v| v.size > 1 ? [k] + [v.size] : nil }.compact.sort_by(&:last).map(&:first)

#15

each_with_object is your friend!

each_with_object是你的朋友!

input = [:bla,:blubb,:bleh,:bla,:bleh,:bla,:blubb,:brrr]

# to get the counts of the elements in the array:
> input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1}
=> {:bla=>3, :blubb=>2, :bleh=>2, :brrr=>1}

# to get only the counts of the non-unique elements in the array:
> input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1}.reject{|k,v| v < 2}
=> {:bla=>3, :blubb=>2, :bleh=>2}

#16

a = ["A", "B", "C", "B", "A"]
b = a.select {|e| a.count(e) > 1}.uniq
c = a - b
d = b + c

Results

结果

 d
=> ["A", "B", "C"]

#17

def firstRepeatedWord(string)
  h_data = Hash.new(0)
  string.split(" ").each{|x| h_data[x] +=1}
  h_data.key(h_data.values.max)
end

#18

[1,2,3].uniq!.nil? => true [1,2,3,3].uniq!.nil? => false

(1、2、3).uniq ! .nil吗?= > true(1、2、3、3).uniq ! .nil吗?= >假

Notice the above is destructive

注意上面的内容是破坏性的

#1

204

a = ["A", "B", "C", "B", "A"]
a.detect{ |e| a.count(e) > 1 }

#2

184