如何计算Ruby数组中的重复元素


68

我有一个排序数组:

[
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
]

我想得到这样的东西,但它不必是哈希:

[
  {:error => 'FATAL <error title="Request timed out.">', :count => 2},
  {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
]

Answers:


131

下面的代码打印你所提出的要求。我将让您决定如何实际用于生成您要查找的哈希:

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
  b[v] += 1
end

b.each do |k, v|
  puts "#{k} appears #{v} times"
end

注意:我刚刚注意到您说数组已经排序。上面的代码不需要排序。使用该属性可能会产生更快的代码。


我实际上并不需要打印它,只需使用散列即可。谢谢!
泽利科菲

3
我知道我迟到了,但是,哇。哈希默认值。那真是一个很酷的把戏。谢谢!
抹茶

4
如果要查找最大出现次数(并在一行中完成):a.inject(Hash.new(0)){| hash,val | hash [val] + = 1; hash} .entries.max_by {| entry | entry.last} .... gotta喜欢它!
codecraig

2
您应该学习Enumerable以避免过程编码风格。
菲尔·皮罗

68

您可以使用以下命令非常简洁(一行)inject

a = ['FATAL <error title="Request timed out.">',
      'FATAL <error title="Request timed out.">',
      'FATAL <error title="There is insufficient ...">']

b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }

b.to_a.each {|error,count| puts "#{count}: #{error}" }

将产生:

1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">

12
在Ruby 1.9+中,您可以使用each_with_object代替injecta.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }
安德鲁·马歇尔

1
@Andrew-谢谢,我更喜欢命名,each_with_object因为它与ruby可枚举对象上的其他类似方法名更好地匹配。
马特·哈金斯

请注意,each_with_object由于不需要累加器作为该块的返回值,因此可以稍微简化代码。
锡人

29

如果您有这样的数组:

words = ["aa","bb","cc","bb","bb","cc"]

需要计算重复元素的地方,单行解决方案是:

result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

23

使用Enumerable#group_by可以采用不同的方法来解决上述问题。

[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}

将其分为不同的方法调用:

a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

Enumerable#group_by 是在Ruby 1.8.7中添加的。


2
我喜欢(&:itself),这恰到好处。
丹·贝查德

一线优雅。这应该是公认的答案!
zor-el

17

怎么样:

things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h

感觉更干净,更能说明我们实际上要做什么。

我怀疑对于大型集合,其效果要比对每个值进行迭代的效果更好。

基准性能测试:

a = (1...1000000).map { rand(100)}
                       user     system      total        real
inject                 7.670000   0.010000   7.680000 (  7.985289)
array count            0.040000   0.000000   0.040000 (  0.036650)
each_with_object       0.210000   0.000000   0.210000 (  0.214731)
group_by               0.220000   0.000000   0.220000 (  0.218581)

因此它要快得多。


things.uniqthings.count(t)迭代这个数组?
桑索什

完全有可能做到这一点,所以也许我已经描述了这个错误。无论哪种方式,性能提升都是真实的,我认为...
Carpela

8

我个人会这样:

# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a

然后运行程序并将其通过管道传递给uniq -c:

ruby myprogram.rb | uniq -c

输出:

 2 FATAL <error title="Request timed out.">
 1 FATAL <error title="There is insufficient system memory to run this query.">

8

从Ruby> = 2.2开始,您可以使用itselfarray.group_by(&:itself).transform_values(&:count)

详细介绍:

array = [
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
];

array.group_by(&:itself).transform_values(&:count)
 => { "FATAL <error title=\"Request timed out.\">"=>2,
      "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }


3
a = [1,1,1,2,2,3]
a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } }
=> [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

4
哦,不要那样做。您要遍历整个数组的每个值!
格伦麦当劳

那里有很好的解决方案。只想提一下array#count的存在:a = [1,1,1,2,2,3]; a.uniq.inject([]){| r,i | r << {{error => i,:count => a.count(i)}}
罗纳德先生

1

如果您想经常使用它,我建议您这样做:

# lib/core_extensions/array/duplicates_counter
module CoreExtensions
  module Array
    module DuplicatesCounter
      def count_duplicates
        self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h
      end
    end
  end
end

加载它

Array.include CoreExtensions::Array::DuplicatesCounter

然后在任何地方都可以使用:

the_ar = %w(a a a a a a a  chao chao chao hola hola mundo hola chao cachacho hola)
the_ar.duplicates_counter
{
           "a" => 7,
        "chao" => 4,
        "hola" => 4,
       "mundo" => 1,
    "cachacho" => 1
}

0

简单实施:

(errors_hash = {}).default = 0
array_of_errors.each { |error| errors_hash[error] += 1 }

2
这第一行可以更清楚地写使用errors_hash = Hash.new(0)
铁皮人

0

这是示例数组:

a=["aa","bb","cc","bb","bb","cc"]
  1. 选择所有唯一键。
  2. 对于每个键,我们将它们累积为一个散列,以得到如下所示的内容: {'bb' => ['bb', 'bb']}
    res = a.uniq.inject({}){| accu,uni | accu.merge({uni => a.select {| i | i == uni}})}
    {“ aa” => [“ aa”],“ bb” => [“ bb”,“ bb”,“ bb”],“ cc” => [“ cc”,“ cc”]}

现在您可以执行以下操作:

res['aa'].size 

-3
def find_most_occurred_item(arr)
    return 'Array has unique elements already' if arr.uniq == arr
    m = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    m.each do |k, v|
        a = arr.max_by { |v| m[v] }
        if v > a
            puts "#{k} appears #{v} times"
        elsif v == a
            puts "#{k} appears #{v} times"
        end 
    end
end

puts find_most_occurred_item([1, 2, 3,4,4,4,3,3])
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.