如何计算Ruby数组中的重复元素

68

我有一个排序数组：

[
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
]

我想得到这样的东西，但它不必是哈希：

[
  {:error => 'FATAL <error title="Request timed out.">', :count => 2},
  {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
]

ruby arrays

— Željko菲律宾
source

131

下面的代码打印你所提出的要求。我将让您决定如何实际用于生成您要查找的哈希：

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
  b[v] += 1
end

b.each do |k, v|
  puts "#{k} appears #{v} times"
end

注意：我刚刚注意到您说数组已经排序。上面的代码不需要排序。使用该属性可能会产生更快的代码。

— 灭绝
source

我实际上并不需要打印它，只需使用散列即可。谢谢！

— 泽利科菲

3

我知道我迟到了，但是，哇。哈希默认值。那真是一个很酷的把戏。谢谢！

— 抹茶

4

如果要查找最大出现次数（并在一行中完成）：a.inject（Hash.new（0））{| hash，val | hash [val] + = 1; hash} .entries.max_by {| entry | entry.last} .... gotta喜欢它！

— codecraig

2

您应该学习Enumerable以避免过程编码风格。

— 菲尔·皮罗

68

您可以使用以下命令非常简洁（一行）inject：

a = ['FATAL <error title="Request timed out.">',
      'FATAL <error title="Request timed out.">',
      'FATAL <error title="There is insufficient ...">']

b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }

b.to_a.each {|error,count| puts "#{count}: #{error}" }

将产生：

1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">

— 弗拉德
source

12

在Ruby 1.9+中，您可以使用each_with_object代替inject：a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }。

— 安德鲁·马歇尔

1

@Andrew-谢谢，我更喜欢命名，each_with_object因为它与ruby可枚举对象上的其他类似方法名更好地匹配。

— 马特·哈金斯

请注意，each_with_object由于不需要累加器作为该块的返回值，因此可以稍微简化代码。

— 锡人

29

如果您有这样的数组：

words = ["aa","bb","cc","bb","bb","cc"]

需要计算重复元素的地方，单行解决方案是：

result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

— 曼尼斯（Manish Shrivastava）
source

23

使用Enumerable＃group_by可以采用不同的方法来解决上述问题。

[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}

将其分为不同的方法调用：

a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

Enumerable#group_by 是在Ruby 1.8.7中添加的。

— 薰
source

2

我喜欢(&:itself)，这恰到好处。

— 丹·贝查德

一线优雅。这应该是公认的答案！

— zor-el

17

怎么样：

things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h

感觉更干净，更能说明我们实际上要做什么。

我怀疑对于大型集合，其效果要比对每个值进行迭代的效果更好。

基准性能测试：

a = (1...1000000).map { rand(100)}
                       user     system      total        real
inject                 7.670000   0.010000   7.680000 (  7.985289)
array count            0.040000   0.000000   0.040000 (  0.036650)
each_with_object       0.210000   0.000000   0.210000 (  0.214731)
group_by               0.220000   0.000000   0.220000 (  0.218581)

因此它要快得多。

— 卡佩拉
source

不things.uniq和things.count(t)迭代这个数组？

— 桑索什

完全有可能做到这一点，所以也许我已经描述了这个错误。无论哪种方式，性能提升都是真实的，我认为...

— Carpela

8

我个人会这样：

# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a

然后运行程序并将其通过管道传递给uniq -c：

ruby myprogram.rb | uniq -c

输出：

 2 FATAL <error title="Request timed out.">
 1 FATAL <error title="There is insufficient system memory to run this query.">

— 担
source

8

从Ruby> = 2.2开始，您可以使用itself：array.group_by(&:itself).transform_values(&:count)

详细介绍：

array = [
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
];

array.group_by(&:itself).transform_values(&:count)
 => { "FATAL <error title=\"Request timed out.\">"=>2,
      "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }

— 安娜·玛丽亚·马丁内斯·戈麦斯
source

8

Ruby版本> = 2.7具有Enumerable＃tally。

例如：

["a", "b", "c", "b"].tally 

#=> { "a" => 1, "b" => 2, "c" => 1 }

— 桑托什
source

3

a = [1,1,1,2,2,3]
a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } }
=> [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

— 米兰诺沃塔
source

4

哦，不要那样做。您要遍历整个数组的每个值！

— 格伦麦当劳

那里有很好的解决方案。只想提一下array＃count的存在：a = [1,1,1,2,2,3]; a.uniq.inject（[]）{| r，i | r << {{error => i，：count => a.count（i）}}

— 罗纳德先生

1

如果您想经常使用它，我建议您这样做：

# lib/core_extensions/array/duplicates_counter
module CoreExtensions
  module Array
    module DuplicatesCounter
      def count_duplicates
        self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h
      end
    end
  end
end

加载它

Array.include CoreExtensions::Array::DuplicatesCounter

然后在任何地方都可以使用：

the_ar = %w(a a a a a a a  chao chao chao hola hola mundo hola chao cachacho hola)
the_ar.duplicates_counter
{
           "a" => 7,
        "chao" => 4,
        "hola" => 4,
       "mundo" => 1,
    "cachacho" => 1
}

— 阿诺德·罗阿
source

0

简单实施：

(errors_hash = {}).default = 0
array_of_errors.each { |error| errors_hash[error] += 1 }

— 埃文·森特
source

2

这第一行可以更清楚地写使用errors_hash = Hash.new(0)

— 铁皮人

0

这是示例数组：

a=["aa","bb","cc","bb","bb","cc"]

选择所有唯一键。
对于每个键，我们将它们累积为一个散列，以得到如下所示的内容： {'bb' => ['bb', 'bb']}

    res = a.uniq.inject（{}）{| accu，uni | accu.merge（{uni => a.select {| i | i == uni}}）}
    {“ aa” => [“ aa”]，“ bb” => [“ bb”，“ bb”，“ bb”]，“ cc” => [“ cc”，“ cc”]}

现在您可以执行以下操作：

res['aa'].size

— 元功夫
source

-3

def find_most_occurred_item(arr)
    return 'Array has unique elements already' if arr.uniq == arr
    m = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    m.each do |k, v|
        a = arr.max_by { |v| m[v] }
        if v > a
            puts "#{k} appears #{v} times"
        elsif v == a
            puts "#{k} appears #{v} times"
        end 
    end
end

puts find_most_occurred_item([1, 2, 3,4,4,4,3,3])

— 沙杰·沙
source