如何查找列表中最常见的元素？

Given the following list

鉴于以下列表

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']

I am trying to count how many times each word appears and display the top 3.

我试图计算每个单词出现的次数并显示前3个。

However I am only looking to find the top three that have the first letter capitalized and ignore all words that do not have the first letter capitalized.

但是我只想找到第一个字母大写的前三个,并忽略所有没有首字母大写的单词。

I am sure there is a better way than this, but my idea was to do the following:

我相信有比这更好的方法,但我的想法是做以下事情:

put the first word in the list into another list called uniquewords

将列表中的第一个单词放入另一个名为uniquewords的列表中

delete the first word and all its duplicated from the original list

从原始列表中删除第一个单词及其复制的所有单词

add the new first word into unique words

将新的第一个单词添加到唯一的单词中

delete the first word and all its duplicated from original list.

删除第一个单词及其原始列表中的所有单词。

etc...
until the original list is empty....

直到原始列表为空....

count how many times each word in uniquewords appears in the original list

计算唯一字中每个单词出现在原始列表中的次数

find top 3 and print

找到前三名并打印

9 个解决方案

#1

If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a dict.

如果您使用的是早期版本的Python,或者您有充分的理由推出自己的单词计数器(我想听听它!),您可以尝试使用dict的以下方法。

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
...     if word in word_counter:
...         word_counter[word] += 1
...     else:
...         word_counter[word] = 1
... 
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>> 
>>> top_3 = popular_words[:3]
>>> 
>>> top_3
['Jellicle', 'Cats', 'and']

Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.

热门提示:只要您想使用这样的算法,交互式Python解释器就是您的朋友。只需输入并观察它,沿途检查元素。

#2

In Python 2.7 and above there is a class called Counter which can help you:

在Python 2.7及更高版本中,有一个名为Counter的类可以帮助您:

from collections import Counter
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

Result:

[('Jellicle', 6), ('Cats', 5), ('And', 2)]

I am quite new to programming so please try and do it in the most barebones fashion.

我对编程很陌生,所以请尝试以最准确的方式进行编程。

You could instead do this using a dictionary with the key being a word and the value being the count for that word. First iterate over the words adding them to the dictionary if they are not present, or else increasing the count for the word if it is present. Then to find the top three you can either use a simple O(n*log(n)) sorting algorithm and take the first three elements from the result, or you can use a O(n) algorithm that scans the list once remembering only the top three elements.

您可以使用字典来执行此操作,其中键是单词,值是该单词的计数。如果它们不存在,首先迭代将它们添加到字典中的单词,否则如果它存在则增加该单词的计数。然后找到前三个,您可以使用简单的O(n * log(n))排序算法并从结果中获取前三个元素,或者您可以使用O(n)算法扫描列表一次只记住前三个要素。

An important observation for beginners is that by using builtin classes that are designed for the purpose you can save yourself a lot of work and/or get better performance. It is good to be familiar with the standard library and the features it offers.

初学者的一个重要观察是,通过使用专为此目的而设计的内置类,您可以节省大量工作和/或获得更好的性能。熟悉标准库及其提供的功能是很好的。

#3

To just return a list containing the most common words:

要返回包含最常用单词的列表:

from collections import Counter
words=["i", "love", "you", "i", "you", "a", "are", "you", "you", "fine", "green"]
most_common_words= [word for word, word_count in Counter(words).most_common(3)]
print most_common_words

this prints:

['you', 'i', 'a']

the 3 in "most_common(3)", specifies the number of items to print. Counter(words).most_common() returns a a list of tuples with each tuple having the word as the first member and the frequency as the second member.The tuples are ordered by the frequency of the word.

“most_common(3)”中的3指定要打印的项目数。 Counter(words).most_common()返回一个元组列表,每个元组都将该单词作为第一个成员,频率作为第二个成员。元组按单词的频率排序。

`most_common = [item for item in Counter(words).most_common()]
print(str(most_common))
[('you', 4), ('i', 2), ('a', 1), ('are', 1), ('green', 1), ('love',1), ('fine', 1)]`

"the word for word, word_counter in", extracts only the first member of the tuple.

“逐字逐句,word_counter in”,只提取元组的第一个成员。

#4

nltk is convenient for a lot of language processing stuff. It has methods for frequency distribution built in. Something like:

nltk很方便很多语言处理。它有内置频率分配的方法。例如:

import nltk
fdist = nltk.FreqDist(your_list) # creates a frequency distribution from a list
most_common = fdist.max()    # returns a single element
top_three = fdist.keys()[:3] # returns a list

#5

Is't it just this ....

不仅仅是这个......

word_list=['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', ''] 

from collections import Counter
c = Counter(word_list)
c.most_common(3)

Which should output

哪个应该输出

[('Jellicle', 6), ('Cats', 5), ('are', 3)]

[('Jellicle',6),('猫',5),('是',3)]

#6

A simple, two-line solution to this, which does not require any extra modules is the following code:

一个简单的双线解决方案,不需要任何额外的模块,代码如下:

lst = ['Jellicle', 'Cats', 'are', 'black', 'and','white,',
       'Jellicle', 'Cats','are', 'rather', 'small;', 'Jellicle', 
       'Cats', 'are', 'merry', 'and','bright,', 'And', 'pleasant',    
       'to','hear', 'when', 'they', 'caterwaul.','Jellicle', 
       'Cats', 'have','cheerful', 'faces,', 'Jellicle',
       'Cats','have', 'bright', 'black','eyes;', 'They', 'like',
       'to', 'practise','their', 'airs', 'and', 'graces', 'And', 
       'wait', 'for', 'the', 'Jellicle','Moon', 'to', 'rise.', '']

lst_sorted=sorted([ss for ss in set(lst) if len(ss)>0 and ss.istitle()], 
                   key=lst.count, 
                   reverse=True)
print lst_sorted[0:3]

Output:

['Jellicle', 'Cats', 'And']

The term in squared brackets returns all unique strings in the list, which are not empty and start with a capital letter. The sorted() function then sorts them by how often they appear in the list (by using the lst.count key) in reverse order.

方括号中的术语返回列表中的所有唯一字符串,这些字符串不为空并以大写字母开头。然后,sorted()函数按它们以相反顺序出现在列表中的频率(通过使用lst.count键)对它们进行排序。

#7

The simple way of doing this would be (assuming your list is in 'l'):

这样做的简单方法是(假设您的列表位于'l'):

>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

Complete sample:

>>> l = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
... 
>>> counter
{'and': 3, '': 1, 'merry': 1, 'rise.': 1, 'small;': 1, 'Moon': 1, 'cheerful': 1, 'bright': 1, 'Cats': 5, 'are': 3, 'have': 2, 'bright,': 1, 'for': 1, 'their': 1, 'rather': 1, 'when': 1, 'to': 3, 'airs': 1, 'black': 2, 'They': 1, 'practise': 1, 'caterwaul.': 1, 'pleasant': 1, 'hear': 1, 'they': 1, 'white,': 1, 'wait': 1, 'And': 2, 'like': 1, 'Jellicle': 6, 'eyes;': 1, 'the': 1, 'faces,': 1, 'graces': 1}
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

With simple I mean working in nearly every version of python.

简单,我的意思是几乎每个版本的python都在工作。

if you don't understand some of the functions used in this sample, you can always do this in the interpreter (after pasting the code above):

如果您不理解此示例中使用的某些函数,您可以始终在解释器中执行此操作(在粘贴上面的代码之后):

>>> help(counter.get)
>>> help(sorted)

#8

The answer from @Mark Byers is best, but if you are on a version of Python < 2.7 (but at least 2.5, which is pretty old these days), you can replicate the Counter class functionality very simply via defaultdict (otherwise, for python < 2.5, three extra lines of code are needed before d[i] +=1, as in @Johnnysweb's answer).

来自@Mark Byers的答案是最好的,但是如果你使用的是Python <2.7(但至少2.5,这些日子已经相当古老)的版本,你可以通过defaultdict非常简单地复制Counter类功能(否则,对于python <2.5,在d [i] + = 1之前需要三行额外的代码,如@ Johnnysweb的回答所示。

from collections import defaultdict
class Counter():
    ITEMS = []
    def __init__(self, items):
        d = defaultdict(int)
        for i in items:
            d[i] += 1
        self.ITEMS = sorted(d.iteritems(), reverse=True, key=lambda i: i[1])
    def most_common(self, n):
        return self.ITEMS[:n]

Then, you use the class exactly as in Mark Byers's answer, i.e.:

然后,你完全按照Mark Byers的答案使用该类,即:

words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

#9

If you are using Count, or have created your own Count-style dict and want to show the name of the item and the count of it, you can iterate around the dictionary like so:

如果您正在使用Count,或者已经创建了自己的Count样式的dict并想要显示项目的名称和计数,您可以像这样迭代字典:

top_10_words = Counter(my_long_list_of_words)
# Iterate around the dictionary
for word in top_10_words:
        # print the word
        print word[0]
        # print the count
        print word[1]

or to iterate through this in a template:

或者在模板中迭代:

{% for word in top_10_words %}
        <p>Word: {{ word.0 }}</p>
        <p>Count: {{ word.1 }}</p>
{% endfor %}

Hope this helps someone

希望这有助于某人

#1