Python:计算文件中单词出现次数

时间:2022-04-04 23:54:36

I have a file that contains a city name and then a state name per line in the file. I am suppose to count how many times a state name occurs and return the value.

我有一个文件,其中包含城市名称,然后是文件中每行的州名称。我想要计算状态名称出现的次数并返回值。

for example, if my file contained:

例如,如果我的文件包含:

Los Angeles   California
San Diego     California
San Francisco California
Albany        New York
Buffalo       New York
Orlando       Florida

I am suppose to return how many times each state name occurs. I have this for California.

我想要返回每个州名发生多少次。我有加州的这个。

for line in f:
    California_count=line.find("California")
    if California_count!=-1:
        total=line.count("California")
print(total)

This only gives me the value 1, which I am assuming is because it occurs 1 time per line. How do I get it to return the number 3 instead of the number 1?

这只给我值1,我假设是因为它每行发生1次。如何让它返回数字3而不是数字1?

4 个解决方案

#1


3  

total = 0

with open('input.txt') as f:
    for line in f:
        finded = line.find('California')
        if finded != -1 and finded != 0:
            total += 1

print total

output:

输出:

3

#2


3  

Use dictionary for storing counters:

使用字典存储计数器:

data = """Los Angeles   California
San Diego     California
San Francisco California
Albany        New York
Buffalo       New York
Orlando       Florida""".splitlines()

counters = {}
for line in data:
    city, state = line[:14], line[14:]
    # city, state = line.split('\t') # if separated by tabulator
    if state not in counters:
        counters[state] = 1
    else:
        counters[state] += 1
print counters
# {'Florida': 1, 'New York': 2, 'California': 3}

You can simplify it by using collections.defaultdict:

您可以使用collections.defaultdict简化它:

from collections import defaultdict
counter = defaultdict(int)
for line in data:
    city, state = line[:14], line[14:]
    counter[state] += 1

print counter
# defaultdict(<type 'int'>, {'Florida': 1, 'New York': 2, 'California': 3})

or using collections.Counter and generator expression:

或使用collections.Counter和generator表达式:

from collections import Counter
states = Counter(line[14:] for line in data)
# Counter({'California': 3, 'New York': 2, 'Florida': 1})

#3


1  

Assuming that the spaces in your post are meant to be tabs, the following code will give you a dict containing the counts for all of the states in the file.

假设帖子中的空格是标签,下面的代码将为您提供一个包含文件中所有状态计数的字典。

#!/usr/bin/env python3

counts = {}
with open('states.txt', 'r') as statefile:
    for i in statefile:
        state = i.split('\t')[1].rstrip()
        if state not in counts:
            counts[state] = 0
        else:
            counts[state] += 1
    print(counts)

#4


1  

Alternatively, you could just use the re module, and regex it:

或者,你可以只使用re模块,并正则表达它:

import re

states = """
Los Angeles   California
San Diego     California
San Francisco California
Albany        New York
Buffalo       New York
Orlando       Florida
"""

found = re.findall('[cC]alifornia', states)

total = 0

for i in found:
    total += 1

print total

#1


3  

total = 0

with open('input.txt') as f:
    for line in f:
        finded = line.find('California')
        if finded != -1 and finded != 0:
            total += 1

print total

output:

输出:

3

#2


3  

Use dictionary for storing counters:

使用字典存储计数器:

data = """Los Angeles   California
San Diego     California
San Francisco California
Albany        New York
Buffalo       New York
Orlando       Florida""".splitlines()

counters = {}
for line in data:
    city, state = line[:14], line[14:]
    # city, state = line.split('\t') # if separated by tabulator
    if state not in counters:
        counters[state] = 1
    else:
        counters[state] += 1
print counters
# {'Florida': 1, 'New York': 2, 'California': 3}

You can simplify it by using collections.defaultdict:

您可以使用collections.defaultdict简化它:

from collections import defaultdict
counter = defaultdict(int)
for line in data:
    city, state = line[:14], line[14:]
    counter[state] += 1

print counter
# defaultdict(<type 'int'>, {'Florida': 1, 'New York': 2, 'California': 3})

or using collections.Counter and generator expression:

或使用collections.Counter和generator表达式:

from collections import Counter
states = Counter(line[14:] for line in data)
# Counter({'California': 3, 'New York': 2, 'Florida': 1})

#3


1  

Assuming that the spaces in your post are meant to be tabs, the following code will give you a dict containing the counts for all of the states in the file.

假设帖子中的空格是标签,下面的代码将为您提供一个包含文件中所有状态计数的字典。

#!/usr/bin/env python3

counts = {}
with open('states.txt', 'r') as statefile:
    for i in statefile:
        state = i.split('\t')[1].rstrip()
        if state not in counts:
            counts[state] = 0
        else:
            counts[state] += 1
    print(counts)

#4


1  

Alternatively, you could just use the re module, and regex it:

或者,你可以只使用re模块,并正则表达它:

import re

states = """
Los Angeles   California
San Diego     California
San Francisco California
Albany        New York
Buffalo       New York
Orlando       Florida
"""

found = re.findall('[cC]alifornia', states)

total = 0

for i in found:
    total += 1

print total