使用Python获取每个CSV列中的字符串计数

时间:2022-02-01 11:09:59

I have a CSV file like this:

我有一个像这样的CSV文件:

Header1,Header2,Header3,Header4
AA,12,ABCS,A1
BDDV,34,ABCS,BB2
ABCS,5666,gf,KK0

where a column can have only letters/words, or just numbers or both. I have multiple files like this and the columns are not necessarily the same in each. I'd like to get the counts of each element in a column that has only letters and no numbers in it.

列只能包含字母/单词,或只包含数字或两者。我有这样的多个文件,每个列中的列不一定相同。我想得到列中只包含字母而没有数字的每个元素的计数。

My desired output is

我想要的输出是

Header1- [('AA', 1),('BDDV',1),('ABCS',1)] Header3- [('ABCS', 2),('gf', 1)]

Here, though both the columns have 'ABCS', I'd like to count them separately for each column.

这里,尽管两列都有'ABCS',但我想为每列分别计算它们。

I can get the count by hardcoding the column number like below:

我可以通过硬编码列号来得到计数,如下所示:

import csv
import collections

count_number = collections.Counter()
with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    headers = next(r)
    for row in r:
        count_number[row[1]] += 1

print count_number.most_common()

but I'm confused on how to do it with respect to columns.

但我对如何做到这一点感到困惑。

2 个解决方案

#1


1  

This can work using a Counter for each header:

这可以使用每个标头的计数器:

#!/usr/bin/env python
from collections import Counter, defaultdict
import csv

header_counter = defaultdict(Counter)

with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    # read headers
    headers = next(r)
    for row in r:
        # count values for each row to add in header context
        row_val = sum([w.isdigit() for w in row])
        # zip each row with headers to know where to count
        for header, val in zip(headers, row):
            # count only non-digits
            if not any(map(str.isdigit, val)):
                header_counter[header].update({val: row_val})

for k, v in header_counter.iteritems():
    print k, v

Output:

Header3 Counter({'ABCS': 2, 'gf': 1})
Header1 Counter({'AA': 1, 'BDDV': 1, 'ABCS': 1})

#2


1  

Partial solution only (you still need to filter columns with digits on the second iteration of your CSV reader).

仅部分解决方案(您仍需要在CSV读取器的第二次迭代中过滤带有数字的列)。

import csv
import collections

with open('filename.csv') as input_file:
  r = csv.reader(input_file, delimiter=',')
  headers = next(r)
  count_number = [collections.Counter() for I in Len(headers)]

  for row in r:
    for i, val in enumerate(row):
      count_number[i][val] += 1

print [cr.most_common() for cr in count_number]

#1


1  

This can work using a Counter for each header:

这可以使用每个标头的计数器:

#!/usr/bin/env python
from collections import Counter, defaultdict
import csv

header_counter = defaultdict(Counter)

with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    # read headers
    headers = next(r)
    for row in r:
        # count values for each row to add in header context
        row_val = sum([w.isdigit() for w in row])
        # zip each row with headers to know where to count
        for header, val in zip(headers, row):
            # count only non-digits
            if not any(map(str.isdigit, val)):
                header_counter[header].update({val: row_val})

for k, v in header_counter.iteritems():
    print k, v

Output:

Header3 Counter({'ABCS': 2, 'gf': 1})
Header1 Counter({'AA': 1, 'BDDV': 1, 'ABCS': 1})

#2


1  

Partial solution only (you still need to filter columns with digits on the second iteration of your CSV reader).

仅部分解决方案(您仍需要在CSV读取器的第二次迭代中过滤带有数字的列)。

import csv
import collections

with open('filename.csv') as input_file:
  r = csv.reader(input_file, delimiter=',')
  headers = next(r)
  count_number = [collections.Counter() for I in Len(headers)]

  for row in r:
    for i, val in enumerate(row):
      count_number[i][val] += 1

print [cr.most_common() for cr in count_number]