使用Python IF特定数据解析CSV文件

时间:2022-12-14 18:23:31

The data files look like this:

数据文件如下所示:

"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"

I need to add the third field to total only IF the last field is "S". Otherwise, the line is just skipped.

如果最后一个字段是“S”,我需要将第三个字段添加到total。否则,只是跳过该行。

I tried importing CSV and using the following:

我尝试导入CSV并使用以下内容:

for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
if line[16] == "S":
    total = total + line[2]

This tells me "IndexError: list index out of range". Maybe there is a better way. I thought Import CSV would do most the work for me. What's the best way? At this point, I would take any way that works.

这告诉我“IndexError:列表索引超出范围”。也许有更好的方法。我认为Import CSV会为我做大部分工作。什么是最好的方式?在这一点上,我会采取任何有效的方式。

Printing a line displays the following:

打印一行显示以下内容:

['"2015"', '"43"', '"2"', '"ZETA"', '"W"', '"x"', '"1"', '"1"', '"d"', '"2"', '""', '"31"', '""', '""', '""', '""', '"N"']

4 个解决方案

#1


It's possible that the file does not consistently have 17 columns. One way for this to happen would be if there is an extra newline at the end of the file.

该文件可能不会始终具有17列。发生这种情况的一种方法是在文件末尾有一个额外的换行符。

Here's how to detect which line causes the problem.

以下是检测哪条线导致问题的方法。

reader = csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True)
for line_num, line in enumerate(reader, start=1):
    try:
        if line[16] == "S":
            total = total + line[2]
    except IndexError:
        # show offending line
        print(line_num, line)
        # reraise to halt execution
        raise

#2


pandas can do this easily:

熊猫可以很容易地做到这一点:

In [52]:
# read the csv into a dataframe
df = pd.read_csv(r'c:\data\sample.txt', quotechar="\"", header=None)
df
Out[52]:
     0   1   2     3  4  5   6   7  8   9   10  11  12  13  14  15 16
0  2015  21   2  RICK  D  w   1   1  f   8 NaN   0 NaN NaN NaN NaN  S
1  2015  56   5  RICK  E  g   1   1  k   8 NaN  15 NaN NaN NaN NaN  F
In [55]:
# we can filter the values and then call count()
df.loc[df[16] == 'S',16].count()
Out[55]:
1
In [56]:
# we can also show the count for all unique values
df[16].value_counts()
Out[56]:
S    1
F    1
dtype: int64

#3


=Assigns values from right side operands to left side operand

=将右侧操作数的值分配给左侧操作数

if line[16] = "S": should be if line[16] == "S":

如果line [16] =“S”:应该是line [16] ==“S”:

hzhang@dell-work ~ $ cat sample.csv 
"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"
hzhang@dell-work ~ $ cat test.py 
import csv
with open("sample.csv", "rb") as csvfile:
    csvreader = csv.reader(csvfile, delimiter=",")
    total = 0
    for line in csvreader:
        if line[16] =="S":
            total = total + int(line[2])

    print "total is:{}".format(total)
hzhang@dell-work ~ $ python test.py 
total is:2

Based on your code:

根据您的代码:

import csv
file = open("sample.csv")
total = 0
for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
    if line[16] == "S":
        total = total + int(line[2])

file.close()
print "total:{}".format(total)
hzhang@dell-work ~ $ python test.py 
total:2

Please makes sure all your input lines have 17 fields, and converts 3rd column of each before you sum them up.

请确保所有输入行都有17个字段,并在总结之前转换每个字段的第3列。

to check which lines do not have 17 fields. if len(line) != 17: print line

检查哪些行没有17个字段。如果len(line)!= 17:打印行

#4


You might consider using negative array indices to access items from the ends of arrays:

您可以考虑使用负数组索引来访问数组末尾的项目:

total = 0
for line in cvs.reader(...):
    if line[-1] == "S":
        total += int(line[2])

#1


It's possible that the file does not consistently have 17 columns. One way for this to happen would be if there is an extra newline at the end of the file.

该文件可能不会始终具有17列。发生这种情况的一种方法是在文件末尾有一个额外的换行符。

Here's how to detect which line causes the problem.

以下是检测哪条线导致问题的方法。

reader = csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True)
for line_num, line in enumerate(reader, start=1):
    try:
        if line[16] == "S":
            total = total + line[2]
    except IndexError:
        # show offending line
        print(line_num, line)
        # reraise to halt execution
        raise

#2


pandas can do this easily:

熊猫可以很容易地做到这一点:

In [52]:
# read the csv into a dataframe
df = pd.read_csv(r'c:\data\sample.txt', quotechar="\"", header=None)
df
Out[52]:
     0   1   2     3  4  5   6   7  8   9   10  11  12  13  14  15 16
0  2015  21   2  RICK  D  w   1   1  f   8 NaN   0 NaN NaN NaN NaN  S
1  2015  56   5  RICK  E  g   1   1  k   8 NaN  15 NaN NaN NaN NaN  F
In [55]:
# we can filter the values and then call count()
df.loc[df[16] == 'S',16].count()
Out[55]:
1
In [56]:
# we can also show the count for all unique values
df[16].value_counts()
Out[56]:
S    1
F    1
dtype: int64

#3


=Assigns values from right side operands to left side operand

=将右侧操作数的值分配给左侧操作数

if line[16] = "S": should be if line[16] == "S":

如果line [16] =“S”:应该是line [16] ==“S”:

hzhang@dell-work ~ $ cat sample.csv 
"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"
hzhang@dell-work ~ $ cat test.py 
import csv
with open("sample.csv", "rb") as csvfile:
    csvreader = csv.reader(csvfile, delimiter=",")
    total = 0
    for line in csvreader:
        if line[16] =="S":
            total = total + int(line[2])

    print "total is:{}".format(total)
hzhang@dell-work ~ $ python test.py 
total is:2

Based on your code:

根据您的代码:

import csv
file = open("sample.csv")
total = 0
for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
    if line[16] == "S":
        total = total + int(line[2])

file.close()
print "total:{}".format(total)
hzhang@dell-work ~ $ python test.py 
total:2

Please makes sure all your input lines have 17 fields, and converts 3rd column of each before you sum them up.

请确保所有输入行都有17个字段,并在总结之前转换每个字段的第3列。

to check which lines do not have 17 fields. if len(line) != 17: print line

检查哪些行没有17个字段。如果len(line)!= 17:打印行

#4


You might consider using negative array indices to access items from the ends of arrays:

您可以考虑使用负数组索引来访问数组末尾的项目:

total = 0
for line in cvs.reader(...):
    if line[-1] == "S":
        total += int(line[2])