python 3.4导入csv分隔符。

时间:2022-09-23 07:46:58

Using Python 3.4 and Im trying to import csv files with some containing commas, others containing semicolons, and other containing tabs as delimiters.

使用Python 3.4和Im尝试导入一些包含逗号的csv文件,其他包含分号的,以及其他包含标签的分隔符。

Is it possible to let python detect which proper delimiter to use? i have read the post on python: import csv file (delimiter “;” or “,”) but cannot get the appropriate result.

是否有可能让python检测到使用哪个适当的分隔符?我已经阅读了python的文章:导入csv文件(分隔符“;”或“,”),但无法得到相应的结果。

My code thus far:

到目前为止我的代码:

import csv

class Data(object):
def __init__(self, csv_file):
    self.raw_data = []
    self.read(csv_file)

def read(self, csv_file):
        with open(csv_file, newline='') as csvfile:
            dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=',;')
            csvfile.seek(0)
            f = csv.reader(csvfile, dialect)
            for row in f:
               self.raw_data.append(row)
            print(self.raw_data)

mycsv = Data('comma_separate.csv')

comma_separate.csv contains:

comma_separate。csv包含:

afsfaf@faf.com, $161,321, True, 1
asafasf@fafa.net, $95.00, False, 3
adaafa3@aca.com, $952025, False, 3

Right now my output is:

现在我的输出是:

['afsfaf@faf.com, $161,321, True, 1'], ['asafasf@fafa.net, $95.00, False, 3'], ['adaafa3@aca.com, $952025, False, 3']

My desired output is:

我的期望输出值是:

['afsfaf@faf.com', '$161,321', 'True', '1'], ['asafasf@fafa.net', '$95.00', 'False', '3'], ['adaafa3@aca.com', '$952025', 'False', '3']

2 个解决方案

#1


1  

The problem seems to be the first line of your csv-file that you are using to determine the delimiter. The program works as expected, if you change the line to:

这个问题似乎是您用来确定分隔符的csv文件的第一行。如果你把线路改为:

afsfaf@faf.com, $161.321, True, 1

I guess the reason for that is that he wants to have the same number of attributes per line in your csv-file.

我猜原因是他希望在你的csv文件中每一行有相同数量的属性。

#2


0  

use sniff without passing possible delimiters works for me

使用“嗅嗅”而不传递可能的分隔符对我有效。

import csv

class Data(object):
    def __init__(self, csv_file):
        self.raw_data = []
        self.read(csv_file)

    def read(self, csv_file):
            with open(csv_file, newline='') as csvfile:
                dialect = csv.Sniffer().sniff(csvfile.read())
                csvfile.seek(0)
                f = csv.reader(csvfile, dialect)
                for row in f:
                   self.raw_data.append(row)

                print(csvfile.name)
                print(self.raw_data)


for f in ['tab_separate.tsv','comma_separate.csv','comma_separate2.csv']:
    mycsv = Data(f)

output

输出

tab_separate.tsv
[['afsfaf@faf.com', '$161,321', 'True', '1'], ['asafasf@fafa.net', '$95.00', 'False', '3'], ['adaafa3@aca.com', '$952025', 'False', '3']]
comma_separate.csv
[['afsfaf@faf.com,', '$161,321,', 'True,', '1'], ['asafasf@fafa.net,', '$95.00,', 'False,', '3'], ['adaafa3@aca.com,', '$952025,', 'False,', '3']]
comma_separate2.csv
[['afsfaf@faf.com', '$161,321', 'True', '1'], ['asafasf@fafa.ne', '$95.00', 'False', '3'], ['adaafa3@aca.com', '$952025', 'False', '3']]

comma input

逗号输入

afsfaf@faf.com, $161,321, True, 1
asafasf@fafa.net, $95.00, False, 3
adaafa3@aca.com, $952025, False, 3

tab input

选项卡中输入

afsfaf@faf.com  $161,321    True    1
asafasf@fafa.net    $95.00  False   3
adaafa3@aca.com $952025 False   3

semi colon input

输入一个分号

afsfaf@faf.com;$161,321;True;1
asafasf@fafa.ne;$95.00;False;3
adaafa3@aca.com;$952025;False;3

#1


1  

The problem seems to be the first line of your csv-file that you are using to determine the delimiter. The program works as expected, if you change the line to:

这个问题似乎是您用来确定分隔符的csv文件的第一行。如果你把线路改为:

afsfaf@faf.com, $161.321, True, 1

I guess the reason for that is that he wants to have the same number of attributes per line in your csv-file.

我猜原因是他希望在你的csv文件中每一行有相同数量的属性。

#2


0  

use sniff without passing possible delimiters works for me

使用“嗅嗅”而不传递可能的分隔符对我有效。

import csv

class Data(object):
    def __init__(self, csv_file):
        self.raw_data = []
        self.read(csv_file)

    def read(self, csv_file):
            with open(csv_file, newline='') as csvfile:
                dialect = csv.Sniffer().sniff(csvfile.read())
                csvfile.seek(0)
                f = csv.reader(csvfile, dialect)
                for row in f:
                   self.raw_data.append(row)

                print(csvfile.name)
                print(self.raw_data)


for f in ['tab_separate.tsv','comma_separate.csv','comma_separate2.csv']:
    mycsv = Data(f)

output

输出

tab_separate.tsv
[['afsfaf@faf.com', '$161,321', 'True', '1'], ['asafasf@fafa.net', '$95.00', 'False', '3'], ['adaafa3@aca.com', '$952025', 'False', '3']]
comma_separate.csv
[['afsfaf@faf.com,', '$161,321,', 'True,', '1'], ['asafasf@fafa.net,', '$95.00,', 'False,', '3'], ['adaafa3@aca.com,', '$952025,', 'False,', '3']]
comma_separate2.csv
[['afsfaf@faf.com', '$161,321', 'True', '1'], ['asafasf@fafa.ne', '$95.00', 'False', '3'], ['adaafa3@aca.com', '$952025', 'False', '3']]

comma input

逗号输入

afsfaf@faf.com, $161,321, True, 1
asafasf@fafa.net, $95.00, False, 3
adaafa3@aca.com, $952025, False, 3

tab input

选项卡中输入

afsfaf@faf.com  $161,321    True    1
asafasf@fafa.net    $95.00  False   3
adaafa3@aca.com $952025 False   3

semi colon input

输入一个分号

afsfaf@faf.com;$161,321;True;1
asafasf@fafa.ne;$95.00;False;3
adaafa3@aca.com;$952025;False;3