I'm trying to use python's CSV sniffer tool as suggested in many * answers to guess if a given CSV file is delimited by ;
or ,
.
我正在尝试使用许多*答案中建议的python的CSV嗅探器工具,以猜测给定的CSV文件是否由分隔;要么 ,。
It's working fine with basic files, but when a value contains a delimiter, it is surrounded by double quotes (as the standard goes), and the sniffer throws _csv.Error: Could not determine delimiter
.
它与基本文件一起正常工作,但是当一个值包含一个分隔符时,它被双引号括起来(正如标准所示),并且嗅探器抛出_csv.Error:无法确定分隔符。
Has anyone experienced that before?
以前有没有人经历过这个?
Here is a minimal failing CSV file:
这是一个最小的失败CSV文件:
column1,column2
0,"a, b"
And the proof of concept:
并且概念证明:
Python 3.5.1 (default, Dec 7 2015, 12:58:09)
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> f = open("example.csv", "r")
>>> f.seek(0);
0
>>> csv.Sniffer().sniff(f.read(), delimiters=';,')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/csv.py", line 186, in sniff
raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter
I have total control over the generation of input CSV file; but sometimes it is modified by a third party using MS Office and the delimiter is replaced by semicolumns, so I have to use this guessing approach. I know I could stop using commas in the input file, but I would like to know if I'm doing something wrong first.
我完全可以控制输入CSV文件的生成;但有时它会被使用MS Office的第三方修改,并且分隔符被半毫米替换,所以我必须使用这种猜测方法。我知道我可以在输入文件中停止使用逗号,但我想先知道我是否做错了。
1 个解决方案
#1
12
You are giving the sniffer too much input. Your sample file does work if you run:
你给嗅探器太多的输入。如果您运行以下示例文件可以正常工作:
csv.Sniffer().sniff(f.readline())
which uses only the header row to determine the delimiter character. If you want to understand why the Sniffer heuristics fail for more data, there is no substitute for reading the csv.py library source code.
它仅使用标题行来确定分隔符。如果您想了解为什么Sniffer启发式失败以获取更多数据,则无法替代读取csv.py库源代码。
#1
12
You are giving the sniffer too much input. Your sample file does work if you run:
你给嗅探器太多的输入。如果您运行以下示例文件可以正常工作:
csv.Sniffer().sniff(f.readline())
which uses only the header row to determine the delimiter character. If you want to understand why the Sniffer heuristics fail for more data, there is no substitute for reading the csv.py library source code.
它仅使用标题行来确定分隔符。如果您想了解为什么Sniffer启发式失败以获取更多数据,则无法替代读取csv.py库源代码。