在Python Pandas to_csv中使用多个字符分隔符

It appears that the pandas read_csv function only allows single character delimiters/separators. Is there some way to allow for a string of characters to be used like, "*|*" or "%%" instead?

看来pandas read_csv函数只允许使用单字符分隔符/分隔符。有没有办法允许使用一串字符，如“* | *”或“%%”？

3 个解决方案

#1

Pandas does now support multi character delimiters

熊猫现在支持多字符分隔符

import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")

#2

As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Spec states about delimiters:

正如Padraic Cunningham在上面的评论中写道，目前还不清楚你为什么要这样做。 CSV规范的Wiki条目说明了分隔符：

... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),

...由分隔符分隔（通常是单个保留字符，如逗号，分号或制表符;有时分隔符可能包含可选空格），

It's unsurprisingly that both the csv module and pandas don't support what you're asking.

毫不奇怪，csv模块和pandas都不支持你所要求的。

However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$ separating lines, and %% separating columns.

但是，如果你真的想这样做，那么你几乎要使用Python的字符串操作。以下示例显示如何将数据框转换为带有$$分隔行的“csv”和%%分隔列。

'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())

Of course, you don't have to turn it into a string like this prior to writing it into a file.

当然，在将其写入文件之前，您不必将其转换为这样的字符串。

#3

The solution would be to use read_table instead of read_csv:

解决方案是使用read_table而不是read_csv：

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

So, we could read this with:

所以，我们可以用以下内容来阅读：

pd.read_table('file.csv', header=None, sep='\*\|\*')

#1