使用Python将制表符分隔的txt文件转换为csv文件

So I want to convert a simple tab delimited text file into a csv file. If I convert the txt file into a string using string.split('\n') I get a list with each list item as a string with '\t' between each column. I was thinking I could just replace the '\t' with a comma but it won't treat the string within the list like string and allow me to use string.replace. Here is start of my code that still needs a way to parse the tab "\t".

所以我想将一个简单的制表符分隔文本文件转换为csv文件。如果我使用string.split（'\ n'）将txt文件转换为字符串，我会得到一个列表，其中每个列表项都是一个字符串，每列之间带有'\ t'。我以为我可以用逗号替换'\ t'但它不会像字符串那样处理列表中的字符串并允许我使用string.replace。这是我的代码的开始，仍然需要一种方法来解析选项卡“\ t”。

import csv
import sys

txt_file = r"mytxt.txt"
csv_file = r"mycsv.csv"

in_txt = open(txt_file, "r")
out_csv = csv.writer(open(csv_file, 'wb'))

file_string = in_txt.read()

file_list = file_string.split('\n')

for row in ec_file_list:       
    out_csv.writerow(row)

2 个解决方案

#1

csv supports tab delimited files. Supply the delimiter argument to reader:

csv支持制表符分隔文件。为读者提供分隔符参数：

import csv

txt_file = r"mytxt.txt"
csv_file = r"mycsv.csv"

# use 'with' if the program isn't going to immediately terminate
# so you don't leave files open
# the 'b' is necessary on Windows
# it prevents \x1a, Ctrl-z, from ending the stream prematurely
# and also stops Python converting to / from different line terminators
# On other platforms, it has no effect
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))

out_csv.writerows(in_txt)

#2

Why you should always use 'rb' mode when reading files with the csv module:

为什么在使用csv模块读取文件时应始终使用'rb'模式：

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

What's in the sample file: any old rubbish, including control characters obtained by extracting blobs or whatever from a database, or injudicious use of the CHAR function in Excel formulas, or ...

示例文件中包含的内容：任何旧垃圾，包括通过从数据库中提取blob或其他内容获得的控制字符，或在Excel公式中不明智地使用CHAR函数，或者......

>>> open('demo.txt', 'rb').read()
'h1\t"h2a\nh2b"\th3\r\nx1\t"x2a\r\nx2b"\tx3\r\ny1\ty2a\x1ay2b\ty3\r\n'

Python follows CP/M, MS-DOS, and Windows when it reads files in text mode: \r\n is recognised as the line separator and is served up as \n, and \x1a aka Ctrl-Z is recognised as an END-OF-FILE marker.

Python在文本模式下读取文件时遵循CP / M，MS-DOS和Windows：\ r \ n被识别为行分隔符并被提供为\ n，而\ x1a又称为Ctrl-Z被识别为END -OF-FILE标记。

>>> open('demo.txt', 'r').read()
'h1\t"h2a\nh2b"\th3\nx1\t"x2a\nx2b"\tx3\ny1\ty2a' # WHOOPS

csv with a file opened with 'rb' works as expected:

使用'rb'打开文件的csv按预期工作：

>>> import csv
>>> list(csv.reader(open('demo.txt', 'rb'), delimiter='\t'))
[['h1', 'h2a\nh2b', 'h3'], ['x1', 'x2a\r\nx2b', 'x3'], ['y1', 'y2a\x1ay2b', 'y3']]

but text mode doesn't:

但文字模式不会：

>>> list(csv.reader(open('demo.txt', 'r'), delimiter='\t'))
[['h1', 'h2a\nh2b', 'h3'], ['x1', 'x2a\nx2b', 'x3'], ['y1', 'y2a']]
>>>

#1