根据文件中的选项卡拆分字符串

时间:2021-07-22 15:40:55

I have file that contains values separated by tab ("\t"). I am trying to create a list and store all values of file in the list. But I get some problem. Here is my code.

我有包含由制表符分隔的值的文件(“\ t”)。我正在尝试创建一个列表并将所有文件值存储在列表中。但我遇到了一些问题。这是我的代码。

line = "abc def ghi"
values = line.split("\t")

It works fine as long as there is only one tab between each value. But if there is one than one tab then it copies the table to values as well. In my case mostly the extra tab will be after the last value in the file. Can anybody help me that.

只要每个值之间只有一个选项卡,它就可以正常工作。但是如果有一个以上的选项卡,那么它也会将表复制到值。在我的情况下,大多数额外的选项卡将位于文件中的最后一个值之后。任何人都可以帮助我。

5 个解决方案

#1


53  

You can use regex here:

你可以在这里使用正则表达式:

>>> import re
>>> strs = "foo\tbar\t\tspam"
>>> re.split(r'\t+', strs)
['foo', 'bar', 'spam']

update:

You can use str.rstrip to get rid of trailing '\t' and then apply regex.

您可以使用str.rstrip去掉尾随的'\ t'然后应用正则表达式。

>>> yas = "yas\t\tbs\tcda\t\t"
>>> re.split(r'\t+', yas.rstrip('\t'))
['yas', 'bs', 'cda']

#2


4  

You can use regexp to do this:

您可以使用regexp执行此操作:

import re
patt = re.compile("[^\t]+")


s = "a\t\tbcde\t\tef"
patt.findall(s)
['a', 'bcde', 'ef']  

#3


1  

An other regex-based solution:

另一种基于正则表达式的解决方案:

>>> strs = "foo\tbar\t\tspam"

>>> r = re.compile(r'([^\t]*)\t*')
>>> r.findall(strs)[:-1]
['foo', 'bar', 'spam']

#4


0  

Split on tab, but then remove all blank matches.

在选项卡上拆分,但随后删除所有空白匹配。

text = "hi\tthere\t\t\tmy main man"
print [splits for splits in text.split("\t") if splits is not ""]

Outputs:

['hi', 'there', 'my main man']

#5


0  

Python has support for CSV files in the eponymous csv module. It is relatively misnamed since it support much more that just comma separated values.

Python支持同名csv模块中的CSV文件。这是相对错误的名称,因为它支持更多的逗号分隔值。

If you need to go beyond basic word splitting you should take a look. Say, for example, because you are in need to deal with quoted values...

如果你需要超越基本的单词分裂,你应该看一看。比如说,因为你需要处理引用的值......

#1


53  

You can use regex here:

你可以在这里使用正则表达式:

>>> import re
>>> strs = "foo\tbar\t\tspam"
>>> re.split(r'\t+', strs)
['foo', 'bar', 'spam']

update:

You can use str.rstrip to get rid of trailing '\t' and then apply regex.

您可以使用str.rstrip去掉尾随的'\ t'然后应用正则表达式。

>>> yas = "yas\t\tbs\tcda\t\t"
>>> re.split(r'\t+', yas.rstrip('\t'))
['yas', 'bs', 'cda']

#2


4  

You can use regexp to do this:

您可以使用regexp执行此操作:

import re
patt = re.compile("[^\t]+")


s = "a\t\tbcde\t\tef"
patt.findall(s)
['a', 'bcde', 'ef']  

#3


1  

An other regex-based solution:

另一种基于正则表达式的解决方案:

>>> strs = "foo\tbar\t\tspam"

>>> r = re.compile(r'([^\t]*)\t*')
>>> r.findall(strs)[:-1]
['foo', 'bar', 'spam']

#4


0  

Split on tab, but then remove all blank matches.

在选项卡上拆分,但随后删除所有空白匹配。

text = "hi\tthere\t\t\tmy main man"
print [splits for splits in text.split("\t") if splits is not ""]

Outputs:

['hi', 'there', 'my main man']

#5


0  

Python has support for CSV files in the eponymous csv module. It is relatively misnamed since it support much more that just comma separated values.

Python支持同名csv模块中的CSV文件。这是相对错误的名称,因为它支持更多的逗号分隔值。

If you need to go beyond basic word splitting you should take a look. Say, for example, because you are in need to deal with quoted values...

如果你需要超越基本的单词分裂,你应该看一看。比如说,因为你需要处理引用的值......