I have a config file that has a string which is tab separated. I want to retrieve that string and then convert that to a nice list. But, I am seeing some interesting things that I do not see when I do it directly on iPython.
我有一个配置文件,其中包含一个以制表符分隔的字符串。我想检索该字符串,然后将其转换为一个很好的列表。但是,当我直接在iPython上做这件事时,我看到了一些我看不到的有趣的东西。
[myvars]
myString = "a\tb\tc\td"
.....
.....<many more variables>
My Python code has this:
我的Python代码有这个:
param_dict = dict(config.items(myvars))
str1 = param_dict["myString"]
print str1
print str1.split()
And it prints out this:
它打印出来:
"a\tb\tc\td"
['"a\\tb\\tc\\td"']
But, when I do the same thing on my python console, I get what I expect:
但是,当我在我的python控制台上做同样的事情时,我得到了我所期望的:
Python 2.7.6 (default, Mar 22 2014, 22:59:38)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "a\tb\tc\td".split()
['a', 'b', 'c', 'd']
>>> k = "a\tb\tc\td"
>>> k.split()
['a', 'b', 'c', 'd']
What is going on here? Can someone help me out? I cannot change the format of the config file variables. And, I want to get the variable out and strip into a nice list.
这里发生了什么?有人可以帮我吗?我无法更改配置文件变量的格式。而且,我想把变量拿出来并剥离成一个很好的列表。
Thanks.
谢谢。
3 个解决方案
#1
6
The backslash is being read in here, you don't see this when you print the plain string, but do if you print the repr.
这里正在读取反斜杠,打印普通字符串时看不到这一点,但是如果打印repr则会看到。
In [11]: myString = "a\\tb\\tc\\td"
In [12]: print(myString)
a\tb\tc\td
In [13]: print(repr(myString))
'a\\tb\\tc\\td'
You can use decode to convert \\t
to \t
:
您可以使用decode将\\ t转换为\ t:
In [14]: myString.decode('string_escape')
Out[14]: 'a\tb\tc\td'
Once they are tabs you can split on them:
一旦它们成为标签,您可以拆分它们:
In [15]: myString.split()
Out[15]: ['a\\tb\\tc\\td']
In [16]: myString.decode('string_escape').split()
Out[16]: ['a', 'b', 'c', 'd']
#2
3
From what I see, you are mistakenly thinking that your string is tab separated in your file where it is separated by the two characters "\" and "t" which is a representation of a tab. This is shown by the representation with escaped backslashes: "a\\tb" instead of "a\tb"
从我看到的情况来看,你错误地认为你的字符串是在文件中以制表符分隔的,它用两个字符“\”和“t”分隔,它是一个制表符的表示。这通过带有转义反斜杠的表示来显示:“a \\ tb”而不是“a \ tb”
As no spacing character is present, sort doesn't know how to split the string.
由于不存在间距字符,sort不知道如何拆分字符串。
You can specifiy a different delimiter in split, here the two characters \ t:
你可以在split中指定一个不同的分隔符,这里是两个字符\ t:
str1.split("\\t")
#3
3
That happens because in your "script" you don't have "a\tb\tc\td"
you really have "a\\tb\\tc\\td"
but if you make a print of "a\\tb\\tc\\td"
it will output "a\tb\tc\td"
发生这种情况是因为在你的“脚本”中你没有“a \ tb \ tc \ td”你真的有“一个\\ tb \\ tc \\ td”,但如果你打印“a \\ tb \” \ tc \\ td“它将输出”a \ tb \ tc \ td \ t“
print myString
Output: 'a\tb\tc\td'
print repr(myString)
Output: 'a\\tb\\tc\\td'
You may user the function decode
to convert the string from 'a\\tb\\tc\\td'
to 'a\tb\tc\td'
and then split or whatever you need
你可以使用函数解码将字符串从'a \\ tb \\ tc \\ td'转换为'a \ tb \ tc \ td'然后拆分或者你需要的任何东西
import re
myString = "a\\tb\\tc\\td"
# I prefer to use regular expressions to deal with strings:
myString = re.sub(r'\W','', myString.decode('string_escape'))
print myString
Output: 'abcd'
# Or you can use split also
myString = myString.decode('string_escape').split()
print myString
Output: ['a', 'b', 'c', 'd']
#1
6
The backslash is being read in here, you don't see this when you print the plain string, but do if you print the repr.
这里正在读取反斜杠,打印普通字符串时看不到这一点,但是如果打印repr则会看到。
In [11]: myString = "a\\tb\\tc\\td"
In [12]: print(myString)
a\tb\tc\td
In [13]: print(repr(myString))
'a\\tb\\tc\\td'
You can use decode to convert \\t
to \t
:
您可以使用decode将\\ t转换为\ t:
In [14]: myString.decode('string_escape')
Out[14]: 'a\tb\tc\td'
Once they are tabs you can split on them:
一旦它们成为标签,您可以拆分它们:
In [15]: myString.split()
Out[15]: ['a\\tb\\tc\\td']
In [16]: myString.decode('string_escape').split()
Out[16]: ['a', 'b', 'c', 'd']
#2
3
From what I see, you are mistakenly thinking that your string is tab separated in your file where it is separated by the two characters "\" and "t" which is a representation of a tab. This is shown by the representation with escaped backslashes: "a\\tb" instead of "a\tb"
从我看到的情况来看,你错误地认为你的字符串是在文件中以制表符分隔的,它用两个字符“\”和“t”分隔,它是一个制表符的表示。这通过带有转义反斜杠的表示来显示:“a \\ tb”而不是“a \ tb”
As no spacing character is present, sort doesn't know how to split the string.
由于不存在间距字符,sort不知道如何拆分字符串。
You can specifiy a different delimiter in split, here the two characters \ t:
你可以在split中指定一个不同的分隔符,这里是两个字符\ t:
str1.split("\\t")
#3
3
That happens because in your "script" you don't have "a\tb\tc\td"
you really have "a\\tb\\tc\\td"
but if you make a print of "a\\tb\\tc\\td"
it will output "a\tb\tc\td"
发生这种情况是因为在你的“脚本”中你没有“a \ tb \ tc \ td”你真的有“一个\\ tb \\ tc \\ td”,但如果你打印“a \\ tb \” \ tc \\ td“它将输出”a \ tb \ tc \ td \ t“
print myString
Output: 'a\tb\tc\td'
print repr(myString)
Output: 'a\\tb\\tc\\td'
You may user the function decode
to convert the string from 'a\\tb\\tc\\td'
to 'a\tb\tc\td'
and then split or whatever you need
你可以使用函数解码将字符串从'a \\ tb \\ tc \\ td'转换为'a \ tb \ tc \ td'然后拆分或者你需要的任何东西
import re
myString = "a\\tb\\tc\\td"
# I prefer to use regular expressions to deal with strings:
myString = re.sub(r'\W','', myString.decode('string_escape'))
print myString
Output: 'abcd'
# Or you can use split also
myString = myString.decode('string_escape').split()
print myString
Output: ['a', 'b', 'c', 'd']