Python从配置文件中删除一个字符串

时间:2022-03-31 00:32:14

I have a config file that has a string which is tab separated. I want to retrieve that string and then convert that to a nice list. But, I am seeing some interesting things that I do not see when I do it directly on iPython.

我有一个配置文件,其中包含一个以制表符分隔的字符串。我想检索该字符串,然后将其转换为一个很好的列表。但是,当我直接在iPython上做这件事时,我看到了一些我看不到的有趣的东西。

[myvars]
myString = "a\tb\tc\td"
.....
.....<many more variables>

My Python code has this:

我的Python代码有这个:

param_dict = dict(config.items(myvars))
str1 = param_dict["myString"]
print str1
print str1.split()

And it prints out this:

它打印出来:

"a\tb\tc\td"
['"a\\tb\\tc\\td"']

But, when I do the same thing on my python console, I get what I expect:

但是,当我在我的python控制台上做同样的事情时,我得到了我所期望的:

Python 2.7.6 (default, Mar 22 2014, 22:59:38) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "a\tb\tc\td".split()
['a', 'b', 'c', 'd']
>>> k = "a\tb\tc\td"
>>> k.split()
['a', 'b', 'c', 'd']

What is going on here? Can someone help me out? I cannot change the format of the config file variables. And, I want to get the variable out and strip into a nice list.

这里发生了什么?有人可以帮我吗?我无法更改配置文件变量的格式。而且,我想把变量拿出来并剥离成一个很好的列表。

Thanks.

谢谢。

3 个解决方案

#1


6  

The backslash is being read in here, you don't see this when you print the plain string, but do if you print the repr.

这里正在读取反斜杠,打印普通字符串时看不到这一点,但是如果打印repr则会看到。

In [11]: myString = "a\\tb\\tc\\td"

In [12]: print(myString)
a\tb\tc\td

In [13]: print(repr(myString))
'a\\tb\\tc\\td'

You can use decode to convert \\t to \t:

您可以使用decode将\\ t转换为\ t:

In [14]: myString.decode('string_escape')
Out[14]: 'a\tb\tc\td'

Once they are tabs you can split on them:

一旦它们成为标签,您可以拆分它们:

In [15]: myString.split()
Out[15]: ['a\\tb\\tc\\td']

In [16]: myString.decode('string_escape').split()
Out[16]: ['a', 'b', 'c', 'd']

#2


3  

From what I see, you are mistakenly thinking that your string is tab separated in your file where it is separated by the two characters "\" and "t" which is a representation of a tab. This is shown by the representation with escaped backslashes: "a\\tb" instead of "a\tb"

从我看到的情况来看,你错误地认为你的字符串是在文件中以制表符分隔的,它用两个字符“\”和“t”分隔,它是一个制表符的表示。这通过带有转义反斜杠的表示来显示:“a \\ tb”而不是“a \ tb”

As no spacing character is present, sort doesn't know how to split the string.

由于不存在间距字符,sort不知道如何拆分字符串。

You can specifiy a different delimiter in split, here the two characters \ t:

你可以在split中指定一个不同的分隔符,这里是两个字符\ t:

str1.split("\\t")

#3


3  

That happens because in your "script" you don't have "a\tb\tc\td" you really have "a\\tb\\tc\\td" but if you make a print of "a\\tb\\tc\\td" it will output "a\tb\tc\td"

发生这种情况是因为在你的“脚本”中你没有“a \ tb \ tc \ td”你真的有“一个\\ tb \\ tc \\ td”,但如果你打印“a \\ tb \” \ tc \\ td“它将输出”a \ tb \ tc \ td \ t“

print myString
Output: 'a\tb\tc\td'
print repr(myString)
Output: 'a\\tb\\tc\\td'

You may user the function decode to convert the string from 'a\\tb\\tc\\td' to 'a\tb\tc\td' and then split or whatever you need

你可以使用函数解码将字符串从'a \\ tb \\ tc \\ td'转换为'a \ tb \ tc \ td'然后拆分或者你需要的任何东西

import re
myString = "a\\tb\\tc\\td"

# I prefer to use regular expressions to deal with strings:
myString = re.sub(r'\W','', myString.decode('string_escape'))
print myString
Output: 'abcd'

# Or you can use split also
myString = myString.decode('string_escape').split()
print myString
Output: ['a', 'b', 'c', 'd']

#1


6  

The backslash is being read in here, you don't see this when you print the plain string, but do if you print the repr.

这里正在读取反斜杠,打印普通字符串时看不到这一点,但是如果打印repr则会看到。

In [11]: myString = "a\\tb\\tc\\td"

In [12]: print(myString)
a\tb\tc\td

In [13]: print(repr(myString))
'a\\tb\\tc\\td'

You can use decode to convert \\t to \t:

您可以使用decode将\\ t转换为\ t:

In [14]: myString.decode('string_escape')
Out[14]: 'a\tb\tc\td'

Once they are tabs you can split on them:

一旦它们成为标签,您可以拆分它们:

In [15]: myString.split()
Out[15]: ['a\\tb\\tc\\td']

In [16]: myString.decode('string_escape').split()
Out[16]: ['a', 'b', 'c', 'd']

#2


3  

From what I see, you are mistakenly thinking that your string is tab separated in your file where it is separated by the two characters "\" and "t" which is a representation of a tab. This is shown by the representation with escaped backslashes: "a\\tb" instead of "a\tb"

从我看到的情况来看,你错误地认为你的字符串是在文件中以制表符分隔的,它用两个字符“\”和“t”分隔,它是一个制表符的表示。这通过带有转义反斜杠的表示来显示:“a \\ tb”而不是“a \ tb”

As no spacing character is present, sort doesn't know how to split the string.

由于不存在间距字符,sort不知道如何拆分字符串。

You can specifiy a different delimiter in split, here the two characters \ t:

你可以在split中指定一个不同的分隔符,这里是两个字符\ t:

str1.split("\\t")

#3


3  

That happens because in your "script" you don't have "a\tb\tc\td" you really have "a\\tb\\tc\\td" but if you make a print of "a\\tb\\tc\\td" it will output "a\tb\tc\td"

发生这种情况是因为在你的“脚本”中你没有“a \ tb \ tc \ td”你真的有“一个\\ tb \\ tc \\ td”,但如果你打印“a \\ tb \” \ tc \\ td“它将输出”a \ tb \ tc \ td \ t“

print myString
Output: 'a\tb\tc\td'
print repr(myString)
Output: 'a\\tb\\tc\\td'

You may user the function decode to convert the string from 'a\\tb\\tc\\td' to 'a\tb\tc\td' and then split or whatever you need

你可以使用函数解码将字符串从'a \\ tb \\ tc \\ td'转换为'a \ tb \ tc \ td'然后拆分或者你需要的任何东西

import re
myString = "a\\tb\\tc\\td"

# I prefer to use regular expressions to deal with strings:
myString = re.sub(r'\W','', myString.decode('string_escape'))
print myString
Output: 'abcd'

# Or you can use split also
myString = myString.decode('string_escape').split()
print myString
Output: ['a', 'b', 'c', 'd']