使用decode()和regex取消对该字符串的转义

时间:2022-04-16 20:46:24

I have the following string and I'm trying to figure out the best practice for unescaping it.

我有下面的弦,我想找出解出它的最佳方法。

The solution has to be somewhat flexible in that I'm receiving this input from an API and I can't be absolutely certain that the current character structure (\n as opposed to \r) will always be the same.

解决方案必须具有一定的灵活性,因为我正在从一个API接收这个输入,我不能绝对确定当前字符结构(\n与\r相反)是否总是相同的。

'"If it ain\'t broke, don\'t fix it." \nWent in for a detailed car wash.\nThe attendants raved-up my engine when taking the car into the tunnel. NOTE: my car is...'

“如果它没坏,就不要修理。”\ ngo进去详细洗车。当我把车开进隧道时,服务员把我的发动机拉了起来。注意:我的车……”

This regex seems like it should work:

这个regex似乎应该可以工作:

text_excerpt = re.sub(r'[\s"\\]', ' ', raw_text_excerpt).strip()

I've aso read that decode() might work (and would be a better solution generally).

我已经读过decode()的文章(通常是更好的解决方案)。

raw_text_excerpt.decode('string_unescape')

Tried something along those lines and it didn't work. Any suggestions? Is regex best here?

沿着这些路线尝试了一些东西,但没有成功。有什么建议吗?正则表达式是最好的吗?

1 个解决方案

#1


16  

The codec you're looking for is string-escape:

你要找的解码器是弦外之音:

>>> print "\\'".decode("string-escape")
'

I'm not sure what version they added it in, though... could be an older version you're using that doesn't have it. I'm running:

我不知道他们把它加进去了,不过……可能是你使用的旧版本没有。我运行:

Python 2.6.6 (r266:84292, Mar 25 2011, 19:36:32) 
[GCC 4.5.2] on linux2

#1


16  

The codec you're looking for is string-escape:

你要找的解码器是弦外之音:

>>> print "\\'".decode("string-escape")
'

I'm not sure what version they added it in, though... could be an older version you're using that doesn't have it. I'm running:

我不知道他们把它加进去了,不过……可能是你使用的旧版本没有。我运行:

Python 2.6.6 (r266:84292, Mar 25 2011, 19:36:32) 
[GCC 4.5.2] on linux2