Python原始字符串和尾部反斜杠

时间:2022-02-21 00:08:24

I ran across something once upon a time and wondered if it was a Python "bug" or at least a misfeature. I'm curious if anyone knows of any justifications for this behavior. I thought of it just now reading "Code Like a Pythonista," which has been enjoyable so far. I'm only familiar with the 2.x line of Python.

我曾经碰过一些东西,想知道这是一个Python“bug”还是至少是一个错误的。我很好奇是否有人知道这种行为有任何理由。我刚才想起了“Code like a Pythonista”,到目前为止一直很愉快。我只熟悉2.x系列的Python。

Raw strings are strings that are prefixed with an r. This is great because I can use backslashes in regular expressions and I don't need to double everything everywhere. It's also handy for writing throwaway scripts on Windows, so I can use backslashes there also. (I know I can also use forward slashes, but throwaway scripts often contain content cut&pasted from elsewhere in Windows.)

原始字符串是以r为前缀的字符串。这很棒,因为我可以在正则表达式中使用反斜杠,而且我不需要在任何地方加倍。在Windows上编写一次性脚本也很方便,所以我也可以在那里使用反斜杠。 (我知道我也可以使用正斜杠,但是一次性脚本通常包含从Windows其他地方剪切和粘贴的内容。)

So great! Unless, of course, you really want your string to end with a backslash. There's no way to do that in a 'raw' string.

很好!当然,除非你真的希望你的字符串以反斜杠结尾。在'原始'字符串中无法做到这一点。

In [9]: r'\n'
Out[9]: '\\n'

In [10]: r'abc\n'
Out[10]: 'abc\\n'

In [11]: r'abc\'
------------------------------------------------
   File "<ipython console>", line 1
     r'abc\'
           ^
SyntaxError: EOL while scanning string literal


In [12]: r'abc\\'
Out[12]: 'abc\\\\'

So one backslash before the closing quote is an error, but two backslashes gives you two backslashes! Certainly I'm not the only one that is bothered by this?

所以在结束引用之前的一个反斜杠是一个错误,但是两个反斜杠给你两个反斜杠!当然,我不是唯一一个被这个困扰的人吗?

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa. If I wanted both, I'd just triple quote. If I really wanted three quotes in a row in a raw string, well, I guess I'd have to deal, but is this considered "proper behavior"?

关于为什么'原始'字符串是'原始的,除了反斜杠引用'的想法?我的意思是,如果我想在那里嵌入单引号,我只是在字符串周围使用双引号,反之亦然。如果我想要两者,我只是三重引用。如果我真的想在原始字符串中连续使用三个引号,那么,我想我必须处理,但这被认为是“正确的行为”吗?

This is particularly problematic with folder names in Windows, where the backslash is the path delimeter.

对于Windows中的文件夹名称,这尤其成问题,其中反斜杠是路径分隔符。

4 个解决方案

#1


18  

It's a FAQ.

这是一个FAQ。

And in response to "you really want your string to end with a backslash. There's no way to do that in a 'raw' string.": the FAQ shows how to workaround it.

并且回应“你真的希望你的字符串以反斜杠结束。在'原始'字符串中没有办法做到这一点。”:常见问题解答显示了如何解决它。

>>> r'ab\c' '\\' == 'ab\\c\\'
True
>>>

#2


4  

Raw strings are meant mostly for readably writing the patterns for regular expressions, which never need a trailing backslash; it's an accident that they may come in handy for Windows (where you could use forward slashes in most cases anyway -- the Microsoft C library which underlies Python accepts either form!). It's not cosidered acceptable to make it (nearly) impossible to write a regular expression pattern containing both single and double quotes, just to reinforce the accident in question.

原始字符串主要用于可读地为正则表达式编写模式,这些模式永远不需要尾随反斜杠;对于Windows来说,它们可能会派上用场(在大多数情况下你可以使用正斜杠,这是一个偶然事件 - 作为Python的基础的Microsoft C库接受任何一种形式!)。使(几乎)不可能编写包含单引号和双引号的正则表达式模式,只是为了加强有关事故,这是不可接受的。

("Nearly" because triple-quoting would almost alway help... but it could be a little bit of a pain sometimes).

(“几乎”因为三重引用几乎总是有帮助......但有时可能会有点痛苦)。

So, yes, raw strings were designed to behave that way (forbidding odd numbers of trailing backslashes), and it is considered perfectly "proper behavior" for them to respect the design decisions Guido made when he invented them;-).

所以,是的,原始字符串被设计成以这种方式行事(禁止奇数尾随反斜杠),并且它被认为是完全“正确的行为”,以尊重Guido在他们发明它们时所做的设计决定;-)。

#3


3  

Another way to workaround this is:

另一种解决方法是:

 >>> print r"Raw \with\ trailing backslash\\"[:-1]
 Raw \with\ trailing backslash\

#4


0  

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa.

关于为什么'原始'字符串是'原始的,除了反斜杠引用'的想法?我的意思是,如果我想在那里嵌入单引号,我只是在字符串周围使用双引号,反之亦然。

But that would then raise the question as to why raw strings are 'raw, except for embedded quotes?'

但那会引发一个问题,即为什么原始字符串是“原始字符串,除了嵌入式引号?”

You have to have some escape mechanism, otherwise you can never use the outer quote characters inside the string at all. And then you need an escape mechanism for the escape mechanism.

你必须有一些转义机制,否则你永远不能在字符串中使用外引号字符。然后你需要一个逃生机制的逃生机制。

#1


18  

It's a FAQ.

这是一个FAQ。

And in response to "you really want your string to end with a backslash. There's no way to do that in a 'raw' string.": the FAQ shows how to workaround it.

并且回应“你真的希望你的字符串以反斜杠结束。在'原始'字符串中没有办法做到这一点。”:常见问题解答显示了如何解决它。

>>> r'ab\c' '\\' == 'ab\\c\\'
True
>>>

#2


4  

Raw strings are meant mostly for readably writing the patterns for regular expressions, which never need a trailing backslash; it's an accident that they may come in handy for Windows (where you could use forward slashes in most cases anyway -- the Microsoft C library which underlies Python accepts either form!). It's not cosidered acceptable to make it (nearly) impossible to write a regular expression pattern containing both single and double quotes, just to reinforce the accident in question.

原始字符串主要用于可读地为正则表达式编写模式,这些模式永远不需要尾随反斜杠;对于Windows来说,它们可能会派上用场(在大多数情况下你可以使用正斜杠,这是一个偶然事件 - 作为Python的基础的Microsoft C库接受任何一种形式!)。使(几乎)不可能编写包含单引号和双引号的正则表达式模式,只是为了加强有关事故,这是不可接受的。

("Nearly" because triple-quoting would almost alway help... but it could be a little bit of a pain sometimes).

(“几乎”因为三重引用几乎总是有帮助......但有时可能会有点痛苦)。

So, yes, raw strings were designed to behave that way (forbidding odd numbers of trailing backslashes), and it is considered perfectly "proper behavior" for them to respect the design decisions Guido made when he invented them;-).

所以,是的,原始字符串被设计成以这种方式行事(禁止奇数尾随反斜杠),并且它被认为是完全“正确的行为”,以尊重Guido在他们发明它们时所做的设计决定;-)。

#3


3  

Another way to workaround this is:

另一种解决方法是:

 >>> print r"Raw \with\ trailing backslash\\"[:-1]
 Raw \with\ trailing backslash\

#4


0  

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa.

关于为什么'原始'字符串是'原始的,除了反斜杠引用'的想法?我的意思是,如果我想在那里嵌入单引号,我只是在字符串周围使用双引号,反之亦然。

But that would then raise the question as to why raw strings are 'raw, except for embedded quotes?'

但那会引发一个问题,即为什么原始字符串是“原始字符串,除了嵌入式引号?”

You have to have some escape mechanism, otherwise you can never use the outer quote characters inside the string at all. And then you need an escape mechanism for the escape mechanism.

你必须有一些转义机制,否则你永远不能在字符串中使用外引号字符。然后你需要一个逃生机制的逃生机制。