Python正则表达式中的方括号(re.sub)

时间:2020-12-13 21:44:52

The Issue

I'm migrating wiki pages from the FlexWiki engine to the FOSwiki engine using Python regular expressions to handle the differences between the two engines' markup languages.

我正在使用Python正则表达式将wiki页面从FlexWiki引擎迁移到FOSwiki引擎,以处理两个引擎的标记语言之间的差异。

The FlexWiki markup and the FOSwiki markup, for reference.

FlexWiki标记和FOSwiki标记供参考。

Most of the conversion works very well, except when I try to convert the renamed links. Both wikis support renamed links in their markup.

大多数转换工作得很好,除了当我试图转换重命名的链接时。这两个wiki都支持在其标记中重命名的链接。

For example, Flexwiki uses:

例如,Flexwiki用途:

"Link To Wikipedia":[http://www.wikipedia.org/]

FOSwiki uses:

FOSwiki用途:

[[http://www.wikipedia.org/][Link To Wikipedia]]

Both of which produce a rewritten hyperlink.

两者都会产生一个重写的超链接。

I'm using the regular expression

我用的是正则表达式

renameLink = re.compile ("\"(?P<linkText>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")

to parse out the link elements from the FlexWiki markup, which after running through something like

从FlexWiki标记中解析链接元素,在运行完类似的内容后

"Link Text":[LinkTarget]

is reliably producing groups

可靠的生产组织

<linkText> = Link Text
<linkTarget = LinkTarget

My issue occurs when I try to use re.sub to insert the parsed content into the FOSwiki markup.

当我尝试使用re.sub将解析后的内容插入到FOSwiki标记中时,就会出现问题。

My experience with regular expressions isn't anything to write home about, but I'm under the impression that, given the groups

我在正则表达式方面的经验没什么好写的,但我的印象是,考虑到这些组

<linkText> = Link text
<linkTarget = LinkTarget

a line like

一行像

line = renameLink.sub ( "[[\g<linkTarget>][\g<linkText>]]" , line )

should produce

应该产生

[[LinkTarget][Link Text]]

However, in the output to the text files I'm getting

但是,在文本文件的输出中

[[LinkTarget [[Link Text]]

which breaks the renamed links.

它会破坏重命名的链接。

After a little bit of fiddling I managed a workaround, where

在做了一点手脚之后,我找到了一个解决办法

line = renameLink.sub ( "[[\g<linkTarget>][ [\g<linkText>]]" , line )

produces

生产

[[LinkTarget][ [[Link Text]]

which, when displayed in FOSwiki looks like

在FOSwiki中显示的是哪个

[[Link Text

Which WORKS, but isn't very pretty.

这是可行的,但不是很漂亮。

There are probably thousands of instances of these renamed links in the pages I'm trying to convert, so fixing it by hand isn't any good. For the record I've run the script under Python 2.5.4 and Python 2.7.3, and gotten the same results.

在我试图转换的页面中,这些重命名的链接可能有数千个实例,因此手工修复它没有任何好处。为了记录,我在Python 2.5.4和Python 2.7.3下运行了脚本,并得到了相同的结果。

Am I missing something really obvious with the syntax? Or is there an easy workaround?

我是否遗漏了语法中很明显的东西?还是有一个简单的解决办法?

Solution

There wasn't anything wrong with the original expression.

最初的表情没有任何问题。

I started running through the other regex's in my script and commented out lines I thought might be overlapping with the renamed-link expression. That appears to have done the trick, and as a semi-permanent fix I've separated the link-focused expressions and the other expressions into separate scripts, which I run one after another.

我开始遍历脚本中的其他regex,并注释掉了我认为可能与重命名链接表达式重叠的行。这似乎已经达到了目的,作为一种半永久性的修复,我将关注链接的表达式和其他表达式分离到单独的脚本中,我一个接一个地运行这些脚本。

I guess them moral here is to double-check that you don't have overlapping expressions.

我猜他们在这里的寓意是反复检查你没有重叠的表达式。

Attempted Solutions (Just see Solution above)

String addition

字符串添加

line = renameLink.sub ( "[[\g<linkTarget>]" + "[\g<linkText>]]" , line )

produces

生产

[[linkTarget [[Link Text]]

It doesn't matter how you slice the concatenation, the result is the same.

无论你如何分割连接,结果都是一样的。

Escaping the square brackets, e.g.

转义方括号,例如。

line = renameLink.sub ( "\[\[\g<linkTarget>\]\[\g<linkName>\]\]" , line )

produces

生产

\[ [[LinkTarget\]] [Link Text\]\]

3 个解决方案

#1


0  

a line like

一行像

line = renameLink.sub ( "[[\g<linkTarget>][\g<linkText>]]" , line )

should produce

应该产生

[[LinkTarget][Link Text]]

And it does. Example:

和它。例子:

line = r""""Link Text":[LinkTarget]"""
renameLink = re.compile("\"(?P<linkText>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")
print(renameLink.sub ("[[\g<linkTarget>][\g<linkText>]]", line))

Output:

输出:

[[LinkTarget][Link Text]]

You probably have issues elsewhere than your expression.

除了表达方式之外,你可能还有别的问题。

#2


3  

Flexwiki-to-FOSwiki

Code:

代码:

import re
text = '"Link To Wikipedia":[http://www.wikipedia.org/]'
print re.sub(r'"([^"]+)":\[([^\]]+)\]', r'[[\2][\1]]', text)

Output:

输出:

[[http://www.wikipedia.org/][Link To Wikipedia]]

See and test the code here.

查看并测试这里的代码。

#3


0  

I tried exactly as you said. I am using python 2.7.1 version.

我照你说的做了。我使用的是python 2.7.1版本。

Here is the result

这是结果

>>> text = '"Link To Wikipedia":[http://www.wikipedia.org/]'
>>> renameLink = re.compile ("\"(?P<linkText>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")
>>> s = renameLink.match(text)
>>> lnkname, lnk = s.groups()
>>> substr = "[[%s][%s]]" % (lnk, lnkname)
>>> renameLink.sub(substr, text)
'[[http://www.wikipedia.org/][Link To Wikipedia]]'

It works out all fine.

结果一切正常。

#1


0  

a line like

一行像

line = renameLink.sub ( "[[\g<linkTarget>][\g<linkText>]]" , line )

should produce

应该产生

[[LinkTarget][Link Text]]

And it does. Example:

和它。例子:

line = r""""Link Text":[LinkTarget]"""
renameLink = re.compile("\"(?P<linkText>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")
print(renameLink.sub ("[[\g<linkTarget>][\g<linkText>]]", line))

Output:

输出:

[[LinkTarget][Link Text]]

You probably have issues elsewhere than your expression.

除了表达方式之外,你可能还有别的问题。

#2


3  

Flexwiki-to-FOSwiki

Code:

代码:

import re
text = '"Link To Wikipedia":[http://www.wikipedia.org/]'
print re.sub(r'"([^"]+)":\[([^\]]+)\]', r'[[\2][\1]]', text)

Output:

输出:

[[http://www.wikipedia.org/][Link To Wikipedia]]

See and test the code here.

查看并测试这里的代码。

#3


0  

I tried exactly as you said. I am using python 2.7.1 version.

我照你说的做了。我使用的是python 2.7.1版本。

Here is the result

这是结果

>>> text = '"Link To Wikipedia":[http://www.wikipedia.org/]'
>>> renameLink = re.compile ("\"(?P<linkText>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")
>>> s = renameLink.match(text)
>>> lnkname, lnk = s.groups()
>>> substr = "[[%s][%s]]" % (lnk, lnkname)
>>> renameLink.sub(substr, text)
'[[http://www.wikipedia.org/][Link To Wikipedia]]'

It works out all fine.

结果一切正常。