I have built a "Currency Tagger" in Python which identifies all currency expressions and replaces them with a tagged string.
我在Python中构建了一个“货币标记器”,它标识所有货币表达式并用标记的字符串替换它们。
Example,
replace "I have $20 in my pocket"
with "I have <Currency>$20</Currency> in my pocket"
例如,将“我的口袋里有20美元”替换为“我的口袋里有 <货币> 20美元 ”
One of the tasks requires me to substitute the string identified as Currency with the tagged string. I am using re.sub()
to do this.
其中一项任务要求我用标记字符串替换标识为Currency的字符串。我正在使用re.sub()来执行此操作。
It works perfectly for every form of string except of the form "$4.4B" or "$4.4M".
除了“$ 4.4B”或“$ 4.4M”之外,它适用于各种形式的字符串。
I tried running simple example in my python console and found that re.sub()
works inconsistently with patterns which have a mixed dollar pattern.
我尝试在我的python控制台中运行简单的示例,发现re.sub()与具有混合美元模式的模式不一致。
For example,
例如,
>>> text = "I have #20 in my pocket"
>>> re.sub("#20", "$20", text)
'I have $20 in my pocket'
>>> text = "I have $20 in my pocket"
>>> re.sub("$20", "#20", text)
'I have $20 in my pocket'
In the above example you see that when I am trying to replace "$20" with "#20" it does not work (in the second case).
在上面的例子中,您会看到当我尝试将“$ 20”替换为“#20”时,它不起作用(在第二种情况下)。
Any help would be greatly appreciated of course. A very silly bug has cropped up and is stalling major work because of this.
当然,任何帮助都将非常感激。由于这个原因,一个非常愚蠢的虫子出现了,并且正在拖延主要工作。
2 个解决方案
#1
6
$
is a special character
.So if you want to replace it use
$是一个特殊字符。所以如果你想替换它使用
re.sub(r"\$20", "#20", text)
^^
You will have to escape
it.Also use r
mode to avoid escaping
problems.
你将不得不逃避它。也使用r模式来避免逃避问题。
$
means end of string.So your regex was being ineffective.
$表示字符串的结尾。所以你的正则表达式无效。
#2
0
Unless you are using regular expressions (and you don't seem to be), there is no reason to use the "re" module.
除非您使用正则表达式(并且您似乎没有),否则没有理由使用“re”模块。
Just use the .replace() method of strings:
只需使用字符串的.replace()方法:
text.replace("#20", "$20")
#1
6
$
is a special character
.So if you want to replace it use
$是一个特殊字符。所以如果你想替换它使用
re.sub(r"\$20", "#20", text)
^^
You will have to escape
it.Also use r
mode to avoid escaping
problems.
你将不得不逃避它。也使用r模式来避免逃避问题。
$
means end of string.So your regex was being ineffective.
$表示字符串的结尾。所以你的正则表达式无效。
#2
0
Unless you are using regular expressions (and you don't seem to be), there is no reason to use the "re" module.
除非您使用正则表达式(并且您似乎没有),否则没有理由使用“re”模块。
Just use the .replace() method of strings:
只需使用字符串的.replace()方法:
text.replace("#20", "$20")