I have a few thousand strings that have one of these two forms:
我有几千个字符串,有这两种形式之一:
SomeT1tle-ThatL00ks L1k3.this - $3.57 KnownWord
SomeT1tle-ThatL00ks L1k3.this - $ 3.57 KnownWord
SomeT1tle-ThatL00ks L1k3.that - 4.5% KnownWord
SomeT1tle-ThatL00ks L1k3.that - 4.5%KnownWord
The SomeT1tle-ThatL00ks L1ke.this
part may contain uppercase and lowercase characters, digits, periods, dashes, and spaces. It is always followed by a space-dash-space pattern.
SomeT1tle-ThatL00ks L1ke.this部分可能包含大写和小写字符,数字,句点,短划线和空格。它始终是一个空间破折号空间模式。
I want to pull out the Title (the part before the space-dash-space separator) and the Amount, which is right before KnownWord
.
我想拉出Title(Space-dash-space分隔符之前的部分)和Amount,它就在KnownWord之前。
So for these two strings I'd like:
所以对于这两个字符串,我想:
SomeT1tle-ThatL00ks L1k3.this, $3.57
and
SomeT1tle-ThatL00ks L1k3.this,3.57美元和
SomeT1tle-ThatL00ks L1k3.that, 4.5%
.
SomeT1tle-ThatL00ks L1k3.that,4.5%。
This code works (using Perl equivalent Regular Expressions)
此代码有效(使用Perl等效正则表达式)
$my_string = "SomeT1tle-ThatL00ks L1k3.this - $3.57 KnownWord";
$pattern_title = "/^(.*?)\x20\x2d\x20/";
$pattern_amount = "/([0-9.$%]+) KnownWord$/";
preg_match_all($pattern_title, $my_string, $matches_title);
preg_match_all($pattern_amount, $my_string, $matches_amount);
echo $matches_title[1][0] . " " . $matches_amount[1][0] . "<br>";
I tried putting both patterns together:
我尝试将两种模式放在一起:
$pattern_together_doesnt_work = "/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/";
but the first part of the pattern always matches the whole thing, even with the "lazy" part (.*?
rather than .*
). I can't negative-match spaces and dashes, because the title itself can contain either.
但是模式的第一部分总是匹配整个事物,即使是“懒惰”部分(。*?而不是。*)。我不能对空格和破折号进行否定匹配,因为标题本身可以包含任何一个。
Any hints?
1 个解决方案
#1
1
Use this pattern
使用此模式
/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/
#1
1
Use this pattern
使用此模式
/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/