Regex提取城市名称(.NET)

时间:2022-09-23 21:24:02

Looking for an expression to extract City Names from addresses. Trying to use this expression in WebHarvy which uses the .NET flavor of regex

查找从地址中提取城市名称的表达式。尝试在WebHarvy中使用这个表达式,它使用了regex的。net风格

Example address

例如地址

1234 Savoy Dr Ste 123
New Houston, TX 77036-3320

or

1234 Savoy Dr Ste 510
Texas, TX 77036-3320

So the city name could be single or two words.

所以城市的名字可以是一个或两个词。

The expression I am trying is

我正在尝试的表达方式是

(\w|\w\s\w)+(?=,\s\w{2})

When I am trying this on RegexStorm it seems to be working fine, but when I am using this in WebHarvy, it only captures the 'n' from the city name New Houston and 'n' from Austin

当我在RegexStorm上尝试这个时,它看起来运行良好,但是当我在webharvard使用它时,它只捕获了来自城市名New Houston和Austin的“n”。

Where am I going wrong?

我哪里做错了?

1 个解决方案

#1


2  

In WebHarvey, if a regex contains a capturing group, its contents are returned. Thus, you do not need a lookahead.

在WebHarvey中,如果regex包含捕获组,则返回其内容。因此,您不需要预先考虑。

Another point is that you need to match 1 or more word chars, optionally followed with a chunk of whitespaces followed with 1 or more word chars. Your regex contains a repeated capturing group whose contents are re-written upon each iteration and after it finds matching, Group 1 only contains n:

另一个要点是,您需要匹配一个或多个单词chars,可选地跟随一大块空白,然后加上一个或多个单词字符。您的regex包含一个重复捕获组,其内容在每次迭代时都被重写,在找到匹配后,组1仅包含n:

Regex提取城市名称(.NET)

Use

使用

(\w+(?:[^\S\r\n]+\w+)?),\s\w{2})

See the regex demo here

参见这里的regex演示。

The [^\S\r\n]+ part matches any whitespace except CR and LF. You may use [\p{Zs}\t]+ to match any 1+ horizontal whitespaces.

(^ \ S \ r \ n)+匹配任何空白除了CR和低频部分。您可以使用[\p{Zs}]+来匹配任何1+水平的空白。

Regex提取城市名称(.NET)

#1


2  

In WebHarvey, if a regex contains a capturing group, its contents are returned. Thus, you do not need a lookahead.

在WebHarvey中,如果regex包含捕获组,则返回其内容。因此,您不需要预先考虑。

Another point is that you need to match 1 or more word chars, optionally followed with a chunk of whitespaces followed with 1 or more word chars. Your regex contains a repeated capturing group whose contents are re-written upon each iteration and after it finds matching, Group 1 only contains n:

另一个要点是,您需要匹配一个或多个单词chars,可选地跟随一大块空白,然后加上一个或多个单词字符。您的regex包含一个重复捕获组,其内容在每次迭代时都被重写,在找到匹配后,组1仅包含n:

Regex提取城市名称(.NET)

Use

使用

(\w+(?:[^\S\r\n]+\w+)?),\s\w{2})

See the regex demo here

参见这里的regex演示。

The [^\S\r\n]+ part matches any whitespace except CR and LF. You may use [\p{Zs}\t]+ to match any 1+ horizontal whitespaces.

(^ \ S \ r \ n)+匹配任何空白除了CR和低频部分。您可以使用[\p{Zs}]+来匹配任何1+水平的空白。

Regex提取城市名称(.NET)