正则表达式处理单词中的点(。)

时间:2022-07-15 16:51:58

I'm having a hard time on a regex expression.

我在表达正则表达式时遇到了困难。

It's only requirement is that if there is a dot (.) in the word, there must be a letter on either side of the dot. There can be any number of dots in the word and any number of letters in between the dots. There just has to be a letter on either side of a dot.

唯一的要求是,如果单词中有一个点(。),则点的两边必须有一个字母。单词中可以有任意数量的点,点之间可以有任意数量的字母。点的两边只需要一个字母。

I have the mostly figured it out but I am having issue with dots that are only separated by one letter (see example below)

我有大部分想出来,但我遇到的问题是只用一个字母分隔的点(见下面的例子)

Currently I have this expression:

目前我有这个表达方式:

^(\s*[0-9A-Za-z]{1,}[.]{0,1}[0-9A-Za-z]{1,}\s*)+$

this works for the following:

这适用于以下内容:

  1. dot.InWord
  2. Multiple.dots.In.Word
  3. d.ot.s
  4. t.wo.Le.tt.er.sB.et.we.en.do.ts

However, this does not work for words if the dots are only seperated by one letter, as follows:

但是,如果点只用一个字母分隔,则对单词不起作用,如下所示:

  1. d.o.t.s.O.n.l.y.S.e.p.e.r.a.t.e.d.B.y.O.n.e.L.e.t.t.e.r

Anyone know how I could solve this?

有谁知道我怎么能解决这个问题?

EDIT:

BHustus solution below is the better solution.

下面的BHustus解决方案是更好的解决方案。

However, I did take what BHustus has shown and combined it with what I had before to come up with a less "confusing" pattern just in case anyone else was interested.

然而,我确实采取了BHustus展示的内容,并将其与我之前所拥有的相结合,以便在其他人感兴趣的情况下提出一个不那么“混乱”的模式。

^(\s*[\d\w]+([.]?[\d\w]+)+\s*)+$

The key was to have the . and the 1 word after be in its own group and repeat. ([.]?[\d\w]+)+

关键是拥有。和之后的1个单词在其自己的组中并重复。 ([。] [\ d \ W] +)+

Thanks.

2 个解决方案

#1


2  

([\w]+\.)+[\w]+(?=[\s]|$)

To explain:

The first group, in the parentheses, matches 1 or more letter or number (\w is shorthand for [A-Za-z0-9] and + means "match the preceding one or more times", shorthand for {1,}), followed by one period. After it has matched one or more cycles of [\w]+\., the final [\w]+ ensures that there is at least one letter at the end and consumes all characters until it reaches a non-character. finally, the (?=[\s]|$) is a lookahead assertion that ensures there is either whitespace immediately ahead ([\s]), or the end of the string ($) (with | being the regex "OR" character). If the lookahead fails, it doesn't match.

括号中的第一组匹配1个或多个字母或数字(\ w是[A-Za-z0-9]的简写,+表示“匹配前一个或多个”,简写为{1,}) ,然后是一个时期。在匹配[\ w] + \。的一个或多个循环之后,最终[\ w] +确保最后至少有一个字母并消耗所有字符,直到它达到非字符。最后,(?= [\ s] | $)是一个先行断言,确保前面有空格([\ s]),或者字符串结尾($)(|是正则表达式“OR”)字符)。如果前瞻失败,则不匹配。

Online demo, showing all your test cases

在线演示,显示所有测试用例

#2


0  

Do you have to use a Regex? The accepted answer's Regex is pretty difficult to read. How about a simple loop?

你必须使用正则表达式吗?接受的答案的正则表达式很难阅读。一个简单的循环怎么样?

for(int i = 0; i < str.length; i++)
{
    char ch = str[i];
    if(ch == '.')
    {
        if(i == 0) return false; //no dots at start of string
        if(i == str.length - 1) return false; //no dots at end of string
        if(str[i + 1] == '.') return false; //no consecutive dots
    }
    else if(!IsLetter(ch) && !IsNumber(ch))
    {
        return false; //allow only letters and numbers
    }
}
return true;

#1


2  

([\w]+\.)+[\w]+(?=[\s]|$)

To explain:

The first group, in the parentheses, matches 1 or more letter or number (\w is shorthand for [A-Za-z0-9] and + means "match the preceding one or more times", shorthand for {1,}), followed by one period. After it has matched one or more cycles of [\w]+\., the final [\w]+ ensures that there is at least one letter at the end and consumes all characters until it reaches a non-character. finally, the (?=[\s]|$) is a lookahead assertion that ensures there is either whitespace immediately ahead ([\s]), or the end of the string ($) (with | being the regex "OR" character). If the lookahead fails, it doesn't match.

括号中的第一组匹配1个或多个字母或数字(\ w是[A-Za-z0-9]的简写,+表示“匹配前一个或多个”,简写为{1,}) ,然后是一个时期。在匹配[\ w] + \。的一个或多个循环之后,最终[\ w] +确保最后至少有一个字母并消耗所有字符,直到它达到非字符。最后,(?= [\ s] | $)是一个先行断言,确保前面有空格([\ s]),或者字符串结尾($)(|是正则表达式“OR”)字符)。如果前瞻失败,则不匹配。

Online demo, showing all your test cases

在线演示,显示所有测试用例

#2


0  

Do you have to use a Regex? The accepted answer's Regex is pretty difficult to read. How about a simple loop?

你必须使用正则表达式吗?接受的答案的正则表达式很难阅读。一个简单的循环怎么样?

for(int i = 0; i < str.length; i++)
{
    char ch = str[i];
    if(ch == '.')
    {
        if(i == 0) return false; //no dots at start of string
        if(i == str.length - 1) return false; //no dots at end of string
        if(str[i + 1] == '.') return false; //no consecutive dots
    }
    else if(!IsLetter(ch) && !IsNumber(ch))
    {
        return false; //allow only letters and numbers
    }
}
return true;