如何使用正则表达式选择特定数量的字符单词

I have a text like below.

我有一个如下文字。

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum 
has been the industry's standard dummy text ever since the fivec harword 1500s, when an unknown printer 
took a galley of type and scrambled it to make a type specimen fivec harword book. It has survived not
only five centuries, but also the leap into electronic typesetting, remaining essentially 
unchanged. It was popularised in the 1960s with the release of fivec harword Letraset sheets containing 
Lorem Ipsum passages, and more recently with desktop publishing software like Aldus 
PageMaker including versions of Lorem Ipsum.

Here's what I need with regex:

这是正则表达式我需要的：

1- select five char word.

1-选择五个字母。

2- select a space after first step.

2-在第一步后选择一个空格。

3- select seven char word after second step.

3-第二步后选择七个字词。

It should capture all fivec harword strings. How can I do that?

它应该捕获所有五个harword字符串。我怎样才能做到这一点？

3 个解决方案

#1

Use this one:

使用这一个：

\b\w{5}\s\w{7}\b

explanation:

说明：

The regular expression:

(?-imsx:\b\w{5}\s\w{7}\b)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  \w{5}                    word characters (a-z, A-Z, 0-9, _) (5
                           times)
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  \w{7}                    word characters (a-z, A-Z, 0-9, _) (7
                           times)
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

#2

This should do the trick

这应该可以解决问题

(^|\W)\w{5}\s\w{7}($|\W)

(^|\W) start of the string or a non-word character.

（^ | \ W）字符串的开头或非单词字符。

\w{5} a string of 5 word characters

\ w {5}一串5个字的字符

\s a space

是一个空间

\w{7} a string of 7 word characters

\ w {7}一串7个字的字符

($|\W) end of the string or a non-word character

（$ | \ W）字符串的结尾或非单词字符

If you specifically want spaces around the string (as opposed to punctuation etc) replace both \W with \s

如果你特别想要字符串周围的空格（而不是标点符号等），请用\ s替换\ W.

#3

try this

尝试这个

\b[a-zA-Z]{5}\s[][a-zA-Z]{7}\b

\b indicates boundary

\ b表示边界

[a-zA-Z] all alpha bets

[a-zA-Z]所有阿尔法投注

{5} 5 characters with previous expression

{5}前一个表达式的5个字符

\s single white space

单个白色空间

#1