获取所有模式Word文本

时间:2022-11-28 22:05:54

If I have a text, how can I get all the strings with a certain pattern from the text? For example I want all the strings with this patters:

如果我有文本,如何从文本中获取具有特定模式的所有字符串?例如,我想要这些图案的所有字符串:

The word CER followed by any number followed by one of this characters ;, ,, . or space .

单词CER后跟任何数字,后跟其中一个字符;,,,。或空间。

For exemple from the text bellow I wold like to have in an array the following results : CER123 , CER23 , CER01 , CER24

例如,从下面的文本中我想得到以下结果:CER123,CER23,CER01,CER24

Lorem CER123 ipsum dolor sit amet, quod copiosae CER23,CER01;CER24 insolens et usu, vis CER34ERD ut saperet civibus accommodare.

Lorem CER123 ipsum dolor sit amet,quod copiosae CER23,CER01; CER24 insolens et usu,vis CER34ERD ut saperet civibus Accommodare。

I've tried this:

我试过这个:

preg_match_all("/CER[0-9*]?/",$content,$m);

but it returns : CER1 , CER2 , CER0 , CER2 and CER3

但它返回:CER1,CER2,CER0,CER2和CER3

1 个解决方案

#1


2  

Use +, add a capturing group and create a character class for the characters you require after the value you need to get:

使用+,添加一个捕获组,并在需要获取的值之后为所需的字符创建一个字符类:

preg_match_all('/(CER[0-9]+)[;,.\s]/',$content,$m);
                 ^        ^^^^^^^^^

See the regex demo

请参阅正则表达式演示

Pattern explanation:

  • (CER[0-9]+) - Group 1 capturing the parts of text you need:
    • CER - a sequence of literal characters CER (NOTE: If you need whole word CER only, after a non-word char or at the start of the string, add word boundaries: \bCER\b)
    • CER - 一系列文字字符CER(注意:如果您只需要整个单词CER,在非单词字符之后或在字符串的开头,添加字边界:\ bCER \ b)

    • [0-9]+ (=\d+) - 1 or more digits
    • [0-9] +(= \ d +) - 1位或更多位数

  • (CER [0-9] +) - 第1组捕获您需要的文本部分:CER - 一系列文字字符CER(注意:如果您只需要整个单词CER,在非单词字符之后或在开头时字符串,添加字边界:\ bCER \ b)[0-9] +(= \ d +) - 1位或更多位数

  • [;,.\s] - any char inside the character class: ;, ,, . or \s (whitespace - replace with a space if you only mean a regular space).
  • [;,。\ s] - 字符类中的任何字符:;,,,。或\ s(空格 - 如果您只是指一个常规空间,则用空格替换)。

PHP demo:

$re= '/(CER[0-9]+)[;,.\s]/';
$content = "Lorem CER123 ipsum dolor sit amet, quod copiosae CER23,CER01;CER24 insolens et usu, vis CER34ERD ut saperet civibus accommodare."; 
preg_match_all($re, $content, $m);
print_r($m[1]);

#1


2  

Use +, add a capturing group and create a character class for the characters you require after the value you need to get:

使用+,添加一个捕获组,并在需要获取的值之后为所需的字符创建一个字符类:

preg_match_all('/(CER[0-9]+)[;,.\s]/',$content,$m);
                 ^        ^^^^^^^^^

See the regex demo

请参阅正则表达式演示

Pattern explanation:

  • (CER[0-9]+) - Group 1 capturing the parts of text you need:
    • CER - a sequence of literal characters CER (NOTE: If you need whole word CER only, after a non-word char or at the start of the string, add word boundaries: \bCER\b)
    • CER - 一系列文字字符CER(注意:如果您只需要整个单词CER,在非单词字符之后或在字符串的开头,添加字边界:\ bCER \ b)

    • [0-9]+ (=\d+) - 1 or more digits
    • [0-9] +(= \ d +) - 1位或更多位数

  • (CER [0-9] +) - 第1组捕获您需要的文本部分:CER - 一系列文字字符CER(注意:如果您只需要整个单词CER,在非单词字符之后或在开头时字符串,添加字边界:\ bCER \ b)[0-9] +(= \ d +) - 1位或更多位数

  • [;,.\s] - any char inside the character class: ;, ,, . or \s (whitespace - replace with a space if you only mean a regular space).
  • [;,。\ s] - 字符类中的任何字符:;,,,。或\ s(空格 - 如果您只是指一个常规空间,则用空格替换)。

PHP demo:

$re= '/(CER[0-9]+)[;,.\s]/';
$content = "Lorem CER123 ipsum dolor sit amet, quod copiosae CER23,CER01;CER24 insolens et usu, vis CER34ERD ut saperet civibus accommodare."; 
preg_match_all($re, $content, $m);
print_r($m[1]);