用于从字符串中提取信息的正则表达式的替代方法

时间:2021-04-18 07:15:52

I'm attempt to extract information from a string, which will always be in the same format.

我试图从一个字符串中提取信息,该字符串的格式总是相同的。

The format will always be:

格式将永远是:

To:
                     Name here
Date:
                     26/08/2014 14:52
Order Number:
                     123456
Service Required:
                     Plumbing
Service Response:
                     48 Hour
Service Limit:
                     110.00

123 TEST ROAD
LEEDS
LS1 1HL

Contact:
                     Mr J Smith - 0777 123456
Telephone:
                     01921 123456

Work Details:

Notes here etc 

I have tried exploding the string by spaces and looping through the array but I cannot structure it in such a way that I receive the information.

我尝试过用空格和数组的循环来破坏字符串,但是我不能以接收信息的方式来构造它。

E.g: I try to retrieve "Name here" from after "To:" without also retrieving "Date: etc..", the eventual idea is to create variables for each bit of information so i can enter it into a database.

E。g:我试着从" after "到" without also retrieve "Date: etc. "从"到"Name here"。,最终的想法是为每个信息位创建变量,以便我可以将其输入数据库。

Any help/suggestions/idea's are really welcome.

任何帮助/建议/想法都是受欢迎的。

thanks for reading

感谢你的阅读

4 个解决方案

#1


1  

You could use regex easily.

您可以轻松地使用regex。

If you use this regex, you can get the name here:

如果您使用这个regex,您可以在这里获得名称:

To:\s+(.*)

Working demo

演示工作

用于从字符串中提取信息的正则表达式的替代方法

The idea of this regex is to look for the key you want to look for and fetch the value. For instance, above regex looks for To: whitespaces and store in a capturing group the content.

这个regex的想法是查找要查找的键并获取值。例如,上面的regex寻找:whitespaces和存储在捕获组的内容。

You just need to change To for whatever you want, if you modify it to Date you will get the date.

你只需要改变到你想要的任何东西,如果你修改到日期,你将得到日期。

用于从字符串中提取信息的正则表达式的替代方法

As a note, this only works with single line values.

注意,这只适用于单行值。

The code to implement this regex in php is very straightforward, like this:

在php中实现这个regex的代码非常简单,如下所示:

$re = "/To:\\s+(.*)/";
$str = "YOUR STRING HERE";
preg_match($re, $str, $matches);

On the other hand, below data follows a different pattern:

另一方面,下面的数据遵循不同的模式:

123 TEST ROAD
LEEDS
LS1 1HL

You'd need a different regex pattern too, so to fetch that information you could use:

您还需要一个不同的regex模式,以便获取您可以使用的信息:

^(\w+[\w\s]+)(?!:)$

Working demo

演示工作

#2


2  

If you don't want to use a regex, since you are looking for the first field content, you can use a double explode:

如果您不想使用regex,因为您正在查找第一个字段内容,您可以使用双重爆炸:

$firstfield= trim(explode("\n",explode(':', $data, 3)[1])[1]);

var_dump($firstfield);

Otherwise to obtain fields and values with a regex, you can use this:

否则,可以使用regex获取字段和值,您可以使用以下方法:

$pattern = '~^(\w+(?: \w+)*):\s*(.+?)\s*(?=(?1):|\z)~ms';

preg_match_all($pattern, $data, $m, PREG_SET_ORDER);

foreach ($m as $v) {
    $results[$v[1]] = $v[2];
}

echo $results['To'];

#3


0  

A regex is not that difficult. Try this one.

regex不是那么困难。试试这个。

 # '~(?msi)^To:\s*(.*?)\s*^Date:\s*(.*?)\s*^Order\ Number:\s*(.*?)\s*^Service\ Required:\s*(.*?)\s*^Service\ Response:\s*(.*?)\s*^Service\ Limit:\s*(.*?)\s*^Contact:\s*(.*?)\s*^Telephone:\s*(.*?)\s*^Work\ Details:\s*(.*?)\s*~'

 (?msi)
 ^ To: \s* 
 ( .*? )                            # (1)
 \s* 
 ^ Date: \s* 
 ( .*? )                            # (2)
 \s* 
 ^ Order\ Number: \s* 
 ( .*? )                            # (3)
 \s* 
 ^ Service\ Required: \s* 
 ( .*? )                            # (4)
 \s* 
 ^ Service\ Response: \s* 
 ( .*? )                            # (5)
 \s* 
 ^ Service\ Limit: \s* 
 ( .*? )                            # (6)
 \s* 
 ^ Contact: \s* 
 ( .*? )                            # (7)
 \s* 
 ^ Telephone: \s* 
 ( .*? )                            # (8)
 \s* 
 ^ Work\ Details: \s* 
 ( .*? )                            # (9)
 \s* 

Output

输出

 **  Grp 1 -  ( pos 26 , len 9 ) 
Name here  
 **  Grp 2 -  ( pos 65 , len 16 ) 
26/08/2014 14:52  
 **  Grp 3 -  ( pos 119 , len 6 ) 
123456  
 **  Grp 4 -  ( pos 167 , len 8 ) 
Plumbing  
 **  Grp 5 -  ( pos 217 , len 7 ) 
48 Hour  
 **  Grp 6 -  ( pos 263 , len 39 ) 
110.00

123 TEST ROAD
LEEDS
LS1 1HL  
 **  Grp 7 -  ( pos 337 , len 24 ) 
Mr J Smith - 0777 123456  
 **  Grp 8 -  ( pos 396 , len 12 ) 
01921 123456  
 **  Grp 9 -  ( pos 427 , len 0 )  EMPTY 

#4


0  

Later update (the work done)

后期更新(已完成的工作)

Here's the fully working script. I guess you'll appreciate here how flexible it is!

这是完整的脚本。我想你会欣赏这里的灵活性!

$s = "To:
          Name here
Date:
          26/08/2014 14:52
Order Number:
          123456
Service Required:
          Plumbing
Service Response:
          48 Hour
Service Limit:
          110.00

123 TEST ROAD
LEEDS
LS1 1HL

Contact:
          Mr J Smith - 0777 123456
Telephone:
          01921 123456

Work Details:

Notes here etc ";


$a = Array(
  Array("To:", "Date:" ),
  Array("Date:", "Order Number:" ),
  Array("Order Number:", "Service Required:" ),
  Array("Service Limit:", 'Contact:' ),
  //etc
);  

foreach ($a as $anchors)  {
  $t = explode ($anchors[0], " ".$s );
  $t = explode ($anchors[1], $t[1]  );
  $r = trim($t[0]);
  echo $anchors[0] ." [". $r ."]\n"  ;
}

Which will produce:

这将会产生:

augusto@cubo:~/Documents$ php script.php
To: [Name here]
Date: [26/08/2014 14:52]
Order Number: [123456]
Service Limit: [110.00

123 TEST ROAD
LEEDS
LS1 1HL]

Older answer (the concept) Seems to be not too difficilt.

旧的答案(概念)似乎不太困难。

You have many good anchors to work on! explode() will be a good friend.

你有很多好的锚要做!爆炸将是一个好朋友。

$tmp = explode ('anchor-before', $string  );
$tmp = explode ('anchor-after', $tmp[1]) ;
$res = trim($tmp[0]);

#1


1  

You could use regex easily.

您可以轻松地使用regex。

If you use this regex, you can get the name here:

如果您使用这个regex,您可以在这里获得名称:

To:\s+(.*)

Working demo

演示工作

用于从字符串中提取信息的正则表达式的替代方法

The idea of this regex is to look for the key you want to look for and fetch the value. For instance, above regex looks for To: whitespaces and store in a capturing group the content.

这个regex的想法是查找要查找的键并获取值。例如,上面的regex寻找:whitespaces和存储在捕获组的内容。

You just need to change To for whatever you want, if you modify it to Date you will get the date.

你只需要改变到你想要的任何东西,如果你修改到日期,你将得到日期。

用于从字符串中提取信息的正则表达式的替代方法

As a note, this only works with single line values.

注意,这只适用于单行值。

The code to implement this regex in php is very straightforward, like this:

在php中实现这个regex的代码非常简单,如下所示:

$re = "/To:\\s+(.*)/";
$str = "YOUR STRING HERE";
preg_match($re, $str, $matches);

On the other hand, below data follows a different pattern:

另一方面,下面的数据遵循不同的模式:

123 TEST ROAD
LEEDS
LS1 1HL

You'd need a different regex pattern too, so to fetch that information you could use:

您还需要一个不同的regex模式,以便获取您可以使用的信息:

^(\w+[\w\s]+)(?!:)$

Working demo

演示工作

#2


2  

If you don't want to use a regex, since you are looking for the first field content, you can use a double explode:

如果您不想使用regex,因为您正在查找第一个字段内容,您可以使用双重爆炸:

$firstfield= trim(explode("\n",explode(':', $data, 3)[1])[1]);

var_dump($firstfield);

Otherwise to obtain fields and values with a regex, you can use this:

否则,可以使用regex获取字段和值,您可以使用以下方法:

$pattern = '~^(\w+(?: \w+)*):\s*(.+?)\s*(?=(?1):|\z)~ms';

preg_match_all($pattern, $data, $m, PREG_SET_ORDER);

foreach ($m as $v) {
    $results[$v[1]] = $v[2];
}

echo $results['To'];

#3


0  

A regex is not that difficult. Try this one.

regex不是那么困难。试试这个。

 # '~(?msi)^To:\s*(.*?)\s*^Date:\s*(.*?)\s*^Order\ Number:\s*(.*?)\s*^Service\ Required:\s*(.*?)\s*^Service\ Response:\s*(.*?)\s*^Service\ Limit:\s*(.*?)\s*^Contact:\s*(.*?)\s*^Telephone:\s*(.*?)\s*^Work\ Details:\s*(.*?)\s*~'

 (?msi)
 ^ To: \s* 
 ( .*? )                            # (1)
 \s* 
 ^ Date: \s* 
 ( .*? )                            # (2)
 \s* 
 ^ Order\ Number: \s* 
 ( .*? )                            # (3)
 \s* 
 ^ Service\ Required: \s* 
 ( .*? )                            # (4)
 \s* 
 ^ Service\ Response: \s* 
 ( .*? )                            # (5)
 \s* 
 ^ Service\ Limit: \s* 
 ( .*? )                            # (6)
 \s* 
 ^ Contact: \s* 
 ( .*? )                            # (7)
 \s* 
 ^ Telephone: \s* 
 ( .*? )                            # (8)
 \s* 
 ^ Work\ Details: \s* 
 ( .*? )                            # (9)
 \s* 

Output

输出

 **  Grp 1 -  ( pos 26 , len 9 ) 
Name here  
 **  Grp 2 -  ( pos 65 , len 16 ) 
26/08/2014 14:52  
 **  Grp 3 -  ( pos 119 , len 6 ) 
123456  
 **  Grp 4 -  ( pos 167 , len 8 ) 
Plumbing  
 **  Grp 5 -  ( pos 217 , len 7 ) 
48 Hour  
 **  Grp 6 -  ( pos 263 , len 39 ) 
110.00

123 TEST ROAD
LEEDS
LS1 1HL  
 **  Grp 7 -  ( pos 337 , len 24 ) 
Mr J Smith - 0777 123456  
 **  Grp 8 -  ( pos 396 , len 12 ) 
01921 123456  
 **  Grp 9 -  ( pos 427 , len 0 )  EMPTY 

#4


0  

Later update (the work done)

后期更新(已完成的工作)

Here's the fully working script. I guess you'll appreciate here how flexible it is!

这是完整的脚本。我想你会欣赏这里的灵活性!

$s = "To:
          Name here
Date:
          26/08/2014 14:52
Order Number:
          123456
Service Required:
          Plumbing
Service Response:
          48 Hour
Service Limit:
          110.00

123 TEST ROAD
LEEDS
LS1 1HL

Contact:
          Mr J Smith - 0777 123456
Telephone:
          01921 123456

Work Details:

Notes here etc ";


$a = Array(
  Array("To:", "Date:" ),
  Array("Date:", "Order Number:" ),
  Array("Order Number:", "Service Required:" ),
  Array("Service Limit:", 'Contact:' ),
  //etc
);  

foreach ($a as $anchors)  {
  $t = explode ($anchors[0], " ".$s );
  $t = explode ($anchors[1], $t[1]  );
  $r = trim($t[0]);
  echo $anchors[0] ." [". $r ."]\n"  ;
}

Which will produce:

这将会产生:

augusto@cubo:~/Documents$ php script.php
To: [Name here]
Date: [26/08/2014 14:52]
Order Number: [123456]
Service Limit: [110.00

123 TEST ROAD
LEEDS
LS1 1HL]

Older answer (the concept) Seems to be not too difficilt.

旧的答案(概念)似乎不太困难。

You have many good anchors to work on! explode() will be a good friend.

你有很多好的锚要做!爆炸将是一个好朋友。

$tmp = explode ('anchor-before', $string  );
$tmp = explode ('anchor-after', $tmp[1]) ;
$res = trim($tmp[0]);