使用正则表达式还是不使用正则表达式?

时间:2021-08-16 00:14:33

I just asked this question about using a regular expression to allow numbers between -90.0 and +90.0. I got some answers on how to implement the regular expression, but most of the answers also mentioned that that would be better handled without using a regular expression or using a regular expression would be overkill. So how do you decide when to use a regular expression and when not to use a regular expression. Is there a check list you can follow?

我刚刚问了一个关于使用正则表达式来允许-90.0和+90.0之间的数的问题。我得到了一些关于如何实现正则表达式的答案,但大多数答案也提到,如果不使用正则表达式或使用正则表达式,那将是更好的处理方法。那么,如何决定何时使用正则表达式,何时不使用正则表达式呢?你能看一下检查表吗?

5 个解决方案

#1


42  

Regular expressions are a text processing tool for character-based tests. More formally, regular expressions are good at handling regular languages and bad at almost anything else.

正则表达式是基于字符的测试的文本处理工具。更正式地说,正则表达式擅长处理正则语言,几乎在其他任何地方都不擅长。

In practice, this means that regular expressions are not well suited for tasks that require discovering meaning (semantics) in text that goes beyond the character level. This would require a full-blown parser.

在实践中,这意味着正则表达式不适用于需要在文本中发现超出字符级别的含义(语义)的任务。这需要一个完整的解析器。

In your particular case: recognizing a number in a text is an exercise that regular expressions are good at (decimal numbers can be trivially described using a regular language). This works on the character level.

在您的特定情况下:识别文本中的数字是正则表达式擅长的练习(小数可以用正则语言简单地描述)。这适用于角色级别。

But doing more advanced stuff with the number that requires knowledge of its numerical value (i.e. its semantics) requires interpretation. Regular expressions are bad at this. So finding a number in text is easy. Finding a number in text that is greater than 11 but smaller than 1004 (or that is divisible by 3) is hard: it requires recognizing the meaning of the number.

但是用数字做更高级的事情需要知道它的数值(即它的语义)需要解释。正则表达式在这方面很糟糕。所以在文本中找到一个数字很容易。在文本中找到一个大于11但小于1004(或可被3整除)的数字是很难的:它需要识别这个数字的含义。

#2


3  

I would say that regex expressions are most effective on Strings. For other data types, manipulations of that data type will usually be more intuitive and provide better results.

我想说regex表达式对字符串最有效。对于其他数据类型,该数据类型的操作通常更直观,并提供更好的结果。

For example, if you know that you're dealing with DateTime, then you can use the Parse and TryParse methods will the different formats and it will usually be more reliable than your own regex expressions.

例如,如果您知道您正在处理DateTime,那么您可以使用Parse和TryParse方法来处理不同的格式,它通常比您自己的regex表达式更可靠。

In your example, you are dealing with numbers so deal with them accordingly.

在您的示例中,您正在处理数字,因此要相应地处理它们。

Regex is very powerful, but it isn't the easiest code to read and to debug. When another reliable solution is at hand, you should probably go for that.

Regex非常强大,但是它不是最容易读取和调试的代码。当另一种可靠的解决方案出现时,您可能应该采取这种做法。

#3


2  

Without meaning to be circular or obtuse, you should use regular expressions when you have a string which contains information structured in a regular language, and you want to turn this string into an object model.

如果字符串包含用正则语言构造的信息,并且希望将该字符串转换为对象模型,那么应该使用正则表达式。

#4


0  

The answer is straight forward:

答案是直截了当的:

If you can solve your problem without regular expressions (just by string functions), you don't use regular expressions. As it was said in one book I've read: regular expressions are violence over computer.

如果您可以不使用正则表达式(仅由字符串函数)解决您的问题,则不使用正则表达式。正如我读过的一本书中所说:正则表达式是对计算机的暴力。

If it's to complicated to use language string functions, use regular expressions.

如果使用语言字符串函数很复杂,可以使用正则表达式。

#5


0  

Basic use-case for RegEx :-

RegEx的基本用例:-

  1. You need "Key Value Pairs" - Both Key and Values are embedded within other noisy text - cant be accessed or isolated otherwise.

    您需要“键值对”——键值和值都嵌入到其他嘈杂的文本中——否则无法访问或隔离。

  2. You need to automate extraction of these values by looping over multiple documents.

    您需要通过对多个文档进行循环来自动提取这些值。

  3. Number and combination of Key Value pairs maybe discovered as you progress parsing through text.

    当您通过文本进行解析时,可能会发现键值对的数量和组合。

#1


42  

Regular expressions are a text processing tool for character-based tests. More formally, regular expressions are good at handling regular languages and bad at almost anything else.

正则表达式是基于字符的测试的文本处理工具。更正式地说,正则表达式擅长处理正则语言,几乎在其他任何地方都不擅长。

In practice, this means that regular expressions are not well suited for tasks that require discovering meaning (semantics) in text that goes beyond the character level. This would require a full-blown parser.

在实践中,这意味着正则表达式不适用于需要在文本中发现超出字符级别的含义(语义)的任务。这需要一个完整的解析器。

In your particular case: recognizing a number in a text is an exercise that regular expressions are good at (decimal numbers can be trivially described using a regular language). This works on the character level.

在您的特定情况下:识别文本中的数字是正则表达式擅长的练习(小数可以用正则语言简单地描述)。这适用于角色级别。

But doing more advanced stuff with the number that requires knowledge of its numerical value (i.e. its semantics) requires interpretation. Regular expressions are bad at this. So finding a number in text is easy. Finding a number in text that is greater than 11 but smaller than 1004 (or that is divisible by 3) is hard: it requires recognizing the meaning of the number.

但是用数字做更高级的事情需要知道它的数值(即它的语义)需要解释。正则表达式在这方面很糟糕。所以在文本中找到一个数字很容易。在文本中找到一个大于11但小于1004(或可被3整除)的数字是很难的:它需要识别这个数字的含义。

#2


3  

I would say that regex expressions are most effective on Strings. For other data types, manipulations of that data type will usually be more intuitive and provide better results.

我想说regex表达式对字符串最有效。对于其他数据类型,该数据类型的操作通常更直观,并提供更好的结果。

For example, if you know that you're dealing with DateTime, then you can use the Parse and TryParse methods will the different formats and it will usually be more reliable than your own regex expressions.

例如,如果您知道您正在处理DateTime,那么您可以使用Parse和TryParse方法来处理不同的格式,它通常比您自己的regex表达式更可靠。

In your example, you are dealing with numbers so deal with them accordingly.

在您的示例中,您正在处理数字,因此要相应地处理它们。

Regex is very powerful, but it isn't the easiest code to read and to debug. When another reliable solution is at hand, you should probably go for that.

Regex非常强大,但是它不是最容易读取和调试的代码。当另一种可靠的解决方案出现时,您可能应该采取这种做法。

#3


2  

Without meaning to be circular or obtuse, you should use regular expressions when you have a string which contains information structured in a regular language, and you want to turn this string into an object model.

如果字符串包含用正则语言构造的信息,并且希望将该字符串转换为对象模型,那么应该使用正则表达式。

#4


0  

The answer is straight forward:

答案是直截了当的:

If you can solve your problem without regular expressions (just by string functions), you don't use regular expressions. As it was said in one book I've read: regular expressions are violence over computer.

如果您可以不使用正则表达式(仅由字符串函数)解决您的问题,则不使用正则表达式。正如我读过的一本书中所说:正则表达式是对计算机的暴力。

If it's to complicated to use language string functions, use regular expressions.

如果使用语言字符串函数很复杂,可以使用正则表达式。

#5


0  

Basic use-case for RegEx :-

RegEx的基本用例:-

  1. You need "Key Value Pairs" - Both Key and Values are embedded within other noisy text - cant be accessed or isolated otherwise.

    您需要“键值对”——键值和值都嵌入到其他嘈杂的文本中——否则无法访问或隔离。

  2. You need to automate extraction of these values by looping over multiple documents.

    您需要通过对多个文档进行循环来自动提取这些值。

  3. Number and combination of Key Value pairs maybe discovered as you progress parsing through text.

    当您通过文本进行解析时,可能会发现键值对的数量和组合。