Okay, I barely understand RegEx basics, but why couldn't they design it to use keywords (like SQL) instead of some cryptic wildcard characters and symbols?
好吧,我几乎不了解RegEx基础知识,但为什么他们不能设计它来使用关键字(如SQL)而不是一些神秘的通配符和符号?
Is it for performance since the RegEx is interpreted/parsed at runtime? (not compiled)
它是否因为在运行时解释/解析RegEx而具有性能? (未编译)
Or maybe for speed of writing? Considering that when you learn some "simple" character combinations it becomes easier to type 1 character instead of a keyword?
或者也许是为了写作的速度?考虑到当你学习一些“简单”的字符组合时,输入1个字符而不是关键字会变得更容易吗?
14 个解决方案
#1
34
You really want this?
你真的想要这个吗?
Pattern findGamesPattern = Pattern.With.Literal(@"<div")
.WhiteSpace.Repeat.ZeroOrMore
.Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""")
.NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
.Literal(@"-game""")
.NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
.Literal(@"<!--gameStatus")
.WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore
.NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-->");
Ok, but it's your funeral, man.
好的,但这是你的葬礼,伙计。
Download the library that does this here:
http://flimflan.com/blog/ReadableRegularExpressions.aspx
下载这里执行此操作的库:http://flimflan.com/blog/ReadableRegularExpressions.aspx
#2
10
Regular expressions have a mathematical (actually, language theory) background and are coded somewhat like a mathematical formula. You can define them by a set of rules, for example
正则表达式具有数学(实际上是语言理论)背景,并且编码有点像数学公式。例如,您可以通过一组规则来定义它们
- every character is a regular expression, representing itself
- 每个字符都是一个正则表达式,代表自己
- if
a
andb
are regular expressions, thena?
,a|b
andab
are regular expressions, too - 如果a和b是正则表达式,那么a,a | b和ab也是正则表达式
- ...
- ...
Using a keyword-based language would be a great burden for simple regular expressions. Most of the time, you will just use a simple text string as search pattern:
对于简单的正则表达式,使用基于关键字的语言将是一个很大的负担。大多数情况下,您只需使用简单的文本字符串作为搜索模式:
grep -R 'main' *.c
Or maybe very simple patterns:
或者可能是非常简单的模式
grep -c ':-[)(]' seidl.txt
Once you get used to regular expressions, this syntax is very clear and precise. In more complicated situations you will probably use something else since a large regular expression is obviously hard to read.
一旦习惯了正则表达式,这种语法就非常清晰和准确。在更复杂的情况下,您可能会使用其他东西,因为很大的正则表达式显然难以阅读。
#3
8
Perl 6 is taking a pretty revolutionary step forward in regex readability. Consider an address of the form: 100 E Main St Springfield MA 01234
Perl 6在正则表达式可读性方面迈出了相当革命性的一步。考虑以下形式的地址:100 E Main St Springfield MA 01234
Here's a moderately-readable Perl 5 compatible regex to parse that (many corner cases not handled):
这是一个适度可读的Perl 5兼容正则表达式来解析(许多极端情况未处理):
m/
([1-9]\d*)\s+
((?:N|S|E|W)\s+)?
(\w+(?:\s+\w+)*)\s+
(ave|ln|st|rd)\s+
([:alpha:]+(?:\s+[:alpha:]+)*)\s+
([A-Z]{2})\s+
(\d{5}(?:-\d{4})?)
/ix;
This Perl 6 regex has the same behavior:
这个Perl 6正则表达式具有相同的行为:
grammar USMailAddress {
rule TOP { <addr> <city> <state> <zip> }
rule addr { <[1..9]>\d* <direction>?
<streetname> <streettype> }
token direction { N | S | E | W }
token streetname { \w+ [ \s+ \w+ ]* }
token streettype {:i ave | ln | rd | st }
token city { <alpha> [ \s+ <alpha> ]* }
token state { <[A..Z]>**{2} }
token zip { \d**{5} [ - \d**{4} ]? }
}
A Perl 6 grammar is a class, and the tokens are all invokable methods. Use it like this:
Perl 6语法是一个类,令牌都是可调用的方法。像这样用它:
if $addr ~~ m/^<USMailAddress::TOP>$/ {
say "$<city>, $<state>";
}
This example comes from a talk I presented at the Frozen Perl 2009 workshop. The Rakudo implementation of Perl 6 is complete enough that this example works today.
这个例子来自我在Frozen Perl 2009研讨会上发表的演讲。 Perl 6的Rakudo实现足够完整,这个例子今天起作用了。
#4
6
Well, if you had keywords, how would you easily differentiate them from actually matched text? How would you handle whitespace?
好吧,如果您有关键字,您如何轻松地将它们与实际匹配的文本区分开来?你会如何处理空白?
Source text Company: A Dept.: B
来源文本公司:A部门:B
Standard regex:
标准正则表达式:
Company:\s+(.+)\s+Dept.:\s+(.+)
Or even:
甚至:
Company: (.+) Dept. (.+)
Keyword regex (trying really hard not get a strawman...)
关键字正则表达式(尝试真的很难得到一个稻草人...)
"Company:" whitespace.oneplus group(any.oneplus) whitespace.oneplus "Dept.:" whitespace.oneplus group(any.oneplus)
Or simplified:
或简化:
"Company:" space group(any.oneplus) space "Dept.:" space group(any.oneplus)
No, it's probably not better.
不,它可能不会更好。
#5
5
Because it corresponds to formal language theory and it's mathematic notation.
因为它对应于形式语言理论,它是数学符号。
#6
4
It's Perl's fault...!
这是Perl的错......
Actually, more specifically, Regular Expressions come from early Unix development, and concise syntax was a lot more highly valued then. Storage, processing time, physical terminals, etc were all very limited, rather unlike today.
实际上,更具体地说,正则表达式来自早期的Unix开发,而简洁的语法则更受重视。存储,处理时间,物理终端等都非常有限,与今天不同。
The history of Regular Expressions on Wikipedia explains more.
*上正则表达的历史解释更多。
There are alternatives to Regex, but I'm not sure any have really caught on.
有正则表达式的替代品,但我不确定是否真的流行起来。
EDIT: Corrected by John Saunders: Regular Expressions were popularised by Unix, but first implemented by the QED editor. The same design constraints applied, even more so, to earlier systems.
编辑:由John Saunders修正:正则表达式由Unix推广,但首先由QED编辑器实现。对早期系统应用相同的设计约束,甚至更多。
#7
3
Actually, no, the world did not begin with Unix. If you read the Wikipedia article, you'll see that
实际上,不,世界并没有从Unix开始。如果您阅读*文章,您会看到
In the 1950s, mathematician Stephen Cole Kleene described these models using his mathematical notation called regular sets. The SNOBOL language was an early implementation of pattern matching, but not identical to regular expressions. Ken Thompson built Kleene's notation into the editor QED as a means to match patterns in text files. He later added this capability to the Unix editor ed, which eventually led to the popular search tool grep's use of regular expressions
在20世纪50年代,数学家Stephen Cole Kleene用他的数学符号描述了这些模型,称为常规集。 SNOBOL语言是模式匹配的早期实现,但与正则表达式不同。 Ken Thompson将Kleene的符号构建到编辑器QED中,作为匹配文本文件中模式的手段。他后来将这个功能添加到了Unix编辑器ed中,最终导致了流行的搜索工具grep使用正则表达式
#8
2
This is much earlier than PERL. The Wikipedia entry on Regular Expressions attributes the first implementations of regular expressions to Ken Thompson of UNIX fame, who implemented them in the QED and then the ed editor. I guess that the commands had short names for performance reasons, but much before being client-side. Mastering Regular Expressions is a great book about regular expressions, which offers the option to annotate a regular expression (with the /x flag) to make it easier to read and understand.
这比PERL早得多。正则表达式上的*条目将正则表达式的第一个实现归因于UNIX成名的Ken Thompson,他们在QED中实现了它们,然后在ed编辑器中实现了它们。我猜这些命令由于性能原因而有短名称,但在客户端之前很多。掌握正则表达式是一本关于正则表达式的好书,它提供了注释正则表达式(带有/ x标志)的选项,以便于阅读和理解。
#9
1
Because the idea of regular expressions--like many things that originate from UNIX--is that they are terse, favouring brevity over readability. This is actually a good thing. I've ended up writing regular expressions (against my better judgement) that are 15 lines long. If that had a verbose syntax it wouldn't be a regex, it'd be a program.
因为正则表达式的想法 - 就像许多来自UNIX的东西 - 是它们简洁,有利于简洁性而不是可读性。这实际上是件好事。我最终编写了15行的正则表达式(反对我更好的判断)。如果它有一个冗长的语法,它不会是一个正则表达式,它就是一个程序。
#10
1
It's actually pretty easy to implement a "wordier" form of regex -- please see my answer here. In a nutshell: write a handful of functions that return regex strings (and take parameters if necessary).
实际上很容易实现正则表达式的“更多字形” - 请在此处查看我的答案。简而言之:编写一些返回正则表达式字符串的函数(并在必要时获取参数)。
#11
1
I don't think keywords would give any benefit. Regular expressions as such are complex but also very powerful.
我不认为关键字会带来任何好处。这样的正则表达式很复杂但也非常强大。
What I think is more confusing is that every supporting library invents its own syntax instead of using (or extending) the classic Perl regex (e.g. \1, $1, {1}, ... for replacements and many more examples).
我认为更令人困惑的是,每个支持库都发明了自己的语法,而不是使用(或扩展)经典的Perl正则表达式(例如\ 1,$ 1,{1},...用于替换和更多示例)。
#12
1
I know its answering your question the wrong way around, but RegExBuddy has a feature that explains your regexpression in plain english. This might make it a bit easier to learn.
我知道它以错误的方式回答你的问题,但是RegExBuddy有一个功能可以用简单的英语解释你的regexpression。这可能会使学习起来更容易一些。
#13
1
If the language you are using supports Posix regexes, you can use them.
如果您使用的语言支持Posix正则表达式,则可以使用它们。
An example:
一个例子:
\d
would be the same as
会是一样的
[:digit:]
The bracket notation is much clearer on what it is matching. I would still learn the "cryptic wildcard characters and symbols, since you will still see them in other people's code and need to understand them.
括号表示法在匹配时更加清晰。我仍然会学习“神秘的通配符和符号,因为你仍然可以在其他人的代码中看到它们并且需要理解它们。
There are more examples in the table on regular-expressions.info's page.
regular-expressions.info页面上的表格中有更多示例。
#14
1
For some reason, my previous answer got deleted. Anyway, i thing ruby regexp machine would fit the bill, at http://www.rubyregexp.sf.net. It is my own project, but i think it should work.
出于某种原因,我之前的回答被删除了。无论如何,我的东西ruby regexp机器适合该法案,在http://www.rubyregexp.sf.net。这是我自己的项目,但我认为它应该有效。
#1
34
You really want this?
你真的想要这个吗?
Pattern findGamesPattern = Pattern.With.Literal(@"<div")
.WhiteSpace.Repeat.ZeroOrMore
.Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""")
.NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
.Literal(@"-game""")
.NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
.Literal(@"<!--gameStatus")
.WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore
.NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-->");
Ok, but it's your funeral, man.
好的,但这是你的葬礼,伙计。
Download the library that does this here:
http://flimflan.com/blog/ReadableRegularExpressions.aspx
下载这里执行此操作的库:http://flimflan.com/blog/ReadableRegularExpressions.aspx
#2
10
Regular expressions have a mathematical (actually, language theory) background and are coded somewhat like a mathematical formula. You can define them by a set of rules, for example
正则表达式具有数学(实际上是语言理论)背景,并且编码有点像数学公式。例如,您可以通过一组规则来定义它们
- every character is a regular expression, representing itself
- 每个字符都是一个正则表达式,代表自己
- if
a
andb
are regular expressions, thena?
,a|b
andab
are regular expressions, too - 如果a和b是正则表达式,那么a,a | b和ab也是正则表达式
- ...
- ...
Using a keyword-based language would be a great burden for simple regular expressions. Most of the time, you will just use a simple text string as search pattern:
对于简单的正则表达式,使用基于关键字的语言将是一个很大的负担。大多数情况下,您只需使用简单的文本字符串作为搜索模式:
grep -R 'main' *.c
Or maybe very simple patterns:
或者可能是非常简单的模式
grep -c ':-[)(]' seidl.txt
Once you get used to regular expressions, this syntax is very clear and precise. In more complicated situations you will probably use something else since a large regular expression is obviously hard to read.
一旦习惯了正则表达式,这种语法就非常清晰和准确。在更复杂的情况下,您可能会使用其他东西,因为很大的正则表达式显然难以阅读。
#3
8
Perl 6 is taking a pretty revolutionary step forward in regex readability. Consider an address of the form: 100 E Main St Springfield MA 01234
Perl 6在正则表达式可读性方面迈出了相当革命性的一步。考虑以下形式的地址:100 E Main St Springfield MA 01234
Here's a moderately-readable Perl 5 compatible regex to parse that (many corner cases not handled):
这是一个适度可读的Perl 5兼容正则表达式来解析(许多极端情况未处理):
m/
([1-9]\d*)\s+
((?:N|S|E|W)\s+)?
(\w+(?:\s+\w+)*)\s+
(ave|ln|st|rd)\s+
([:alpha:]+(?:\s+[:alpha:]+)*)\s+
([A-Z]{2})\s+
(\d{5}(?:-\d{4})?)
/ix;
This Perl 6 regex has the same behavior:
这个Perl 6正则表达式具有相同的行为:
grammar USMailAddress {
rule TOP { <addr> <city> <state> <zip> }
rule addr { <[1..9]>\d* <direction>?
<streetname> <streettype> }
token direction { N | S | E | W }
token streetname { \w+ [ \s+ \w+ ]* }
token streettype {:i ave | ln | rd | st }
token city { <alpha> [ \s+ <alpha> ]* }
token state { <[A..Z]>**{2} }
token zip { \d**{5} [ - \d**{4} ]? }
}
A Perl 6 grammar is a class, and the tokens are all invokable methods. Use it like this:
Perl 6语法是一个类,令牌都是可调用的方法。像这样用它:
if $addr ~~ m/^<USMailAddress::TOP>$/ {
say "$<city>, $<state>";
}
This example comes from a talk I presented at the Frozen Perl 2009 workshop. The Rakudo implementation of Perl 6 is complete enough that this example works today.
这个例子来自我在Frozen Perl 2009研讨会上发表的演讲。 Perl 6的Rakudo实现足够完整,这个例子今天起作用了。
#4
6
Well, if you had keywords, how would you easily differentiate them from actually matched text? How would you handle whitespace?
好吧,如果您有关键字,您如何轻松地将它们与实际匹配的文本区分开来?你会如何处理空白?
Source text Company: A Dept.: B
来源文本公司:A部门:B
Standard regex:
标准正则表达式:
Company:\s+(.+)\s+Dept.:\s+(.+)
Or even:
甚至:
Company: (.+) Dept. (.+)
Keyword regex (trying really hard not get a strawman...)
关键字正则表达式(尝试真的很难得到一个稻草人...)
"Company:" whitespace.oneplus group(any.oneplus) whitespace.oneplus "Dept.:" whitespace.oneplus group(any.oneplus)
Or simplified:
或简化:
"Company:" space group(any.oneplus) space "Dept.:" space group(any.oneplus)
No, it's probably not better.
不,它可能不会更好。
#5
5
Because it corresponds to formal language theory and it's mathematic notation.
因为它对应于形式语言理论,它是数学符号。
#6
4
It's Perl's fault...!
这是Perl的错......
Actually, more specifically, Regular Expressions come from early Unix development, and concise syntax was a lot more highly valued then. Storage, processing time, physical terminals, etc were all very limited, rather unlike today.
实际上,更具体地说,正则表达式来自早期的Unix开发,而简洁的语法则更受重视。存储,处理时间,物理终端等都非常有限,与今天不同。
The history of Regular Expressions on Wikipedia explains more.
*上正则表达的历史解释更多。
There are alternatives to Regex, but I'm not sure any have really caught on.
有正则表达式的替代品,但我不确定是否真的流行起来。
EDIT: Corrected by John Saunders: Regular Expressions were popularised by Unix, but first implemented by the QED editor. The same design constraints applied, even more so, to earlier systems.
编辑:由John Saunders修正:正则表达式由Unix推广,但首先由QED编辑器实现。对早期系统应用相同的设计约束,甚至更多。
#7
3
Actually, no, the world did not begin with Unix. If you read the Wikipedia article, you'll see that
实际上,不,世界并没有从Unix开始。如果您阅读*文章,您会看到
In the 1950s, mathematician Stephen Cole Kleene described these models using his mathematical notation called regular sets. The SNOBOL language was an early implementation of pattern matching, but not identical to regular expressions. Ken Thompson built Kleene's notation into the editor QED as a means to match patterns in text files. He later added this capability to the Unix editor ed, which eventually led to the popular search tool grep's use of regular expressions
在20世纪50年代,数学家Stephen Cole Kleene用他的数学符号描述了这些模型,称为常规集。 SNOBOL语言是模式匹配的早期实现,但与正则表达式不同。 Ken Thompson将Kleene的符号构建到编辑器QED中,作为匹配文本文件中模式的手段。他后来将这个功能添加到了Unix编辑器ed中,最终导致了流行的搜索工具grep使用正则表达式
#8
2
This is much earlier than PERL. The Wikipedia entry on Regular Expressions attributes the first implementations of regular expressions to Ken Thompson of UNIX fame, who implemented them in the QED and then the ed editor. I guess that the commands had short names for performance reasons, but much before being client-side. Mastering Regular Expressions is a great book about regular expressions, which offers the option to annotate a regular expression (with the /x flag) to make it easier to read and understand.
这比PERL早得多。正则表达式上的*条目将正则表达式的第一个实现归因于UNIX成名的Ken Thompson,他们在QED中实现了它们,然后在ed编辑器中实现了它们。我猜这些命令由于性能原因而有短名称,但在客户端之前很多。掌握正则表达式是一本关于正则表达式的好书,它提供了注释正则表达式(带有/ x标志)的选项,以便于阅读和理解。
#9
1
Because the idea of regular expressions--like many things that originate from UNIX--is that they are terse, favouring brevity over readability. This is actually a good thing. I've ended up writing regular expressions (against my better judgement) that are 15 lines long. If that had a verbose syntax it wouldn't be a regex, it'd be a program.
因为正则表达式的想法 - 就像许多来自UNIX的东西 - 是它们简洁,有利于简洁性而不是可读性。这实际上是件好事。我最终编写了15行的正则表达式(反对我更好的判断)。如果它有一个冗长的语法,它不会是一个正则表达式,它就是一个程序。
#10
1
It's actually pretty easy to implement a "wordier" form of regex -- please see my answer here. In a nutshell: write a handful of functions that return regex strings (and take parameters if necessary).
实际上很容易实现正则表达式的“更多字形” - 请在此处查看我的答案。简而言之:编写一些返回正则表达式字符串的函数(并在必要时获取参数)。
#11
1
I don't think keywords would give any benefit. Regular expressions as such are complex but also very powerful.
我不认为关键字会带来任何好处。这样的正则表达式很复杂但也非常强大。
What I think is more confusing is that every supporting library invents its own syntax instead of using (or extending) the classic Perl regex (e.g. \1, $1, {1}, ... for replacements and many more examples).
我认为更令人困惑的是,每个支持库都发明了自己的语法,而不是使用(或扩展)经典的Perl正则表达式(例如\ 1,$ 1,{1},...用于替换和更多示例)。
#12
1
I know its answering your question the wrong way around, but RegExBuddy has a feature that explains your regexpression in plain english. This might make it a bit easier to learn.
我知道它以错误的方式回答你的问题,但是RegExBuddy有一个功能可以用简单的英语解释你的regexpression。这可能会使学习起来更容易一些。
#13
1
If the language you are using supports Posix regexes, you can use them.
如果您使用的语言支持Posix正则表达式,则可以使用它们。
An example:
一个例子:
\d
would be the same as
会是一样的
[:digit:]
The bracket notation is much clearer on what it is matching. I would still learn the "cryptic wildcard characters and symbols, since you will still see them in other people's code and need to understand them.
括号表示法在匹配时更加清晰。我仍然会学习“神秘的通配符和符号,因为你仍然可以在其他人的代码中看到它们并且需要理解它们。
There are more examples in the table on regular-expressions.info's page.
regular-expressions.info页面上的表格中有更多示例。
#14
1
For some reason, my previous answer got deleted. Anyway, i thing ruby regexp machine would fit the bill, at http://www.rubyregexp.sf.net. It is my own project, but i think it should work.
出于某种原因,我之前的回答被删除了。无论如何,我的东西ruby regexp机器适合该法案,在http://www.rubyregexp.sf.net。这是我自己的项目,但我认为它应该有效。