如何正确匹配正则表达式?

时间:2021-02-13 00:46:47

I have a list of objects output from ldapsearch as follows:

我有一个从ldapsearch输出的对象列表,如下所示:

dn: cn=HPOTTER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOLdn: cn=HGRANGER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOLdn: cn=RWEASLEY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOLdn: cn=DMALFOY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOLdn: cn=SSNAPE,ou=FACULTY,ou=HOGWARTS,o=SCHOOLdn: cn=ADUMBLED,ou=FACULTY,ou=HOGWARTS,o=SCHOOL

So far, I have the following regex:

到目前为止,我有以下正则表达式:

/\bcn=\w*,/g

Which returns results like this:

返回结果如下:

cn=HPOTTER,cn=HGRANGER,cn=RWEASLEY,cn=DMALFOY,cn=SSNAPE,cn=ADUMBLED,

I need a regex that returns results like this:

我需要一个返回如下结果的正则表达式:

HPOTTERHGRANGERRWEASLEYDMALFOYSSNAPEADUMBLED

What do I need to change in my regex so the pattern (the cn= and the comma) is not included in the results?

我需要在正则表达式中进行哪些更改,以便结果中不包含模式(cn =和逗号)?

EDIT: I will be using sed to do the pattern matching, and piping the output to other command line utilities.

编辑:我将使用sed进行模式匹配,并将输出传递给其他命令行实用程序。

7 个解决方案

#1


Sounds more like a simple parsing problem and not regex. An ANTLR grammar would sort this out in no time.

听起来更像是一个简单的解析问题,而不是正则表达式。 ANTLR语法会立即对此进行排序。

#2


You will have to perform a grouping. This is done by modifying the regex to:

您必须执行分组。这是通过将正则表达式修改为:

/\bcn=\(\w*\),/g

This will then populate your result into a grouping variable. Depending on your language how to extract this value will differ. (For you with sed the variable will be \1)

然后,这会将您的结果填充到分组变量中。根据您的语言,如何提取此值将有所不同。 (对于你用sed变量将是\ 1)

Note that most regex flavors you don't have to escape the brackets (), but since you're using sed you will need to as shown above.

请注意,大多数正则表达式都不必转义括号(),但由于您使用的是sed,因此需要如上所示。

For an excellent resource on Regular Expressions I suggest: Mastering Regular Expressions

对于正则表达式的优秀资源,我建议:掌握正则表达式

#3


OK, the place where you asked the more specific question was closed as "exact duplicate" of this, so I'm copying my answer from there to here:

好的,你问过更具体问题的地方被关闭为“完全重复”,所以我将我的答案从那里复制到这里:

If you want to use sed, you can use something like the following:

如果您想使用sed,可以使用以下内容:

sed -e 's/dn: cn=\([^,]*\),.*$/\1/'

sed -e's / dn:cn = \([^,] * \),. * $ / \ 1 /'

You have to use [^,]* because in sed, .* is "greedy" meaning it will match everything it can before looking at any following character. That means if you use \(.*\), in your pattern it will match up to the last comma, not up to the first comma.

你必须使用[^,] *,因为在sed中,。*是“贪婪的”意味着在查看任何后续字符之前它将匹配它所能做的一切。这意味着如果您使用\(。* \),在您的模式中它将匹配最后一个逗号,而不是第一个逗号。

#4


Check out Expresso I have used it in the past to build my RegEx. It is good to help learning too.

查看Expresso我过去使用它来构建我的RegEx。帮助学习也很好。

#5


The quick and dirty method is to use submatches assuming your engine supports it:

快速而肮脏的方法是使用子匹配,假设您的引擎支持它:

/\bcn=(\w*),/g

Then you would want to get the first submatch.

然后你会想得到第一个子匹配。

#6


Without knowing what language you're using, we can't tell for sure, but in most regular expression parsers, if you use parenthesis, such as

在不知道你正在使用什么语言的情况下,我们无法确定,但在大多数正则表达式解析器中,如果使用括号,例如

/\bcn=(\w*),/g

then you'll be able to get the first matching pattern (often \1) as exactly what you are searching for. To be more specific, we need to know what language you are using.

那么你将能够得到第一个匹配的模式(通常是\ 1),就像你正在搜索的那样。更具体地说,我们需要知道您使用的语言。

#7


If your regex supports Lookaheads and Lookbehinds then you can use

如果您的正则表达式支持Lookaheads和Lookbehinds,那么您可以使用

/(?<=\bcn=)\w*(?=,)/g

That will match

那会匹配

HPOTTERHGRANGERRWEASLEYDMALFOYSSNAPEADUMBLED

But not the cn= or the , on either side. The comma and cn= still have to be there for the match, it just isn't included in the result.

但不是两侧的cn =或者。逗号和cn =仍然必须在那里进行匹配,它只是不包含在结果中。

#1


Sounds more like a simple parsing problem and not regex. An ANTLR grammar would sort this out in no time.

听起来更像是一个简单的解析问题,而不是正则表达式。 ANTLR语法会立即对此进行排序。

#2


You will have to perform a grouping. This is done by modifying the regex to:

您必须执行分组。这是通过将正则表达式修改为:

/\bcn=\(\w*\),/g

This will then populate your result into a grouping variable. Depending on your language how to extract this value will differ. (For you with sed the variable will be \1)

然后,这会将您的结果填充到分组变量中。根据您的语言,如何提取此值将有所不同。 (对于你用sed变量将是\ 1)

Note that most regex flavors you don't have to escape the brackets (), but since you're using sed you will need to as shown above.

请注意,大多数正则表达式都不必转义括号(),但由于您使用的是sed,因此需要如上所示。

For an excellent resource on Regular Expressions I suggest: Mastering Regular Expressions

对于正则表达式的优秀资源,我建议:掌握正则表达式

#3


OK, the place where you asked the more specific question was closed as "exact duplicate" of this, so I'm copying my answer from there to here:

好的,你问过更具体问题的地方被关闭为“完全重复”,所以我将我的答案从那里复制到这里:

If you want to use sed, you can use something like the following:

如果您想使用sed,可以使用以下内容:

sed -e 's/dn: cn=\([^,]*\),.*$/\1/'

sed -e's / dn:cn = \([^,] * \),. * $ / \ 1 /'

You have to use [^,]* because in sed, .* is "greedy" meaning it will match everything it can before looking at any following character. That means if you use \(.*\), in your pattern it will match up to the last comma, not up to the first comma.

你必须使用[^,] *,因为在sed中,。*是“贪婪的”意味着在查看任何后续字符之前它将匹配它所能做的一切。这意味着如果您使用\(。* \),在您的模式中它将匹配最后一个逗号,而不是第一个逗号。

#4


Check out Expresso I have used it in the past to build my RegEx. It is good to help learning too.

查看Expresso我过去使用它来构建我的RegEx。帮助学习也很好。

#5


The quick and dirty method is to use submatches assuming your engine supports it:

快速而肮脏的方法是使用子匹配,假设您的引擎支持它:

/\bcn=(\w*),/g

Then you would want to get the first submatch.

然后你会想得到第一个子匹配。

#6


Without knowing what language you're using, we can't tell for sure, but in most regular expression parsers, if you use parenthesis, such as

在不知道你正在使用什么语言的情况下,我们无法确定,但在大多数正则表达式解析器中,如果使用括号,例如

/\bcn=(\w*),/g

then you'll be able to get the first matching pattern (often \1) as exactly what you are searching for. To be more specific, we need to know what language you are using.

那么你将能够得到第一个匹配的模式(通常是\ 1),就像你正在搜索的那样。更具体地说,我们需要知道您使用的语言。

#7


If your regex supports Lookaheads and Lookbehinds then you can use

如果您的正则表达式支持Lookaheads和Lookbehinds,那么您可以使用

/(?<=\bcn=)\w*(?=,)/g

That will match

那会匹配

HPOTTERHGRANGERRWEASLEYDMALFOYSSNAPEADUMBLED

But not the cn= or the , on either side. The comma and cn= still have to be there for the match, it just isn't included in the result.

但不是两侧的cn =或者。逗号和cn =仍然必须在那里进行匹配,它只是不包含在结果中。