如何为这种情况编写正则表达式?

时间:2022-06-23 13:50:19

For example, I have a string :

例如,我有一个字符串:

/div1/div2[/div3[/div4]]/div5/div6[/div7]

Now I want to split the content by "/" and ignore the content in the "[ ]".

现在我想用“/”分割内容并忽略“[]”中的内容。

The result should be:

结果应该是:

  1. div1
  2. div2[/div3[/div4]]
  3. div5
  4. div6[/div7]

How can I get the result using regular expression? My programming language is JavaScript.

如何使用正则表达式获得结果?我的编程语言是JavaScript。

7 个解决方案

#1


This works...

using System;using System.Text.RegularExpressions;class Program{    static void Main(string[] args)    {        string testCase = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";        //string pattern = "(?<Match>/div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";        string pattern = "(?<Match>div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";        Regex rx = new Regex(pattern);        MatchCollection matches = rx.Matches(testCase);        foreach (Match match in matches)             Console.WriteLine(match.Value);        Console.ReadLine();    }}

Courtesy of... http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/

礼貌...... http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/

#2


You can't do this with regular expressions because it's recursive. (That answers your question, now to see if I can solve the problem elegantly...)

您不能使用正则表达式执行此操作,因为它是递归的。 (这回答了你的问题,现在看看我是否可以优雅地解决问题...)

Edit: aem tipped me off! :D

编辑:aem让我失望! :d

Works as long as every [ is followed by /. It does not verify that the string is in the correct format.

只要每个[后跟/。它不验证字符串格式是否正确。

string temp = text.Replace("[/", "[");string[] elements = temp.Split('/').Select(element => element.Replace("[", "[/")).ToArray();

#3


You can first translate the two-character sequence [/ into another character or sequence that you know won't appear in the input, then split the string on / boundaries, then re-translate the translated sequence back into [/ in the result strings. This doesn't even require regular expressions. :)

您可以先将双字符序列[/转换为您知道不会出现在输入中的另一个字符或序列,然后将字符串拆分为/ boundary,然后将翻译后的序列重新转换回[/在结果字符串中] 。这甚至不需要正则表达式。 :)

For instance, if you know that [ won't appear on its own in your input sequences, you could replace [/ with [ in the initial step.

例如,如果您知道[在输入序列中不会单独显示,则可以在初始步骤中替换[/]。

#4


Judging by your posting history, I'll guess you're talking about C# (.NET) regexes. In that case, this should work:

根据您的发布历史判断,我猜您在谈论C#(.NET)正则表达式。在这种情况下,这应该工作:

Regex.Split(target, @"(?<!\[)/");

This assumes every non-delimiter / is immediately preceded by a left square bracket, as in your sample data.

这假定每个非分隔符/前面都有一个左方括号,就像在样本数据中一样。

You should always specify which regex flavor you're working with. This technique, for example, requires a flavor that supports lookbehinds. Off the top of my head, that includes Perl, PHP, Python and Java, but not JavaScript.

您应该始终指定您正在使用的正则表达式。例如,这种技术需要支持外观的风味。在我的头脑中,包括Perl,PHP,Python和Java,但不包括JavaScript。

EDIT: Here's a demonstration in Java:

编辑:这是Java的演示:

public class Test{  public static void main(String[] args)  {    String str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";    String[] parts = str.split("(?<!\\[)/");    for (String s : parts)    {      System.out.println(s);    }  }}

output:

div1div2[/div3[/div4]]div5div6[/div7]  

Of course, I'm relying on some simplifying assumptions here. I trust you'll let me know if any of my assumptions are wrong, Mike. :)

当然,我在这里依赖于一些简化的假设。如果我的任何假设都错了,我相信你会告诉我的,迈克。 :)

EDIT: Still waiting on a ruling from Mike about the assumptions, but Chris Lutz brought up a good point in his comment to 280Z28. At the root level in the sample string, there are two places where you see two contiguous /divN tokens, but at every other level the tokens are always isolated from each other by square brackets. My solution, like 280Z28's, assumes that will always be true, but what if the data looked like this?

编辑:仍在等待迈克关于这些假设的裁决,但Chris Lutz在他对280Z28的评论中提出了一个很好的观点。在示例字符串的根级别,有两个地方可以看到两个连续/ divN标记,但在每个其他级别,标记始终通过方括号彼此隔离。我的解决方案,如280Z28,假设总是如此,但如果数据看起来像这样呢?

/div1/div2[/div3/div8[/div4]/div9]/div5/div6[/div7]  

Now we've got two places where a non-delimiter slash is not preceded by a left square bracket, but the basic idea is. Starting from any point the root level, if you scan forward looking for square brackets, the first one you find will always be a left (or opening) bracket. If you scan backward, you'll always find a right (or closing) bracket first. If both of those conditions are not true, you're not at the root level. Translating that to lookarounds, you get this:

现在我们有两个地方,非分隔符斜杠前面没有左方括号,但基本的想法是。从任何一点开始,如果你向前扫描寻找方括号,你找到的第一个将始终是一个左(或开口)括号。如果向后扫描,您将始终首先找到正确(或关闭)括号。如果这两个条件都不正确,那么您就不在根级别。将其翻译为外观,你得到这个:

/(?![^\[\]]*\])(?<!\[[^\[\]]*)

I know it's getting pretty gnarly, but I'll this take over that godawful recursion stuff any day of the week. ;) Another nice thing is that you don't have to know anything about the tokens except that they start with slashes and don't contain any square brackets. By the way, this regex contains a lookbehind that can match any number of characters; the list of regex flavors that support that is very short indeed, but .NET can do it.

我知道它变得非常粗糙,但我会在一周的任何一天接管那些神圣的递归。 ;)另一个好处是你不必知道任何关于令牌的事情,除了它们以斜杠开头并且不包含任何方括号。顺便说一下,这个正则表达式包含一个可以匹配任意数量字符的lookbehind;支持它的正则表达式列表确实非常短,但.NET可以做到这一点。

#5


experimental example, using PHP and split approach, but only tested on sample string.

实验示例,使用PHP和拆分方法,但仅在样本字符串上进行测试。

$str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]/div8";// split on "/"$s = explode("/",$str);foreach ($s as $k=>$v){    // if no [ or ] in the item    if( strpos($v,"[")===FALSE && strpos($v,"]") ===FALSE){        print "\n";        print $v."\n";    }else{        print $v . "/";    }}

output:

div1div2[/div3[/div4]]/div5div6[/div7]/div8

Note: there is "/" at the end so just a bit of trimming will get desired result.

注意:最后有“/”,所以只需进行一些修剪即可得到理想的结果。

#6


s/\/(div\d{0,}(?:\[.*?\])?)/$1\n/

#7


Without knowing which regex engine you are targeting i can only guess what would work for you. If you are using .Net, have a look here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

在不知道您所针对的正则表达式引擎的情况下,我只能猜测出什么对您有用。如果您使用的是.Net,请查看:http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

If you're using perl, have a look here: http://metacpan.org/pod/Regexp::Common::balanced

如果你正在使用perl,请看一下:http://metacpan.org/pod/Regexp :::Common :: balance

#1


This works...

using System;using System.Text.RegularExpressions;class Program{    static void Main(string[] args)    {        string testCase = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";        //string pattern = "(?<Match>/div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";        string pattern = "(?<Match>div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";        Regex rx = new Regex(pattern);        MatchCollection matches = rx.Matches(testCase);        foreach (Match match in matches)             Console.WriteLine(match.Value);        Console.ReadLine();    }}

Courtesy of... http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/

礼貌...... http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/

#2


You can't do this with regular expressions because it's recursive. (That answers your question, now to see if I can solve the problem elegantly...)

您不能使用正则表达式执行此操作,因为它是递归的。 (这回答了你的问题,现在看看我是否可以优雅地解决问题...)

Edit: aem tipped me off! :D

编辑:aem让我失望! :d

Works as long as every [ is followed by /. It does not verify that the string is in the correct format.

只要每个[后跟/。它不验证字符串格式是否正确。

string temp = text.Replace("[/", "[");string[] elements = temp.Split('/').Select(element => element.Replace("[", "[/")).ToArray();

#3


You can first translate the two-character sequence [/ into another character or sequence that you know won't appear in the input, then split the string on / boundaries, then re-translate the translated sequence back into [/ in the result strings. This doesn't even require regular expressions. :)

您可以先将双字符序列[/转换为您知道不会出现在输入中的另一个字符或序列,然后将字符串拆分为/ boundary,然后将翻译后的序列重新转换回[/在结果字符串中] 。这甚至不需要正则表达式。 :)

For instance, if you know that [ won't appear on its own in your input sequences, you could replace [/ with [ in the initial step.

例如,如果您知道[在输入序列中不会单独显示,则可以在初始步骤中替换[/]。

#4


Judging by your posting history, I'll guess you're talking about C# (.NET) regexes. In that case, this should work:

根据您的发布历史判断,我猜您在谈论C#(.NET)正则表达式。在这种情况下,这应该工作:

Regex.Split(target, @"(?<!\[)/");

This assumes every non-delimiter / is immediately preceded by a left square bracket, as in your sample data.

这假定每个非分隔符/前面都有一个左方括号,就像在样本数据中一样。

You should always specify which regex flavor you're working with. This technique, for example, requires a flavor that supports lookbehinds. Off the top of my head, that includes Perl, PHP, Python and Java, but not JavaScript.

您应该始终指定您正在使用的正则表达式。例如,这种技术需要支持外观的风味。在我的头脑中,包括Perl,PHP,Python和Java,但不包括JavaScript。

EDIT: Here's a demonstration in Java:

编辑:这是Java的演示:

public class Test{  public static void main(String[] args)  {    String str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";    String[] parts = str.split("(?<!\\[)/");    for (String s : parts)    {      System.out.println(s);    }  }}

output:

div1div2[/div3[/div4]]div5div6[/div7]  

Of course, I'm relying on some simplifying assumptions here. I trust you'll let me know if any of my assumptions are wrong, Mike. :)

当然,我在这里依赖于一些简化的假设。如果我的任何假设都错了,我相信你会告诉我的,迈克。 :)

EDIT: Still waiting on a ruling from Mike about the assumptions, but Chris Lutz brought up a good point in his comment to 280Z28. At the root level in the sample string, there are two places where you see two contiguous /divN tokens, but at every other level the tokens are always isolated from each other by square brackets. My solution, like 280Z28's, assumes that will always be true, but what if the data looked like this?

编辑:仍在等待迈克关于这些假设的裁决,但Chris Lutz在他对280Z28的评论中提出了一个很好的观点。在示例字符串的根级别,有两个地方可以看到两个连续/ divN标记,但在每个其他级别,标记始终通过方括号彼此隔离。我的解决方案,如280Z28,假设总是如此,但如果数据看起来像这样呢?

/div1/div2[/div3/div8[/div4]/div9]/div5/div6[/div7]  

Now we've got two places where a non-delimiter slash is not preceded by a left square bracket, but the basic idea is. Starting from any point the root level, if you scan forward looking for square brackets, the first one you find will always be a left (or opening) bracket. If you scan backward, you'll always find a right (or closing) bracket first. If both of those conditions are not true, you're not at the root level. Translating that to lookarounds, you get this:

现在我们有两个地方,非分隔符斜杠前面没有左方括号,但基本的想法是。从任何一点开始,如果你向前扫描寻找方括号,你找到的第一个将始终是一个左(或开口)括号。如果向后扫描,您将始终首先找到正确(或关闭)括号。如果这两个条件都不正确,那么您就不在根级别。将其翻译为外观,你得到这个:

/(?![^\[\]]*\])(?<!\[[^\[\]]*)

I know it's getting pretty gnarly, but I'll this take over that godawful recursion stuff any day of the week. ;) Another nice thing is that you don't have to know anything about the tokens except that they start with slashes and don't contain any square brackets. By the way, this regex contains a lookbehind that can match any number of characters; the list of regex flavors that support that is very short indeed, but .NET can do it.

我知道它变得非常粗糙,但我会在一周的任何一天接管那些神圣的递归。 ;)另一个好处是你不必知道任何关于令牌的事情,除了它们以斜杠开头并且不包含任何方括号。顺便说一下,这个正则表达式包含一个可以匹配任意数量字符的lookbehind;支持它的正则表达式列表确实非常短,但.NET可以做到这一点。

#5


experimental example, using PHP and split approach, but only tested on sample string.

实验示例,使用PHP和拆分方法,但仅在样本字符串上进行测试。

$str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]/div8";// split on "/"$s = explode("/",$str);foreach ($s as $k=>$v){    // if no [ or ] in the item    if( strpos($v,"[")===FALSE && strpos($v,"]") ===FALSE){        print "\n";        print $v."\n";    }else{        print $v . "/";    }}

output:

div1div2[/div3[/div4]]/div5div6[/div7]/div8

Note: there is "/" at the end so just a bit of trimming will get desired result.

注意:最后有“/”,所以只需进行一些修剪即可得到理想的结果。

#6


s/\/(div\d{0,}(?:\[.*?\])?)/$1\n/

#7


Without knowing which regex engine you are targeting i can only guess what would work for you. If you are using .Net, have a look here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

在不知道您所针对的正则表达式引擎的情况下,我只能猜测出什么对您有用。如果您使用的是.Net,请查看:http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

If you're using perl, have a look here: http://metacpan.org/pod/Regexp::Common::balanced

如果你正在使用perl,请看一下:http://metacpan.org/pod/Regexp :::Common :: balance