What I have
是)我有的
string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";
What I want
我想要的是
string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";
Bouns:
if the url can be splited like below, it will be better, but if not, that's ok.
Bouns:如果网址可以像下面那样被分割,那会更好,但如果没有,那就没关系。
s[3] = "We are the Best friends";
s[4] = "http://www.c.com";
What's the question
I try to use the code below to split the string,
我尝试使用下面的代码分割字符串是什么问题,
string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
But the result is not good, it seems that the Split method take out all the strings which matched the ImageRegPattern. But I want them to stay. I check the RegEx page on MSDN ,it seems there is no proper method to meet my need. So how to do it?
但结果并不好,似乎Split方法取出了与ImageRegPattern匹配的所有字符串。但我希望他们留下来。我检查MSDN上的RegEx页面,似乎没有适当的方法来满足我的需要。那怎么办呢?
4 个解决方案
#1
4
You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them.
你需要类似这个方法的东西,它首先找到所有的匹配,然后将它们与它们之间不匹配的字符串一起收集到一个列表中。
UPDATE: Added conditional to handle if no matches are found.
更新:如果未找到匹配项,则添加条件以进行处理。
private static IEnumerable<string> InclusiveSplit
(
string source,
string pattern
)
{
List<string> parts = new List<string>();
int currIndex = 0;
// First, find all the matches. These are your separators.
MatchCollection matches =
Regex.Matches(source, pattern,
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
// If there are no matches, there's nothing to split, so just return a
// collection with just the source string in it.
if (matches.Count < 1)
{
parts.Add(source);
}
else
{
foreach (Match match in matches)
{
// If the match begins after our current index, we need to add the
// portion of the source string between the last match and the
// current match.
if (match.Index > currIndex)
{
parts.Add(source.Substring(currIndex, match.Index - currIndex));
}
// Add the matched value, of course, to make the split inclusive.
parts.Add(match.Value);
// Update the current index so we know if the next match has an
// unmatched substring before it.
currIndex = match.Index + match.Length;
}
// Finally, check is there is a bit of unmatched string at the end of the
// source string.
if (currIndex < source.Length)
parts.Add(source.Substring(currIndex));
}
return parts;
}
The output for your example input will be like so:
示例输入的输出将如下所示:
[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"
#2
1
One does not simply underestimate the power of regex:
一个人并不是简单地低估了正则表达式的力量:
(.*?)([A-Z][\w\s]+(?=http|$))
Explanation:
-
(.*?)
: group and match everything until capital letter found, in this group you'll find the url -
(
: start group-
[A-Z]
: match one capital letter -
[\w\s]+
: match any character of a-z, A-Z, 0-9, _, \n, \r, \t, \f " " 1 or more times -
(?=http|$)
: lookahead, check if what follows ishttp
or end of line -
)
: close group (here you'll find the text)
[A-Z]:匹配一个大写字母
[\ w \ s] +:匹配a-z,A-Z,0-9,_,\ n,\ r,\ t,\ t,“f”的任何字符1次或多次
(?= http | $):lookahead,检查后面是http还是行尾
):关闭组(在这里你会找到文字)
-
(。*?):分组并匹配所有内容,直到找到大写字母,在此组中您将找到该网址
(:开始组[AZ]:匹配一个大写字母[\ w \ s] +:匹配az,AZ,0-9,_,\ n,\ r,\ t,\ t,“f”的任何字符1或更多times(?= http | $):lookahead,检查后面是http还是行尾):close group(这里你会找到文本)
Note: This solution is for matching the string, not splitting it.
注意:此解决方案用于匹配字符串,而不是将其拆分。
#3
0
I think you need a multi-step process to insert a delimiter that can then be used by the String.Split
command:
我认为您需要一个多步骤过程来插入分隔符,然后可以由String.Split命令使用:
resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
a = a.Substring(1);
string a = resultString.Split('|');
#4
0
The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. That being said, it's not impossible to use split.
这里显而易见的答案当然不是使用拆分,而是匹配图像模式并检索它们。话虽如此,使用拆分并非不可能。
string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"
This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg
, .gif
or .png
.
这将匹配字符串中跟随图像网址的任何点,或者以.jpg,.gif或.png开头的点。
I really don't recommend doing it this way, I'm just saying you can.
我真的不建议这样做,我只是说你可以。
#1
4
You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them.
你需要类似这个方法的东西,它首先找到所有的匹配,然后将它们与它们之间不匹配的字符串一起收集到一个列表中。
UPDATE: Added conditional to handle if no matches are found.
更新:如果未找到匹配项,则添加条件以进行处理。
private static IEnumerable<string> InclusiveSplit
(
string source,
string pattern
)
{
List<string> parts = new List<string>();
int currIndex = 0;
// First, find all the matches. These are your separators.
MatchCollection matches =
Regex.Matches(source, pattern,
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
// If there are no matches, there's nothing to split, so just return a
// collection with just the source string in it.
if (matches.Count < 1)
{
parts.Add(source);
}
else
{
foreach (Match match in matches)
{
// If the match begins after our current index, we need to add the
// portion of the source string between the last match and the
// current match.
if (match.Index > currIndex)
{
parts.Add(source.Substring(currIndex, match.Index - currIndex));
}
// Add the matched value, of course, to make the split inclusive.
parts.Add(match.Value);
// Update the current index so we know if the next match has an
// unmatched substring before it.
currIndex = match.Index + match.Length;
}
// Finally, check is there is a bit of unmatched string at the end of the
// source string.
if (currIndex < source.Length)
parts.Add(source.Substring(currIndex));
}
return parts;
}
The output for your example input will be like so:
示例输入的输出将如下所示:
[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"
#2
1
One does not simply underestimate the power of regex:
一个人并不是简单地低估了正则表达式的力量:
(.*?)([A-Z][\w\s]+(?=http|$))
Explanation:
-
(.*?)
: group and match everything until capital letter found, in this group you'll find the url -
(
: start group-
[A-Z]
: match one capital letter -
[\w\s]+
: match any character of a-z, A-Z, 0-9, _, \n, \r, \t, \f " " 1 or more times -
(?=http|$)
: lookahead, check if what follows ishttp
or end of line -
)
: close group (here you'll find the text)
[A-Z]:匹配一个大写字母
[\ w \ s] +:匹配a-z,A-Z,0-9,_,\ n,\ r,\ t,\ t,“f”的任何字符1次或多次
(?= http | $):lookahead,检查后面是http还是行尾
):关闭组(在这里你会找到文字)
-
(。*?):分组并匹配所有内容,直到找到大写字母,在此组中您将找到该网址
(:开始组[AZ]:匹配一个大写字母[\ w \ s] +:匹配az,AZ,0-9,_,\ n,\ r,\ t,\ t,“f”的任何字符1或更多times(?= http | $):lookahead,检查后面是http还是行尾):close group(这里你会找到文本)
Note: This solution is for matching the string, not splitting it.
注意:此解决方案用于匹配字符串,而不是将其拆分。
#3
0
I think you need a multi-step process to insert a delimiter that can then be used by the String.Split
command:
我认为您需要一个多步骤过程来插入分隔符,然后可以由String.Split命令使用:
resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
a = a.Substring(1);
string a = resultString.Split('|');
#4
0
The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. That being said, it's not impossible to use split.
这里显而易见的答案当然不是使用拆分,而是匹配图像模式并检索它们。话虽如此,使用拆分并非不可能。
string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"
This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg
, .gif
or .png
.
这将匹配字符串中跟随图像网址的任何点,或者以.jpg,.gif或.png开头的点。
I really don't recommend doing it this way, I'm just saying you can.
我真的不建议这样做,我只是说你可以。