In a program I'm reading in some data files, part of which are formatted as a series of records each in square brackets. Each record contains a section title and a series of key/value pairs.
在一个程序中,我正在读取一些数据文件,其中的一部分被格式化为一系列的记录,每个都在方括号中。每个记录包含一个节标题和一系列键/值对。
I originally wrote code to loop through and extract the values, but decided it could be done more elegantly using regular expressions. Below is my resulting code (I just hacked it out for now in a console app - so know the variable names aren't that great, etc.
我最初编写代码来循环并提取值,但我认为可以使用正则表达式更优雅地完成它。下面是我的结果代码(我刚刚在一个控制台应用程序中把它删掉了——所以要知道变量名不是很好,等等。
Can you suggest improvements? I feel it shouldn't be necessary to do two matches and a substring, but can't figure out how to do it all in one big step:
你能提出改进建议吗?我觉得不需要做两个匹配和一个子字符串,但是我不能在一个大步骤中搞清楚怎么做:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches=Regex.Matches(input, @"\[[^\]]*\]");
foreach (Match match in matches)
{
string subinput = match.Value;
int firstSpace = subinput.IndexOf(' ');
string section = subinput.Substring(1, firstSpace-1);
Console.WriteLine(section);
MatchCollection newMatches = Regex.Matches(subinput.Substring(firstSpace + 1), @"\s*(\w+)\s*=\s*(\w+)\s*");
foreach (Match newMatch in newMatches)
{
Console.WriteLine("{0}={1}", newMatch.Groups[1].Value, newMatch.Groups[2].Value);
}
}
4 个解决方案
#1
7
I prefer named captures, nice formatting, and clarity:
我更喜欢命名捕获、漂亮的格式和清晰性:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, @"\[
(?<sectionName>\S+)
(\s+
(?<key>[^=]+)
=
(?<value>[^ \] ]+)
)+
]", RegexOptions.IgnorePatternWhitespace);
foreach(Match currentMatch in matches)
{
Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
CaptureCollection keys = currentMatch.Groups["key"].Captures;
CaptureCollection values = currentMatch.Groups["value"].Captures;
for(int i = 0; i < keys.Count; i++)
{
Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);
}
}
#2
5
You should take advantage of the collections to get each key. So something like this then:
您应该利用集合来获取每个键。大概是这样的:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
Regex r = new Regex(@"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups[2].Value);
foreach (Capture c in m.Groups[3].Captures)
{
Console.WriteLine(c.Value);
}
}
Resulting output:
输出结果:
section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1
#3
2
You should be able to do something with nested groups like this:
您应该能够对嵌套的组进行如下操作:
pattern = @"\[(\S+)(\s+([^\s=]+)=([^\s\]]+))*\]"
I haven't tested it in C# or looped through the matches, but the results look right on rubular.com
我还没有在c#中测试它或者在比赛中循环测试它,但是结果在rubular.com上看起来是对的
#4
-1
This will match all the key/value pairs ...
这将匹配所有的键/值对…
var input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
var ms = Regex.Matches(input, @"section(\d+)\s*(\w+=\w+)\s*(\w+=\w+)*");
foreach (Match m in ms)
{
Console.WriteLine("Section " + m.Groups[1].Value);
for (var i = 2; i < m.Groups.Count; i++)
{
if( !m.Groups[i].Success ) continue;
var kvp = m.Groups[i].Value.Split( '=' );
Console.WriteLine( "{0}={1}", kvp[0], kvp[1] );
}
}
#1
7
I prefer named captures, nice formatting, and clarity:
我更喜欢命名捕获、漂亮的格式和清晰性:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, @"\[
(?<sectionName>\S+)
(\s+
(?<key>[^=]+)
=
(?<value>[^ \] ]+)
)+
]", RegexOptions.IgnorePatternWhitespace);
foreach(Match currentMatch in matches)
{
Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
CaptureCollection keys = currentMatch.Groups["key"].Captures;
CaptureCollection values = currentMatch.Groups["value"].Captures;
for(int i = 0; i < keys.Count; i++)
{
Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);
}
}
#2
5
You should take advantage of the collections to get each key. So something like this then:
您应该利用集合来获取每个键。大概是这样的:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
Regex r = new Regex(@"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups[2].Value);
foreach (Capture c in m.Groups[3].Captures)
{
Console.WriteLine(c.Value);
}
}
Resulting output:
输出结果:
section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1
#3
2
You should be able to do something with nested groups like this:
您应该能够对嵌套的组进行如下操作:
pattern = @"\[(\S+)(\s+([^\s=]+)=([^\s\]]+))*\]"
I haven't tested it in C# or looped through the matches, but the results look right on rubular.com
我还没有在c#中测试它或者在比赛中循环测试它,但是结果在rubular.com上看起来是对的
#4
-1
This will match all the key/value pairs ...
这将匹配所有的键/值对…
var input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
var ms = Regex.Matches(input, @"section(\d+)\s*(\w+=\w+)\s*(\w+=\w+)*");
foreach (Match m in ms)
{
Console.WriteLine("Section " + m.Groups[1].Value);
for (var i = 2; i < m.Groups.Count; i++)
{
if( !m.Groups[i].Success ) continue;
var kvp = m.Groups[i].Value.Split( '=' );
Console.WriteLine( "{0}={1}", kvp[0], kvp[1] );
}
}