I've got a string containing similar text like below
我有一个包含类似文字的字符串,如下所示
Name: John\n Surname: Smith\n Address: XXX\n
It can appear in different order.
它可以以不同的顺序出现。
I want to get the name value, the surname value and the adress value.
我想获取名称值,姓氏值和地址值。
So the question is: how to get a string starting from string "Name: " and ending before "\n", so I get "John" and the code is very readable?
所以问题是:如何从字符串“Name:”开始并在“\ n”之前结束字符串,所以我得到“John”并且代码非常易读?
I tried to use the Substring function but it required modyfing the string so I get the correct index of the "\n" part. And I would prefer not to modify the original string so it's more readable.
我试图使用Substring函数,但它需要对字符串进行修改,以便获得“\ n”部分的正确索引。我宁愿不修改原始字符串,因此它更具可读性。
3 个解决方案
#1
3
You can convert this string to dictionary (i.e. set of key-value pairs). First split initial string by newline character into array of strings. Then each string from this array split by colon in two parts - key and value:
您可以将此字符串转换为字典(即键值对集)。首先将换行符的初始字符串拆分为字符串数组。然后,这个数组中的每个字符串由冒号分为两部分 - 键和值:
var input = "Name: John\n Surname: Smith\n Address: XXX\n";
var dictionary = input.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Split(':'))
.ToDictionary(p => p[0].Trim(), p => p[1].Trim());
And then read values by their keys:
然后按键读取值:
var name = dictionary["Name"]; // gives you John
Note: if address or some other field is allowed to contain colon, you can use string.Join option from comment by @Joel Coehoorn when selecting value of dictionary.
注意:如果允许地址或其他字段包含冒号,则可以在选择字典值时使用@Joel Coehoorn的注释中的string.Join选项。
Or you can use regex instead of splitting and joining strings. Just find pattern matches in your input:
或者您可以使用正则表达式而不是拆分和连接字符串。只需在输入中找到模式匹配:
var input = "Name: John\n Surname: Sm:ith\n Address: XX:X\n";
var dictionary = Regex.Matches(input, @"\s*([^:]+): ([^\n]+)\n").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
var address = dictionary["Address"]; // XX:X
#2
1
I would use Regex
in these type of situations for 2 reasons:
我会在这种情况下使用Regex有两个原因:
- It is easier to maintain it in these situations. The
Substring
,Split
,Indexof
easily get convoluted when the role of the function increases. - 在这些情况下更容易维护它。当函数的作用增加时,子字符串,拆分,索引很容易变得复杂。
- It offers more flexibility for future changes
- 它为未来的变化提供了更大的灵活性
Below is the code to parse it:
下面是解析它的代码:
static string ExtractParam(string input, string arg) {
var match = Regex.Match(input, $@"\b{arg}:\s*(.*?)\n");
return match.Success ? match.Groups[1].Value : null;
}
static void Main() {
var input = "Name: John\n Surname: Smith\n Address: XXX\n";
var name = ExtractParam(input, "Name");
var surname = ExtractParam(input, "Surname");
var address = ExtractParam(input, "Address");
Console.WriteLine($"Name: {name}\n Surname: {surname}\n Address: {address}\n");
}
The regex is very simple to understand.
正则表达式很容易理解。
\b : Match a word boundary
\s* : Eat up any unwanted whitespace
.*? : Match any string in a non-greedy way
() : Parenthesis are used to capture what we want to return
#3
0
The answer of @Vikhram was really good :)
@Vikhram的答案非常好:)
I am going to give you other idea. My program works a bit different, it finds all the indexs where the string contains "n", and the this will print the string from the last "\n" to the "\n".
我会给你另一个想法。我的程序工作有点不同,它找到字符串包含“n”的所有索引,并且这将打印从最后一个“\ n”到“\ n”的字符串。
string test = "Name: John\n Surname: Smith\n Address: XXX\n";
int fst_index = test.IndexOf("\n");
int snd_index = test.IndexOf("\n", fst_index+1);
int trd_index = test.IndexOf("\n", snd_index+1);
Console.WriteLine(test.Substring(fst_index, snd_index-fst_index));
Console.WriteLine("SPACE ?");
Console.WriteLine(test.Substring(snd_index, trd_index - snd_index));
Console.WriteLine("SPACE ?");
If you are going to use this with a long text you have to use a loop.
如果您打算使用长文本,则必须使用循环。
#1
3
You can convert this string to dictionary (i.e. set of key-value pairs). First split initial string by newline character into array of strings. Then each string from this array split by colon in two parts - key and value:
您可以将此字符串转换为字典(即键值对集)。首先将换行符的初始字符串拆分为字符串数组。然后,这个数组中的每个字符串由冒号分为两部分 - 键和值:
var input = "Name: John\n Surname: Smith\n Address: XXX\n";
var dictionary = input.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Split(':'))
.ToDictionary(p => p[0].Trim(), p => p[1].Trim());
And then read values by their keys:
然后按键读取值:
var name = dictionary["Name"]; // gives you John
Note: if address or some other field is allowed to contain colon, you can use string.Join option from comment by @Joel Coehoorn when selecting value of dictionary.
注意:如果允许地址或其他字段包含冒号,则可以在选择字典值时使用@Joel Coehoorn的注释中的string.Join选项。
Or you can use regex instead of splitting and joining strings. Just find pattern matches in your input:
或者您可以使用正则表达式而不是拆分和连接字符串。只需在输入中找到模式匹配:
var input = "Name: John\n Surname: Sm:ith\n Address: XX:X\n";
var dictionary = Regex.Matches(input, @"\s*([^:]+): ([^\n]+)\n").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
var address = dictionary["Address"]; // XX:X
#2
1
I would use Regex
in these type of situations for 2 reasons:
我会在这种情况下使用Regex有两个原因:
- It is easier to maintain it in these situations. The
Substring
,Split
,Indexof
easily get convoluted when the role of the function increases. - 在这些情况下更容易维护它。当函数的作用增加时,子字符串,拆分,索引很容易变得复杂。
- It offers more flexibility for future changes
- 它为未来的变化提供了更大的灵活性
Below is the code to parse it:
下面是解析它的代码:
static string ExtractParam(string input, string arg) {
var match = Regex.Match(input, $@"\b{arg}:\s*(.*?)\n");
return match.Success ? match.Groups[1].Value : null;
}
static void Main() {
var input = "Name: John\n Surname: Smith\n Address: XXX\n";
var name = ExtractParam(input, "Name");
var surname = ExtractParam(input, "Surname");
var address = ExtractParam(input, "Address");
Console.WriteLine($"Name: {name}\n Surname: {surname}\n Address: {address}\n");
}
The regex is very simple to understand.
正则表达式很容易理解。
\b : Match a word boundary
\s* : Eat up any unwanted whitespace
.*? : Match any string in a non-greedy way
() : Parenthesis are used to capture what we want to return
#3
0
The answer of @Vikhram was really good :)
@Vikhram的答案非常好:)
I am going to give you other idea. My program works a bit different, it finds all the indexs where the string contains "n", and the this will print the string from the last "\n" to the "\n".
我会给你另一个想法。我的程序工作有点不同,它找到字符串包含“n”的所有索引,并且这将打印从最后一个“\ n”到“\ n”的字符串。
string test = "Name: John\n Surname: Smith\n Address: XXX\n";
int fst_index = test.IndexOf("\n");
int snd_index = test.IndexOf("\n", fst_index+1);
int trd_index = test.IndexOf("\n", snd_index+1);
Console.WriteLine(test.Substring(fst_index, snd_index-fst_index));
Console.WriteLine("SPACE ?");
Console.WriteLine(test.Substring(snd_index, trd_index - snd_index));
Console.WriteLine("SPACE ?");
If you are going to use this with a long text you have to use a loop.
如果您打算使用长文本,则必须使用循环。