如何从字符串中获取字符串,以特定字符串开头和结尾

时间:2021-03-09 20:57:50

I've got a string containing similar text like below

我有一个包含类似文字的字符串,如下所示

Name: John\n Surname: Smith\n Address: XXX\n

It can appear in different order.

它可以以不同的顺序出现。

I want to get the name value, the surname value and the adress value.

我想获取名称值,姓氏值和地址值。

So the question is: how to get a string starting from string "Name: " and ending before "\n", so I get "John" and the code is very readable?

所以问题是:如何从字符串“Name:”开始并在“\ n”之前结束字符串,所以我得到“John”并且代码非常易读?

I tried to use the Substring function but it required modyfing the string so I get the correct index of the "\n" part. And I would prefer not to modify the original string so it's more readable.

我试图使用Substring函数,但它需要对字符串进行修改,以便获得“\ n”部分的正确索引。我宁愿不修改原始字符串,因此它更具可读性。

3 个解决方案

#1


3  

You can convert this string to dictionary (i.e. set of key-value pairs). First split initial string by newline character into array of strings. Then each string from this array split by colon in two parts - key and value:

您可以将此字符串转换为字典(即键值对集)。首先将换行符的初始字符串拆分为字符串数组。然后,这个数组中的每个字符串由冒号分为两部分 - 键和值:

var input = "Name: John\n Surname: Smith\n Address: XXX\n";
var dictionary = input.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
                      .Select(s => s.Split(':'))
                      .ToDictionary(p => p[0].Trim(), p => p[1].Trim());

And then read values by their keys:

然后按键读取值:

var name = dictionary["Name"]; // gives you John

Note: if address or some other field is allowed to contain colon, you can use string.Join option from comment by @Joel Coehoorn when selecting value of dictionary.

注意:如果允许地址或其他字段包含冒号,则可以在选择字典值时使用@Joel Coehoorn的注释中的string.Join选项。

Or you can use regex instead of splitting and joining strings. Just find pattern matches in your input:

或者您可以使用正则表达式而不是拆分和连接字符串。只需在输入中找到模式匹配:

var input = "Name: John\n Surname: Sm:ith\n Address: XX:X\n";
var dictionary = Regex.Matches(input, @"\s*([^:]+): ([^\n]+)\n").Cast<Match>()
                      .ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
var address = dictionary["Address"]; // XX:X

#2


1  

I would use Regex in these type of situations for 2 reasons:

我会在这种情况下使用Regex有两个原因:

  1. It is easier to maintain it in these situations. The Substring, Split, Indexof easily get convoluted when the role of the function increases.
  2. 在这些情况下更容易维护它。当函数的作用增加时,子字符串,拆分,索引很容易变得复杂。
  3. It offers more flexibility for future changes
  4. 它为未来的变化提供了更大的灵活性

Below is the code to parse it:

下面是解析它的代码:

static string ExtractParam(string input, string arg) {
    var match = Regex.Match(input, $@"\b{arg}:\s*(.*?)\n");
    return match.Success ? match.Groups[1].Value : null;
}

static void Main() {
    var input = "Name: John\n Surname: Smith\n Address: XXX\n";

    var name = ExtractParam(input, "Name");
    var surname = ExtractParam(input, "Surname");
    var address = ExtractParam(input, "Address");

    Console.WriteLine($"Name: {name}\n Surname: {surname}\n Address: {address}\n");
}

The regex is very simple to understand.

正则表达式很容易理解。

\b   : Match a word boundary
\s*  : Eat up any unwanted whitespace
.*?  : Match any string in a non-greedy way
()   : Parenthesis are used to capture what we want to return

#3


0  

The answer of @Vikhram was really good :)

@Vikhram的答案非常好:)

I am going to give you other idea. My program works a bit different, it finds all the indexs where the string contains "n", and the this will print the string from the last "\n" to the "\n".

我会给你另一个想法。我的程序工作有点不同,它找到字符串包含“n”的所有索引,并且这将打印从最后一个“\ n”到“\ n”的字符串。

        string test = "Name: John\n Surname: Smith\n Address: XXX\n";

        int fst_index = test.IndexOf("\n");
        int snd_index = test.IndexOf("\n", fst_index+1);
        int trd_index = test.IndexOf("\n", snd_index+1);

        Console.WriteLine(test.Substring(fst_index, snd_index-fst_index));
        Console.WriteLine("SPACE ?");

        Console.WriteLine(test.Substring(snd_index, trd_index - snd_index));
        Console.WriteLine("SPACE ?");

If you are going to use this with a long text you have to use a loop.

如果您打算使用长文本,则必须使用循环。

#1


3  

You can convert this string to dictionary (i.e. set of key-value pairs). First split initial string by newline character into array of strings. Then each string from this array split by colon in two parts - key and value:

您可以将此字符串转换为字典(即键值对集)。首先将换行符的初始字符串拆分为字符串数组。然后,这个数组中的每个字符串由冒号分为两部分 - 键和值:

var input = "Name: John\n Surname: Smith\n Address: XXX\n";
var dictionary = input.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
                      .Select(s => s.Split(':'))
                      .ToDictionary(p => p[0].Trim(), p => p[1].Trim());

And then read values by their keys:

然后按键读取值:

var name = dictionary["Name"]; // gives you John

Note: if address or some other field is allowed to contain colon, you can use string.Join option from comment by @Joel Coehoorn when selecting value of dictionary.

注意:如果允许地址或其他字段包含冒号,则可以在选择字典值时使用@Joel Coehoorn的注释中的string.Join选项。

Or you can use regex instead of splitting and joining strings. Just find pattern matches in your input:

或者您可以使用正则表达式而不是拆分和连接字符串。只需在输入中找到模式匹配:

var input = "Name: John\n Surname: Sm:ith\n Address: XX:X\n";
var dictionary = Regex.Matches(input, @"\s*([^:]+): ([^\n]+)\n").Cast<Match>()
                      .ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
var address = dictionary["Address"]; // XX:X

#2


1  

I would use Regex in these type of situations for 2 reasons:

我会在这种情况下使用Regex有两个原因:

  1. It is easier to maintain it in these situations. The Substring, Split, Indexof easily get convoluted when the role of the function increases.
  2. 在这些情况下更容易维护它。当函数的作用增加时,子字符串,拆分,索引很容易变得复杂。
  3. It offers more flexibility for future changes
  4. 它为未来的变化提供了更大的灵活性

Below is the code to parse it:

下面是解析它的代码:

static string ExtractParam(string input, string arg) {
    var match = Regex.Match(input, $@"\b{arg}:\s*(.*?)\n");
    return match.Success ? match.Groups[1].Value : null;
}

static void Main() {
    var input = "Name: John\n Surname: Smith\n Address: XXX\n";

    var name = ExtractParam(input, "Name");
    var surname = ExtractParam(input, "Surname");
    var address = ExtractParam(input, "Address");

    Console.WriteLine($"Name: {name}\n Surname: {surname}\n Address: {address}\n");
}

The regex is very simple to understand.

正则表达式很容易理解。

\b   : Match a word boundary
\s*  : Eat up any unwanted whitespace
.*?  : Match any string in a non-greedy way
()   : Parenthesis are used to capture what we want to return

#3


0  

The answer of @Vikhram was really good :)

@Vikhram的答案非常好:)

I am going to give you other idea. My program works a bit different, it finds all the indexs where the string contains "n", and the this will print the string from the last "\n" to the "\n".

我会给你另一个想法。我的程序工作有点不同,它找到字符串包含“n”的所有索引,并且这将打印从最后一个“\ n”到“\ n”的字符串。

        string test = "Name: John\n Surname: Smith\n Address: XXX\n";

        int fst_index = test.IndexOf("\n");
        int snd_index = test.IndexOf("\n", fst_index+1);
        int trd_index = test.IndexOf("\n", snd_index+1);

        Console.WriteLine(test.Substring(fst_index, snd_index-fst_index));
        Console.WriteLine("SPACE ?");

        Console.WriteLine(test.Substring(snd_index, trd_index - snd_index));
        Console.WriteLine("SPACE ?");

If you are going to use this with a long text you have to use a loop.

如果您打算使用长文本,则必须使用循环。