用于从xml输入中提取数字的正则表达式模式是什么?

时间:2021-05-09 18:57:02

My input text is as blow:

我的输入文字就像打击一样:

<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">2</string>

What regex pattern to use to extract the number from the above input?

用什么正则表达式从上面的输入中提取数字?

var pattern = "<string ?>?</string>"; // how to write this?
var match = Regex.Match(input, pattern, RegexOptions.IgnoreCase);

Thanks,

4 个解决方案

#1


2  

Another aproach using LinqToXml:

另一个使用LinqToXml的方法:

var ele = XElement.Parse("<string xmlns=\"http://schemas.microsoft.com/2003/10/Serialization/\">2</string>");
var valueString = ele.Value; //valueString = "2";

Update

And for regex: I would extend solution from @Oded with (?<=startRegex) and (?=endRegex) (lookbehind and lookahead), so the unnecessary <string> tag will be omitted in match value.

对于正则表达式:我会使用(?<= startRegex)和(?= endRegex)(lookbehind和lookahead)从@Oded扩展解决方案,因此匹配值中将省略不必要的 标记。

(?<=<string[^>]+>)([0-9]+)(?=</string>)

#2


5  

This pattern should do the trick:

这种模式应该可以解决问题:

"<string[^>]+>([0-9]+)</string>"

Breakdown:

<string   - Match the string <string
[^>]+     - Followed by one or more characters that are not >
>         - Followed by >
(         - Start capturing group
[0-9]+    - Followed by one or more of the digits 0-9
)         - End capturing group
</string> - Followed by the string </string>

If the example is the whole string, you may want to anchor it using ^ and $ at the start and end respectively.

如果示例是整个字符串,您可能希望分别使用^和$在开头和结尾处锚定它。

Note I am using [0-9] and not \d, as in .NET \d will match on any Unicode numeral.

注意我使用[0-9]而不是\ d,因为在.NET \ d中将匹配任何Unicode数字。

#3


1  

Here is the non-regex way of doing it.

这是非正则表达方式。

string str = "<string xmlns=\"http://schemas.microsoft.com/2003/10/Serialization/\">2</string>";
int startIndex = str.IndexOf('>');
int endIndex = str.LastIndexOf('<');
int numberLenght =  (endIndex - startIndex) - 1;
string result = str.Substring(startIndex + 1, numberLenght);

#4


1  

You can use this method to extract the number:

您可以使用此方法提取数字:

    /// <summary>
    /// Example for how to extract the number from an xml string.
    /// </summary>
    /// <param name="xml"></param>
    /// <returns></returns>
    private string ExtractNumber(string xml)
    {
        // Extracted number.
        string number = string.Empty;

        // Input text
        xml = @"<string xmlns=""http://schemas.microsoft.com/2003/10/Serialization/"">2</string>";

        // The regular expression for the match.
        // You can use the parentesis to isolate the desired number into a group match. "(\d+?)"
        var pattern = @"<string.*?>(\d+?)</string>";

        // Match the desired part of the xml.
        var match = Regex.Match(xml, pattern);

        // Verify if the match has sucess.
        if (match.Success)
        {
            // Finally, use the group value to isolate the number.
            number = match.Groups[1].Value;
        }

        return number;
    }

This is the way that I used to solve this problem.

这是我用来解决这个问题的方式。

#1


2  

Another aproach using LinqToXml:

另一个使用LinqToXml的方法:

var ele = XElement.Parse("<string xmlns=\"http://schemas.microsoft.com/2003/10/Serialization/\">2</string>");
var valueString = ele.Value; //valueString = "2";

Update

And for regex: I would extend solution from @Oded with (?<=startRegex) and (?=endRegex) (lookbehind and lookahead), so the unnecessary <string> tag will be omitted in match value.

对于正则表达式:我会使用(?<= startRegex)和(?= endRegex)(lookbehind和lookahead)从@Oded扩展解决方案,因此匹配值中将省略不必要的 标记。

(?<=<string[^>]+>)([0-9]+)(?=</string>)

#2


5  

This pattern should do the trick:

这种模式应该可以解决问题:

"<string[^>]+>([0-9]+)</string>"

Breakdown:

<string   - Match the string <string
[^>]+     - Followed by one or more characters that are not >
>         - Followed by >
(         - Start capturing group
[0-9]+    - Followed by one or more of the digits 0-9
)         - End capturing group
</string> - Followed by the string </string>

If the example is the whole string, you may want to anchor it using ^ and $ at the start and end respectively.

如果示例是整个字符串,您可能希望分别使用^和$在开头和结尾处锚定它。

Note I am using [0-9] and not \d, as in .NET \d will match on any Unicode numeral.

注意我使用[0-9]而不是\ d,因为在.NET \ d中将匹配任何Unicode数字。

#3


1  

Here is the non-regex way of doing it.

这是非正则表达方式。

string str = "<string xmlns=\"http://schemas.microsoft.com/2003/10/Serialization/\">2</string>";
int startIndex = str.IndexOf('>');
int endIndex = str.LastIndexOf('<');
int numberLenght =  (endIndex - startIndex) - 1;
string result = str.Substring(startIndex + 1, numberLenght);

#4


1  

You can use this method to extract the number:

您可以使用此方法提取数字:

    /// <summary>
    /// Example for how to extract the number from an xml string.
    /// </summary>
    /// <param name="xml"></param>
    /// <returns></returns>
    private string ExtractNumber(string xml)
    {
        // Extracted number.
        string number = string.Empty;

        // Input text
        xml = @"<string xmlns=""http://schemas.microsoft.com/2003/10/Serialization/"">2</string>";

        // The regular expression for the match.
        // You can use the parentesis to isolate the desired number into a group match. "(\d+?)"
        var pattern = @"<string.*?>(\d+?)</string>";

        // Match the desired part of the xml.
        var match = Regex.Match(xml, pattern);

        // Verify if the match has sucess.
        if (match.Success)
        {
            // Finally, use the group value to isolate the number.
            number = match.Groups[1].Value;
        }

        return number;
    }

This is the way that I used to solve this problem.

这是我用来解决这个问题的方式。