I am trying to extract text in between an xml tag. The text in between the tag is multilingual. For example:
我试图在xml标记之间提取文本。标签之间的文本是多语言的。例如:
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">
तुम्हारा नाम क्या है
</string>
I have tried to google it and got a few regexes but that didn't work Here is one I have tried:
我试过谷歌,并获得了一些regex,但在这里不起作用的是我试过的一个:
String str = "<string xmlns="+
"http://schemas.microsoft.com/2003/10/Serialization/"+">"+
"तुम्हारा नाम क्या है"+"</string>";
final Pattern pattern = Pattern.compile("<String xmlns="+
"http://schemas.microsoft.com/2003/10/Serialization/"+">(.+?)</string>");
final Matcher matcher = pattern.matcher(str);
matcher.find();
System.out.println(matcher.group(1));
The given String
format is
给定的字符串格式是
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">
तुम्हारा नाम क्या है
</string>
and the expected output is:
期望的输出是:
तुम्हारा नाम क्या है
It's giving me an error
它给了我一个错误
2 个解决方案
#1
4
This pattern matches expected part and $1
gives you expected result:
此模式匹配预期部分,$1给出您预期的结果:
/<string .*?>(.*?)<\\/string>/
Online Demo
But highly recommended to stop doing that by regex ..! You have to find a HTML parser in JAVA and simply grab the content of <string>
tag.
但是强烈建议不要再这样做了。您必须在JAVA中找到一个HTML解析器,并简单地获取
#2
0
Don’t use regular expressions for parsing XML. It will work in a few cases, but eventually it will fail. See Can you provide some examples of why it is hard to parse XML and HTML with a regex? for a full explanation.
不要使用正则表达式解析XML。它在一些情况下会起作用,但最终会失败。可以提供一些示例来说明为什么使用regex很难解析XML和HTML ?为一个完整的解释。
The easiest way to extract an element’s string content is with XPath:
提取元素字符串内容的最简单方法是使用XPath:
String contents =
XPathFactory.newInstance().newXPath().evaluate(
"//*[local-name()='string']",
new InputSource(new StringReader(str)));
#1
4
This pattern matches expected part and $1
gives you expected result:
此模式匹配预期部分,$1给出您预期的结果:
/<string .*?>(.*?)<\\/string>/
Online Demo
But highly recommended to stop doing that by regex ..! You have to find a HTML parser in JAVA and simply grab the content of <string>
tag.
但是强烈建议不要再这样做了。您必须在JAVA中找到一个HTML解析器,并简单地获取
#2
0
Don’t use regular expressions for parsing XML. It will work in a few cases, but eventually it will fail. See Can you provide some examples of why it is hard to parse XML and HTML with a regex? for a full explanation.
不要使用正则表达式解析XML。它在一些情况下会起作用,但最终会失败。可以提供一些示例来说明为什么使用regex很难解析XML和HTML ?为一个完整的解释。
The easiest way to extract an element’s string content is with XPath:
提取元素字符串内容的最简单方法是使用XPath:
String contents =
XPathFactory.newInstance().newXPath().evaluate(
"//*[local-name()='string']",
new InputSource(new StringReader(str)));