I have a string str="<u>rag</u>"
. Now, i want to get the string "rag"
only. How can I get it using regex?
我有一个字符串str =“ rag ”。现在,我只想获得字符串“rag”。如何使用正则表达式获取它?
My code is here..
我的代码在这里..
I got the output=""
我得到了输出=“”
Thanks in advance..
提前致谢..
C# code:
string input="<u>ragu</u>";
string regex = "(\\<.*\\>)";
string output = Regex.Replace(input, regex, "");
5 个解决方案
#1
4
Using regex
for parsing html is not recommended
不建议使用正则表达式来解析html
regex
is used for regularly occurring patterns.html
is not regular with it's format(except xhtml
).For example html
files are valid even if you don't have a closing tag
!This could break your code.
regex用于定期发生的patterns.html不是常规的格式(xhtml除外)。例如,即使你没有结束标记,html文件也是有效的!这可能会破坏你的代码。
Use an html parser like htmlagilitypack
使用像htmlagilitypack这样的html解析器
WARNING {Don't try this in your code}
警告{请勿在代码中尝试此操作}
To solve your regex problem!
解决你的正则表达式问题!
<.*>
replaces <
followed by 0 to many characters(i.e u>rag</u
) till last >
<。*>将 <后跟0到多个字符(即u> rag
You should replace it with this regex
你应该用这个正则表达式替换它
<.*?>
.*
is greedy i.e it would eat as many characters as it matches
。*是贪婪的,即它会吃掉尽可能多的字符
.*?
is lazy i.e it would eat as less characters as possible
。*?是懒惰,即尽可能少吃人物
#2
7
const string HTML_TAG_PATTERN = "<.*?>";
Regex.Replace (str, HTML_TAG_PATTERN, string.Empty);
#3
1
You don't need to use regex for that.
你不需要使用正则表达式。
string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", "");
Console.WriteLine(input);
#4
0
Sure you can:
你当然可以:
string input = "<u>ragu</u>";
string regex = "(\\<[/]?[a-z]\\>)";
string output = Regex.Replace(input, regex, "");
#5
0
Your code was almost correct, a small modification makes it work:
你的代码几乎是正确的,一个小修改使它工作:
string input = "<u>ragu</u>";
string regex = @"<.*?\>";
string output = Regex.Replace(input, regex, string.empty);
Output is 'ragu'.
输出是'ragu'。
EDIT: this solution may not be the best. Interesting remark from user the-land-of-devils-srilanka: do not use regex to parse HTML. Indeed, see also RegEx match open tags except XHTML self-contained tags.
编辑:这个解决方案可能不是最好的。来自用户the-land-of-devils-srilanka的有趣评论:不要使用正则表达式来解析HTML。实际上,除了XHTML自包含标签外,还可以看到RegEx匹配开放标签。
#1
4
Using regex
for parsing html is not recommended
不建议使用正则表达式来解析html
regex
is used for regularly occurring patterns.html
is not regular with it's format(except xhtml
).For example html
files are valid even if you don't have a closing tag
!This could break your code.
regex用于定期发生的patterns.html不是常规的格式(xhtml除外)。例如,即使你没有结束标记,html文件也是有效的!这可能会破坏你的代码。
Use an html parser like htmlagilitypack
使用像htmlagilitypack这样的html解析器
WARNING {Don't try this in your code}
警告{请勿在代码中尝试此操作}
To solve your regex problem!
解决你的正则表达式问题!
<.*>
replaces <
followed by 0 to many characters(i.e u>rag</u
) till last >
<。*>将 <后跟0到多个字符(即u> rag
You should replace it with this regex
你应该用这个正则表达式替换它
<.*?>
.*
is greedy i.e it would eat as many characters as it matches
。*是贪婪的,即它会吃掉尽可能多的字符
.*?
is lazy i.e it would eat as less characters as possible
。*?是懒惰,即尽可能少吃人物
#2
7
const string HTML_TAG_PATTERN = "<.*?>";
Regex.Replace (str, HTML_TAG_PATTERN, string.Empty);
#3
1
You don't need to use regex for that.
你不需要使用正则表达式。
string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", "");
Console.WriteLine(input);
#4
0
Sure you can:
你当然可以:
string input = "<u>ragu</u>";
string regex = "(\\<[/]?[a-z]\\>)";
string output = Regex.Replace(input, regex, "");
#5
0
Your code was almost correct, a small modification makes it work:
你的代码几乎是正确的,一个小修改使它工作:
string input = "<u>ragu</u>";
string regex = @"<.*?\>";
string output = Regex.Replace(input, regex, string.empty);
Output is 'ragu'.
输出是'ragu'。
EDIT: this solution may not be the best. Interesting remark from user the-land-of-devils-srilanka: do not use regex to parse HTML. Indeed, see also RegEx match open tags except XHTML self-contained tags.
编辑:这个解决方案可能不是最好的。来自用户the-land-of-devils-srilanka的有趣评论:不要使用正则表达式来解析HTML。实际上,除了XHTML自包含标签外,还可以看到RegEx匹配开放标签。