I have a string like this:
我有一个像这样的字符串:
This <span class="highlight">is</span> a very "nice" day!
这个是 非常“美好”的一天!
What should my RegEx-pattern in VB look like, to find the quotes within the tag? I want to replace it with something...
我的VB中的RegEx模式应该是什么样的,在标签中找到引号?我想用一些东西替换它......
This <span class=^highlight^>is</span> a very "nice" day!
这个是 非常“美好”的一天!
Something like <(")[^>]+> doesn't work :(
像<(“)[^>] +>之类的东西不起作用:(
Thanks
5 个解决方案
#1
10
It depends on your regex flavor, but this works for most of them:
这取决于你的正则表达式风格,但这适用于大多数:
"(?=[^<]*>)
EDIT: For anyone curious how this works. This translates into English as "Find a quote that is followed by a > before the next <".
编辑:对于任何好奇这是如何工作的人。这将英语翻译为“查找在下一个 <之前跟随> 的引用”。
#2
2
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
正则表达式在解析HTML方面基本上是不好的(参见你能提供一些为什么难以用正则表达式解析XML和HTML的例子吗?)。你需要的是一个HTML解析器。请参阅您是否提供了使用您喜欢的解析器解析HTML的示例?例如,使用各种解析器。
If you are using VB.net you should be able to use HTMLAgilityPack.
如果您使用的是VB.net,则应该可以使用HTMLAgilityPack。
#3
-1
Try this: <span class="([^"]+?)?">
试试这个:
#4
-1
This should get your the first attribute value in a tag:
这应该是标记中的第一个属性值:
<[^">]+"(?<value>[^"]*)"[^>]*>
#5
-1
If your intention is to replace ALL quotation marks within tags, you could use the following regular expression:
如果您打算替换标记中的所有引号,则可以使用以下正则表达式:
(<[^>"]*)(")([^>]*>)
That will isolate the substrings before and after your quotation mark. Note that this does not attempt to match opening and closing quotation marks. It simply matches a quotation mark within a tag.
这将隔离引号前后的子串。请注意,这不会尝试匹配开始和结束引号。它只是匹配标记中的引号。
#1
10
It depends on your regex flavor, but this works for most of them:
这取决于你的正则表达式风格,但这适用于大多数:
"(?=[^<]*>)
EDIT: For anyone curious how this works. This translates into English as "Find a quote that is followed by a > before the next <".
编辑:对于任何好奇这是如何工作的人。这将英语翻译为“查找在下一个 <之前跟随> 的引用”。
#2
2
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
正则表达式在解析HTML方面基本上是不好的(参见你能提供一些为什么难以用正则表达式解析XML和HTML的例子吗?)。你需要的是一个HTML解析器。请参阅您是否提供了使用您喜欢的解析器解析HTML的示例?例如,使用各种解析器。
If you are using VB.net you should be able to use HTMLAgilityPack.
如果您使用的是VB.net,则应该可以使用HTMLAgilityPack。
#3
-1
Try this: <span class="([^"]+?)?">
试试这个:
#4
-1
This should get your the first attribute value in a tag:
这应该是标记中的第一个属性值:
<[^">]+"(?<value>[^"]*)"[^>]*>
#5
-1
If your intention is to replace ALL quotation marks within tags, you could use the following regular expression:
如果您打算替换标记中的所有引号,则可以使用以下正则表达式:
(<[^>"]*)(")([^>]*>)
That will isolate the substrings before and after your quotation mark. Note that this does not attempt to match opening and closing quotation marks. It simply matches a quotation mark within a tag.
这将隔离引号前后的子串。请注意,这不会尝试匹配开始和结束引号。它只是匹配标记中的引号。