如何在标签内查找报价?

时间:2020-12-10 22:28:28

I have a string like this:

我有一个像这样的字符串:

This <span class="highlight">is</span> a very "nice" day!

这个是 非常“美好”的一天!

What should my RegEx-pattern in VB look like, to find the quotes within the tag? I want to replace it with something...

我的VB中的RegEx模式应该是什么样的,在标签中找到引号?我想用一些东西替换它......

This <span class=^highlight^>is</span> a very "nice" day!

这个是 非常“美好”的一天!

Something like <(")[^>]+> doesn't work :(

像<(“)[^>] +>之类的东西不起作用:(

Thanks

5 个解决方案

#1


10  

It depends on your regex flavor, but this works for most of them:

这取决于你的正则表达式风格,但这适用于大多数:

"(?=[^<]*>)

EDIT: For anyone curious how this works. This translates into English as "Find a quote that is followed by a > before the next <".

编辑:对于任何好奇这是如何工作的人。这将英语翻译为“查找在下一个 <之前跟随> 的引用”。

#2


2  

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

正则表达式在解析HTML方面基本上是不好的(参见你能提供一些为什么难以用正则表达式解析XML和HTML的例子吗?)。你需要的是一个HTML解析器。请参阅您是否提供了使用您喜欢的解析器解析HTML的示例?例如,使用各种解析器。

If you are using VB.net you should be able to use HTMLAgilityPack.

如果您使用的是VB.net,则应该可以使用HTMLAgilityPack。

#3


-1  

Try this: <span class="([^"]+?)?">

试试这个:

#4


-1  

This should get your the first attribute value in a tag:

这应该是标记中的第一个属性值:

<[^">]+"(?<value>[^"]*)"[^>]*>

#5


-1  

If your intention is to replace ALL quotation marks within tags, you could use the following regular expression:

如果您打算替换标记中的所有引号,则可以使用以下正则表达式:

(<[^>"]*)(")([^>]*>)

That will isolate the substrings before and after your quotation mark. Note that this does not attempt to match opening and closing quotation marks. It simply matches a quotation mark within a tag.

这将隔离引号前后的子串。请注意,这不会尝试匹配开始和结束引号。它只是匹配标记中的引号。

#1


10  

It depends on your regex flavor, but this works for most of them:

这取决于你的正则表达式风格,但这适用于大多数:

"(?=[^<]*>)

EDIT: For anyone curious how this works. This translates into English as "Find a quote that is followed by a > before the next <".

编辑:对于任何好奇这是如何工作的人。这将英语翻译为“查找在下一个 <之前跟随> 的引用”。

#2


2  

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

正则表达式在解析HTML方面基本上是不好的(参见你能提供一些为什么难以用正则表达式解析XML和HTML的例子吗?)。你需要的是一个HTML解析器。请参阅您是否提供了使用您喜欢的解析器解析HTML的示例?例如,使用各种解析器。

If you are using VB.net you should be able to use HTMLAgilityPack.

如果您使用的是VB.net,则应该可以使用HTMLAgilityPack。

#3


-1  

Try this: <span class="([^"]+?)?">

试试这个:

#4


-1  

This should get your the first attribute value in a tag:

这应该是标记中的第一个属性值:

<[^">]+"(?<value>[^"]*)"[^>]*>

#5


-1  

If your intention is to replace ALL quotation marks within tags, you could use the following regular expression:

如果您打算替换标记中的所有引号,则可以使用以下正则表达式:

(<[^>"]*)(")([^>]*>)

That will isolate the substrings before and after your quotation mark. Note that this does not attempt to match opening and closing quotation marks. It simply matches a quotation mark within a tag.

这将隔离引号前后的子串。请注意,这不会尝试匹配开始和结束引号。它只是匹配标记中的引号。