无法弄清楚如何使用正则表达式获取HTML标记中包含的模式[重复]

This question already has an answer here:

这个问题在这里已有答案：

I just started learning about Regexes and can't figure out how to lift Gizmo from the HTML tag

我刚开始学习Regexes，无法弄清楚如何从HTML标签中提升Gizmo

<meta content="Gizmo" property="og:title" />

I'm stuck at the (?<Name>meta content=), which is basically nothing, but I don't know what to do from there.

我坚持（？ meta content =），这基本上没什么，但我不知道该怎么做。

1 个解决方案

It's well known you shouldn't use regex to parse html (actually, it's been said millon times), you should use a html parser instead.

众所周知，你不应该使用正则表达式来解析html（实际上，它已经被称为millon次），你应该使用html解析器。

On the other hand, if you want to use regex for this... you are pretty close, you have to use:

另一方面，如果你想使用正则表达式...你很接近，你必须使用：

(?<Name>meta content=".*?")

Btw, if you want to grab the word Gizmo you have to use capturing groups also withing your group Name

顺便说一句，如果你想获取Gizmo这个词，你必须使用捕获组以及你的组名

(?<Name>meta content="(.*?)")

工作演示

On the other hand, if you don't care about capturing meta content and you just want to capture the content within content, you can use use:

另一方面，如果您不关心捕获元内容并且您只想捕获内容中的内容，则可以使用：

content="(?<Name>.*?)"

工作演示2