如何使用RegEx匹配C#中的字符串列表?

时间:2022-09-02 10:09:44

I need to find all the regex matches from a list of strings. For example, I need to be able to take the string "This foo is a foobar" and match any instances of either "foo" or "bar". What would the correct pattern be for this? Also, what input sanitation would I need to do to prevent the inputted text from breaking the pattern?

我需要从字符串列表中找到所有正则表达式匹配项。例如,我需要能够获取字符串“This foo is foobar”并匹配“foo”或“bar”的任何实例。这个正确的模式是什么?另外,我需要做什么输入卫生来防止输入的文本破坏模式?

1 个解决方案

#1


I'm a little unsure of what your actual question is. To match "foo" or "bar", you'd simply want "foo|bar" for your pattern. If you want to do this to a list of strings, you'd likely want to check each string individually—you could join the strings first and check that, but I'm not sure this would be of much use. If you want to get the exact text that matched your pattern, you should surround the pattern in parentheses—such as "([fg]oo|[bt]ar)", which would match "foo", "goo", "bar", or "tar"—then use the Groups property of the Match object to retrieve these captures, so you can determine exactly which word matched. Groups[1] is the first captured value (that is, the value in the first set of parentheses in your pattern). Groups[0] is the entire match. You can also name your captures—"(?<word>[fg]oo|[bt]ar)"—and refer to them by name—Groups["word"]. I would recommend reading through the documentation on regular expression language elements.

我不确定你的实际问题是什么。要匹配“foo”或“bar”,你只需要“foo | bar”作为你的模式。如果你想对字符串列表这样做,你可能想要单独检查每个字符串 - 你可以先加入字符串并检查它,但我不确定这会有多大用处。如果你想获得与你的模式匹配的确切文本,你应该在括号中包围模式 - 例如“([fg] oo | [bt] ar)”,这将匹配“foo”,“goo”,“bar “或”tar“ - 然后使用Match对象的Groups属性来检索这些捕获,这样您就可以确切地确定匹配的单词。组[1]是第一个捕获的值(即模式中第一组括号中的值)。组[0]是整场比赛。您也可以命名您的捕获 - “(? [fg] oo | [bt] ar)” - 并按名称 - 组[“word”]引用它们。我建议阅读正则表达式语言元素的文档。

As for sanitizing the input, there is no input that will "break" the regex. It might prevent a match, but that's really kinda what regexes are all about, isn't it?

至于清理输入,没有输入会“破坏”正则表达式。它可能会阻止匹配,但这真的有点像正则表达式,不是吗?

#1


I'm a little unsure of what your actual question is. To match "foo" or "bar", you'd simply want "foo|bar" for your pattern. If you want to do this to a list of strings, you'd likely want to check each string individually—you could join the strings first and check that, but I'm not sure this would be of much use. If you want to get the exact text that matched your pattern, you should surround the pattern in parentheses—such as "([fg]oo|[bt]ar)", which would match "foo", "goo", "bar", or "tar"—then use the Groups property of the Match object to retrieve these captures, so you can determine exactly which word matched. Groups[1] is the first captured value (that is, the value in the first set of parentheses in your pattern). Groups[0] is the entire match. You can also name your captures—"(?<word>[fg]oo|[bt]ar)"—and refer to them by name—Groups["word"]. I would recommend reading through the documentation on regular expression language elements.

我不确定你的实际问题是什么。要匹配“foo”或“bar”,你只需要“foo | bar”作为你的模式。如果你想对字符串列表这样做,你可能想要单独检查每个字符串 - 你可以先加入字符串并检查它,但我不确定这会有多大用处。如果你想获得与你的模式匹配的确切文本,你应该在括号中包围模式 - 例如“([fg] oo | [bt] ar)”,这将匹配“foo”,“goo”,“bar “或”tar“ - 然后使用Match对象的Groups属性来检索这些捕获,这样您就可以确切地确定匹配的单词。组[1]是第一个捕获的值(即模式中第一组括号中的值)。组[0]是整场比赛。您也可以命名您的捕获 - “(? [fg] oo | [bt] ar)” - 并按名称 - 组[“word”]引用它们。我建议阅读正则表达式语言元素的文档。

As for sanitizing the input, there is no input that will "break" the regex. It might prevent a match, but that's really kinda what regexes are all about, isn't it?

至于清理输入,没有输入会“破坏”正则表达式。它可能会阻止匹配,但这真的有点像正则表达式,不是吗?