Trying to use a wildcard in C# to grab information from a webpage source, but I cannot seem to figure out what to use as the wildcard character. Nothing I've tried works!
试图在C#中使用通配符来从网页源获取信息,但我似乎无法弄清楚要用什么作为通配符。我没有尝试过任何作品!
The wildcard only needs to allow for numbers, but as the page is generated the same every time, I may as well allow for any characters.
通配符只需要允许数字,但由于页面每次都生成相同,我也可以允许任何字符。
Regex statement in use:
正在使用的正则表达式声明:
Regex guestbookWidgetIDregex = new Regex("GuestbookWidget(' INSERT WILDCARD HERE ', '(.*?)', 500);", RegexOptions.IgnoreCase);
If anyone can figure out what I'm doing wrong, it would be greatly appreciated!
如果有人能弄清楚我做错了什么,我将不胜感激!
2 个解决方案
#1
10
The wildcard character is .
.
To match any number of arbitrary characters, use .*
(which means zero or more .
) or .+
(which means one or more .
)
通配符是..要匹配任意数量的任意字符,请使用。*(表示零或更多。)或。+(表示一个或多个。)
Note that you need to escape your parentheses as \\(
and \\)
. (or \(
and \)
in an @""
string)
请注意,您需要将括号转义为\\(和\\)。 (或@(和\)在@“”字符串中)
#2
7
On the dot
In regular expression, the dot .
matches almost any character. The only characters it doesn't normally match are the newline characters. For the dot to match all characters, you must enable what is called the single line mode (aka "dot all").
在正则表达式中,点。匹配几乎任何角色。它通常不匹配的唯一字符是换行符。要使点匹配所有字符,必须启用所谓的单行模式(又名“全点”)。
In C#, this is specified using RegexOptions.Singleline
. You can also embed this as (?s)
in the pattern.
在C#中,这是使用RegexOptions.Singleline指定的。您也可以将其作为(?s)嵌入模式中。
References
- regular-expressions.info/The Dot Matches (Almost) Any Character
regular-expressions.info/点匹配(几乎)任何角色
On metacharacters and escaping
The .
isn't the only regex metacharacters. They are:
这个。不是唯一的正则表达式元字符。他们是:
( ) { } [ ] ? * + - ^ $ . | \
Depending on where they appear, if you want these characters to mean literally (e.g. .
as a period), you may need to do what is called "escaping". This is done by preceding the character with a \
.
根据它们出现的位置,如果您希望这些字符按字面意思(例如,作为句点),您可能需要执行所谓的“转义”。这是通过在字符前加上\来完成的。
Of course, a \
is also an escape character for C# string literals. To get a literal \
, you need to double it in your string literal (i.e. "\\"
is a string of length one). Alternatively, C# also has what is called @
-quoted string literals, where escape sequences are not processed. Thus, the following two strings are equal:
当然,\也是C#字符串文字的转义字符。要获得文字\,您需要在字符串文字中加倍(即“\\”是长度为1的字符串)。或者,C#也有所谓的@ -quoted字符串文字,其中不处理转义序列。因此,以下两个字符串是相等的:
"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"
Since \
is used a lot in regular expression, @
-quoting is often used to avoid excessive doubling.
由于\在正则表达式中使用了很多,因此@ -quoting通常用于避免过度加倍。
References
- regular-expressions.info/Metacharacters
- MSDN - C# Programmer's Reference -
string
MSDN - C#程序员参考 - 字符串
On character classes
Regular expression engines allow you to define character classes, e.g. [aeiou]
is a character class containing the 5 vowel letters. You can also use -
metacharacter to define a range, e.g. [0-9]
is a character classes containing all 10 digit characters.
正则表达式引擎允许您定义字符类,例如[aeiou]是一个包含5个元音字母的字符类。您还可以使用 - 元字符来定义范围,例如[0-9]是包含所有10位数字符的字符类。
Since digit characters are so frequently used, regex also provides a shorthand notation for it, which is \d
. In C#, this will also match decimal digits from other Unicode character sets, unless you're using RegexOptions.ECMAScript
where it's strictly just [0-9]
.
由于数字字符经常被使用,因此正则表达式也为它提供了简写符号,即\ d。在C#中,这也将匹配来自其他Unicode字符集的十进制数字,除非你使用的是RegexOptions.ECMAScript,它只是[0-9]。
References
- regular-expressions.info/Character Classes
- MSDN - Character Classes - Decimal Digit Character
MSDN - 字符类 - 十进制数字字符
Related questions
- .NET regex: What is the word character
\w
.NET正则表达式:字符是什么\ w
Putting it all together
It looks like the following will work for you:
看起来以下内容对您有用:
@-quoting digits_ _____anything but ', captured
| / \ / \
new Regex(@"GuestbookWidget\('\d*', '([^']*)', 500\);", RegexOptions.IgnoreCase);
\/ \/
escape ( escape )
Note that I've modified the pattern slightly so that it uses negated character class instead of reluctance wildcard matching. This causes a slight difference in behavior if you allow '
to be escaped in your input string, but neither pattern handle this case perfectly. If you're not allowing '
to be escaped, however, this pattern is definitely better.
请注意,我稍微修改了模式,因此它使用了否定字符类而不是磁阻通配符匹配。如果允许'在输入字符串中进行转义,这会导致行为略有不同,但这两种模式都不能完美地处理这种情况。但是,如果你不允许'逃脱',这种模式肯定会更好。
References
- regular-expressions.info/An Alternative to Laziness and Capturing Groups
regular-expressions.info/懒惰和捕获群体的替代方案
#1
10
The wildcard character is .
.
To match any number of arbitrary characters, use .*
(which means zero or more .
) or .+
(which means one or more .
)
通配符是..要匹配任意数量的任意字符,请使用。*(表示零或更多。)或。+(表示一个或多个。)
Note that you need to escape your parentheses as \\(
and \\)
. (or \(
and \)
in an @""
string)
请注意,您需要将括号转义为\\(和\\)。 (或@(和\)在@“”字符串中)
#2
7
On the dot
In regular expression, the dot .
matches almost any character. The only characters it doesn't normally match are the newline characters. For the dot to match all characters, you must enable what is called the single line mode (aka "dot all").
在正则表达式中,点。匹配几乎任何角色。它通常不匹配的唯一字符是换行符。要使点匹配所有字符,必须启用所谓的单行模式(又名“全点”)。
In C#, this is specified using RegexOptions.Singleline
. You can also embed this as (?s)
in the pattern.
在C#中,这是使用RegexOptions.Singleline指定的。您也可以将其作为(?s)嵌入模式中。
References
- regular-expressions.info/The Dot Matches (Almost) Any Character
regular-expressions.info/点匹配(几乎)任何角色
On metacharacters and escaping
The .
isn't the only regex metacharacters. They are:
这个。不是唯一的正则表达式元字符。他们是:
( ) { } [ ] ? * + - ^ $ . | \
Depending on where they appear, if you want these characters to mean literally (e.g. .
as a period), you may need to do what is called "escaping". This is done by preceding the character with a \
.
根据它们出现的位置,如果您希望这些字符按字面意思(例如,作为句点),您可能需要执行所谓的“转义”。这是通过在字符前加上\来完成的。
Of course, a \
is also an escape character for C# string literals. To get a literal \
, you need to double it in your string literal (i.e. "\\"
is a string of length one). Alternatively, C# also has what is called @
-quoted string literals, where escape sequences are not processed. Thus, the following two strings are equal:
当然,\也是C#字符串文字的转义字符。要获得文字\,您需要在字符串文字中加倍(即“\\”是长度为1的字符串)。或者,C#也有所谓的@ -quoted字符串文字,其中不处理转义序列。因此,以下两个字符串是相等的:
"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"
Since \
is used a lot in regular expression, @
-quoting is often used to avoid excessive doubling.
由于\在正则表达式中使用了很多,因此@ -quoting通常用于避免过度加倍。
References
- regular-expressions.info/Metacharacters
- MSDN - C# Programmer's Reference -
string
MSDN - C#程序员参考 - 字符串
On character classes
Regular expression engines allow you to define character classes, e.g. [aeiou]
is a character class containing the 5 vowel letters. You can also use -
metacharacter to define a range, e.g. [0-9]
is a character classes containing all 10 digit characters.
正则表达式引擎允许您定义字符类,例如[aeiou]是一个包含5个元音字母的字符类。您还可以使用 - 元字符来定义范围,例如[0-9]是包含所有10位数字符的字符类。
Since digit characters are so frequently used, regex also provides a shorthand notation for it, which is \d
. In C#, this will also match decimal digits from other Unicode character sets, unless you're using RegexOptions.ECMAScript
where it's strictly just [0-9]
.
由于数字字符经常被使用,因此正则表达式也为它提供了简写符号,即\ d。在C#中,这也将匹配来自其他Unicode字符集的十进制数字,除非你使用的是RegexOptions.ECMAScript,它只是[0-9]。
References
- regular-expressions.info/Character Classes
- MSDN - Character Classes - Decimal Digit Character
MSDN - 字符类 - 十进制数字字符
Related questions
- .NET regex: What is the word character
\w
.NET正则表达式:字符是什么\ w
Putting it all together
It looks like the following will work for you:
看起来以下内容对您有用:
@-quoting digits_ _____anything but ', captured
| / \ / \
new Regex(@"GuestbookWidget\('\d*', '([^']*)', 500\);", RegexOptions.IgnoreCase);
\/ \/
escape ( escape )
Note that I've modified the pattern slightly so that it uses negated character class instead of reluctance wildcard matching. This causes a slight difference in behavior if you allow '
to be escaped in your input string, but neither pattern handle this case perfectly. If you're not allowing '
to be escaped, however, this pattern is definitely better.
请注意,我稍微修改了模式,因此它使用了否定字符类而不是磁阻通配符匹配。如果允许'在输入字符串中进行转义,这会导致行为略有不同,但这两种模式都不能完美地处理这种情况。但是,如果你不允许'逃脱',这种模式肯定会更好。
References
- regular-expressions.info/An Alternative to Laziness and Capturing Groups
regular-expressions.info/懒惰和捕获群体的替代方案