如果存在特定字符集,编写正则表达式以跳过一行?

时间:2022-09-28 23:14:13

I am trying to write a regex in python to parse a file having contents like this :-

我试图在python中编写一个正则表达式来解析一个包含这样内容的文件: -

static const PropertyID PROPERTY_X = 10225;
//static const PropertyID PROPERTY_Y = 10226;
   //static const PropertyID PROPERTY_Z = 10227;

I want to extract the property name and number for only non commented properties. This is the expression I wrote

我想仅提取非注释属性的属性名称和编号。这是我写的表达方式

tuples = re.findall(r"[^/]*static[ \t]*const[ \t]*PropertyID[ \t]*(\w+)[ \t]*=[ \t]*(\d+).*",fileContents)

where fileContents has the data of file as string.

其中fileContents将文件数据作为字符串。

But this regex is even matching the commented(lines with //) lines. How to make it avoid matching the commented lines.

但是这个正则表达式甚至匹配注释(行与//)行。如何避免匹配注释行。

3 个解决方案

#1


1  

You could specify that, after the start of the line, you only want spaces before the first static:

您可以指定在行开始之后,您只需要在第一个静态之前的空格:

tuples = re.findall(r"^\s*static[ \t]*const[ \t]*PropertyID[ \t]*(\w+)[ \t]*=[ \t]*(\d+).*",fileContents)

#2


2  

Try:

尝试:

r"(?m)^(?!//)static\s+const\s+PropertyID\s+(\S+)\s+=\s+(\d+);"

A couple notes.

几个笔记。

^ matches beginning of line

^匹配行的开头

(?!//) is a negative lookahead, asserting that it is NOT followed by //

(?!//)是一个负向前瞻,声称它后面没有//

\s is any space character

\ s是任何空格字符

\S is any non-space character

\ S是任何非空格字符

#3


0  

If you're parsing C code, you can use something like pycparser. Regular expressions aren't suited (or possible) to parse any programming language.

如果你正在解析C代码,你可以使用像pycparser这样的东西。正则表达式不适合(或可能)解析任何编程语言。

Alternatively, I think this code is simpler for what you're doing:

或者,我认为这段代码对您正在做的事情更简单:

import re
string = "   //static const PropertyID PROPERTY_Z = 10227;"
results = re.split("\s*",string)
#results = ['//static', 'const', 'PropertyID', 'PROPERTY_Z', '=', '10227;']

if results[0].startswith("\\") or results[0].startswith("/*"):
    pass

#1


1  

You could specify that, after the start of the line, you only want spaces before the first static:

您可以指定在行开始之后,您只需要在第一个静态之前的空格:

tuples = re.findall(r"^\s*static[ \t]*const[ \t]*PropertyID[ \t]*(\w+)[ \t]*=[ \t]*(\d+).*",fileContents)

#2


2  

Try:

尝试:

r"(?m)^(?!//)static\s+const\s+PropertyID\s+(\S+)\s+=\s+(\d+);"

A couple notes.

几个笔记。

^ matches beginning of line

^匹配行的开头

(?!//) is a negative lookahead, asserting that it is NOT followed by //

(?!//)是一个负向前瞻,声称它后面没有//

\s is any space character

\ s是任何空格字符

\S is any non-space character

\ S是任何非空格字符

#3


0  

If you're parsing C code, you can use something like pycparser. Regular expressions aren't suited (or possible) to parse any programming language.

如果你正在解析C代码,你可以使用像pycparser这样的东西。正则表达式不适合(或可能)解析任何编程语言。

Alternatively, I think this code is simpler for what you're doing:

或者,我认为这段代码对您正在做的事情更简单:

import re
string = "   //static const PropertyID PROPERTY_Z = 10227;"
results = re.split("\s*",string)
#results = ['//static', 'const', 'PropertyID', 'PROPERTY_Z', '=', '10227;']

if results[0].startswith("\\") or results[0].startswith("/*"):
    pass