This is my string and I'm working on Python
这是我的字符串,我正在研究Python
Memoria RAM - 1.5GB
Memoria RAM - 1 GB
This is the regex that I use to extract the value
这是我用来提取值的正则表达式
(\d{1,4})((,|.)(\d{1,2})){0,1}
The result is:
结果是:
MATCH 1 --> 1.5.5
MATCH 2 --> 1
Of course only the second one is correct. The excepted output is:
当然只有第二个是正确的。例外输出是:
MATCH 1 --> 1.5
MATCH 2 --> 1
Why my regex catch another ".5" ?? How can I fix my regex?
为什么我的正则表达式会抓住另一个“.5”?我怎样才能修复我的正则表达式?
3 个解决方案
#1
1
I've tried this example and it works (when using group(0)
):
我已经尝试过这个例子并且它有效(当使用group(0)时):
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> result = re.search('(\d{1,4})((,|.)(\d{1,2})){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
However if you check groups()
you'll get:
但是,如果你检查组(),你会得到:
>>> result.groups()
('1', '.5', '.', '5')
Why?
为什么?
You're capturing:
你正在捕捉:
1) The "1" ((\d{1,4})
);
1)“1”((\ d {1,4}));
2) The "." or "," ((,|.)
, and btw should be (,|\.)
because "." - matches any character except a newline
see more here so you should use \.
);
2)“。”或者“,”((,|。)和btw应该是(,| \。),因为“。” - 匹配除换行符之外的任何字符在这里看到更多,所以你应该使用\。);
3) The "5" ((\d{1,2}
);
3)“5”((\ d {1,2});
4) The.5 (When you use parenthesis around poins 2 and 3 ((,|.)(\d{1,2}))
);
4)The.5(在poins 2和3周围使用括号((,|。)(\ d {1,2})));
So you should remove the parenthesis in point 4, like this:
所以你应该删除第4点中的括号,如下所示:
>>> result = re.search('(\d{1,4})(,|\.)(\d{1,2}){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
>>> result.groups()
('1', '.', '5')
#2
0
If you need to only capture each part of the integer/decimal number the way you do with your regex, just make sure the decimal part is optional and use a non-capturing group:
如果您只需要像使用正则表达式一样捕获整数/十进制数的每个部分,只需确保小数部分是可选的并使用非捕获组:
(\d{1,4})(?:([,.])(\d{1,2}))?
See demo. I also replaced the (,|.)
with [,.]
since I guess your intention was to match either a comma or a dot, not a comma or any character but a newline.
见演示。我也用[,。]替换了(,|。)因为我猜你的意图是匹配逗号或点,而不是逗号或任何字符而是换行符。
IDEONE演示:
import re
p = re.compile(r'(\d{1,4})(?:([,.])(\d{1,2}))?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print ["".join(x) for x in re.findall(p, test_str)]
Alternatively, you can just use a regex to match the numbers:
或者,您可以使用正则表达式匹配数字:
\d+(?:\.\d+)?
If you need to match the numbers only before GB, use a look-ahead:
如果您需要仅在GB之前匹配数字,请使用前瞻:
\d+(?:\.\d+)?(?=\s*GB)
See the regex demo.
请参阅正则表达式演示。
Her is an IDEONE demo:
她是一个IDEONE演示:
import re
p = re.compile(r'\d+(?:\.\d+)?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print (p.findall(test_str))
# => ['1.5', '1']
If you need to match the numbers only before GB, use a look-ahead:
如果您需要仅在GB之前匹配数字,请使用前瞻:
\d+(?:\.\d+)?(?=\s*GB)
See the regex demo
请参阅正则表达式演示
#3
0
result = re.findall(r'(?<!\S)\d\.\d+|(?<!\S)\d',st)
(?<!\S) - not preceded by non-space
print(result)
['1.5', '1']
#1
1
I've tried this example and it works (when using group(0)
):
我已经尝试过这个例子并且它有效(当使用group(0)时):
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> result = re.search('(\d{1,4})((,|.)(\d{1,2})){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
However if you check groups()
you'll get:
但是,如果你检查组(),你会得到:
>>> result.groups()
('1', '.5', '.', '5')
Why?
为什么?
You're capturing:
你正在捕捉:
1) The "1" ((\d{1,4})
);
1)“1”((\ d {1,4}));
2) The "." or "," ((,|.)
, and btw should be (,|\.)
because "." - matches any character except a newline
see more here so you should use \.
);
2)“。”或者“,”((,|。)和btw应该是(,| \。),因为“。” - 匹配除换行符之外的任何字符在这里看到更多,所以你应该使用\。);
3) The "5" ((\d{1,2}
);
3)“5”((\ d {1,2});
4) The.5 (When you use parenthesis around poins 2 and 3 ((,|.)(\d{1,2}))
);
4)The.5(在poins 2和3周围使用括号((,|。)(\ d {1,2})));
So you should remove the parenthesis in point 4, like this:
所以你应该删除第4点中的括号,如下所示:
>>> result = re.search('(\d{1,4})(,|\.)(\d{1,2}){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
>>> result.groups()
('1', '.', '5')
#2
0
If you need to only capture each part of the integer/decimal number the way you do with your regex, just make sure the decimal part is optional and use a non-capturing group:
如果您只需要像使用正则表达式一样捕获整数/十进制数的每个部分,只需确保小数部分是可选的并使用非捕获组:
(\d{1,4})(?:([,.])(\d{1,2}))?
See demo. I also replaced the (,|.)
with [,.]
since I guess your intention was to match either a comma or a dot, not a comma or any character but a newline.
见演示。我也用[,。]替换了(,|。)因为我猜你的意图是匹配逗号或点,而不是逗号或任何字符而是换行符。
IDEONE演示:
import re
p = re.compile(r'(\d{1,4})(?:([,.])(\d{1,2}))?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print ["".join(x) for x in re.findall(p, test_str)]
Alternatively, you can just use a regex to match the numbers:
或者,您可以使用正则表达式匹配数字:
\d+(?:\.\d+)?
If you need to match the numbers only before GB, use a look-ahead:
如果您需要仅在GB之前匹配数字,请使用前瞻:
\d+(?:\.\d+)?(?=\s*GB)
See the regex demo.
请参阅正则表达式演示。
Her is an IDEONE demo:
她是一个IDEONE演示:
import re
p = re.compile(r'\d+(?:\.\d+)?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print (p.findall(test_str))
# => ['1.5', '1']
If you need to match the numbers only before GB, use a look-ahead:
如果您需要仅在GB之前匹配数字,请使用前瞻:
\d+(?:\.\d+)?(?=\s*GB)
See the regex demo
请参阅正则表达式演示
#3
0
result = re.findall(r'(?<!\S)\d\.\d+|(?<!\S)\d',st)
(?<!\S) - not preceded by non-space
print(result)
['1.5', '1']