使用Python中的Regex提取浮点值

This is my string and I'm working on Python

这是我的字符串，我正在研究Python

Memoria RAM - 1.5GB 
Memoria RAM - 1 GB

This is the regex that I use to extract the value

这是我用来提取值的正则表达式

(\d{1,4})((,|.)(\d{1,2})){0,1}

The result is:

结果是：

MATCH 1 --> 1.5.5 
MATCH 2 --> 1

Of course only the second one is correct. The excepted output is:

当然只有第二个是正确的。例外输出是：

MATCH 1 --> 1.5
MATCH 2 --> 1

Why my regex catch another ".5" ?? How can I fix my regex?

为什么我的正则表达式会抓住另一个“.5”？我怎样才能修复我的正则表达式？

3 个解决方案

#1

I've tried this example and it works (when using group(0)):

我已经尝试过这个例子并且它有效（当使用group（0）时）：

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> result = re.search('(\d{1,4})((,|.)(\d{1,2})){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'

However if you check groups() you'll get:

但是，如果你检查组（），你会得到：

>>> result.groups()
('1', '.5', '.', '5')

Why?

为什么？

You're capturing:

你正在捕捉：

1) The "1" ((\d{1,4}));

1）“1”（（\ d {1,4}））;

2) The "." or "," ((,|.), and btw should be (,|\.) because "." - matches any character except a newline see more here so you should use \.);

2）“。”或者“，”（（，|。）和btw应该是（，| \。），因为“。” - 匹配除换行符之外的任何字符在这里看到更多，所以你应该使用\。）;

3) The "5" ((\d{1,2});

3）“5”（（\ d {1,2}）;

4) The.5 (When you use parenthesis around poins 2 and 3 ((,|.)(\d{1,2})));

4）The.5（在poins 2和3周围使用括号（（，|。）（\ d {1,2}）））;

So you should remove the parenthesis in point 4, like this:

所以你应该删除第4点中的括号，如下所示：

>>> result = re.search('(\d{1,4})(,|\.)(\d{1,2}){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
>>> result.groups()
('1', '.', '5')

#2

If you need to only capture each part of the integer/decimal number the way you do with your regex, just make sure the decimal part is optional and use a non-capturing group:

如果您只需要像使用正则表达式一样捕获整数/十进制数的每个部分，只需确保小数部分是可选的并使用非捕获组：

(\d{1,4})(?:([,.])(\d{1,2}))?

See demo. I also replaced the (,|.) with [,.] since I guess your intention was to match either a comma or a dot, not a comma or any character but a newline.

见演示。我也用[，。]替换了（，|。）因为我猜你的意图是匹配逗号或点，而不是逗号或任何字符而是换行符。

IDEONE demo:

IDEONE演示：

import re
p = re.compile(r'(\d{1,4})(?:([,.])(\d{1,2}))?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print ["".join(x) for x in re.findall(p, test_str)]

Alternatively, you can just use a regex to match the numbers:

或者，您可以使用正则表达式匹配数字：

\d+(?:\.\d+)?

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字，请使用前瞻：

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo.

请参阅正则表达式演示。

Her is an IDEONE demo:

她是一个IDEONE演示：

import re
p = re.compile(r'\d+(?:\.\d+)?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print (p.findall(test_str))
# => ['1.5', '1']

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字，请使用前瞻：

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo

请参阅正则表达式演示

#3

result = re.findall(r'(?<!\S)\d\.\d+|(?<!\S)\d',st)

(?<!\S) - not preceded by non-space

print(result)

['1.5', '1']

#1

I've tried this example and it works (when using group(0)):

我已经尝试过这个例子并且它有效（当使用group（0）时）：

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> result = re.search('(\d{1,4})((,|.)(\d{1,2})){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'

However if you check groups() you'll get:

但是，如果你检查组（），你会得到：

>>> result.groups()
('1', '.5', '.', '5')

Why?

为什么？

You're capturing:

你正在捕捉：

1) The "1" ((\d{1,4}));

1）“1”（（\ d {1,4}））;

2) The "." or "," ((,|.), and btw should be (,|\.) because "." - matches any character except a newline see more here so you should use \.);

2）“。”或者“，”（（，|。）和btw应该是（，| \。），因为“。” - 匹配除换行符之外的任何字符在这里看到更多，所以你应该使用\。）;

3) The "5" ((\d{1,2});

3）“5”（（\ d {1,2}）;

4) The.5 (When you use parenthesis around poins 2 and 3 ((,|.)(\d{1,2})));

4）The.5（在poins 2和3周围使用括号（（，|。）（\ d {1,2}）））;

So you should remove the parenthesis in point 4, like this:

所以你应该删除第4点中的括号，如下所示：

>>> result = re.search('(\d{1,4})(,|\.)(\d{1,2}){0,1}', 'Memoria RAM - 1.5GB')
>>> result.group(0)
'1.5'
>>> result.groups()
('1', '.', '5')

#2

If you need to only capture each part of the integer/decimal number the way you do with your regex, just make sure the decimal part is optional and use a non-capturing group:

如果您只需要像使用正则表达式一样捕获整数/十进制数的每个部分，只需确保小数部分是可选的并使用非捕获组：

(\d{1,4})(?:([,.])(\d{1,2}))?

See demo. I also replaced the (,|.) with [,.] since I guess your intention was to match either a comma or a dot, not a comma or any character but a newline.

见演示。我也用[，。]替换了（，|。）因为我猜你的意图是匹配逗号或点，而不是逗号或任何字符而是换行符。

IDEONE demo:

IDEONE演示：

import re
p = re.compile(r'(\d{1,4})(?:([,.])(\d{1,2}))?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print ["".join(x) for x in re.findall(p, test_str)]

Alternatively, you can just use a regex to match the numbers:

或者，您可以使用正则表达式匹配数字：

\d+(?:\.\d+)?

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字，请使用前瞻：

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo.

请参阅正则表达式演示。

Her is an IDEONE demo:

她是一个IDEONE演示：

import re
p = re.compile(r'\d+(?:\.\d+)?')
test_str = "Memoria RAM - 1.5GB \nMemoria RAM - 1 GB"
print (p.findall(test_str))
# => ['1.5', '1']

If you need to match the numbers only before GB, use a look-ahead:

如果您需要仅在GB之前匹配数字，请使用前瞻：

\d+(?:\.\d+)?(?=\s*GB)

See the regex demo

请参阅正则表达式演示

#3

result = re.findall(r'(?<!\S)\d\.\d+|(?<!\S)\d',st)

(?<!\S) - not preceded by non-space

print(result)

['1.5', '1']

秒客网

使用Python中的Regex提取浮点值

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章