删除具有任何数字子字符串的列行

I notice that when an element of a column from a Pandas DataFrame has numeric substrings, the method isnumeric returns false.

我注意到当Pandas DataFrame中的列的元素具有数字子字符串时,方法isnumeric返回false。

For example:

row 1, column 1 has the following: 0002 0003 1289
row 2, column 1 has the following: 89060 324 123431132
row 3, column 1 has the following: 890GB 32A 34311TT
row 4, column 1 has the following: 82A 34311TT
row 4, column 1 has the following: 82A 34311TT 889 9999C

Clearly, the rows 1 and 2 are all numbers, but isnumeric returns false for rows 1 and 2.

显然,第1行和第2行都是数字,但对于第1行和第2行,isnumeric返回false。

I found a work-around the involves separating each substring into their own columns and then creating a boolean column for each to add the booleans together to reveal whether a row is all numeric or not. This, however, is tedious and my function doesn't look tidy. I also to not want to strip and replace the whitespace (to squeeze all the substrings into just one number) because I need to preserve the original substrings.

我发现了一个解决方法,即将每个子字符串分成它们自己的列,然后为每个子字符串创建一个布尔列,将booleans添加到一起以显示行是否全部是数字。然而,这是乏味的,我的功能看起来并不整洁。我也不想剥离和替换空格(将所有子字符串压缩成一个数字),因为我需要保留原始子字符串。

Does anyone know of a simpler solution/technique that will correctly tell me that these elements with one or more numeric sub strings is all numeric? My ultimate goal is to delete these numeric-only rows.

有没有人知道一个更简单的解决方案/技术会正确告诉我这些带有一个或多个数字子字符串的元素都是数字的?我的最终目标是删除这些仅限数字的行。

2 个解决方案

#1

I think need list comprehension with split with all for check all numeric strings:

我认为需要列表理解与拆分全部用于检查所有数字字符串:

mask = ~df['a'].apply(lambda x: all([s.isnumeric() for s in x.split()]))

mask = [not all([s.isnumeric() for s in x.split()]) for x in df['a']]

If want check if at least one numeric string use any:

如果要检查是否至少有一个数字字符串使用any:

mask = ~df['a'].apply(lambda x: any([s.isnumeric() for s in x.split()]))

mask = [not any([s.isnumeric() for s in x.split()]) for x in df['a']]

#2

Here is one way using pd.Series.map, any with a generator expression, str.isdecimal and str.split.

这是使用pd.Series.map的一种方法,任何使用生成器表达式,str.isdecimal和str.split。

import pandas as pd

df = pd.DataFrame({'col1': ['0002 0003 1289', '89060 324 123431132', '890GB 32A 34311TT',
                            '82A 34311TT', '82A 34311TT 889 9999C']})

df['numeric'] = df['col1'].map(lambda x: any(i.isdecimal() for i in x.split()))

Note that isdecimal is more strict than isdigit. But you may need to use str.isdigit or str.isnumeric in Python 2.7.

请注意,isdecimal比isdigit更严格。但是您可能需要在Python 2.7中使用str.isdigit或str.isnumeric。

To remove such rows where result is False:

要删除结果为False的行:

df = df[df['col1'].map(lambda x: any(i.isdecimal() for i in x.split()))]

Result

First part of logic:

逻辑的第一部分:

                    col1 numeric
0         0002 0003 1289    True
1    89060 324 123431132    True
2      890GB 32A 34311TT   False
3            82A 34311TT   False
4  82A 34311TT 889 9999C    True

#1