I have a table of over 10,000 rows and over 400 columns. For columns containing at least the string 'xyz', I need to find the max value of each row (within these 'xyz' columns), and create 2 new columns.
我有一个超过10,000行和超过400列的表。对于至少包含字符串'xyz'的列,我需要找到每行的最大值(在这些'xyz'列中),并创建2个新列。
The 1st new column would contain the max value of each row of these 'xyz' columns.
第一个新列将包含这些'xyz'列的每一行的最大值。
The 2nd new column would contain the column name from which the max value was retrieved. I'm stuck at creating the 2nd column. I've tried some stuff which doesn't work like;
第二个新列将包含从中检索最大值的列名称。我坚持创建第二列。我尝试了一些不起作用的东西;
Match = df[CompCol].isin[SpecList].all(axis=1)
How should approach the 2nd column?
如何接近第二列?
2 个解决方案
#1
Does this work for you?
这对你有用吗?
import pandas as pd
df = pd.DataFrame([(1,2,3,4),(2,1,1,4)], columns = ['xyz1','xyz2','xyz3','abc'])
cols = [k for k in df.columns if 'xyz' in k]
df['maxval'] = df[cols].apply(lambda s: max(zip(s, s.keys()))[0],1)
df['maxcol'] = df[cols].apply(lambda s: max(zip(s, s.keys()))[1],1)
df
Out[753]:
xyz1 xyz2 xyz3 abc maxval maxcol
0 1 2 3 4 3 xyz3
1 2 1 1 4 2 xyz1
#2
another way using 'regex' and 'idmax.
使用'正则表达式'和'idmax的另一种方式。
df = pd.DataFrame({'xyz1': [10, 20, 30, 40], 'xyz2': [11, 12,13,14],'xyz3':[1,2,3,44],'abc':[100,101,102,103]})
df['maxval']= df.filter(regex='xyz').apply(max, axis=1)
df['maxval_col'] = df.filter(regex='xyz').idxmax(axis=1)
abc xyz1 xyz2 xyz3 maxval maxval_col
100 10 11 1 11 xyz2
101 20 12 2 20 xyz1
102 30 13 3 30 xyz1
103 40 14 44 44 xyz3
#1
Does this work for you?
这对你有用吗?
import pandas as pd
df = pd.DataFrame([(1,2,3,4),(2,1,1,4)], columns = ['xyz1','xyz2','xyz3','abc'])
cols = [k for k in df.columns if 'xyz' in k]
df['maxval'] = df[cols].apply(lambda s: max(zip(s, s.keys()))[0],1)
df['maxcol'] = df[cols].apply(lambda s: max(zip(s, s.keys()))[1],1)
df
Out[753]:
xyz1 xyz2 xyz3 abc maxval maxcol
0 1 2 3 4 3 xyz3
1 2 1 1 4 2 xyz1
#2
another way using 'regex' and 'idmax.
使用'正则表达式'和'idmax的另一种方式。
df = pd.DataFrame({'xyz1': [10, 20, 30, 40], 'xyz2': [11, 12,13,14],'xyz3':[1,2,3,44],'abc':[100,101,102,103]})
df['maxval']= df.filter(regex='xyz').apply(max, axis=1)
df['maxval_col'] = df.filter(regex='xyz').idxmax(axis=1)
abc xyz1 xyz2 xyz3 maxval maxval_col
100 10 11 1 11 xyz2
101 20 12 2 20 xyz1
102 30 13 3 30 xyz1
103 40 14 44 44 xyz3