How can I find the minimum values among multiple worksheets for each index across total worksheet
如何在整个工作表中找到每个索引的多个工作表中的最小值
suppose,
假设,
worksheet 1
index A B C
0 2 3 4.28
1 3 4 5.23
worksheet 2
index A B C
0 9 6 5.9
1 1 3 4.1
worksheet 3
index A B C
0 9 6 6.0
1 1 3 4.3
...................(Worksheet 4,Worksheet 5)...........
by comparing C column, I want an answer, where dataframe looks like
index min(c)
0 4.28
1 4.1
2 个解决方案
#1
3
from functools import reduce
reduce(np.fmin, [ws1.C, ws2.C, ws3.C])
index
0 4.28
1 4.10
Name: C, dtype: float64
This generalizes nicely with a comprehension
这很好地概括了理解
reduce(np.fmin, [w.C for w in [ws1, ws2, ws3, ws4, ws5]])
If you must insist on your column name
如果您必须坚持您的列名称
from functools import reduce
reduce(np.fmin, [ws1.C, ws2.C, ws3.C]).to_frame('min(C)')
min(C)
index
0 4.28
1 4.10
You can also use pd.concat
on a dictionary and use pd.Series.min
with the level=1
parameter
您还可以在字典上使用pd.concat,并使用pd.Series.min和level = 1参数
pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]]))).min(level=1)
# equivalently
# pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]])), axis=1).min(1)
index
0 4.28
1 4.10
Name: C, dtype: float64
Note:
注意:
dict(enumerate([w.C for w in [ws1, ws2, ws3]]))
is another way of saying
是另一种说法
{0: ws1.C, 1: ws2.C, 2: ws3.C}
#2
3
You need read_excel
with parameter sheetname=None
for OrderedDict
s from all sheetnames and then list comprehension with reduce
with numpy.fmin
:
对于所有工作表名称中的OrderedDicts,您需要带参数sheetname = None的read_excel,然后使用numpy.fmin的reduce列表理解:
dfs = pd.read_excel('file.xlsx', sheetname=None)
print (dfs)
OrderedDict([('Sheet1', A B C
0 2 3 4.28
1 3 4 5.23), ('Sheet2', A B C
0 9 6 5.9
1 1 3 4.1), ('Sheet3', A B C
0 9 6 6.0
1 1 3 4.3)])
from functools import reduce
df = reduce(np.fmin, [v['C'] for k,v in dfs.items()])
print (df)
0 4.28
1 4.10
Name: C, dtype: float64
Solution with concat
:
concat的解决方案:
df = pd.concat([v['C'] for k,v in dfs.items()],axis=1).min(axis=1)
print (df)
0 4.28
1 4.10
dtype: float64
If need define index in read_excel
:
如果需要在read_excel中定义索引:
dfs = pd.read_excel('file.xlsx', sheetname=None, index_col='index')
print (dfs)
OrderedDict([('Sheet1', A B C
index
0 2 3 4.28
1 3 4 5.23), ('Sheet2', A B C
index
0 9 6 5.9
1 1 3 4.1), ('Sheet3', A B C
index
0 9 6 6.0
1 1 3 4.3)])
df = pd.concat([v['C'] for k,v in dfs.items()], axis=1).min(axis=1)
print (df)
index
0 4.28
1 4.10
dtype: float64
#1
3
from functools import reduce
reduce(np.fmin, [ws1.C, ws2.C, ws3.C])
index
0 4.28
1 4.10
Name: C, dtype: float64
This generalizes nicely with a comprehension
这很好地概括了理解
reduce(np.fmin, [w.C for w in [ws1, ws2, ws3, ws4, ws5]])
If you must insist on your column name
如果您必须坚持您的列名称
from functools import reduce
reduce(np.fmin, [ws1.C, ws2.C, ws3.C]).to_frame('min(C)')
min(C)
index
0 4.28
1 4.10
You can also use pd.concat
on a dictionary and use pd.Series.min
with the level=1
parameter
您还可以在字典上使用pd.concat,并使用pd.Series.min和level = 1参数
pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]]))).min(level=1)
# equivalently
# pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]])), axis=1).min(1)
index
0 4.28
1 4.10
Name: C, dtype: float64
Note:
注意:
dict(enumerate([w.C for w in [ws1, ws2, ws3]]))
is another way of saying
是另一种说法
{0: ws1.C, 1: ws2.C, 2: ws3.C}
#2
3
You need read_excel
with parameter sheetname=None
for OrderedDict
s from all sheetnames and then list comprehension with reduce
with numpy.fmin
:
对于所有工作表名称中的OrderedDicts,您需要带参数sheetname = None的read_excel,然后使用numpy.fmin的reduce列表理解:
dfs = pd.read_excel('file.xlsx', sheetname=None)
print (dfs)
OrderedDict([('Sheet1', A B C
0 2 3 4.28
1 3 4 5.23), ('Sheet2', A B C
0 9 6 5.9
1 1 3 4.1), ('Sheet3', A B C
0 9 6 6.0
1 1 3 4.3)])
from functools import reduce
df = reduce(np.fmin, [v['C'] for k,v in dfs.items()])
print (df)
0 4.28
1 4.10
Name: C, dtype: float64
Solution with concat
:
concat的解决方案:
df = pd.concat([v['C'] for k,v in dfs.items()],axis=1).min(axis=1)
print (df)
0 4.28
1 4.10
dtype: float64
If need define index in read_excel
:
如果需要在read_excel中定义索引:
dfs = pd.read_excel('file.xlsx', sheetname=None, index_col='index')
print (dfs)
OrderedDict([('Sheet1', A B C
index
0 2 3 4.28
1 3 4 5.23), ('Sheet2', A B C
index
0 9 6 5.9
1 1 3 4.1), ('Sheet3', A B C
index
0 9 6 6.0
1 1 3 4.3)])
df = pd.concat([v['C'] for k,v in dfs.items()], axis=1).min(axis=1)
print (df)
index
0 4.28
1 4.10
dtype: float64