**THIS HAS TO BE A FAST CODE AS I HAVE A LOT OF DATA **
**这是一个快速的代码,因为我有很多数据**
I have a data frame which has multiple same index eg:
我有一个具有多个相同索引的数据框,例如:
Index is the following:
索引如下:
A
A
A
B
B
C
C
C
D
D
D
D
And column df['random'] has some values eg:
列df ['random']有一些值,例如:
1 2 3 4 5 6 7 8 100 101 102
Now I want to create a new column in the same dataframe which takes the sum over each unique index point and divides original value by the sum of the values for that particular index.
现在,我想在同一个数据帧中创建一个新列,该列获取每个唯一索引点的总和,并将原始值除以该特定索引的值的总和。
Eg for d['adjusted_random'] for the first entry of A I would like to get 1/6, second entry would be 2/6, third entry 3/6 and forth entry 5/11 (B sums to 11), etc...
例如,对于AI的第一个条目,d ['adjusted_random']想获得1/6,第二个条目将是2/6,第三个条目是3/6,第四个条目是5/11(B总和为11),等等。 ..
Please could somebody help
请有人帮忙
1 个解决方案
#1
1
New Answer
def argunsort(s):
n = s.size
u = np.empty(n, dtype=np.int64)
u[s] = np.arange(n)
return u
def gsum(g, v):
g, v = np.asarray(g), np.asarray(v)
n = g.size
a = g.argsort(kind='mergesort')
i = argunsort(a)
gs, vs = g[a], v[a]
lg = np.append(np.where(gs[:-1] != gs[1:])[0], n - 1)
cn = np.diff(np.append(-1, lg))
cs = vs.cumsum()
sm = np.diff(np.append(0, cs[lg]), 1)
return (v / np.repeat(sm, cn))[i]
Demonstration
示范
df.insert(1, 'adjusted_random', gsum(df.index.values, df.random.values))
df
random adjusted_random
A 1 0.166667
A 2 0.333333
A 3 0.500000
B 4 0.444444
B 5 0.555556
C 6 0.285714
C 7 0.333333
C 8 0.380952
D 100 0.330033
D 101 0.333333
D 102 0.336634
timing
定时
Old Answer
Use transform
使用转换
df.random / df.groupby(level=0).random.sum()
A 0.166667
A 0.333333
A 0.500000
B 0.444444
B 0.555556
C 0.285714
C 0.333333
C 0.380952
D 0.330033
D 0.333333
D 0.336634
Name: random, dtype: float64
Create new column
创建新列
df.assign(adjusted_random=df.random / df.groupby(level=0).random.sum())
random adjusted_random
A 1 0.166667
A 2 0.333333
A 3 0.500000
B 4 0.444444
B 5 0.555556
C 6 0.285714
C 7 0.333333
C 8 0.380952
D 100 0.330033
D 101 0.333333
D 102 0.336634
alternatives
备择方案
df.random.div(df.groupby(level=0).random.transform('sum'))
df.random.div(df.random.sum(level=0)) # @NickilMaveli
#1
1
New Answer
def argunsort(s):
n = s.size
u = np.empty(n, dtype=np.int64)
u[s] = np.arange(n)
return u
def gsum(g, v):
g, v = np.asarray(g), np.asarray(v)
n = g.size
a = g.argsort(kind='mergesort')
i = argunsort(a)
gs, vs = g[a], v[a]
lg = np.append(np.where(gs[:-1] != gs[1:])[0], n - 1)
cn = np.diff(np.append(-1, lg))
cs = vs.cumsum()
sm = np.diff(np.append(0, cs[lg]), 1)
return (v / np.repeat(sm, cn))[i]
Demonstration
示范
df.insert(1, 'adjusted_random', gsum(df.index.values, df.random.values))
df
random adjusted_random
A 1 0.166667
A 2 0.333333
A 3 0.500000
B 4 0.444444
B 5 0.555556
C 6 0.285714
C 7 0.333333
C 8 0.380952
D 100 0.330033
D 101 0.333333
D 102 0.336634
timing
定时
Old Answer
Use transform
使用转换
df.random / df.groupby(level=0).random.sum()
A 0.166667
A 0.333333
A 0.500000
B 0.444444
B 0.555556
C 0.285714
C 0.333333
C 0.380952
D 0.330033
D 0.333333
D 0.336634
Name: random, dtype: float64
Create new column
创建新列
df.assign(adjusted_random=df.random / df.groupby(level=0).random.sum())
random adjusted_random
A 1 0.166667
A 2 0.333333
A 3 0.500000
B 4 0.444444
B 5 0.555556
C 6 0.285714
C 7 0.333333
C 8 0.380952
D 100 0.330033
D 101 0.333333
D 102 0.336634
alternatives
备择方案
df.random.div(df.groupby(level=0).random.transform('sum'))
df.random.div(df.random.sum(level=0)) # @NickilMaveli