Suppose I have a df
like this:
假设我有一个这样的df
stringOfInterest trend
0 C up
1 D down
2 E down
3 C,O up
4 C,P up
I want to plot this df
as a bar graph using pandas
. To obtain the proper grouped bar plots, I would like to group the data by the column df["trend"]
and then count the occurrence of df["stringOfInterest"]
for each letter. As can be seen, some of this strings contain multiple letters separated by a ",".
我想用熊猫把这个df画成条形图。为了获得合适的分组条形图,我想用列df["trend"]对数据进行分组,然后计算出每个字母中出现的df["stringOfInterest"]。可以看到,其中一些字符串包含多个由“,”分隔的字母。
Using
使用
df.groupby("trend").stringOfInterest.value_counts().unstack(0)
produces the expected result:
产生预期的结果:
trend down up
stringOfInterest
- 7.0 8.0
C 3.0 11.0
C,O NaN 2.0
C,P 1.0 1.0
D 1.0 2.0
E 15.0 14.0
E,T 1.0 NaN
However, I would like to count the occurrence of individual characters (C,E,D,...). On the original df
this can be achieved like this:
但是,我想计算单个字符的出现次数(C,E,D,…)。在原来的df上可以实现如下:
s = df.stringOfInterest.str.split(",", expand = True).stack()
s.value_counts()
This typically generates something like this:
这通常会产生这样的东西:
C 3
E 2
D 1
O 1
P 1
T 1
Unfortunately, this cannot be used here after the groupby()
in combination with unstack()
.
不幸的是,在groupby()与unstack()结合之后不能在这里使用。
Maybe I am on the wrong track and some more elegant way would be preferred.
也许我在错误的轨道上,一些更优雅的方式更可取。
To clarify the plotting: For each letter (stringOfInterest), there must be two bars indicating the number of "up" and "down" trend(s).
为了明确情节:对于每一个字母(stringOfInterest),必须有两个横线表示“向上”和“向下”的趋势。
1 个解决方案
#1
2
Based on this answer here: Pandas expand rows from list data available in column
基于这里的答案:熊猫从列中可用的列表数据展开行
Is this something that would help you?
这对你有帮助吗?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'stringOfInterest': {0: 'C', 1: 'D', 2: 'E', 3: 'C,O', 4: 'C,P'},
'trend': {0: 'up', 1: 'down', 2: 'down', 3: 'up', 4: 'up'}})
df2 = (pd.DataFrame(df.stringOfInterest.str.split(',').tolist(), index=df.trend)
.stack()
.reset_index()
.groupby('trend')[0]
.value_counts()
.unstack()
).T
df2.plot(kind='bar')
plt.show()
Another approach
另一种方法
We could also zip the columns together and expand.
我们也可以把列压缩在一起展开。
import pandas as pd
from collections import Counter
data = [(x,i) for x,y in zip(df.trend,df.stringOfInterest.str.split(',')) for i in y]
pd.Series(Counter(data)).plot(kind='bar')
#1
2
Based on this answer here: Pandas expand rows from list data available in column
基于这里的答案:熊猫从列中可用的列表数据展开行
Is this something that would help you?
这对你有帮助吗?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'stringOfInterest': {0: 'C', 1: 'D', 2: 'E', 3: 'C,O', 4: 'C,P'},
'trend': {0: 'up', 1: 'down', 2: 'down', 3: 'up', 4: 'up'}})
df2 = (pd.DataFrame(df.stringOfInterest.str.split(',').tolist(), index=df.trend)
.stack()
.reset_index()
.groupby('trend')[0]
.value_counts()
.unstack()
).T
df2.plot(kind='bar')
plt.show()
Another approach
另一种方法
We could also zip the columns together and expand.
我们也可以把列压缩在一起展开。
import pandas as pd
from collections import Counter
data = [(x,i) for x,y in zip(df.trend,df.stringOfInterest.str.split(',')) for i in y]
pd.Series(Counter(data)).plot(kind='bar')