My dataset:
I have data in the following format (here, imported from a CSV file). You can find an example dataset as CSV here.
我有以下格式的数据(此处,从CSV文件导入)。您可以在此处找到CSV格式的示例数据集。
PAIR PREFERENCE
1 5
1 3
1 2
2 4
2 1
2 3
… and so on. In total, there are 19 pairs, and the PREFERENCE
ranges from 1
to 5
, as discrete values.
… 等等。总共有19对,PREFERENCE的范围从1到5,作为离散值。
What I'm trying to achieve:
What I need is a stacked histogram, e.g. a 100% high column, for each pair, indicating the distribution of the PREFERENCE
values.
我需要的是堆叠直方图,例如每对都有一个100%高的列,表示PREFERENCE值的分布。
Something similar to the "100% stacked columns" in Excel, or (although not quite the same, a so-called "mosaic plot"):
类似于Excel中的“100%堆积列”,或者(尽管不完全相同,所谓的“马赛克图”):
What I tried:
I figured it'd be easiest using ggplot2
, but I don't even know where to start. I know I can create a simple bar chart with something like:
我认为使用ggplot2最简单,但我甚至不知道从哪里开始。我知道我可以创建一个简单的条形图,例如:
ggplot(d, aes(x=factor(PAIR), y=factor(PREFERENCE))) + geom_bar(position="fill")
… that however doesn't get me very far. So I tried this, and it gets me somewhat closer to what I'm trying to achieve, but it still uses the count of PREFERENCE
, I suppose? Note the ylab
being "count" here, and the values ranging to 19.
......然而,这并没有让我走得太远。所以我尝试了这个,它让我更接近我想要实现的目标,但我认为它仍然使用了PREFERENCE的数量?注意ylab在这里是“count”,值是19。
qplot(factor(PAIR), data=d, geom="bar", fill=factor(PREFERENCE_FIXED))
Results in:
- So, what do I have to do to get the stacked bars to represent a histogram?
- Or do they actually do this already?
- If so, what do I have to change to get the labels right (e.g. have percentages instead of the "count")?
那么,我需要做些什么来使堆积条形成直方图?
或者他们实际上已经这样做了吗?
如果是这样,我需要更改什么才能使标签正确(例如,有百分比而不是“计数”)?
By the way, this is not really related to this question, and only marginally related to this (i.e. probably same idea, but not continuous values, instead grouped into bars).
顺便说一下,这与这个问题并没有真正的关系,只是与此略有关系(即可能是相同的想法,但不是连续的值,而是分为条形)。
1 个解决方案
#1
8
Maybe you want something like this:
也许你想要这样的东西:
ggplot() +
geom_bar(data = dat,
aes(x = factor(PAIR),fill = factor(PREFERENCE)),
position = "fill")
where I've read your data into dat
. This outputs something like this:
我把你的数据读入dat的地方。这输出如下:
The y label is still "count", but you can change that manually by adding:
y标签仍为“count”,但您可以通过添加以下内容手动更改:
+ scale_x_discrete("Pairs") + scale_y_continuous("Votes")
#1
8
Maybe you want something like this:
也许你想要这样的东西:
ggplot() +
geom_bar(data = dat,
aes(x = factor(PAIR),fill = factor(PREFERENCE)),
position = "fill")
where I've read your data into dat
. This outputs something like this:
我把你的数据读入dat的地方。这输出如下:
The y label is still "count", but you can change that manually by adding:
y标签仍为“count”,但您可以通过添加以下内容手动更改:
+ scale_x_discrete("Pairs") + scale_y_continuous("Votes")