Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2
? In Matlab it is the scatterhist()
function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
是否有一种方法可以像在ggplot2中一样,用边缘直方图创建散点图?在Matlab中,它是scatterhist()函数,并且在R中也存在类似的函数。不过,我还没见过ggplot2。
I started an attempt by creating the single graphs but don't know how to arrange them properly.
我开始尝试创建单个图形,但不知道如何正确排列它们。
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
并将其与此处发布的函数进行排列。但是长话短说:有没有办法创造这些图表呢?
7 个解决方案
#1
81
The gridExtra
package should work here. Start by making each of the ggplot objects:
gridExtra的包裹应该在这里工作。首先让每个ggplot对象:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
然后使用网格。安排功能:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
#2
104
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
这不是一个完全响应的答案,但它非常简单。它演示了一种显示边际密度的替代方法,以及如何在支持透明的图形输出中使用alpha级别:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
#3
65
This might be a bit late, but I decided to make a package (ggExtra
) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
这可能有点晚了,但是我决定为它做一个包(ggExtra),因为它涉及到一些代码,而且编写起来很麻烦。该软件包还试图解决一些常见的问题,比如确保即使有标题或文本被放大,这些情节仍然会相互内联。
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
基本的想法和这里给出的答案很相似,但它超出了这个范围。这里有一个例子,说明如何在一个随机的1000点上增加边缘直方图。希望这能使将来更容易添加直方图/密度图。
链接到ggExtra包
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
#4
42
One addition, just to save some searching time for people doing this after us.
另外,只是为了节省一些搜索时间。
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
传说,轴标,轴心文字,滴答声使情节偏离彼此,所以你的情节将看起来丑陋和不一致。
You can correct this by using some of these theme settings,
你可以通过使用这些主题设置来修正这个问题,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
并使尺度,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
所以结果看起来很好:
#5
25
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
这只是BondedDust的答案的一个很小的变化,在广义的分布的边际指标的精神中。
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
Edward Tufte将这种地毯的使用称为“点破折号”,并在VDQI中有一个使用轴线来表示每个变量的范围的例子。在我的示例中,axis标签和网格线也表示数据的分布。标签位于Tukey的5个数字摘要(最小,下页,中值,上铰,最大值)的值,给每个变量的传播提供一个快速的印象。
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
这五个数字是箱线图的数字表示。这有点棘手,因为不均匀间隔的网格线表明坐标轴有一个非线性的刻度(在这个例子中它们是线性的)。也许最好忽略网格线,或者强迫它们在正常位置,并让标签显示5个数字摘要。
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
#6
7
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
由于在比较不同的组时,没有令人满意的解决方案,所以我写了一个函数来做这个。
It works for both grouped and ungrouped data and accepts additional graphical parameters:
它适用于分组和未分组的数据,并接受额外的图形参数:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
#7
1
I've found the package (ggpubr
) that seems to work very well for this problem and it considers several possibilities to display the data.
我发现这个包(ggpubr)似乎很适合这个问题,它考虑了一些显示数据的可能性。
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
这个包的链接在这里,在这个链接中,您将找到一个很好的教程来使用它。为了完整起见,我附上了我复制的一个例子。
I first installed the package (it requires devtools
)
我首先安装了这个包(它需要devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra
: "One limitation of ggExtra
is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot
package." In my case, I had to install the latter package:
对于不同组显示不同直方图的特殊例子,它提到了与ggExtra的关系:“ggExtra的一个限制是它不能处理散点图和边缘图中的多个组。在下面的R代码中,我们提供了一个使用cowplot包的解决方案。在我的案例中,我不得不安装后一个包:
install.packages("cowplot")
And I followed this piece of code:
我遵循这段代码:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
这对我来说很正常:
Iris set marginal histograms scatterplot
虹膜设置边缘直方图散点图。
#1
81
The gridExtra
package should work here. Start by making each of the ggplot objects:
gridExtra的包裹应该在这里工作。首先让每个ggplot对象:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
然后使用网格。安排功能:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
#2
104
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
这不是一个完全响应的答案,但它非常简单。它演示了一种显示边际密度的替代方法,以及如何在支持透明的图形输出中使用alpha级别:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
#3
65
This might be a bit late, but I decided to make a package (ggExtra
) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
这可能有点晚了,但是我决定为它做一个包(ggExtra),因为它涉及到一些代码,而且编写起来很麻烦。该软件包还试图解决一些常见的问题,比如确保即使有标题或文本被放大,这些情节仍然会相互内联。
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
基本的想法和这里给出的答案很相似,但它超出了这个范围。这里有一个例子,说明如何在一个随机的1000点上增加边缘直方图。希望这能使将来更容易添加直方图/密度图。
链接到ggExtra包
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
#4
42
One addition, just to save some searching time for people doing this after us.
另外,只是为了节省一些搜索时间。
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
传说,轴标,轴心文字,滴答声使情节偏离彼此,所以你的情节将看起来丑陋和不一致。
You can correct this by using some of these theme settings,
你可以通过使用这些主题设置来修正这个问题,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
并使尺度,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
所以结果看起来很好:
#5
25
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
这只是BondedDust的答案的一个很小的变化,在广义的分布的边际指标的精神中。
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
Edward Tufte将这种地毯的使用称为“点破折号”,并在VDQI中有一个使用轴线来表示每个变量的范围的例子。在我的示例中,axis标签和网格线也表示数据的分布。标签位于Tukey的5个数字摘要(最小,下页,中值,上铰,最大值)的值,给每个变量的传播提供一个快速的印象。
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
这五个数字是箱线图的数字表示。这有点棘手,因为不均匀间隔的网格线表明坐标轴有一个非线性的刻度(在这个例子中它们是线性的)。也许最好忽略网格线,或者强迫它们在正常位置,并让标签显示5个数字摘要。
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
#6
7
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
由于在比较不同的组时,没有令人满意的解决方案,所以我写了一个函数来做这个。
It works for both grouped and ungrouped data and accepts additional graphical parameters:
它适用于分组和未分组的数据,并接受额外的图形参数:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
#7
1
I've found the package (ggpubr
) that seems to work very well for this problem and it considers several possibilities to display the data.
我发现这个包(ggpubr)似乎很适合这个问题,它考虑了一些显示数据的可能性。
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
这个包的链接在这里,在这个链接中,您将找到一个很好的教程来使用它。为了完整起见,我附上了我复制的一个例子。
I first installed the package (it requires devtools
)
我首先安装了这个包(它需要devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra
: "One limitation of ggExtra
is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot
package." In my case, I had to install the latter package:
对于不同组显示不同直方图的特殊例子,它提到了与ggExtra的关系:“ggExtra的一个限制是它不能处理散点图和边缘图中的多个组。在下面的R代码中,我们提供了一个使用cowplot包的解决方案。在我的案例中,我不得不安装后一个包:
install.packages("cowplot")
And I followed this piece of code:
我遵循这段代码:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
这对我来说很正常:
Iris set marginal histograms scatterplot
虹膜设置边缘直方图散点图。