R散点图:符号颜色表示重叠点的个数。

时间:2021-03-03 15:00:49

Scatter plots can be hard to interpret when many points overlap, as such overlapping obscures the density of data in a particular region. One solution is to use semi-transparent colors for the plotted points, so that opaque region indicates that many observations are present in those coordinates.

当许多点重叠时,散点图可能很难解释,因为这种重叠会模糊特定区域的数据密度。一种解决方案是对绘制点使用半透明的颜色,这样不透明的区域表示在这些坐标中存在许多观测。

Below is an example of my black and white solution in R:

下面是我在R中的黑白溶液的一个例子:

MyGray <- rgb(t(col2rgb("black")), alpha=50, maxColorValue=255)
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
dev.new(width=3.5, height=5)
par(mfrow=c(2,1), mar=c(2.5,2.5,0.5,0.5), ps=10, cex=1.15)
plot(x1, x2, ylab="", xlab="", pch=20, col=MyGray)
plot(x1, x2, ylab="", xlab="", pch=20, col="black")

R散点图:符号颜色表示重叠点的个数。

However, I recently came across this article in PNAS, which took a similar a approach, but used heat-map coloration as opposed to opacity as an indicator of how many points were overlapping. The article is Open Access, so anyone can download the .pdf and look at Figure 1, which contains a relevant example of the graph I want to create. The methods section of this paper indicates that analyses were done in Matlab.

然而,我最近在《美国国家科学院院刊》上看到了这篇文章,它采用了类似的方法,但使用了热图着色而不是不透明作为有多少点重叠的指标。本文是开放访问的,因此任何人都可以下载.pdf并查看图1,图1包含我要创建的图的相关示例。本文的方法部分表明,在Matlab中进行了分析。

For the sake of convenience, here is a small portion of Figure 1 from the above article:

为了方便起见,下面是上面文章中图1的一小部分:

R散点图:符号颜色表示重叠点的个数。

How would I create a scatter plot in R that used color, not opacity, as an indicator of point density?

如何在R中创建一个散点图,使用颜色而不是不透明度作为点密度的指示器?

For starters, R users can access this Matlab color scheme in the install.packages("fields") library, using the function tim.colors().

对于初学者,R用户可以使用函数time .colors()在install.packages(“fields”)库中访问这个Matlab配色方案。

Is there an easy way to make a figure similar to Figure 1 of the above article, but in R? Thanks!

是否有一种简单的方法来生成类似于上面这篇文章的图1,但是在R中?谢谢!

3 个解决方案

#1


29  

One option is to use densCols() to extract kernel densities at each point. Mapping those densities to the desired color ramp, and plotting points in order of increasing local density gets you a plot much like those in the linked article.

一种选择是使用densCols()在每个点提取内核密度。将这些密度映射到所需的颜色渐变,并按照增加局部密度的顺序绘制点,可以得到与链接文章中类似的图。

## Data in a data.frame
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
df <- data.frame(x1,x2)

## Use densCols() output to get density at each point
x <- densCols(x1,x2, colramp=colorRampPalette(c("black", "white")))
df$dens <- col2rgb(x)[1,] + 1L

## Map densities to colors
cols <-  colorRampPalette(c("#000099", "#00FEFF", "#45FE4F", 
                            "#FCFF00", "#FF9400", "#FF3100"))(256)
df$col <- cols[df$dens]

## Plot it, reordering rows so that densest points are plotted on top
plot(x2~x1, data=df[order(df$dens),], pch=20, col=col, cex=2)

R散点图:符号颜色表示重叠点的个数。

#2


5  

You can get a similar effect by doing hexagonal binning, divide the region into hexagons, color each hexagon based on the number of points in the hexagon. The hexbin package has functions to do this and there are also functions in the ggplot2 package.

你可以做六边形的装订,把区域分成六边形,根据六边形中点的数量给每个六边形涂上颜色。hexbin包有这样的函数,在ggplot2包中也有函数。

#3


3  

You can use smoothScatter for this.

你可以使用平滑散点。

colramp = colorRampPalette(c('white', 'blue', 'green', 'yellow', 'red'))
smoothScatter(x1, x2, colramp=colramp)

#1


29  

One option is to use densCols() to extract kernel densities at each point. Mapping those densities to the desired color ramp, and plotting points in order of increasing local density gets you a plot much like those in the linked article.

一种选择是使用densCols()在每个点提取内核密度。将这些密度映射到所需的颜色渐变,并按照增加局部密度的顺序绘制点,可以得到与链接文章中类似的图。

## Data in a data.frame
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
df <- data.frame(x1,x2)

## Use densCols() output to get density at each point
x <- densCols(x1,x2, colramp=colorRampPalette(c("black", "white")))
df$dens <- col2rgb(x)[1,] + 1L

## Map densities to colors
cols <-  colorRampPalette(c("#000099", "#00FEFF", "#45FE4F", 
                            "#FCFF00", "#FF9400", "#FF3100"))(256)
df$col <- cols[df$dens]

## Plot it, reordering rows so that densest points are plotted on top
plot(x2~x1, data=df[order(df$dens),], pch=20, col=col, cex=2)

R散点图:符号颜色表示重叠点的个数。

#2


5  

You can get a similar effect by doing hexagonal binning, divide the region into hexagons, color each hexagon based on the number of points in the hexagon. The hexbin package has functions to do this and there are also functions in the ggplot2 package.

你可以做六边形的装订,把区域分成六边形,根据六边形中点的数量给每个六边形涂上颜色。hexbin包有这样的函数,在ggplot2包中也有函数。

#3


3  

You can use smoothScatter for this.

你可以使用平滑散点。

colramp = colorRampPalette(c('white', 'blue', 'green', 'yellow', 'red'))
smoothScatter(x1, x2, colramp=colramp)