如何根据模式匹配为R中的数据点着色?

时间:2022-06-25 00:02:07

I have a data frame with this form:

我有一个这种形式的数据框:

             V1 V2                       V3          V4       V5         V6       V7           V8
1 0610007C21Rik  -   chr5:31351012-31356737 1.33732e-05 0.752381  0.9965090 0.000000 1.777419e-05
2 0610007L01Rik  - chr5:130695613-130717165 1.67168e+00 1.673120  0.0000000 3.453930 4.997847e-01
3 0610007P08Rik  -  chr13:63916627-64000808 7.06033e-01 0.000000  0.0815767 0.318051 1.000000e+00
4 0610007P14Rik  -  chr12:87157066-87165495 0.00000e+00 0.000000  0.0000000 5.494230          NaN
5 0610007P22Rik  -  chr17:25377114-25379603 4.99696e+00 0.908254  0.9076130 3.639250 8.461946e-01
6 0610009B22Rik  -  chr11:51499151-51502136 6.53363e-01 8.500980 13.5797000 0.000000 7.137192e-02

I am plotting log2(V4) vs. log2(V5) with this command:

我使用此命令绘制log2(V4)与log2(V5)的关系:

plot(log2(df[,4]) ~ log2(df[,5]), xlim=c(0,10), ylim=c(0,10))

I want to color points based on a pattern match in V1. For instance, how can I color 0610007C21Rik and 0610007L01Rik green and 0610007P22Rik and 0610007P14Rik red? I have tried adding another column to the data frame with a color specified, but there's got to be an easier way.

我想基于V1中的模式匹配来着色点。例如,如何为0610007C21Rik和0610007L01Rik绿色以及0610007P22Rik和0610007P14Rik红色进行着色?我已经尝试使用指定的颜色向数据框添加另一列,但必须有一种更简单的方法。

2 个解决方案

#1


1  

Here's a base R solution:

这是一个基本的R解决方案:

Define your list of colours as a named vector once for each unique value of df$V1. Note the ""'s around each of the names of points to be coloured.

为df $ V1的每个唯一值将颜色列表定义为命名向量一次。请注意每个要着色的点名称周围的“”。

col.list <- c(
              "0610007C21Rik"="green",
              "0610007L01Rik"="green",
              "0610007P22Rik"="red",
              "0610007P14Rik"="red"
             )

Then plot away using df$V1 to look up the values in the col.list vector you have just defined.

然后使用df $ V1进行绘图,以查找刚刚定义的col.list向量中的值。

plot(
     log2(df[,4]) ~ log2(df[,5]), 
     xlim=c(0,10),
     ylim=c(0,10),
     col=col.list[paste(df$V1)]
    )

To address the OP's comment below, use this in the plot call:

要解决下面的OP评论,请在情节调用中使用:

... col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")

This makes the full call look like:

这使得完整调用看起来像:

plot( 
      log2(df[,4]) ~ log2(df[,5]),
      xlim=c(0,10),
      ylim=c(0,10),
      col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")
    )

#2


1  

Have a look at the ggplot2 package.

看看ggplot2包。

If you dput your data frame it will make it easier for people to help with code.

如果您输入数据框,它将使人们更容易提供代码帮助。

Here is one example with made up data that looks a bit like yours, there are better ways to log transform however.

这是一个示例,其中包含看起来有点像你的数据,但是有更好的方法来记录转换。

df <- data.frame(sample(LETTERS[1:5],20, replace=TRUE), abs(rnorm(20)/100), abs(runif(20)*10))
colnames(df) <- c('V1','V4','V5')


library(ggplot2)

p <- ggplot(df, aes(log2(V4) , log2(V5)))
p + geom_point(aes(colour = V1))

#1


1  

Here's a base R solution:

这是一个基本的R解决方案:

Define your list of colours as a named vector once for each unique value of df$V1. Note the ""'s around each of the names of points to be coloured.

为df $ V1的每个唯一值将颜色列表定义为命名向量一次。请注意每个要着色的点名称周围的“”。

col.list <- c(
              "0610007C21Rik"="green",
              "0610007L01Rik"="green",
              "0610007P22Rik"="red",
              "0610007P14Rik"="red"
             )

Then plot away using df$V1 to look up the values in the col.list vector you have just defined.

然后使用df $ V1进行绘图,以查找刚刚定义的col.list向量中的值。

plot(
     log2(df[,4]) ~ log2(df[,5]), 
     xlim=c(0,10),
     ylim=c(0,10),
     col=col.list[paste(df$V1)]
    )

To address the OP's comment below, use this in the plot call:

要解决下面的OP评论,请在情节调用中使用:

... col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")

This makes the full call look like:

这使得完整调用看起来像:

plot( 
      log2(df[,4]) ~ log2(df[,5]),
      xlim=c(0,10),
      ylim=c(0,10),
      col=ifelse(df$V1 %in% names(col.list),col.list[paste(df$V1)],"black")
    )

#2


1  

Have a look at the ggplot2 package.

看看ggplot2包。

If you dput your data frame it will make it easier for people to help with code.

如果您输入数据框,它将使人们更容易提供代码帮助。

Here is one example with made up data that looks a bit like yours, there are better ways to log transform however.

这是一个示例,其中包含看起来有点像你的数据,但是有更好的方法来记录转换。

df <- data.frame(sample(LETTERS[1:5],20, replace=TRUE), abs(rnorm(20)/100), abs(runif(20)*10))
colnames(df) <- c('V1','V4','V5')


library(ggplot2)

p <- ggplot(df, aes(log2(V4) , log2(V5)))
p + geom_point(aes(colour = V1))