R:使用“识别”在箱图中查找列名

时间:2022-02-17 13:02:15

In R, I'm drawing a rather large boxplot from a data.frame with approximately 150 columns. I know that there are some "anomalous" columns where the distribution is too different from the rest of the data set and I want to identify which ones precisely.

在R中,我从一个大约150列的data.frame中绘制了一个相当大的箱线图。我知道有一些“异常”的列,其分布与数据集的其余部分太不同,我想确切地识别哪些。

Rather unsurprisingly, there is not enough room for the labels and even if there were, it would be probably inconvenient to check by hand. So I thought I could use R's identify function to locate the offending columns. Such a function however needs x and y coordinates, and so far I was unable to get it to work.

不出所料,标签没有足够的空间,即使有,也可能不方便手工检查。所以我认为我可以使用R的识别功能来定位有问题的列。然而,这样的函数需要x和y坐标,到目前为止我无法使其工作。

I tried

boxplot(dd.noctr$TGS, outline=F)
identify(xy.coords(dd.noctr$TGS)$x, y=xy.coords(dd.noctr$TGS)$y)

where dd.noctr$TGS is my data (a matrix or data.frame), only to get the error

其中dd.noctr $ TGS是我的数据(矩阵或data.frame),只是为了得到错误

warning: no point within 0.25 inches

meaning that no point was identified.

这意味着没有发现任何一点。

Is there an alternative solution to identify column names (not single points)?

是否有另一种解决方案来识别列名(而不是单点)?

2 个解决方案

#1


1  

This solution seems a bit clunky, so there is probably a better solution.

这个解决方案看起来有点笨重,所以可能有更好的解决方案。

  1. Set up some example data with three columns:

    设置一些包含三列的示例数据:

    TGS = data.frame(A = rnorm(100), B = rnorm(100), C=rnorm(100))
    
  2. Next plot the boxplot

    接下来绘制箱线图

    boxplot(TGS, outline=F)
    
  3. Now we construct the identity function.

    现在我们构造身份函数。

    identify(x=rep(1:ncol(TGS), each=nrow(TGS)), 
         y=as.vector(unlist(TGS)), 
         label=rep(colnames(TGS), each=nrow(TGS)))
    

    The labels are the column names. This function only works if you click near the centre of the boxplot.

    标签是列名。仅当您在箱线图的中心附近单击时,此功能才有效。

R:使用“识别”在箱图中查找列名

#2


0  

If you want to get a list of outliers, you can use the 'out' component of boxplot.

如果要获取异常值列表,可以使用boxplot的“out”组件。

example: Create a dataframe : with a few random values with mean 20, and add some outliers. This code will display the outliers.

示例:创建一个数据框:带有一些平均值为20的随机值,并添加一些异常值。此代码将显示异常值。

 df1 = data.frame(A = c(rnorm(15,20,3),7,8,35,32))   #15 rnorm and 4 extreme values
 bplot=boxplot(df1)
 bplot$out

#1


1  

This solution seems a bit clunky, so there is probably a better solution.

这个解决方案看起来有点笨重,所以可能有更好的解决方案。

  1. Set up some example data with three columns:

    设置一些包含三列的示例数据:

    TGS = data.frame(A = rnorm(100), B = rnorm(100), C=rnorm(100))
    
  2. Next plot the boxplot

    接下来绘制箱线图

    boxplot(TGS, outline=F)
    
  3. Now we construct the identity function.

    现在我们构造身份函数。

    identify(x=rep(1:ncol(TGS), each=nrow(TGS)), 
         y=as.vector(unlist(TGS)), 
         label=rep(colnames(TGS), each=nrow(TGS)))
    

    The labels are the column names. This function only works if you click near the centre of the boxplot.

    标签是列名。仅当您在箱线图的中心附近单击时,此功能才有效。

R:使用“识别”在箱图中查找列名

#2


0  

If you want to get a list of outliers, you can use the 'out' component of boxplot.

如果要获取异常值列表,可以使用boxplot的“out”组件。

example: Create a dataframe : with a few random values with mean 20, and add some outliers. This code will display the outliers.

示例:创建一个数据框:带有一些平均值为20的随机值,并添加一些异常值。此代码将显示异常值。

 df1 = data.frame(A = c(rnorm(15,20,3),7,8,35,32))   #15 rnorm and 4 extreme values
 bplot=boxplot(df1)
 bplot$out