使用普通曲线的值将列添加到数据表

I have been trying to figure out how to solve a problem in R for several hours. Hopefully, someone can help out:

我一直试图弄清楚如何在R中解决问题几个小时。希望有人可以提供帮助:

I have the following data table (only a sample shown, called xout):

我有以下数据表(只显示一个示例,称为xout):

       factorx Freq cumFreq   relative 
1    (-2,-1.9]   13      13 0.00132626 
2  (-1.9,-1.8]   18      31 0.00183636 
3  (-1.8,-1.7]   22      53 0.00224444 
4  (-1.7,-1.6]   18      71 0.00183636 
5  (-1.6,-1.5]   22      93 0.00224444 
6  (-1.5,-1.4]   31     124 0.00316262

I am trying to add a new column with the relative frequency from a normal curve. I have tried to split up the column factorx into two columns called min and max so that I then use the numeric value to pass into the dnorm function. All of my attempts at string manipulation in r have failed. I tried to use:

我试图添加一个新列,其中相对频率来自正常曲线。我试图将列factorx分成两个名为min和max的列,然后我使用数值传递给dnorm函数。我在r中对字符串操作的所有尝试都失败了。我试过用:

gsub("[^/d]","",strsplit(toString(xout$factorx),",")))

but that failed. I am quite new to r, so I am sure there are better ways.

但那失败了。我对r很新,所以我相信有更好的方法。

2 个解决方案

#1

If you definitely want to use sub then here is one way to do it. You can capture the group you want using (.) in the regexp pattern and then pick it up.

如果你肯定想使用sub,那么这是一种方法。您可以在正则表达式模式中使用(。)捕获所需的组,然后将其拾取。

min <- as.numeric(sub("\\((.*),.*$", "\\1", xout$factorx))
> min
# [1] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5

max <- as.numeric(sub(".*,(.*)\\]$", "\\1", xout$factorx))
> max
# [1] -1.9 -1.8 -1.7 -1.6 -1.5 -1.4

Also, you could use strsplit, and substr with sapply as follows:

此外,您可以使用strsplit和substr with sapply,如下所示:

# first convert to character (to use `nchar` and `substr`)
xout$factorx <- as.character(xout$factorx)
# first remove the ( and ] and then split by "," and then convert to numeric
sapply(strsplit(substr(xout$factorx, 2, nchar(xout$factorx)-1), ","), as.numeric)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5
[2,] -1.9 -1.8 -1.7 -1.6 -1.5 -1.4

you have the min and max in the rows of the matrix.

你有矩阵行的最小值和最大值。

Another variation of sub: You could first remove the ( and ] using sub and then use strsplit.

sub的另一个变体:你可以先删除(和)使用sub然后使用strsplit。

sapply(strsplit(sub("\\((.*)\\]", "\\1", xout$factorx), ","), as.numeric)

#2

Couldn't you just do

你不能这样做吗?

data.frame(xout, newCol=c(1,2,3,4,...))

Of course, the vector you give could be anything.

当然,你给的矢量可以是任何东西。

Example: Add new column with Freq * 4:

示例:使用Freq * 4添加新列:

data.frame(xout, FreqFour=xout[[2]]*4)

Resulting in

       factorx Freq cumFreq   relative FreqFour
1    (-2,-1.9]   13      13 0.00132626       52
2  (-1.9,-1.8]   18      31 0.00183636       72
3  (-1.8,-1.7]   22      53 0.00224444       88
4  (-1.7,-1.6]   18      71 0.00183636       72
5  (-1.6,-1.5]   22      93 0.00224444       88
6  (-1.5,-1.4]   31     124 0.00316262      124

#1