在R中运行几个简单的回归

So I have a data set that has 188 rows and 65 columns relating to World development indicators and Birth statistics. I am trying to do a purposeful selection method to create a regression model. The first step of this is to look at all of the individual simple linear models.

所以我的数据集有188行和65列,与世界发展指标和出生统计有关。我正在尝试做一个有目的的选择方法来创建一个回归模型。第一步是查看所有单独的简单线性模型。

my goal is to run regression models in R for for each of my variables against my response. I know I can run lm(x$v30 ~ x$v1) which would give the regression for one of the variables. however, i am hoping to be able to do this in one step and pull all of the p values into a table or write them to a CSV.

我的目标是在R中为我的每个变量运行回归模型。我知道我可以运行lm(x $ v30~x $ v1),这将给出其中一个变量的回归。但是,我希望能够一步完成并将所有p值拉入表格或将其写入CSV。

I was following this but this does not give the P-values in a nice manner:R loop for Regression

我跟着这个但是这并不能很好地给出P值:回归的R循环

1 个解决方案

#1

First, I don't recommend you doing this unless you know what you are doing. Else read about things like selection bias, false discovery rate, etc.

首先,除非你知道自己在做什么,否则我不建议你这样做。否则阅读选择偏差,错误发现率等内容。

In the following, I am using the iris dataset, and regress the first three columns on the fourth one. You can easily change this to data you have.

在下文中,我使用的是虹膜数据集,并在第四列中回归前三列。您可以轻松地将其更改为您拥有的数据。

Using the broom package isn't mandatory. If you don't want that, remove tidy`` command in thelapply` function.

使用扫帚包不是强制性的。如果您不想这样,请在thelapply`函数中删除tidy``命令。

library(broom)

list_out <- lapply(colnames(iris)[1:3], function(i)
             tidy(lm(as.formula(paste("Petal.Width ~", i)), data = iris)))

# [[1]]
# term   estimate  std.error statistic      p.value
# 1  (Intercept) -3.2002150 0.25688579 -12.45773 8.141394e-25
# 2 Sepal.Length  0.7529176 0.04353017  17.29645 2.325498e-37
# 
# [[2]]
# term   estimate std.error statistic      p.value
# 1 (Intercept)  3.1568723 0.4130820  7.642242 2.474053e-12
# 2 Sepal.Width -0.6402766 0.1337683 -4.786461 4.073229e-06
# 
# [[3]]
# term   estimate   std.error statistic      p.value
# 1  (Intercept) -0.3630755 0.039761990 -9.131221 4.699798e-16
# 2 Petal.Length  0.4157554 0.009582436 43.387237 4.675004e-86

Put them into a data.frame

将它们放入data.frame中

do.call(rbind, list_out)

#          term   estimate   std.error  statistic      p.value
# 1  (Intercept) -3.2002150 0.256885790 -12.457735 8.141394e-25
# 2 Sepal.Length  0.7529176 0.043530170  17.296454 2.325498e-37
# 3  (Intercept)  3.1568723 0.413081984   7.642242 2.474053e-12
# 4  Sepal.Width -0.6402766 0.133768277  -4.786461 4.073229e-06
# 5  (Intercept) -0.3630755 0.039761990  -9.131221 4.699798e-16
# 6 Petal.Length  0.4157554 0.009582436  43.387237 4.675004e-86

#1