I've read the answers to this question and they are quite helpful, but I need help particularly in R.
我已经读了这个问题的答案,它们很有帮助,但是我需要特别的帮助。
I have an example data set in R as follows:
我有一个以R为例的数据集:
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
I want to fit a model to these data so that y = f(x)
. I want it to be a 3rd order polynomial model.
我想把一个模型与这些数据相匹配,使y = f(x)我希望它是一个三阶多项式模型。
How can I do that in R?
在R中怎么做呢?
Additionally, can R help me to find the best fitting model?
另外,R能帮我找到最合适的模型吗?
4 个解决方案
#1
73
To get a third order polynomial in x (x^3), you can do
得到一个三阶多项式在x(x ^ 3),你可以做
lm(y ~ x + I(x^2) + I(x^3))
or
或
lm(y ~ poly(x, 3, raw=TRUE))
You could fit a 10th order polynomial and get a near-perfect fit, but should you?
你可以拟出一个10次多项式,得到一个近乎完美的拟合,但是你应该吗?
EDIT: poly(x, 3) is probably a better choice (see @hadley below).
编辑:poly(x, 3)可能是一个更好的选择(见下面@hadley)。
#2
37
Which model is the "best fitting model" depends on what you mean by "best". R has tools to help, but you need to provide the definition for "best" to choose between them. Consider the following example data and code:
哪种模型是“最佳拟合模型”取决于你所说的“最佳”。R有帮助的工具,但是您需要提供“最佳”的定义,以便在它们之间进行选择。考虑以下示例数据和代码:
x <- 1:10
y <- x + c(-0.5,0.5)
plot(x,y, xlim=c(0,11), ylim=c(-1,12))
fit1 <- lm( y~offset(x) -1 )
fit2 <- lm( y~x )
fit3 <- lm( y~poly(x,3) )
fit4 <- lm( y~poly(x,9) )
library(splines)
fit5 <- lm( y~ns(x, 3) )
fit6 <- lm( y~ns(x, 9) )
fit7 <- lm( y ~ x + cos(x*pi) )
xx <- seq(0,11, length.out=250)
lines(xx, predict(fit1, data.frame(x=xx)), col='blue')
lines(xx, predict(fit2, data.frame(x=xx)), col='green')
lines(xx, predict(fit3, data.frame(x=xx)), col='red')
lines(xx, predict(fit4, data.frame(x=xx)), col='purple')
lines(xx, predict(fit5, data.frame(x=xx)), col='orange')
lines(xx, predict(fit6, data.frame(x=xx)), col='grey')
lines(xx, predict(fit7, data.frame(x=xx)), col='black')
Which of those models is the best? arguments could be made for any of them (but I for one would not want to use the purple one for interpolation).
那些模型中哪一个是最好的?可以为它们中的任何一个进行参数(但我不想用紫色来进行插值)。
#3
12
Regarding the question 'can R help me find the best fitting model', there is probably a function to do this, assuming you can state the set of models to test, but this would be a good first approach for the set of n-1 degree polynomials:
关于“R能否帮助我找到最合适的模型”这个问题,假设你可以陈述一套模型来进行测试,可能有一个函数可以这样做,但这对于n-1度多项式的集合来说是很好的第一种方法:
polyfit <- function(i) x <- AIC(lm(y~poly(x,i)))
as.integer(optimize(polyfit,interval = c(1,length(x)-1))$minimum)
Notes
笔记
-
The validity of this approach will depend on your objectives, the assumptions of
optimize()
andAIC()
and if AIC is the criterion that you want to use,这种方法的有效性将取决于您的目标,优化的假设()和AIC(),如果AIC是您想要使用的标准,
-
polyfit()
may not have a single minimum. check this with something like:polyfit()可能没有一个最小值。用类似的方法检查一下:
for (i in 2:length(x)-1) print(polyfit(i))
-
I used the
as.integer()
function because it is not clear to me how I would interpret a non-integer polynomial.我使用了as.integer()函数,因为我不清楚如何解释一个非整数多项式。
-
for testing an arbitrary set of mathematical equations, consider the 'Eureqa' program reviewed by Andrew Gelman here
为了测试一组任意的数学方程,请考虑Andrew Gelman在这里介绍的“Eureqa”程序。
Update
更新
Also see the stepAIC
function (in the MASS package) to automate model selection.
还可以看到stepAIC函数(在MASS包中)来自动化模型选择。
#4
5
The easiest way to find the best fit in R is to code the model as:
在R中找到最适合的最简单的方法是将模型编码为:
lm.1 <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + ...)
After using step down AIC regression
在使用了退一步的AIC回归之后。
lm.s <- step(lm.1)
#1
73
To get a third order polynomial in x (x^3), you can do
得到一个三阶多项式在x(x ^ 3),你可以做
lm(y ~ x + I(x^2) + I(x^3))
or
或
lm(y ~ poly(x, 3, raw=TRUE))
You could fit a 10th order polynomial and get a near-perfect fit, but should you?
你可以拟出一个10次多项式,得到一个近乎完美的拟合,但是你应该吗?
EDIT: poly(x, 3) is probably a better choice (see @hadley below).
编辑:poly(x, 3)可能是一个更好的选择(见下面@hadley)。
#2
37
Which model is the "best fitting model" depends on what you mean by "best". R has tools to help, but you need to provide the definition for "best" to choose between them. Consider the following example data and code:
哪种模型是“最佳拟合模型”取决于你所说的“最佳”。R有帮助的工具,但是您需要提供“最佳”的定义,以便在它们之间进行选择。考虑以下示例数据和代码:
x <- 1:10
y <- x + c(-0.5,0.5)
plot(x,y, xlim=c(0,11), ylim=c(-1,12))
fit1 <- lm( y~offset(x) -1 )
fit2 <- lm( y~x )
fit3 <- lm( y~poly(x,3) )
fit4 <- lm( y~poly(x,9) )
library(splines)
fit5 <- lm( y~ns(x, 3) )
fit6 <- lm( y~ns(x, 9) )
fit7 <- lm( y ~ x + cos(x*pi) )
xx <- seq(0,11, length.out=250)
lines(xx, predict(fit1, data.frame(x=xx)), col='blue')
lines(xx, predict(fit2, data.frame(x=xx)), col='green')
lines(xx, predict(fit3, data.frame(x=xx)), col='red')
lines(xx, predict(fit4, data.frame(x=xx)), col='purple')
lines(xx, predict(fit5, data.frame(x=xx)), col='orange')
lines(xx, predict(fit6, data.frame(x=xx)), col='grey')
lines(xx, predict(fit7, data.frame(x=xx)), col='black')
Which of those models is the best? arguments could be made for any of them (but I for one would not want to use the purple one for interpolation).
那些模型中哪一个是最好的?可以为它们中的任何一个进行参数(但我不想用紫色来进行插值)。
#3
12
Regarding the question 'can R help me find the best fitting model', there is probably a function to do this, assuming you can state the set of models to test, but this would be a good first approach for the set of n-1 degree polynomials:
关于“R能否帮助我找到最合适的模型”这个问题,假设你可以陈述一套模型来进行测试,可能有一个函数可以这样做,但这对于n-1度多项式的集合来说是很好的第一种方法:
polyfit <- function(i) x <- AIC(lm(y~poly(x,i)))
as.integer(optimize(polyfit,interval = c(1,length(x)-1))$minimum)
Notes
笔记
-
The validity of this approach will depend on your objectives, the assumptions of
optimize()
andAIC()
and if AIC is the criterion that you want to use,这种方法的有效性将取决于您的目标,优化的假设()和AIC(),如果AIC是您想要使用的标准,
-
polyfit()
may not have a single minimum. check this with something like:polyfit()可能没有一个最小值。用类似的方法检查一下:
for (i in 2:length(x)-1) print(polyfit(i))
-
I used the
as.integer()
function because it is not clear to me how I would interpret a non-integer polynomial.我使用了as.integer()函数,因为我不清楚如何解释一个非整数多项式。
-
for testing an arbitrary set of mathematical equations, consider the 'Eureqa' program reviewed by Andrew Gelman here
为了测试一组任意的数学方程,请考虑Andrew Gelman在这里介绍的“Eureqa”程序。
Update
更新
Also see the stepAIC
function (in the MASS package) to automate model selection.
还可以看到stepAIC函数(在MASS包中)来自动化模型选择。
#4
5
The easiest way to find the best fit in R is to code the model as:
在R中找到最适合的最简单的方法是将模型编码为:
lm.1 <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + ...)
After using step down AIC regression
在使用了退一步的AIC回归之后。
lm.s <- step(lm.1)