I have a simple dataset and I am trying to use the power trend to best fit the data. The sample data is very small and is as follows:
我有一个简单的数据集,我正在尝试使用功率趋势来最好地拟合数据。样本数据非常小,如下:
structure(list(Discharge = c(250, 300, 500, 700, 900), Downstream = c(0.3,
0.3, 0.3, 0.3, 0.3), Age = c(1.32026239202165, 1.08595138888889,
0.638899189814815, 0.455364583333333, 0.355935185185185)), .Names = c("Discharge",
"Downstream", "Age"), row.names = c(NA, 5L), class = "data.frame")
Data looks as follows:
数据如下:
> new
Discharge Downstream Age
1 250 0.3 1.3202624
2 300 0.3 1.0859514
3 500 0.3 0.6388992
4 700 0.3 0.4553646
5 900 0.3 0.3559352
I tried to plot the above data using ggplot2
我试图使用ggplot2绘制上述数据
ggplot(new)+geom_point(aes(x=Discharge,y=Age))
I could add the linear line using geom_smooth(method="lm")
but I am not sure what code do I need to show the power line.
我可以使用geom_smooth(method =“lm”)添加线性线,但我不确定显示电源线需要什么代码。
The output is as follows:
输出如下:
How Can I add a power linear regression line as done in excel ? The excel figure is shown below:
如何在excel中添加幂线性回归线? excel图如下所示:
3 个解决方案
#1
10
Use nls
(nonlinear least squares) as your smoother
使用nls(非线性最小二乘)作为您的平滑器
eg
例如
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE)
Noting Doug Bates comments on R-squared values and non-linear models here, you could use the ideas in Adding Regression Line Equation and R2 on graph
注意Doug Bates在这里评论R平方值和非线性模型,你可以使用添加回归线方程和R2的图表中的想法
to append the regression line equation
附加回归线方程
# note that you have to give it sensible starting values
# and I haven't worked out why the values passed to geom_smooth work!
power_eqn = function(df, start = list(a =300,b=1)){
m = nls(Discharge ~ a*Age^b, start = start, data = df);
eq <- substitute(italic(y) == a ~italic(x)^b,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2)))
as.character(as.expression(eq));
}
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE) +
geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
#2
15
While mnel's answer is correct for a nonlinear least squares fit, note that Excel isn't actually doing anything nearly that sophisticated. It's really just log-transforming the response and predictor variables, and doing an ordinary (linear) least squares fit. To reproduce this in R, you would do:
虽然mnel的答案对于非线性最小二乘拟合是正确的,但请注意Excel实际上并没有做任何复杂的事情。它实际上只是对响应和预测变量进行对数变换,并进行普通(线性)最小二乘拟合。要在R中重现这一点,您可以:
lm(log(Age) ~ log(Discharge), data=df)
Call:
lm(formula = log(Age) ~ log(Discharge), data = df)
Coefficients:
(Intercept) log(Discharge)
5.927 -1.024
As a check, the coefficient for log(Discharge)
is identical to that from Excel while exp(5.927) ~ 375.05.
作为检查,log(Discharge)的系数与Excel中的系数相同,而exp(5.927)~375.05。
While I'm not sure how to use this as a trendline in ggplot2, you can do it in base graphics thusly:
虽然我不确定如何在ggplot2中使用它作为趋势线,但你可以在基本图形中这样做:
m <- lm(log(y) ~ log(x), data=df)
newdf <- data.frame(Discharge=seq(min(df$Discharge), max(df$Discharge), len=100))
plot(Age ~ Discharge, data=df)
lines(newdf$Discharge, exp(predict(m, newdf)))
text(600, .8, substitute(b0*x^b1, list(b0=exp(coef(m)[1]), b1=coef(m)[2])))
text(600, .75, substitute(plain("R-square: ") * r2, list(r2=summary(m)$r.squared)))
#3
1
2018 Update: The call "start"
now seems to be depreciated. It is not in the stat_smooth
function information either.
2018年更新:呼叫“开始”现在似乎已折旧。它也不在stat_smooth函数信息中。
If you want to choose starting values, you need to use "method.args" option now.
如果要选择起始值,则需要立即使用“method.args”选项。
See changes below:
请参阅以下更改:
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a = 1,b=1)),se=FALSE) + geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
#1
10
Use nls
(nonlinear least squares) as your smoother
使用nls(非线性最小二乘)作为您的平滑器
eg
例如
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE)
Noting Doug Bates comments on R-squared values and non-linear models here, you could use the ideas in Adding Regression Line Equation and R2 on graph
注意Doug Bates在这里评论R平方值和非线性模型,你可以使用添加回归线方程和R2的图表中的想法
to append the regression line equation
附加回归线方程
# note that you have to give it sensible starting values
# and I haven't worked out why the values passed to geom_smooth work!
power_eqn = function(df, start = list(a =300,b=1)){
m = nls(Discharge ~ a*Age^b, start = start, data = df);
eq <- substitute(italic(y) == a ~italic(x)^b,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2)))
as.character(as.expression(eq));
}
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE) +
geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
#2
15
While mnel's answer is correct for a nonlinear least squares fit, note that Excel isn't actually doing anything nearly that sophisticated. It's really just log-transforming the response and predictor variables, and doing an ordinary (linear) least squares fit. To reproduce this in R, you would do:
虽然mnel的答案对于非线性最小二乘拟合是正确的,但请注意Excel实际上并没有做任何复杂的事情。它实际上只是对响应和预测变量进行对数变换,并进行普通(线性)最小二乘拟合。要在R中重现这一点,您可以:
lm(log(Age) ~ log(Discharge), data=df)
Call:
lm(formula = log(Age) ~ log(Discharge), data = df)
Coefficients:
(Intercept) log(Discharge)
5.927 -1.024
As a check, the coefficient for log(Discharge)
is identical to that from Excel while exp(5.927) ~ 375.05.
作为检查,log(Discharge)的系数与Excel中的系数相同,而exp(5.927)~375.05。
While I'm not sure how to use this as a trendline in ggplot2, you can do it in base graphics thusly:
虽然我不确定如何在ggplot2中使用它作为趋势线,但你可以在基本图形中这样做:
m <- lm(log(y) ~ log(x), data=df)
newdf <- data.frame(Discharge=seq(min(df$Discharge), max(df$Discharge), len=100))
plot(Age ~ Discharge, data=df)
lines(newdf$Discharge, exp(predict(m, newdf)))
text(600, .8, substitute(b0*x^b1, list(b0=exp(coef(m)[1]), b1=coef(m)[2])))
text(600, .75, substitute(plain("R-square: ") * r2, list(r2=summary(m)$r.squared)))
#3
1
2018 Update: The call "start"
now seems to be depreciated. It is not in the stat_smooth
function information either.
2018年更新:呼叫“开始”现在似乎已折旧。它也不在stat_smooth函数信息中。
If you want to choose starting values, you need to use "method.args" option now.
如果要选择起始值,则需要立即使用“method.args”选项。
See changes below:
请参阅以下更改:
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a = 1,b=1)),se=FALSE) + geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)