So I'm a newbie to R and basically I'm looking to a large sum of regressions from taking data from one data frame and storing the output into a new data frame but to also create a loop to automatically "pick up the next input" and skip the duplicates.
所以我是R的新手,基本上我正在寻找大量的回归,从一个数据帧中获取数据并将输出存储到一个新数据框中,但也要创建一个循环来自动“获取下一个输入“并跳过重复。
I've attached a photo of what my data looks like.
我附上了一张我的数据的照片。
This is my code to run the regressions
这是我运行回归的代码
#inputs
Airport = "ABZ"
#choose target airport & nation GDP
df <- subset(Elasticities_Study, Airport_Code==Airport)
#log-log
df <- data.frame(df$Year, df$Region,
df$Airport_Code, log(df$Passengers, 10), log(df$GDP, 10))
colnames(df) <- c("Year", "Region",
"Airport", "Passengers", "GDP")
#regression
fit <- lm(df$Passengers ~ df$GDP)
#store the coefficient
coefficient <-coefficients(fit)
elasticity <- coefficient["df$GDP"]
#store the p_value
p <- function (fit) {
if (class(fit) != "lm") stop("Not an object of class 'lm' ")
f <- summary(fit)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
p_value <- p(fit)
#store the r_squared
r_squared <- summary(fit)$r.squared
#save regression output into data frame
Regression_Output <- data.frame(df[1,2], df[1,3],
elasticity, p_value, r_squared)
colnames(Regression_Output) <- c("Region", "Airport", "Elasticity", "P-
Value", "R_Squared")
Could someone please help! Thanks!
请有人帮忙!谢谢!
2 个解决方案
#1
0
Sorry I couldn't comment on the above answer (my answer belongs more of a comment but I don't really have permissions for that yet).
对不起,我无法评论上面的答案(我的答案更多的是评论,但我还没有真正的权限)。
You can use a variation of the loop Ben_its performed, but instead of using a numeric value for i, have the loop iterate by Airport code. Its messy code but its how my brain works
您可以使用已执行的循环Ben_its的变体,但不是使用i的数值,而是按机场代码迭代循环。它凌乱的代码,但它的大脑是如何工作的
First create a blank data.frame to store your new results solutions. Then create a vector with all of your airport codes. Set the loop to loop through that vector of airport codes you provide it.
首先创建一个空白data.frame来存储新的结果解决方案。然后使用所有机场代码创建一个向量。设置循环以循环显示您提供的机场代码向量。
Within the loop, the first thing it is doing is subsetting the data based on that first airport code. So in your list, it will take only the airports with Airport_Code == item #1 on your vector of codes. Then it will run your regression and you will tell it which values to take from the summary, etc. Once it has the values you want, it stores it into the vector solution new_data, which is then rbind-ed to the blank data.frame we created for the solutions. That last line is where it is simply creating a new results data.frame by taking the previous object of the results data.frame and adding the new row of the solutions you just got based on the subsetted data by airport code. Then, it iterates through the next airport code in your codes vector.
在循环中,它首先要做的是根据第一个机场代码对数据进行子集化。因此,在您的列表中,只需在您的代码矢量上使用Airport_Code == item#1的机场。然后它将运行您的回归,您将告诉它从摘要中获取哪些值等。一旦它具有您想要的值,它就将其存储到向量解决方案new_data中,然后将其转换为空白data.frame我们为解决方案创建了。最后一行是通过获取结果data.frame的前一个对象并根据机场代码基于子集化数据添加刚刚获得的解决方案的新行来创建新结果data.frame的地方。然后,它遍历代码向量中的下一个机场代码。
your_data = data.frame(source data)
## Create empty data.frame to store solutions. We will use the loop and rbind to constantly add a row of solutions with each iteration of the loop
results = data.frame()
codes = c(all of your airport codes)
for (i in codes){
## here you use the variable of the loop to subset your whole dataset based on that airport code
byairportcode = subset(your_data,your_data$Airport_Code == i)
## All operations listed in your post, creating a vector of your desired information/solutions for each line
new_data = something(parameters)
###combine this new data with your list of solutions by airport code
results = rbind(results,new_data)
}
You shouldn't need to eliminate any duplicates, because you have subsetted the data by airport code already, such that each entry in your results frame is each individual airport code's results of the regression.
您不需要消除任何重复项,因为您已经按机场代码对数据进行了子集化,因此结果框架中的每个条目都是每个机场代码的回归结果。
#2
0
Consider using by
to slice the large dataframe by every Airport Code and pass each as subsetted dataframe object into a called function like your regression processing.
考虑使用by按每个机场代码对大型数据帧进行切片,并将每个机场代码作为子集化数据帧对象传递给调用函数,如回归处理。
Final output will be a list of dataframes with equal length to distinct Airport Codes and each item of list named by corresponding airport code.
最终输出将是一个数据帧列表,其长度与不同的机场代码相同,每个列表项由相应的机场代码命名。
Assign Functions
# store the p_value
p <- function (fit) {
if (class(fit) != "lm") stop("Not an object of class 'lm' ")
f <- summary(fit)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
# run regressions
regression_process <- function(df) {
#log-log
df <- data.frame(Year=df$Year, Region=df$Region, Airport=df$Airport_Code,
Passengers=log(df$Passengers, 10), GDP=log(df$GDP, 10))
#regression
fit <- lm(df$Passengers ~ df$GDP)
#store the coefficient
coefficient <-coefficients(fit)
elasticity <- coefficient["df$GDP"]
p_value <- p(fit)
#store the r_squared
r_squared <- summary(fit)$r.squared
#save regression output into data frame
data.frame(Region=df[1,2], Airport=df[1,3], Elasticity=elasticity,
P_Value=p_value, R_Squared=r_squared)
}
Run by() (creates a list of named dataframes for each Airport Code)
由()运行(为每个机场代码创建一个命名数据帧列表)
regression_list <- by(Elasticities_Study, Elasticities_Study$Airport_Code,
FUN=regression_process)
regression_list$ABE # FIRST REGRESSION DATAFRAME ELEMENT
regression_list$ABI # SECOND REGRESSION DATAFRAME ELEMENT
regression_list$ABJ # THIRD REGRESSION DATAFRAME ELEMENT
...
#1
0
Sorry I couldn't comment on the above answer (my answer belongs more of a comment but I don't really have permissions for that yet).
对不起,我无法评论上面的答案(我的答案更多的是评论,但我还没有真正的权限)。
You can use a variation of the loop Ben_its performed, but instead of using a numeric value for i, have the loop iterate by Airport code. Its messy code but its how my brain works
您可以使用已执行的循环Ben_its的变体,但不是使用i的数值,而是按机场代码迭代循环。它凌乱的代码,但它的大脑是如何工作的
First create a blank data.frame to store your new results solutions. Then create a vector with all of your airport codes. Set the loop to loop through that vector of airport codes you provide it.
首先创建一个空白data.frame来存储新的结果解决方案。然后使用所有机场代码创建一个向量。设置循环以循环显示您提供的机场代码向量。
Within the loop, the first thing it is doing is subsetting the data based on that first airport code. So in your list, it will take only the airports with Airport_Code == item #1 on your vector of codes. Then it will run your regression and you will tell it which values to take from the summary, etc. Once it has the values you want, it stores it into the vector solution new_data, which is then rbind-ed to the blank data.frame we created for the solutions. That last line is where it is simply creating a new results data.frame by taking the previous object of the results data.frame and adding the new row of the solutions you just got based on the subsetted data by airport code. Then, it iterates through the next airport code in your codes vector.
在循环中,它首先要做的是根据第一个机场代码对数据进行子集化。因此,在您的列表中,只需在您的代码矢量上使用Airport_Code == item#1的机场。然后它将运行您的回归,您将告诉它从摘要中获取哪些值等。一旦它具有您想要的值,它就将其存储到向量解决方案new_data中,然后将其转换为空白data.frame我们为解决方案创建了。最后一行是通过获取结果data.frame的前一个对象并根据机场代码基于子集化数据添加刚刚获得的解决方案的新行来创建新结果data.frame的地方。然后,它遍历代码向量中的下一个机场代码。
your_data = data.frame(source data)
## Create empty data.frame to store solutions. We will use the loop and rbind to constantly add a row of solutions with each iteration of the loop
results = data.frame()
codes = c(all of your airport codes)
for (i in codes){
## here you use the variable of the loop to subset your whole dataset based on that airport code
byairportcode = subset(your_data,your_data$Airport_Code == i)
## All operations listed in your post, creating a vector of your desired information/solutions for each line
new_data = something(parameters)
###combine this new data with your list of solutions by airport code
results = rbind(results,new_data)
}
You shouldn't need to eliminate any duplicates, because you have subsetted the data by airport code already, such that each entry in your results frame is each individual airport code's results of the regression.
您不需要消除任何重复项,因为您已经按机场代码对数据进行了子集化,因此结果框架中的每个条目都是每个机场代码的回归结果。
#2
0
Consider using by
to slice the large dataframe by every Airport Code and pass each as subsetted dataframe object into a called function like your regression processing.
考虑使用by按每个机场代码对大型数据帧进行切片,并将每个机场代码作为子集化数据帧对象传递给调用函数,如回归处理。
Final output will be a list of dataframes with equal length to distinct Airport Codes and each item of list named by corresponding airport code.
最终输出将是一个数据帧列表,其长度与不同的机场代码相同,每个列表项由相应的机场代码命名。
Assign Functions
# store the p_value
p <- function (fit) {
if (class(fit) != "lm") stop("Not an object of class 'lm' ")
f <- summary(fit)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
# run regressions
regression_process <- function(df) {
#log-log
df <- data.frame(Year=df$Year, Region=df$Region, Airport=df$Airport_Code,
Passengers=log(df$Passengers, 10), GDP=log(df$GDP, 10))
#regression
fit <- lm(df$Passengers ~ df$GDP)
#store the coefficient
coefficient <-coefficients(fit)
elasticity <- coefficient["df$GDP"]
p_value <- p(fit)
#store the r_squared
r_squared <- summary(fit)$r.squared
#save regression output into data frame
data.frame(Region=df[1,2], Airport=df[1,3], Elasticity=elasticity,
P_Value=p_value, R_Squared=r_squared)
}
Run by() (creates a list of named dataframes for each Airport Code)
由()运行(为每个机场代码创建一个命名数据帧列表)
regression_list <- by(Elasticities_Study, Elasticities_Study$Airport_Code,
FUN=regression_process)
regression_list$ABE # FIRST REGRESSION DATAFRAME ELEMENT
regression_list$ABI # SECOND REGRESSION DATAFRAME ELEMENT
regression_list$ABJ # THIRD REGRESSION DATAFRAME ELEMENT
...