1、日期数据生成
seq(as.Date("2015/12/14"),by="week", length.out=62) #按周增长
seq(as.Date("2015/12/14"),by="3 days", length.out=62) #按天增长
2、检查数据的函数
> str(tsdata_tmp)
'data.frame': 1116 obs.of 6 variables:
$ corname : chr "日本" "日本" "日本" "日本"...
$ cityname : chr "东京" "东京" "东京" "东京"...
$ date : chr "2015-12-21" "2015-12-28" "2016-01-04""2016-01-11" ...
$ weeknum : int 1 2 3 4 5 6 7 8 9 1 ...
$ ciiquantity: int 9386 8521 5224 7770 10610 12100 11413 1569510926 309 ...
$ y_stlf : num 8593 8312 6515 7452 7965 ...
> str(advancedbooking_tmp)
'data.frame': 1539 obs.of 5 variables:
$ cityid : int 228 228 228 228 228 228 228 228 228 228 ...
$ cityname : chr "东京" "东京" "东京" "东京"...
$ date : chr "2015/11/30" "2015/11/30" "2015/11/30""2015/11/30" ...
$ weeknum : int 3 1 2 5 6 4 8 7 6 3 ...
$ ciiquantity: int 0 0 0 0 0 0 0 0 0 0 ...
3、R建模常用的构造时间特征的函数
library(lubridate)
DataSet$quarter<- quarter(DataSet$date)
DataSet$month<- month(DataSet$date)
DataSet$week <- week(DataSet$date) #一年的第几周
isoweek('2017-01-01') #一年的第几周
?lubridate::week #查看帮助
4、Merging Data
Adding Columns
Tomerge two data frames (datasets) horizontally, use the merge function. In mostcases, you join two data frames by one or more common key variables(i.e., an inner join).
# merge two dataframes by ID
total <-merge(data frameA,data frameB,by="ID") #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
# merge two dataframes by ID and Country
total <-merge(data frameA,data frameB,by=c("ID","Country")) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Inner join: merge(df1, df2) will work for these examples because R automatically joins theframes by common variable names, but you would most likely want to specify merge(df1, df2,by="CustomerId") tomake sure that you were matching on only the fields you desired. You canalso use the by.x and by.y parameters if the matching variables have differentnames in the different data frames.
Outer join: merge(x = df1, y = df2, by ="CustomerId", all = TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Left outer: merge(x = df1, y = df2, by ="CustomerId", all.x=TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Right outer: merge(x = df1, y = df2, by ="CustomerId", all.y=TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Cross join: merge(x = df1, y = df2, by =NULL) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
4、数据框值更改
将数据框的某一列大于0的数,用同一行另一列的值替换,可以如下处理:
output_results[output_results$pred<0,][,'pred'] <-output_results[output_results$pred<0,][,'act_quantity'] #对负值进行处理