data2 = pd.DataFrame(data1['kwh'])
data2
kwh
date
2012-04-12 14:56:50 1.256400
2012-04-12 15:11:55 1.430750
2012-04-12 15:27:01 1.369910
2012-04-12 15:42:06 1.359350
2012-04-12 15:57:10 1.305680
2012-04-12 16:12:10 1.287750
2012-04-12 16:27:14 1.245970
2012-04-12 16:42:19 1.282280
2012-04-12 16:57:24 1.365710
2012-04-12 17:12:28 1.320130
2012-04-12 17:27:33 1.354890
2012-04-12 17:42:37 1.343680
2012-04-12 17:57:41 1.314220
2012-04-12 18:12:44 1.311970
2012-04-12 18:27:46 1.338980
2012-04-12 18:42:51 1.357370
2012-04-12 18:57:54 1.328700
2012-04-12 19:12:58 1.308200
2012-04-12 19:28:01 1.341770
2012-04-12 19:43:04 1.278350
2012-04-12 19:58:07 1.253170
2012-04-12 20:13:10 1.420670
2012-04-12 20:28:15 1.292740
2012-04-12 20:43:15 1.322840
2012-04-12 20:58:18 1.247410
2012-04-12 21:13:20 0.568352
2012-04-12 21:28:22 0.317865
2012-04-12 21:43:24 0.233603
2012-04-12 21:58:27 0.229524
2012-04-12 22:13:29 0.236929
2012-04-12 22:28:34 0.233806
2012-04-12 22:43:38 0.235618
2012-04-12 22:58:43 0.229858
2012-04-12 23:13:43 0.235132
2012-04-12 23:28:46 0.231863
2012-04-12 23:43:55 0.237794
2012-04-12 23:59:00 0.229634
2012-04-13 00:14:02 0.234484
2012-04-13 00:29:05 0.234189
2012-04-13 00:44:09 0.237213
2012-04-13 00:59:09 0.230483
2012-04-13 01:14:10 0.234982
2012-04-13 01:29:11 0.237121
2012-04-13 01:44:16 0.230910
2012-04-13 01:59:22 0.238406
2012-04-13 02:14:21 0.250530
2012-04-13 02:29:24 0.283575
2012-04-13 02:44:24 0.302299
2012-04-13 02:59:25 0.322093
2012-04-13 03:14:30 0.327600
2012-04-13 03:29:31 0.324368
2012-04-13 03:44:31 0.301869
2012-04-13 03:59:42 0.322019
2012-04-13 04:14:43 0.325328
2012-04-13 04:29:43 0.306727
2012-04-13 04:44:46 0.299012
2012-04-13 04:59:47 0.303288
2012-04-13 05:14:48 0.326205
2012-04-13 05:29:49 0.344230
2012-04-13 05:44:50 0.353484
...
65701 rows × 1 columns
I have this dataframe with this index and 1 column.I want to do simple prediction using linear regression with sklearn.I'm very confused and I don't know how to set X and y(I want the x values to be the time and y values kwh...).I'm new to Python so every help is valuable.Thank you.
我有这个索引和1列的数据框。我想使用sklearn的线性回归进行简单的预测。我很困惑,我不知道如何设置X和y(我希望x值是时间和y值kwh ...)。我是Python的新手,所以每一个帮助都是有价值的。谢谢。
2 个解决方案
#1
14
The first thing you have to do is split your data into two arrays, X and y. Each element of X will be a date, and the corresponding element of y will be the associated kwh.
您要做的第一件事是将数据拆分为两个数组,X和y。 X的每个元素都是一个日期,y的相应元素将是相关的kwh。
Once you have that, you will want to use sklearn.linear_model.LinearRegression to do the regression. The documentation is here.
完成后,您将需要使用sklearn.linear_model.LinearRegression进行回归。文档在这里。
As for every sklearn model, there is two step. First you must fit your data. Then, put the dates of which you want to predict the kwh in another array, X_predict, and predict the kwh using the predict method.
至于每个sklearn模型,有两个步骤。首先,您必须适合您的数据。然后,将要预测kwh的日期放在另一个数组X_predict中,并使用predict方法预测kwh。
from sklearn.linear_model import LinearRegression
X = [] # put your dates in here
y = [] # put your kwh in here
model = LinearRegression()
model.fit(X, y)
X_predict = [] # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)
#2
0
Predict() function takes 2 dimensional array as arguments. So, If u want to predict the value for simple linear regression, then you have to issue the prediction value within 2 dimentional array like,
Predict()函数将2维数组作为参数。那么,如果你想预测简单线性回归的值,那么你必须在2维数组内发出预测值,比如
model.predict([[2012-04-13 05:55:30]]);
If it is a multiple linear regression then,
如果是多元线性回归那么,
model.predict([[2012-04-13 05:44:50,0.327433]])
#1
14
The first thing you have to do is split your data into two arrays, X and y. Each element of X will be a date, and the corresponding element of y will be the associated kwh.
您要做的第一件事是将数据拆分为两个数组,X和y。 X的每个元素都是一个日期,y的相应元素将是相关的kwh。
Once you have that, you will want to use sklearn.linear_model.LinearRegression to do the regression. The documentation is here.
完成后,您将需要使用sklearn.linear_model.LinearRegression进行回归。文档在这里。
As for every sklearn model, there is two step. First you must fit your data. Then, put the dates of which you want to predict the kwh in another array, X_predict, and predict the kwh using the predict method.
至于每个sklearn模型,有两个步骤。首先,您必须适合您的数据。然后,将要预测kwh的日期放在另一个数组X_predict中,并使用predict方法预测kwh。
from sklearn.linear_model import LinearRegression
X = [] # put your dates in here
y = [] # put your kwh in here
model = LinearRegression()
model.fit(X, y)
X_predict = [] # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)
#2
0
Predict() function takes 2 dimensional array as arguments. So, If u want to predict the value for simple linear regression, then you have to issue the prediction value within 2 dimentional array like,
Predict()函数将2维数组作为参数。那么,如果你想预测简单线性回归的值,那么你必须在2维数组内发出预测值,比如
model.predict([[2012-04-13 05:55:30]]);
If it is a multiple linear regression then,
如果是多元线性回归那么,
model.predict([[2012-04-13 05:44:50,0.327433]])