I been trying import data from yahoo finance via panda then convert it to arrays via .as_matrix(), then as i input the data into the classifer to train, it gives me an error.
我尝试通过熊猫从雅虎财经导入数据,然后通过.as_matrix()将其转换为数组,然后当我将数据输入到classifer中进行训练时,它会给我一个错误。
ValueError: Found array with dim 4. Estimator expected <= 2.
This below is my code:
以下是我的代码:
from sklearn import tree
import pandas as pd
import pandas_datareader.data as web
df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')
close_price = df[['Close']]
ma_50 = (pd.rolling_mean(close_price, window=50))
ma_100 = (pd.rolling_mean(close_price, window=100))
ma_200 = (pd.rolling_mean(close_price, window=200))
#adding buys and sell based on the values
df['B/S']= (df['Close'].diff() < 0).astype(int)
close_buy = df[['Close']+['B/S']]
closing = df[['Close']].as_matrix()
buy_sell = df[['B/S']]
close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')
close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
print(ma_100)
clf = tree.DecisionTreeClassifier()
x = [[close_buy,ma_50,ma_100,ma_200]]
y = [buy_sell]
clf.fit(x,y)
1 个解决方案
#1
1
I found a couple of bugs/things needing fixing.
我发现有几个bug需要修复。
- Missing parantheses
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
- buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
-
[[close_buy,ma_50,ma_100,ma_200]]
is what gives you your 4 dimensions. Instead, I'd usenp.concatenate
which takes a list of arrays and appends them to each other either length wise or width wise. the parameteraxis=1
specifies width wise. What this does is makex
an 822 x 28 matrix of 822 observations of 28 features. If this isn't what you were going for, then clearly I didn't hit the mark. But those dimensions line up with youry
. - [[close_buy,ma_50,ma_100,ma_200]]是给你的4维空间。相反,我使用np。连接,它接受数组的列表并将它们附加到每一个长度或宽度上。参数轴=1指定宽度方向。这使得x变成了一个822×28的矩阵,包含了28个特征的822个观测值。如果这不是你想要的,那么很明显我没有达到目标。但是这些维度和y是一致的。
Instead:
而不是:
from sklearn import tree
import pandas as pd
import pandas_datareader.data as web
df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')
close_price = df[['Close']]
ma_50 = (pd.rolling_mean(close_price, window=50))
ma_100 = (pd.rolling_mean(close_price, window=100))
ma_200 = (pd.rolling_mean(close_price, window=200))
#adding buys and sell based on the values
df['B/S']= (df['Close'].diff() < 0).astype(int)
close_buy = df[['Close']+['B/S']]
closing = df[['Close']].as_matrix()
buy_sell = df[['B/S']]
close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')
close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix() # Fixed
print(ma_100)
clf = tree.DecisionTreeClassifier()
x = np.concatenate([close_buy,ma_50,ma_100,ma_200], axis=1) # Fixed
y = buy_sell # Brackets not necessary... I don't think
clf.fit(x,y)
This ran for me:
这对我来说跑:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
random_state=None, splitter='best')
#1
1
I found a couple of bugs/things needing fixing.
我发现有几个bug需要修复。
- Missing parantheses
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
- buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
-
[[close_buy,ma_50,ma_100,ma_200]]
is what gives you your 4 dimensions. Instead, I'd usenp.concatenate
which takes a list of arrays and appends them to each other either length wise or width wise. the parameteraxis=1
specifies width wise. What this does is makex
an 822 x 28 matrix of 822 observations of 28 features. If this isn't what you were going for, then clearly I didn't hit the mark. But those dimensions line up with youry
. - [[close_buy,ma_50,ma_100,ma_200]]是给你的4维空间。相反,我使用np。连接,它接受数组的列表并将它们附加到每一个长度或宽度上。参数轴=1指定宽度方向。这使得x变成了一个822×28的矩阵,包含了28个特征的822个观测值。如果这不是你想要的,那么很明显我没有达到目标。但是这些维度和y是一致的。
Instead:
而不是:
from sklearn import tree
import pandas as pd
import pandas_datareader.data as web
df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')
close_price = df[['Close']]
ma_50 = (pd.rolling_mean(close_price, window=50))
ma_100 = (pd.rolling_mean(close_price, window=100))
ma_200 = (pd.rolling_mean(close_price, window=200))
#adding buys and sell based on the values
df['B/S']= (df['Close'].diff() < 0).astype(int)
close_buy = df[['Close']+['B/S']]
closing = df[['Close']].as_matrix()
buy_sell = df[['B/S']]
close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')
close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix() # Fixed
print(ma_100)
clf = tree.DecisionTreeClassifier()
x = np.concatenate([close_buy,ma_50,ma_100,ma_200], axis=1) # Fixed
y = buy_sell # Brackets not necessary... I don't think
clf.fit(x,y)
This ran for me:
这对我来说跑:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
random_state=None, splitter='best')