ValueError: x和y必须是相同的大小。

时间:2021-01-27 16:12:39
import numpy as np
import pandas as pd
import matplotlib.pyplot as pt

data1 = pd.read_csv('stage1_labels.csv')

X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_X = LabelEncoder()
X[:,0] = label_X.fit_transform(X[:,0])
encoder = OneHotEncoder(categorical_features = [0])
X = encoder.fit_transform(X).toarray()

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)

#fitting Simple Regression to training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#predecting the test set results
y_pred = regressor.predict(X_test)

#Visualization of the training set results
pt.scatter(X_train, y_train, color = 'red')
pt.plot(X_train, regressor.predict(X_train), color = 'green')
pt.title('salary vs yearExp (Training set)')
pt.xlabel('years of experience')
pt.ylabel('salary')
pt.show()

I need a help understanding the error in while executing the above code. Below is the error:

在执行上述代码时,我需要帮助理解错误。下面是错误:

"raise ValueError("x and y must be the same size")"

“增加ValueError(x和y必须是相同的大小)”

I have .csv file with 1398 rows and 2 column. I have taken 40% as y_test set, as it is visible in the above code.

我有。csv文件,有1398行和2列。我已经将40%作为y_test集,因为它在上面的代码中是可见的。

Please help

请帮助

Regards, Amitesh

问候,Amitesh

2 个解决方案

#1


4  

Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

打印X_train形状。你看到了什么?我打赌X_train是2d(矩阵为单列),而y_train 1d (vector)。反过来,你会得到不同的尺寸。

I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

我认为使用X_train[:,0]来绘图(这是错误产生的地方)应该可以解决这个问题。

#2


0  

Slicing with [:, :-1] will give you a 2-dimensional array (including all rows and all columns excluding the last column).

使用[::-1]切片将给您一个二维数组(包括所有行和除最后一列之外的所有列)。

Slicing with [:, 1] will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2] or [:, 1].reshape(-1, 1) or [:, 1][:, None] instead of [:, 1]. This will make x and y comparable.

使用[:,1]切片将给您一个一维数组(包括第二列中的所有行)。要使这个数组也具有二维的用法[:,1:2]或[:,1]。重塑(1)或(:,1](:,),而不是(:1)。这将使x和y具有可比性。


An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0] (instead of [:, :1]) for selecting the first column and [:, 1] for selecting the second column.

另一种方法是使两个数组都是二维的,这使得它们都是一维的。For this one would do[: 0](而不是[:1])用于选择第一列和[:1],用于选择第二列。

#1


4  

Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

打印X_train形状。你看到了什么?我打赌X_train是2d(矩阵为单列),而y_train 1d (vector)。反过来,你会得到不同的尺寸。

I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

我认为使用X_train[:,0]来绘图(这是错误产生的地方)应该可以解决这个问题。

#2


0  

Slicing with [:, :-1] will give you a 2-dimensional array (including all rows and all columns excluding the last column).

使用[::-1]切片将给您一个二维数组(包括所有行和除最后一列之外的所有列)。

Slicing with [:, 1] will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2] or [:, 1].reshape(-1, 1) or [:, 1][:, None] instead of [:, 1]. This will make x and y comparable.

使用[:,1]切片将给您一个一维数组(包括第二列中的所有行)。要使这个数组也具有二维的用法[:,1:2]或[:,1]。重塑(1)或(:,1](:,),而不是(:1)。这将使x和y具有可比性。


An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0] (instead of [:, :1]) for selecting the first column and [:, 1] for selecting the second column.

另一种方法是使两个数组都是二维的,这使得它们都是一维的。For this one would do[: 0](而不是[:1])用于选择第一列和[:1],用于选择第二列。