_transform()取2个位置参数,但给出3个

时间:2022-03-05 23:23:54

I try to build a pipeline with variable transformation And i do as below

我尝试构建一个具有变量转换的管道,如下所示

import numpy as np
import pandas as pd
import sklearn
from sklearn import linear_model
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline

Dataframe

Dataframe

df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]})

I try to get a new variable for predict

我试图得到一个新的变量来预测

class Complex():
    def __init__(self, X1, X2):
        self.a = X1
        self.b = X2
    def transform(self, X1, X2): 
        age = pd.DataFrame(self.a - self.b)
        return age
    def fit_transform(self, X1, X2):
        self.fit( X1, X2)
        return self.transform(X1, X2)

    def fit(self, X1, X2):
        return self

Then i make a pipeline

然后我做了一个管道

X = df[['a', 'b']]
y = df['y']
regressor = linear_model.SGDRegressor()
pipeline = Pipeline([
        ('transform', Complex(X['a'], X['b'])) ,
        ('model_fitting', regressor)
    ])
pipeline.fit(X, y)

and i get error

我得到错误

pred = pipeline.predict(X)
pred
TypeError                                 Traceback (most recent call last)
<ipython-input-555-7a07ccb0c38a> in <module>()
----> 1 pred = pipeline.predict(X)
      2 pred

C:\Program Files\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
     52 
     53         # lambda, but not partial, allows help() to work with update_wrapper
---> 54         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
     55         # update the docstring of the returned function
     56         update_wrapper(out, self.fn)

C:\Program Files\Anaconda3\lib\site-packages\sklearn\pipeline.py in predict(self, X)
    324         for name, transform in self.steps[:-1]:
    325             if transform is not None:
--> 326                 Xt = transform.transform(Xt)
    327         return self.steps[-1][-1].predict(Xt)
    328 

TypeError: transform() missing 1 required positional argument: 'X2'

what i do wrong? I see the mistake is in class Complex(). How to fix it?

我做错了什么吗?我发现错误在类Complex()中。如何修复它吗?

1 个解决方案

#1


2  

So the problem is that transform expects an argument of array of shape [n_samples, n_features]

所以问题是转换需要一个形状数组的参数[n_samples, n_features]

See the Examples section in the documentation of sklearn.pipeline.Pipeline, it uses sklearn.feature_selection.SelectKBest as a transform, and you can see its source that it expects X to be an array instead of separate variables like X1 and X2.

请参阅sklearn.pipeline文档中的示例部分。管道,它使用sklearn.feature_selection。SelectKBest作为一个转换,您可以看到它的源,它期望X是一个数组,而不是像X1和X2这样的独立变量。

In short, your code can be fixed like this:

简而言之,您的代码可以这样修复:


import pandas as pd
import sklearn
from sklearn import linear_model
from sklearn.pipeline import Pipeline

df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]})

class Complex():
    def transform(self, Xt):
        return pd.DataFrame(Xt['a'] - Xt['b'])

    def fit_transform(self, X1, X2):
        return self.transform(X1)

X = df[['a', 'b']]
y = df['y']
regressor = linear_model.SGDRegressor()
pipeline = Pipeline([
        ('transform', Complex()) ,
        ('model_fitting', regressor)
    ])
pipeline.fit(X, y)

pred = pipeline.predict(X)
print(pred)

#1


2  

So the problem is that transform expects an argument of array of shape [n_samples, n_features]

所以问题是转换需要一个形状数组的参数[n_samples, n_features]

See the Examples section in the documentation of sklearn.pipeline.Pipeline, it uses sklearn.feature_selection.SelectKBest as a transform, and you can see its source that it expects X to be an array instead of separate variables like X1 and X2.

请参阅sklearn.pipeline文档中的示例部分。管道,它使用sklearn.feature_selection。SelectKBest作为一个转换,您可以看到它的源,它期望X是一个数组,而不是像X1和X2这样的独立变量。

In short, your code can be fixed like this:

简而言之,您的代码可以这样修复:


import pandas as pd
import sklearn
from sklearn import linear_model
from sklearn.pipeline import Pipeline

df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]})

class Complex():
    def transform(self, Xt):
        return pd.DataFrame(Xt['a'] - Xt['b'])

    def fit_transform(self, X1, X2):
        return self.transform(X1)

X = df[['a', 'b']]
y = df['y']
regressor = linear_model.SGDRegressor()
pipeline = Pipeline([
        ('transform', Complex()) ,
        ('model_fitting', regressor)
    ])
pipeline.fit(X, y)

pred = pipeline.predict(X)
print(pred)