实际上,scikit-learn
确实提供了这样的功能,尽管实现起来可能有些棘手。这是建立在三个模型之上的平均回归器的完整工作示例。首先,让我们导入所有必需的软件包:
from sklearn.base import TransformerMixin
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
然后,我们需要将三个回归模型转换为变压器。这将允许我们使用以下命令将其谓词合并为一个特征向量FeatureUnion
:
class RidgeTransformer(Ridge, TransformerMixin):
def transform(self, X, *_):
return self.predict(X)
class RandomForestTransformer(RandomForestRegressor, TransformerMixin):
def transform(self, X, *_):
return self.predict(X)
class KNeighborsTransformer(KNeighborsRegressor, TransformerMixin):
def transform(self, X, *_):
return self.predict(X)
现在,让我们为弗兰肯斯坦模型定义一个构建器函数:
def build_model():
ridge_transformer = Pipeline(steps=[
('scaler', StandardScaler()),
('poly_feats', PolynomialFeatures()),
('ridge', RidgeTransformer())
])
pred_union = FeatureUnion(
transformer_list=[
('ridge', ridge_transformer),
('rand_forest', RandomForestTransformer()),
('knn', KNeighborsTransformer())
],
n_jobs=2
)
model = Pipeline(steps=[
('pred_union', pred_union),
('lin_regr', LinearRegression())
])
return model
最后,让我们拟合模型:
print('Build and fit a model...')
model = build_model()
X, y = make_regression(n_features=10, n_targets=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('Done. Score:', score)
输出:
Build and fit a model...
Done. Score: 0.9600413867438636
为什么要以这种方式使事情复杂化?好的,这种方法允许我们使用scikit-learn
诸如GridSearchCV
或的标准模块来优化模型超参数RandomizedSearchCV
。而且,现在可以轻松地从磁盘上保存和加载预训练的模型。