SelectKBest如何工作？

15

我正在看本教程：https : //www.dataquest.io/mission/75/improving-your-submission

在第8节中，找到最佳功能，它显示了以下代码。

import numpy as np
from sklearn.feature_selection import SelectKBest, f_classif

predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked", "FamilySize", "Title", "FamilyId"]

# Perform feature selection
selector = SelectKBest(f_classif, k=5)
selector.fit(titanic[predictors], titanic["Survived"])

# Get the raw p-values for each feature, and transform from p-values into scores
scores = -np.log10(selector.pvalues_)

# Plot the scores.  See how "Pclass", "Sex", "Title", and "Fare" are the best?
plt.bar(range(len(predictors)), scores)
plt.xticks(range(len(predictors)), predictors, rotation='vertical')
plt.show()

k = 5在做什么，因为它从未被使用过（无论我使用k = 1还是k =“ all”，该图仍列出了所有功能）？它如何确定最佳功能，它们是否独立于人们想要使用的方法（逻辑回归，随机森林或其他）？

python scikit-learn

— 用户
source

根据k个最高分数选择功能。

— 斯里尼

11

SelectKBest类仅使用函数（在本例中为f_classif，但可以是其他函数）对要素进行评分，然后“删除除k个最高评分要素之外的所有要素”。http://scikit-learn.org/stable/modules/generation/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest

因此，它是一种包装器，这里重要的是用于对功能评分的功能。

有关sklearn中的其他功能选择技术，请阅读：http ://scikit-learn.org/stable/modules/feature_selection.html

是的，f_classif和chi2与您使用的预测方法无关。

— 伽利略
source

2

如果使用selector.fit_transform（），则k参数很重要，它会返回一个新数组，该数组的特征集已减少到最佳'k'。

— 克里斯·汤普森
source