1
为什么xgboost比sklearn GradientBoostingClassifier快得多?
我正在尝试通过50个具有100个数字特征的示例训练一个梯度提升模型。XGBClassifier我的机器43秒内把手500棵树,而GradientBoostingClassifier只处理10棵(!)以1分2秒:(我没有理会试图种植500棵树,因为它会需要几个小时。我使用的是相同的learning_rate,并max_depth设置, 见下文。 是什么使XGBoost如此之快?它是否使用了sklearn家伙不知道的用于梯度增强的新颖实现方式?还是“偷工减料”并种植浅树? ps我知道这个讨论:https : //www.kaggle.com/c/higgs-boson/forums/t/10335/xgboost-post-competition-survey,但是那里找不到答案... XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.05, max_delta_step=0, max_depth=10, min_child_weight=1, missing=None, n_estimators=500, nthread=-1, objective='binary:logistic', reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True, subsample=1) GradientBoostingClassifier(init=None, learning_rate=0.05, loss='deviance', max_depth=10, max_features=None, max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, presort='auto', random_state=None, subsample=1.0, verbose=0, warm_start=False)
29
scikit-learn
xgboost
gbm
data-mining
classification
data-cleaning
machine-learning
reinforcement-learning
data-mining
bigdata
dataset
nlp
language-model
stanford-nlp
machine-learning
neural-network
deep-learning
randomized-algorithms
machine-learning
beginner
career
xgboost
loss-function
neural-network
software-recommendation
naive-bayes-classifier
classification
scikit-learn
feature-selection
r
random-forest
cross-validation
data-mining
python
scikit-learn
random-forest
churn
python
clustering
k-means
machine-learning
nlp
sentiment-analysis
machine-learning
programming
python
scikit-learn
nltk
gensim
visualization
data
csv
neural-network
deep-learning
descriptive-statistics
machine-learning
supervised-learning
text-mining
orange
data
parameter-estimation
python
pandas
scraping
r
clustering
k-means
unsupervised-learning