数据科学 parameter-estimation

1

为什么xgboost比sklearn GradientBoostingClassifier快得多？

我正在尝试通过50个具有100个数字特征的示例训练一个梯度提升模型。XGBClassifier我的机器43秒内把手500棵树，而GradientBoostingClassifier只处理10棵（！）以1分2秒:(我没有理会试图种植500棵树，因为它会需要几个小时。我使用的是相同的learning_rate，并max_depth设置，见下文。是什么使XGBoost如此之快？它是否使用了sklearn家伙不知道的用于梯度增强的新颖实现方式？还是“偷工减料”并种植浅树？ ps我知道这个讨论：https : //www.kaggle.com/c/higgs-boson/forums/t/10335/xgboost-post-competition-survey，但是那里找不到答案... XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.05, max_delta_step=0, max_depth=10, min_child_weight=1, missing=None, n_estimators=500, nthread=-1, objective='binary:logistic', reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True, subsample=1) GradientBoostingClassifier(init=None, learning_rate=0.05, loss='deviance', max_depth=10, max_features=None, max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, presort='auto', random_state=None, subsample=1.0, verbose=0, warm_start=False)

29 scikit-learn xgboost gbm data-mining classification data-cleaning machine-learning reinforcement-learning data-mining bigdata dataset nlp language-model stanford-nlp machine-learning neural-network deep-learning randomized-algorithms machine-learning beginner career xgboost loss-function neural-network software-recommendation naive-bayes-classifier classification scikit-learn feature-selection r random-forest cross-validation data-mining python scikit-learn random-forest churn python clustering k-means machine-learning nlp sentiment-analysis machine-learning programming python scikit-learn nltk gensim visualization data csv neural-network deep-learning descriptive-statistics machine-learning supervised-learning text-mining orange data parameter-estimation python pandas scraping r clustering k-means unsupervised-learning

2

旋转角度的参数化回归

假设我有一个自上而下的箭头图片，并且我想预测该箭头所成的角度。这将在到度之间，或者在到。问题在于该目标是圆形的，度和度是完全相同的，这是我希望在目标中纳入的不变性，这将有助于显着地推广（这是我的假设）。问题是我没有找到解决这个问题的干净方法，是否有任何论文试图解决这个问题（或类似的问题）？对于它们的潜在缺点，我确实有一些想法：0003603603600002π2π2\pi000360360360 使用S形或tanh激活，将其缩放到（范围，并将圆形属性合并到损失函数中。我认为这将相当困难，因为如果它在边界上（最差的预测），则只有很小的噪音会推动砝码向另一方向移动。而且，更接近于和边界的值将更难达到，因为绝对预激活值将需要接近无穷大。0,2π)0,2π)0, 2\pi)0002π2π2\pi 回归到和这两个值，并根据这两个值所成的角度计算损耗。我认为这有更大的潜力，但此向量的范数不受限制，这可能会导致数值不稳定，并可能导致训练过程中爆炸或趋于零。可以通过使用一些怪异的正则化函数来防止此规范离1太远，从而解决此问题。xxxyyy 其他选项可能会对正弦和余弦函数有所帮助，但我感觉到这样的事实，即多个预激活映射到相同的输出也会使优化和泛化变得非常困难。

15 neural-network deep-learning loss-function parameter-estimation

4

哪个第一：算法基准测试，特征选择，参数调整？

当尝试进行分类时，我目前的方法是首先尝试各种算法并对它们进行基准测试根据上述1中的最佳算法执行特征选择使用所选功能和算法调整参数但是，如果其他算法已使用最佳参数/最适合的功能进行了优化，则我通常无法使自己相信，可能有比所选算法更好的算法。同时，对所有算法*参数*功能进行搜索非常耗时。关于正确的方法/顺序有什么建议吗？

11 feature-selection parameter-estimation

Questions tagged «parameter-estimation»