数据科学 software-recommendation

5

我已经建立了模型。现在，我想为我的研究论文绘制网络架构图。示例如下所示：

77 machine-learning neural-network deep-learning svm software-recommendation

1

为什么xgboost比sklearn GradientBoostingClassifier快得多？

我正在尝试通过50个具有100个数字特征的示例训练一个梯度提升模型。XGBClassifier我的机器43秒内把手500棵树，而GradientBoostingClassifier只处理10棵（！）以1分2秒:(我没有理会试图种植500棵树，因为它会需要几个小时。我使用的是相同的learning_rate，并max_depth设置，见下文。是什么使XGBoost如此之快？它是否使用了sklearn家伙不知道的用于梯度增强的新颖实现方式？还是“偷工减料”并种植浅树？ ps我知道这个讨论：https : //www.kaggle.com/c/higgs-boson/forums/t/10335/xgboost-post-competition-survey，但是那里找不到答案... XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.05, max_delta_step=0, max_depth=10, min_child_weight=1, missing=None, n_estimators=500, nthread=-1, objective='binary:logistic', reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True, subsample=1) GradientBoostingClassifier(init=None, learning_rate=0.05, loss='deviance', max_depth=10, max_features=None, max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, presort='auto', random_state=None, subsample=1.0, verbose=0, warm_start=False)

29 scikit-learn xgboost gbm data-mining classification data-cleaning machine-learning reinforcement-learning data-mining bigdata dataset nlp language-model stanford-nlp machine-learning neural-network deep-learning randomized-algorithms machine-learning beginner career xgboost loss-function neural-network software-recommendation naive-bayes-classifier classification scikit-learn feature-selection r random-forest cross-validation data-mining python scikit-learn random-forest churn python clustering k-means machine-learning nlp sentiment-analysis machine-learning programming python scikit-learn nltk gensim visualization data csv neural-network deep-learning descriptive-statistics machine-learning supervised-learning text-mining orange data parameter-estimation python pandas scraping r clustering k-means unsupervised-learning

6

在团队中共享Jupyter笔记本

我想设置一个服务器，该服务器可以通过以下方式支持数据科学团队：作为存储，版本控制，共享以及可能执行Jupyter笔记本的中心点。一些所需的属性：不同的用户可以访问服务器并打开和执行由他们或其他团队成员存储的笔记本。这里有趣的问题是，会是什么行为，如果用户X执行细胞在笔记本上创作的用户Y.我猜笔记本应该不被改变：解决方案应该是自托管的。笔记本应存储在服务器或Google驱动器上，或存储在owncloud的自托管实例中。（奖金）笔记本将受到git版本控制（git可以是自托管的。不能绑定到GitHub或类似的东西）。我调查了JupyterHub和Binder。对于前者，我不了解如何允许跨用户访问。后者似乎仅支持GitHub作为笔记本的存储。您是否有使用任何一种解决方案的经验？

22 software-recommendation

4

用于分段回归的Python库（又名分段回归）

我正在寻找可以执行分段回归（也称为分段回归）的Python库。范例：

16 python linear-regression library software-recommendation

7

可以计算混淆矩阵以进行多标签分类的Python库

我正在寻找可以为多标签分类计算混淆矩阵的Python库。仅供参考： scikit-learn 不支持混淆矩阵的多标签） Multiclass和Multilabel问题有什么区别

9 python software-recommendation multilabel-classification

2

Python中的多元线性回归

我正在寻找实现多元线性回归的Python包。（术语注释：多元回归处理的情况是一个以上因变量，而多元回归处理的情况是一个因变量但一个以上自变量。）

9 python regression library software-recommendation

Questions tagged «software-recommendation»