数据科学 xgboost

1

是否有关于我应使用的LSTM电池的最小，最大和“合理”数量的经验法则（或实际规则）？具体来说，我与TensorFlow和property 有关的BasicLSTMCell有关num_units。请假设我有以下定义的分类问题： t - number of time steps n - length of input vector in each time step m - length of output vector (number of classes) i - number of training examples 例如，训练示例的数量应该大于： 4*((n+1)*m + m*m)*c c单元数在哪里？我基于此：如何计算LSTM网络的参数数量？据我了解，这应该给出参数的总数，该总数应少于训练示例的数量。

12 rnn machine-learning r predictive-modeling random-forest python language-model sentiment-analysis encoding machine-learning deep-learning neural-network dataset caffe classification xgboost multiclass-classification unbalanced-classes time-series descriptive-statistics python r clustering machine-learning python deep-learning tensorflow machine-learning python predictive-modeling probability scikit-learn svm machine-learning python classification gradient-descent regression research python neural-network deep-learning convnet keras python tensorflow machine-learning deep-learning tensorflow python r bigdata visualization rstudio pandas pyspark dataset time-series multilabel-classification machine-learning neural-network ensemble-modeling kaggle machine-learning linear-regression cnn convnet machine-learning tensorflow association-rules machine-learning predictive-modeling training model-selection neural-network keras deep-learning deep-learning convnet image-classification predictive-modeling prediction machine-learning python classification predictive-modeling scikit-learn machine-learning python random-forest sampling training recommender-system books python neural-network nlp deep-learning tensorflow python matlab information-retrieval search search-engine deep-learning convnet keras machine-learning python cross-validation sampling machine-learning

3

是否有适用于python的好的即用型语言模型？

我正在为一个应用程序制作原型，我需要一个语言模型来计算一些生成的句子的困惑度。我可以随时使用经过训练的python语言模型吗？简单的东西 model = LanguageModel('en') p1 = model.perplexity('This is a well constructed sentence') p2 = model.perplexity('Bunny lamp robert junior pancake') assert p1 < p2 我看过一些框架，但找不到我想要的。我知道我可以使用类似： from nltk.model.ngram import NgramModel lm = NgramModel(3, brown.words(categories='news')) 这在Brown Corpus上使用了很好的图林概率分布，但是我正在一些大型数据集（例如1b单词数据集）上寻找精心设计的模型。我可以真正相信一般领域的结果（不仅是新闻）

11 python nlp language-model r statistics linear-regression machine-learning classification random-forest xgboost python sampling data-mining orange predictive-modeling recommender-system statistics dimensionality-reduction pca machine-learning python deep-learning keras reinforcement-learning neural-network image-classification r dplyr deep-learning keras tensorflow lstm dropout machine-learning sampling categorical-data data-imputation machine-learning deep-learning machine-learning-model dropout deep-network pandas data-cleaning data-science-model aggregation python neural-network reinforcement-learning policy-gradients r dataframe dataset statistics prediction forecasting r k-means python scikit-learn labels python orange cloud-computing machine-learning neural-network deep-learning rnn recurrent-neural-net logistic-regression missing-data deep-learning autoencoder apache-hadoop time-series data preprocessing classification predictive-modeling time-series machine-learning python feature-selection autoencoder deep-learning keras tensorflow lstm word-embeddings predictive-modeling prediction machine-learning-model machine-learning classification binary theory machine-learning neural-network time-series lstm rnn neural-network deep-learning keras tensorflow convnet computer-vision

3

XGboost-由模型选择

我正在使用XGboost预测保险索赔的2类目标变量。我有一个在另一个数据集上运行的模型（交叉验证训练，超参数调整等...）。我的问题是：有没有办法知道为什么一个给定的要求会受到一个类别的影响，即解释模型选择的特征？目的是能够向第三方人员证明机器所做的选择是合理的。感谢您的回答。

10 xgboost

1

梯度提升树：“变量越大越好”？

从XGBoost 的教程中，我认为当每棵树长大时，将扫描所有变量以选择拆分节点，然后选择具有最大增益拆分的变量。所以我的问题是，如果我将一些噪声变量添加到数据集中，这些噪声变量会影响变量选择（对于每棵树生长）吗？我的逻辑是，由于这些噪声变量根本不会给出最大的增益分配，因此将永远不会选择它们，因此它们不会影响树的生长。如果答案是肯定的，那么“ XGBoost变量越多越好”是真的吗？我们不考虑培训时间。同样，如果答案是肯定的，那么“我们不需要从模型中滤除非重要变量”是否成立。谢谢！

10 xgboost self-study

1

XGBoost线性回归输出不正确

我是XGBoost的新手，请原谅我的无知。这是python代码： import pandas as pd import xgboost as xgb df = pd.DataFrame({'x':[1,2,3], 'y':[10,20,30]}) X_train = df.drop('y',axis=1) Y_train = df['y'] T_train_xgb = xgb.DMatrix(X_train, Y_train) params = {"objective": "reg:linear"} gbm = xgb.train(dtrain=T_train_xgb,params=params) Y_pred = gbm.predict(xgb.DMatrix(pd.DataFrame({'x':[4,5]}))) print Y_pred 输出为： [ 24.126194 24.126194] 如您所见，输入数据只是一条直线。所以我期望的输出是[40,50]。我在这里做错了什么？

10 python linear-regression xgboost

1

梯度提升库的分布式意味着什么？

我正在查看XGBoost文档，并指出XGBoost是一个优化的分布式梯度提升库。什么是分布式？祝你今天愉快

9 xgboost distributed boosting

Questions tagged «xgboost»