数据科学 time-series

1

是否有关于我应使用的LSTM电池的最小，最大和“合理”数量的经验法则（或实际规则）？具体来说，我与TensorFlow和property 有关的BasicLSTMCell有关num_units。请假设我有以下定义的分类问题： t - number of time steps n - length of input vector in each time step m - length of output vector (number of classes) i - number of training examples 例如，训练示例的数量应该大于： 4*((n+1)*m + m*m)*c c单元数在哪里？我基于此：如何计算LSTM网络的参数数量？据我了解，这应该给出参数的总数，该总数应少于训练示例的数量。

12 rnn machine-learning r predictive-modeling random-forest python language-model sentiment-analysis encoding machine-learning deep-learning neural-network dataset caffe classification xgboost multiclass-classification unbalanced-classes time-series descriptive-statistics python r clustering machine-learning python deep-learning tensorflow machine-learning python predictive-modeling probability scikit-learn svm machine-learning python classification gradient-descent regression research python neural-network deep-learning convnet keras python tensorflow machine-learning deep-learning tensorflow python r bigdata visualization rstudio pandas pyspark dataset time-series multilabel-classification machine-learning neural-network ensemble-modeling kaggle machine-learning linear-regression cnn convnet machine-learning tensorflow association-rules machine-learning predictive-modeling training model-selection neural-network keras deep-learning deep-learning convnet image-classification predictive-modeling prediction machine-learning python classification predictive-modeling scikit-learn machine-learning python random-forest sampling training recommender-system books python neural-network nlp deep-learning tensorflow python matlab information-retrieval search search-engine deep-learning convnet keras machine-learning python cross-validation sampling machine-learning

3

是否有适用于python的好的即用型语言模型？

我正在为一个应用程序制作原型，我需要一个语言模型来计算一些生成的句子的困惑度。我可以随时使用经过训练的python语言模型吗？简单的东西 model = LanguageModel('en') p1 = model.perplexity('This is a well constructed sentence') p2 = model.perplexity('Bunny lamp robert junior pancake') assert p1 < p2 我看过一些框架，但找不到我想要的。我知道我可以使用类似： from nltk.model.ngram import NgramModel lm = NgramModel(3, brown.words(categories='news')) 这在Brown Corpus上使用了很好的图林概率分布，但是我正在一些大型数据集（例如1b单词数据集）上寻找精心设计的模型。我可以真正相信一般领域的结果（不仅是新闻）

11 python nlp language-model r statistics linear-regression machine-learning classification random-forest xgboost python sampling data-mining orange predictive-modeling recommender-system statistics dimensionality-reduction pca machine-learning python deep-learning keras reinforcement-learning neural-network image-classification r dplyr deep-learning keras tensorflow lstm dropout machine-learning sampling categorical-data data-imputation machine-learning deep-learning machine-learning-model dropout deep-network pandas data-cleaning data-science-model aggregation python neural-network reinforcement-learning policy-gradients r dataframe dataset statistics prediction forecasting r k-means python scikit-learn labels python orange cloud-computing machine-learning neural-network deep-learning rnn recurrent-neural-net logistic-regression missing-data deep-learning autoencoder apache-hadoop time-series data preprocessing classification predictive-modeling time-series machine-learning python feature-selection autoencoder deep-learning keras tensorflow lstm word-embeddings predictive-modeling prediction machine-learning-model machine-learning classification binary theory machine-learning neural-network time-series lstm rnn neural-network deep-learning keras tensorflow convnet computer-vision

5

如何合并每月，每日和每周数据？

Google趋势返回每周数据，因此我必须找到一种将它们与我的每日/每月数据合并的方法。到目前为止，我所做的就是将每个系列分解为每日数据，例如：从： 2013-03-03-2013-03-09 37 至： 2013-03-03 37 2013-03-04 37 2013-03-05 37 2013-03-06 37 2013-03-07 37 2013-03-08 37 2013-03-09 37 但这给我的问题增加了很多复杂性。我试图根据最近6个月的值或每月数据中的6个值来预测Google搜索。每日数据意味着需要对180个过去的值进行处理。（我有10年的数据，因此每月数据为120点/每周数据为500 + /每日数据为3500+）另一种方法是将每日数据“合并”到每周/每月数据中。但是这个过程引起了一些问题。可以对某些数据求平均，因为它们的总和代表某些东西。例如，降雨，一周中的降雨量将等于构成一周的每一天的降雨量之和。就我而言，我正在处理价格，财务汇率等问题。对于价格，在我的领域中通常会考虑交易量，因此每周数据将是加权平均值。对于财务费率来说，它要复杂一些，其中涉及一些公式来从每日费率中建立每周费率。对于其他事情，我不知道基础属性。我认为这些属性对于避免无意义的指标非常重要（例如，平均财务利率是无稽之谈）。所以三个问题：对于已知和未知的属性，我应该如何处理从每日到每周/每月的数据？我觉得像每周一样将每周/每月数据分解为每日数据有点不对劲，因为我介绍的是现实生活中没有意义的数量。所以几乎是相同的问题：对于已知和未知的属性，我应该如何从每周/每月变为每日数据？最后但并非最不重要的一点：给定两个具有不同时间步长的时间序列时，哪个更好：使用最低或最大时间步长？我认为这是数据数量和模型复杂性之间的折衷，但是我看不出有任何强有力的论据可供选择。编辑：如果您知道一个工具（在R Python甚至Excel中）很容易做到，将不胜感激。

11 time-series

4

特征提取技术-汇总数据序列

我经常在建立一个模型（分类或回归）时，在该模型中有一些序列预测变量，我一直在寻找技术建议，以便以最佳方式将其总结为预测变量。举一个具体的例子，假设正在建立一个模型来预测客户是否会在未来90天内离开公司（t和t + 90之间的任何时间；因此是二进制结果）。可用的预测因素之一是时段t_0至t-1期间客户的财务余额水平。也许这代表了前12个月的每月观测值（即12次测量值）。我正在寻找构建本系列文章功能的方法。我使用每个客户系列的描述，例如均值，高，低，标准差，拟合OLS回归来得出趋势。是他们计算特征的其他方法吗？其他衡量变化或波动的方法吗？加：就像在下面的回复中提到的那样，我也考虑过（但忘记在此处添加）动态时间规整（DTW），然后在所得的距离矩阵上进行分层聚类-创建一些聚类，然后使用聚类成员身份作为功能。评分测试数据可能必须遵循对新案例和聚类质心进行DTW的过程-将新数据系列与其最接近的质心进行匹配...

11 machine-learning feature-selection time-series

3

最佳科学计算语言[关闭]

已关闭。这个问题需要更加集中。它当前不接受答案。想改善这个问题吗？更新问题，使其仅通过编辑此帖子来关注一个问题。 5年前关闭。似乎大多数语言都具有一定数量的科学计算库。 Python有 Scipy Rust 有 SciRust C++有几个包括ViennaCL和Armadillo Java具有Java Numerics和Colt其他几个且不说像语言R和Julia明确的科学计算而设计。有这么多种选择，您如何选择适合任务的最佳语言？另外，哪种语言的性能最高？Python并且R似乎在该领域具有最大的吸引力，但从逻辑上讲，编译语言似乎是一个更好的选择。会有什么表现胜过Fortran？此外编译语言往往有GPU加速，而解释性语言如R并Python没有。选择一种语言时应该考虑什么？哪些语言可以在效用和性能之间取得最佳平衡？还有我错过的具有重要科学计算资源的语言吗？

10 efficiency statistics tools knowledge-base machine-learning neural-network deep-learning optimization hyperparameter machine-learning time-series categorical-data logistic-regression python visualization bigdata efficiency classification binary svm random-forest logistic-regression data-mining sql experiments bigdata efficiency performance scalability distributed bigdata nlp statistics education knowledge-base definitions machine-learning recommender-system evaluation efficiency algorithms parameter efficiency scalability sql statistics visualization knowledge-base education machine-learning r python r text-mining sentiment-analysis machine-learning machine-learning python neural-network statistics reference-request machine-learning data-mining python classification data-mining bigdata usecase apache-hadoop map-reduce aws education feature-selection machine-learning machine-learning sports data-formats hierarchical-data-format bigdata apache-hadoop bigdata apache-hadoop python visualization knowledge-base classification confusion-matrix accuracy bigdata apache-hadoop bigdata efficiency apache-hadoop distributed machine-translation nlp metadata data-cleaning text-mining python pandas machine-learning python pandas scikit-learn bigdata machine-learning databases clustering data-mining recommender-system

2

尝试使用TensorFlow预测财务时间序列数据

我是ML和TensorFlow的新手（大约几个小时前开始），我正尝试使用它来预测时间序列中的下几个数据点。我正在接受输入，并使用它来执行此操作： /----------- x ------------\ .-------------------------------. | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | '-------------------------------' \----------- y ------------/ 我以为我在做的是将x用作输入数据，将y用作该输入的期望输出，因此，给定0-6时，我可以得到1-7（尤其是7）。但是，当我使用x作为输入运行图时，得到的预测看起来更像x而不是y。这是代码（基于本文和本文）： import tensorflow as tf import numpy as np import matplotlib.pyplot as plot import pandas as pd import csv def load_data_points(filename): print("Opening CSV …

10 machine-learning python time-series tensorflow rnn

1

具有一维时间序列的Keras LSTM

我正在学习如何使用Keras，并使用Chollet的Python深度学习中的示例在标记数据集上取得了合理的成功。数据集是〜1000个时间序列，长度为3125，具有3个潜在类别。我想超越基本的Dense层，该层为我提供了约70％的预测率，并且本书继续讨论LSTM和RNN层。所有示例似乎都为每个时间序列使用了具有多个功能的数据集，因此我正在努力研究如何实现数据。例如，如果我有1000x3125时间序列，如何将其输入到SimpleRNN或LSTM层中？我是否缺少有关这些层功能的一些基本知识？当前代码： import pandas as pd import numpy as np import os from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM, Dropout, SimpleRNN, Embedding, Reshape from keras.utils import to_categorical from keras import regularizers from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt def readData(): # …

10 python deep-learning time-series lstm rnn

1

强化学习可以应用于时间序列预测吗？

9 time-series reinforcement-learning forecasting

4

分类多元时间序列

我有一组由约40个维度的时间序列（8个点）组成的数据（因此每个时间序列为8 x 40）。对应的输出（类别的可能结果）为0或1。设计具有多个维度的时间序列的分类器的最佳方法是什么？我最初的策略是从这些时间序列中提取特征：均值，标准差，每个维度的最大变化。我获得了用于训练RandomTreeForest的数据集。意识到这一点的天真之处，并且在获得较差的结果之后，我现在正在寻找一种更好的模型。我的线索如下：对每个维度进行系列分类（使用KNN算法和DWT），使用PCA降低维度，并沿多维类别使用最终分类器。作为ML的新手，我不知道自己是否完全错了。

9 classification time-series pca

Questions tagged «time-series»