计算机科学 data-mining

5

今天在讨论一些入门级主题时，包括遗传算法的使用；有人告诉我，这方面的研究确实放慢了速度。给出的原因是大多数人都专注于机器学习和数据挖掘。更新：这是准确的吗？如果是这样，与GA相比，ML / DM有什么优势？

45 machine-learning data-mining evolutionary-computing history

4

我正在尝试了解聚类方法。我想我明白的是：在监督学习中，分配给类别/标签的数据在计算之前是已知的。因此，使用标签，类或类别来“学习”对于那些集群真正重要的参数。在无监督学习中，将数据集分配给段，而无需了解聚类。这是否意味着，如果我什至不知道哪个参数对于细分至关重要，我应该喜欢监督学习？

28 machine-learning data-mining clustering

2

识别与段落中日期有关的事件

是否存在一种算法方法来确定段落中给定的日期与段落中的特定事件（短语）相关？示例，请考虑以下段落： 1970年6月，这位伟大的领袖宣誓就职。但是直到1972年5月国务卿去世后，他才接管了国家的the绳。在1980年中期之前，他一直得到民众的支持，但此后，他的影响力开始下降。是否有一种算法（确定性或随机性）＃可以生成2元组（日期，事件），而该段落暗示该事件发生在该日期上？在上述情况下：（1970年6月，伟大的领袖宣誓）（1972年5月，接任the绳）甚至更好（1972年5月，伟大的领导人接管了ins绳）（1980年，影响力下降）＃后来加法

13 algorithms data-mining natural-language-processing

5

数据科学与运筹学

顾名思义，一般的问题是： DS和OR /优化之间有什么区别。从概念上讲，我知道DS会尝试从可用数据中提取知识，并主要使用统计，机器学习技术。另一方面，“或” 使用数据以便基于数据做出决策，例如通过优化数据（输入）上的某些目标函数（准则）来进行决策。我想知道这两种范例之间的比较。是另一个子集吗？他们在考虑互补领域吗？是否有一个例子可以补充一个领域或将它们结合使用？我特别对以下内容感兴趣：是否有使用OR技术解决数据科学问题的示例？

11 optimization data-mining

4

信息检索与信息提取之间的关系和区别？

来自维基百科信息检索是从信息资源集合中获取与信息需求相关的信息资源的活动。搜索可以基于元数据或全文索引。来自维基百科信息提取（IE）是从非结构化和/或半结构化的机器可读文档中自动提取结构化信息的任务。在大多数情况下，此活动涉及通过自然语言处理（NLP）来处理人类语言文本。多媒体文档处理中的最新活动，例如从图像/音频/视频中自动注释和内容提取，可以看作是信息提取。信息检索与信息提取之间的关系和区别是什么？谢谢！

11 data-mining natural-language-processing

1

推断优化类型

在工作中，我的任务是推断一些有关动态语言的类型信息。我将语句序列重写为嵌套let表达式，如下所示： return x; Z => x var x; Z => let x = undefined in Z x = y; Z => let x = y in Z if x then T else F; Z => if x then { T; Z } else { F; Z } 由于我从一般类型信息开始，并试图推断出更具体的类型，因此自然的选择是精简类型。例如，条件运算符返回其真假分支类型的并集。在简单的情况下，它效果很好。但是，在尝试推断以下类型时遇到了障碍： function …

11 programming-languages logic type-theory type-inference machine-learning data-mining clustering order-theory reference-request information-theory entropy algorithms algorithm-analysis space-complexity lower-bounds formal-languages computability formal-grammars context-free parsing complexity-theory time-complexity terminology turing-machines nondeterminism programming-languages semantics operational-semantics complexity-theory time-complexity complexity-theory reference-request turing-machines machine-models simulation graphs probability-theory data-structures terminology distributed-systems hash-tables history terminology programming-languages meta-programming terminology formal-grammars compilers algorithms search-algorithms formal-languages regular-languages complexity-theory satisfiability sat-solvers factoring algorithms randomized-algorithms streaming-algorithm in-place algorithms numerical-analysis regular-languages automata finite-automata regular-expressions algorithms data-structures efficiency coding-theory algorithms graph-theory reference-request education books formal-languages context-free proof-techniques algorithms graph-theory greedy-algorithms matroids complexity-theory graph-theory np-complete intuition complexity-theory np-complete traveling-salesman algorithms graphs probabilistic-algorithms weighted-graphs data-structures time-complexity priority-queues computability turing-machines automata pushdown-automata algorithms graphs binary-trees algorithms algorithm-analysis spanning-trees terminology asymptotics landau-notation algorithms graph-theory network-flow terminology computability undecidability rice-theorem algorithms data-structures computational-geometry

5

复杂度为O（n）的词频

在接受Java开发人员职位面试时，有人问我以下问题：编写一个具有两个参数的函数：代表文本文档的字符串，以及提供要返回的项目数的整数。实现函数，使其返回按单词频率排序的字符串列表，最频繁出现的单词在前。您的解决方案应在时间运行，其中是文档中的字符数。nO （n ）O(n)O(n)ñnn 以下是我的回答（用伪代码），由于排序，它不是，而是时间。我不知道该怎么做时间。 O （n log n ）O （n ）O （n ）O(n)O(n)Ø （ñ 日志n ）O(nlog⁡n)O(n \log n)O （n ）O(n)O(n) wordFrequencyMap = new HashMap<String, Integer>(); words = inputString.split(' '); for (String word : words) { count = wordFrequencyMap.get(word); count = (count == null) ? 1 : …

11 algorithms sorting strings data-mining

1

寻找适合新条目的排名算法

我正在使用一种排名系统，该系统将根据一段时间内的投票对条目进行排名。我正在寻找一种算法，可以计算出近似于平均分的分数，但是我希望它比起较旧的分数更喜欢较新的分数。我在考虑以下方面的问题： score1+ 2⋅score2 + ⋯+ n⋅scoren1+2+⋯+nscore1+ 2⋅score2 + ⋯+ n⋅scoren1+2+⋯+n\frac{\mathrm{score}_1 +\ 2\cdot \mathrm{score}_2\ +\ \dots +\ n\cdot \mathrm{score}_n}{1 + 2 + \dots + n} 我想知道是否还有其他通常用于这种情况的算法，如果可以，请您解释一下？

9 algorithms data-mining

Questions tagged «data-mining»