程序设计 scikit-learn

9

我正在使用linear_model.LinearRegressionscikit-learn作为预测模型。它有效且完美。我在使用accuracy_score度量标准评估预测结果时遇到问题。这是我的真实数据： array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0]) 我的预测数据： array([ 0.07094605, 0.1994941 , 0.19270157, 0.13379635, 0.04654469, 0.09212494, 0.19952108, 0.12884365, 0.15685076, -0.01274453, 0.32167554, 0.32167554, -0.10023553, 0.09819648, -0.06755516, 0.25390082, 0.17248324]) 我的代码： accuracy_score(y_true, y_pred, normalize=False) 错误信息： ValueError：无法处理二进制目标和连续目标的混合救命？谢谢。

73 python machine-learning scikit-learn linear-regression prediction

27

为什么pydot无法在Windows 8中找到GraphViz的可执行文件？

我在Windows 8中安装了GraphViz 2.32，并将C：\ Program Files（x86）\ Graphviz2.32 \ bin添加到系统PATH变量中。pydot仍然找不到其可执行文件。 Traceback (most recent call last): File "<pyshell#26>", line 1, in <module> graph.write_png('example1_graph.png') File "build\bdist.win32\egg\pydot.py", line 1809, in <lambda> lambda path, f=frmt, prog=self.prog : self.write(path, format=f, prog=prog)) File "build\bdist.win32\egg\pydot.py", line 1911, in write dot_fd.write(self.create(prog, format)) File "build\bdist.win32\egg\pydot.py", line 1953, in create 'GraphViz\'s executables …

71 graphviz scikit-learn pygraphviz pydot

6

Scikit学习SVC决策功能并进行预测

我正在尝试理解Decision_function和Predict之间的关系，它们是SVC的实例方法（http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html）。到目前为止，我已经收集到决策函数返回类之间的成对得分。我的印象是，预测会选择最大化其成对成绩的课程，但我对此进行了测试，得出了不同的结果。这是我用来尝试理解两者之间关系的代码。首先，我生成了成对分数矩阵，然后打印出最大成对分数的类，该类不同于clf.predict预测的类。 result = clf.decision_function(vector)[0] counter = 0 num_classes = len(clf.classes_) pairwise_scores = np.zeros((num_classes, num_classes)) for r in xrange(num_classes): for j in xrange(r + 1, num_classes): pairwise_scores[r][j] = result[counter] pairwise_scores[j][r] = -result[counter] counter += 1 index = np.argmax(pairwise_scores) class = index_star / num_classes print class print clf.predict(vector)[0] 有谁知道这些预测和决策函数之间的关系？

71 python numpy svm scikit-learn

9

如何获取scikit学习分类器的大多数信息功能？

诸如liblinear和nltk之类的机器学习包中的分类器提供了一个method show_most_informative_features()，它对于调试功能确实很有帮助： viagra = None ok : spam = 4.5 : 1.0 hello = True ok : spam = 4.5 : 1.0 hello = None spam : ok = 3.3 : 1.0 viagra = True spam : ok = 3.3 : 1.0 casino = True spam : ok = 2.0 …

70 python machine-learning classification scikit-learn

10

在scikit-learn中估算分类缺失值

我有一些文本类型的列的熊猫数据。这些文本列中包含一些NaN值。我想做的是通过sklearn.preprocessing.Imputer（用最常用的值替换NaN ）来估算这些NaN 。问题在于实施。假设有一个具有30列的Pandas数据框df，其中10列属于分类性质。一旦我运行： from sklearn.preprocessing import Imputer imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0) imp.fit(df) Python会生成一个error: 'could not convert string to float: 'run1''，其中'run1'是带有分类数据的第一列中的普通（不丢失）值。任何帮助将非常欢迎

70 python pandas scikit-learn imputation

3

转换多个分类列

在我的数据集中，我想列举两个分类列。两列都包含国家，有些重叠（出现在两列中）。我想在同一国家的column1和column2中给出相同的数字。我的数据看起来像： import pandas as pd d = {'col1': ['NL', 'BE', 'FR', 'BE'], 'col2': ['BE', 'NL', 'ES', 'ES']} df = pd.DataFrame(data=d) df 目前，我正在像这样转换数据： from sklearn.preprocessing import LabelEncoder df.apply(LabelEncoder().fit_transform) 但是，这在FR和ES之间没有区别。是否有另一种简单的方法可以得到以下输出？ o = {'col1': [2,0,1,0], 'col2': [0,2,4,4]} output = pd.DataFrame(data=o) output

10 python python-3.x pandas scikit-learn categorical-data

2

将GridSearchCV与IsolationForest结合使用以查找异常值

我想IsolationForest用于发现异常值。我想使用找到最佳模型参数GridSearchCV。问题是我总是得到相同的错误： TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator IsolationForest(behaviour='old', bootstrap=False, contamination='legacy', max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=None, random_state=None, verbose=0, warm_start=False) does not. 似乎是一个问题，因为IsolationForest没有score方法。有没有办法来解决这个问题？还可以找到隔离林的分数吗？这是我的代码： import pandas as pd from sklearn.ensemble import IsolationForest from sklearn.model_selection import GridSearchCV df = pd.DataFrame({'first': [-112,0,1,28,5,6,3,5,4,2,7,5,1,3,2,2,5,2,42,84,13,43,13], 'second': [42,1,2,85,2,4,6,8,3,5,7,3,64,1,4,1,2,4,13,1,0,40,9], 'third': [3,4,7,74,3,8,2,4,7,1,53,6,5,5,59,0,5,12,65,4,3,4,11], …

10 python scikit-learn

Questions tagged «scikit-learn»