带标签的sklearn图混淆矩阵


79

我想绘制一个混淆矩阵以可视化分类器的性能,但它仅显示标签的数字,而不显示标签本身:

from sklearn.metrics import confusion_matrix
import pylab as pl
y_test=['business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business']

pred=array(['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health'], 
      dtype='|S8')

cm = confusion_matrix(y_test, pred)
pl.matshow(cm)
pl.title('Confusion matrix of the classifier')
pl.colorbar()
pl.show()

如何将标签(健康,业务等)添加到混淆矩阵?

Answers:


64

由于在暗示这个问题,你必须“开放”的较低级别的艺术家API,通过存储的数字和轴对象由matplotlib函数调用(传递figaxcax下面的变量)。然后,您可以使用set_xticklabels/替换默认的x和y轴刻度set_yticklabels

from sklearn.metrics import confusion_matrix

labels = ['business', 'health']
cm = confusion_matrix(y_test, pred, labels)
print(cm)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

请注意,我将labels列表传递给confusion_matrix函数以确保其正确排序,与刻度线匹配。

结果如下图:

在此处输入图片说明


3
如果您有多个类别,则Matplotlib决定错误地标记轴-您必须强制其标记每个单元格。 from matplotlib.ticker import MultipleLocator; ax.xaxis.set_major_locator(MultipleLocator(1)); ax.yaxis.set_major_locator(MultipleLocator(1))
rescdsk 2014年

作为一个新盒子,您能告诉我3个盒子的大小是否暗示着准确性?
鲍里斯2015年

如何在它们上显示数字?因为颜色不一定在所有情况下
都能

嗨... @ metakermit 您能告诉我如何在彩色图中显示数字吗?
Humaun Rashid Nayan

65

更新:

在scikit-learn 0.22中,有一个新功能可以直接绘制混淆矩阵。

请参阅文档:sklearn.metrics.plot_confusion_matrix


旧答案:

我认为值得一提的是seaborn.heatmap这里的用法。

import seaborn as sns
import matplotlib.pyplot as plt     

ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax); #annot=True to annotate cells

# labels, title and ticks
ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels'); 
ax.set_title('Confusion Matrix'); 
ax.xaxis.set_ticklabels(['business', 'health']); ax.yaxis.set_ticklabels(['health', 'business']);

在此处输入图片说明


30
建议:传递fmt='g'heatmap电话以防止数字使用科学计数法。
polm23 '18

5
建议:传递cmap='Greens'heatmap呼叫具有直观的颜色含义。
EliadL

如何确定您没有混淆标签?
Monica

@RevolucionforMonica当您获得时confusion_matrix,X轴刻度标签为1,0,Y轴刻度标签为0,1(按轴值递增顺序)。如果分类器为clf,则可以通过来获得类顺序clf.classes_["health", "business"]在这种情况下,该顺序应匹配。(假定business为肯定类)。
akilat90

28

我发现了一个可以绘制从生成的混淆矩阵的函数sklearn

import numpy as np


def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    import numpy as np
    import itertools

    accuracy = np.trace(cm) / np.sum(cm).astype('float')
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

看起来像这样 在此处输入图片说明


23

您可能对https://github.com/pandas-ml/pandas-ml/有兴趣

它实现了Confusion Matrix的Python Pandas实现。

一些功能:

  • 情节混淆矩阵
  • 绘制归一化混淆矩阵
  • 班级统计
  • 总体统计

这是一个例子:

In [1]: from pandas_ml import ConfusionMatrix
In [2]: import matplotlib.pyplot as plt

In [3]: y_test = ['business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business']

In [4]: y_pred = ['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health']

In [5]: cm = ConfusionMatrix(y_test, y_pred)

In [6]: cm
Out[6]:
Predicted  business  health  __all__
Actual
business         14       6       20
health            0       0        0
__all__          14       6       20

In [7]: cm.plot()
Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x1093cf9b0>

In [8]: plt.show()

情节混淆矩阵

In [9]: cm.print_stats()
Confusion Matrix:

Predicted  business  health  __all__
Actual
business         14       6       20
health            0       0        0
__all__          14       6       20


Overall Statistics:

Accuracy: 0.7
95% CI: (0.45721081772371086, 0.88106840959427235)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.608009812201
Kappa: 0.0
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                 business health
Population                                    20     20
P: Condition positive                         20      0
N: Condition negative                          0     20
Test outcome positive                         14      6
Test outcome negative                          6     14
TP: True Positive                             14      0
TN: True Negative                              0     14
FP: False Positive                             0      6
FN: False Negative                             6      0
TPR: (Sensitivity, hit rate, recall)         0.7    NaN
TNR=SPC: (Specificity)                       NaN    0.7
PPV: Pos Pred Value (Precision)                1      0
NPV: Neg Pred Value                            0      1
FPR: False-out                               NaN    0.3
FDR: False Discovery Rate                      0      1
FNR: Miss Rate                               0.3    NaN
ACC: Accuracy                                0.7    0.7
F1 score                               0.8235294      0
MCC: Matthews correlation coefficient        NaN    NaN
Informedness                                 NaN    NaN
Markedness                                     0      0
Prevalence                                     1      0
LR+: Positive likelihood ratio               NaN    NaN
LR-: Negative likelihood ratio               NaN    NaN
DOR: Diagnostic odds ratio                   NaN    NaN
FOR: False omission rate                       1      0

什么,您是如何使它工作的?使用最新的pandas_ml,它为我提供了一个空白的混淆矩阵(全为0),并且标签为True / False,而不是业务和健康状况。
wordforthewise

同样,它是空白
Elham

1
我收到AttributeError:模块“ sklearn.metrics”不具有scikit-learn版本0.23.1和pandas-ml版本0.6.1的属性“ jaccard_similarity_score”。我也尝试过其他版本,但没有运气。
彼得拉

17
from sklearn import model_selection
test_size = 0.33
seed = 7
X_train, X_test, y_train, y_test = model_selection.train_test_split(feature_vectors, y, test_size=test_size, random_state=seed)

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

model = LogisticRegression()
model.fit(X_train, y_train)
result = model.score(X_test, y_test)
print("Accuracy: %.3f%%" % (result*100.0))
y_pred = model.predict(X_test)
print("F1 Score: ", f1_score(y_test, y_pred, average="macro"))
print("Precision Score: ", precision_score(y_test, y_pred, average="macro"))
print("Recall Score: ", recall_score(y_test, y_pred, average="macro")) 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

def cm_analysis(y_true, y_pred, labels, ymap=None, figsize=(10,10)):
    """
    Generate matrix plot of confusion matrix with pretty annotations.
    The plot image is saved to disk.
    args: 
      y_true:    true label of the data, with shape (nsamples,)
      y_pred:    prediction of the data, with shape (nsamples,)
      filename:  filename of figure file to save
      labels:    string array, name the order of class labels in the confusion matrix.
                 use `clf.classes_` if using scikit-learn models.
                 with shape (nclass,).
      ymap:      dict: any -> string, length == nclass.
                 if not None, map the labels & ys to more understandable strings.
                 Caution: original y_true, y_pred and labels must align.
      figsize:   the size of the figure plotted.
    """
    if ymap is not None:
        y_pred = [ymap[yi] for yi in y_pred]
        y_true = [ymap[yi] for yi in y_true]
        labels = [ymap[yi] for yi in labels]
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    cm_sum = np.sum(cm, axis=1, keepdims=True)
    cm_perc = cm / cm_sum.astype(float) * 100
    annot = np.empty_like(cm).astype(str)
    nrows, ncols = cm.shape
    for i in range(nrows):
        for j in range(ncols):
            c = cm[i, j]
            p = cm_perc[i, j]
            if i == j:
                s = cm_sum[i]
                annot[i, j] = '%.1f%%\n%d/%d' % (p, c, s)
            elif c == 0:
                annot[i, j] = ''
            else:
                annot[i, j] = '%.1f%%\n%d' % (p, c)
    cm = pd.DataFrame(cm, index=labels, columns=labels)
    cm.index.name = 'Actual'
    cm.columns.name = 'Predicted'
    fig, ax = plt.subplots(figsize=figsize)
    sns.heatmap(cm, annot=annot, fmt='', ax=ax)
    #plt.savefig(filename)
    plt.show()

cm_analysis(y_test, y_pred, model.classes_, ymap=None, figsize=(10,10))

在此处输入图片说明

使用https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7

请注意,如果使用rocket_r它,它将使颜色反转,并且某种程度上看起来更自然,更好,如下所示: 在此处输入图片说明


谢谢,但是rocket_r您提到的选择是什么?
哈曼·塞缪尔

在该函数中sns.heatmap(),传递cmap='rocket_r'刻度的颜色反比的参数
Sai Prabhanjan Reddy

9
    from sklearn.metrics import confusion_matrix
    import seaborn as sns
    import matplotlib.pyplot as plt
    model.fit(train_x, train_y,validation_split = 0.1, epochs=50, batch_size=4)
    y_pred=model.predict(test_x,batch_size=15)
    cm =confusion_matrix(test_y.argmax(axis=1), y_pred.argmax(axis=1))  
    index = ['neutral','happy','sad']  
    columns = ['neutral','happy','sad']  
    cm_df = pd.DataFrame(cm,columns,index)                      
    plt.figure(figsize=(10,6))  
    sns.heatmap(cm_df, annot=True)

混淆矩阵


3

要添加到@ akilat90的有关sklearn.metrics.plot_confusion_matrix以下内容的更新:

您可以直接ConfusionMatrixDisplay在中使用该类,sklearn.metrics而无需将分类器传递给plot_confusion_matrix。它还具有display_labels参数,该参数可让您指定所需的图形显示的标签。

的构造函数ConfusionMatrixDisplay无法提供对图表进行更多自定义的方法,但是您可以ax_在调用其plot()方法后通过属性访问matplotlib轴对象。我添加了第二个示例来说明这一点。

我发现烦恼的是,必须对大量数据重新运行分类器才能生成带有的图plot_confusion_matrix。我正在根据预测数据生成其他图,所以我不想浪费我的时间来每次进行重新预测。这也是解决该问题的简便方法。

例:

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()

混淆矩阵示例1

使用示例ax_

cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()
cmd.ax_.set(xlabel='Predicted', ylabel='True')

混淆矩阵示例


1
太好了-谢谢!问题:是否可以为轴标签自定义“真实标签”和“预测标签”值?
caydin

1
我之前没有意识到这一点,但是您可以通过来访问matplotlib axes对象cmd.ax_,该对象可以对图表进行很多控制。要自定义轴标签,请使用以下命令:cmd.ax_.set(xlabel='foo', ylabel='bar')。我将更新我的答案。
themaninthewoods

非常感谢!但它看起来像cmd.ax_.set禁用display_labels=['business','health']
caydin

我也得到了AttributeError: 'ConfusionMatrixDisplay' object has no attribute 'ax_'
caydin

1
啊,你说得对!感谢您指出这些问题。在寻找解决方案的激动中,我在更新中犯了一些错误。请查看最新版本,现在应该可以使用。
themaninthewoods
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.