Sparse_categorical_crossentropy与categorical_crossentropy（keras，准确性）

哪个更适合准确性，还是相同？当然，如果使用categorical_crossentropy，则使用一种热编码，如果使用sparse_categorical_crossentropy，则将编码为普通整数。另外，什么时候比另一种更好？

— M大师
source

当您的类是互斥的时（例如，当每个样本完全属于一个类时），请使用稀疏分类交叉熵；而当一个样本可以有多个类，或者标签是软概率（例如[0.5、0.3、0.2]）时，请使用分类交叉熵。

分类交叉熵的公式（S-样本，C-类， $s \in c$ -属于c）类的示例是：

- \frac{1个}{ñ} \sum_{s \in 小号} \sum_{C \in C} {1个}_{s \in C} 升 Ø G p （ s \in C ）

$-\frac{1}{N} \sum_{s\in S} \sum_{c \in C} 1_{s\in c} log {p(s \in c)}$

对于类是互斥的情况，您无需对其求和-对于每个样本，仅非零值只是 $-log p(s \in c)$ 对于真正的c。

This allows to conserve time and memory. Consider case of 10000 classes when they are mutually exclusive - just 1 log instead of summing up 10000 for each sample, just one integer instead of 10000 floats.

Formula is the same in both cases, so no impact on accuracy should be there.

— frenzykryger
source

Do they impact the accuracy differently, for example on mnist digits dataset?

— Master M

Mathematically there is no difference. If there is significant difference in values computed by implementations (say tensorflow or pytorch), then this sounds like a bug. Simple comparison on random data (1000 classes, 10 000 samples) show no difference.

— frenzykryger

Dear frenzykryger, I guess you forgot a minus for the one sample case only: "for each sample only non-zero value is just -log(p(s

\in

$\in$ c))". For the rest, nice answer.

— Nicg

You're right. Thanks!

— frenzykryger

@frenzykryger I am working on multi-output problem. I have 3 seperate output o1,o2,o3 and each one have 167,11,7 classes respectively. I've read your answer that it'll make no difference but is there any difference if I use sparse__ or not. Can I go for categorical for the last 2 and sparse for the first one as there are 167 classes in the first class?

— Deshwal

The Answer, In a Nutshell

If your targets are one-hot encoded, use categorical_crossentropy. Examples of one-hot encodings:

[1,0,0]
[0,1,0] 
[0,0,1]

But if your targets are integers, use sparse_categorical_crossentropy. Examples of integer encodings (for the sake of completion):

1
2
3

— user78035
source

Do I need a single output node for sparse_categorical_crossentropy? And what does the from_logits argument mean?

— Leevo