如何在Python中实现Softmax函数

245

从Udacity的深度学习类中，y_i的softmax只是指数除以整个Y向量的指数和：

其中S(y_i)，y_i和的softmax函数e是指数，并且j是否。输入向量Y中的列数。

我尝试了以下方法：

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回：

[ 0.8360188   0.11314284  0.05083836]

但是建议的解决方案是：

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

即使第一个实现显式地获取每列和最大值的差然后除以总和，它也会产生与第一个实现相同的输出。

有人可以从数学上说明为什么吗？一个是正确的，另一个是错误的吗？

在代码和时间复杂度方面实现是否相似？哪个更有效？

— 阿尔瓦斯
source

6

我很好奇为什么您试图通过max函数以这种方式实现它。是什么让您这样想的？

— BBischof

1

我不知道，我认为将最大值视为0有点像将图形向左移动并裁剪为0有所帮助。然后我的范围从缩短-inf to +inf到-inf to 0。我想我想得太多了。hahahaaa

— alvas '16

1

我仍然有一个子问题，以下似乎未回答。axis = 0Udacity建议的答案的意义何在？

— Parva Thakkar

3

如果您查看numpy文档，它将讨论sum（x，axis = 0）以及类似的axis = 1的作用。简而言之，它提供了求和数组的方向。在这种情况下，它告诉它沿向量求和。在这种情况下，它对应于softmax函数中的分母。

— BBischof

3

就像隔周一样，还有一个更正确的答案，直到我的数学不足以决定谁是正确的为止=）任何不提供答案的数学专家都可以帮助您确定哪个是正确的？

— 阿尔瓦斯

137

它们都是正确的，但是从数值稳定性的角度来看，您是首选。

你开始

e ^ (x - max(x)) / sum(e^(x - max(x))

通过使用a ^（b-c）=（a ^ b）/（a ^ c）的事实，我们得到

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

另一个答案是什么。您可以将max（x）替换为任何变量，它将被抵消。

— 特雷弗·梅里菲尔德
source

4

重新格式化您的答案@TrevorM以进一步澄清：e ^（x-max（x））/ sum（e ^（x-max（x））使用a ^（b-c）=（a ^ b）/（a ^ c）我们有= e ^ x / {e ^ max（x）* sum（e ^ x / e ^ max（x））} = e ^ x / sum（e ^ x）

— shanky_thebearer

5

@Trevor Merrifield，我认为第一种方法没有任何“不必要的用语”。实际上，它比第二种方法更好。我将这一点添加为单独的答案。

— Shagun Sodhani '16

6

@Shagun你是正确的。两者在数学上是等效的，但是我没有考虑数值稳定性。

— 特雷弗·梅里菲尔德

希望您不要介意：我编辑了“不必要的用语”，以防人们不阅读评论（或评论消失）。该页面从搜索引擎获得了大量访问量，这是当前人们所看到的第一个答案。

— Alex Riley

我不知道为什么要减去max（x）而不是max（abs（x））（在确定值后固定符号）。如果所有值都小于零且绝对值非常大，并且只有值（最大值）接近零，那么减去最大值将不会有任何改变。它还在数值上不稳定吗？

— 塞诺（Cerno），

101

（嗯……在这里，无论是在问题还是在答案中，都有很多困惑……）

首先，这两种解决方案（即您和建议的解决方案）并不相同；它们恰好只对一维分数数组的特例等效。如果您还尝试了Udacity测验提供的示例中的2-D分数数组，则将发现它。

从结果来看，这两个解决方案之间的唯一实际区别是axis=0参数。为了了解这种情况，让我们尝试您的解决方案（your_softmax），其中唯一的区别是axis参数：

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# correct solution:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

正如我所说，对于一维分数数组，结果确实是相同的：

scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
print(softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True,  True,  True], dtype=bool)

不过，以下是在Udacity测验中给出的2-D分数数组的结果作为测试示例：

scores2D = np.array([[1, 2, 3, 6],
                     [2, 4, 5, 6],
                     [3, 8, 7, 6]])

print(your_softmax(scores2D))
# [[  4.89907947e-04   1.33170787e-03   3.61995731e-03   7.27087861e-02]
#  [  1.33170787e-03   9.84006416e-03   2.67480676e-02   7.27087861e-02]
#  [  3.61995731e-03   5.37249300e-01   1.97642972e-01   7.27087861e-02]]

print(softmax(scores2D))
# [[ 0.09003057  0.00242826  0.01587624  0.33333333]
#  [ 0.24472847  0.01794253  0.11731043  0.33333333]
#  [ 0.66524096  0.97962921  0.86681333  0.33333333]]

结果是不同的-第二个结果确实与Udacity测验中预期的结果相同，在Udacity测验中，所有列的确加起来为1，而第一个（错误的）结果并非如此。

因此，所有的麻烦实际上是针对实现细节- axis参数。根据numpy.sum文档：

默认值axis = None将对输入数组的所有元素求和

因此在这里我们要逐行求和axis=0。对于一维数组，（仅）行的总和与所有元素的总和恰好相同，因此在这种情况下您的结果相同...

除了axis问题之外，您的实现（即您选择先减去最大值）实际上比建议的解决方案更好！实际上，这是实现softmax函数的推荐方法- 有关理由，请参见此处（数字稳定性，此处也由其他一些答案指出）。

— Desertnaut
source

好吧，如果您只是在谈论多维数组。第一溶液可以通过添加容易地固定axis参数都max和sum。然而，第一次执行仍是更好，因为你可以在拍摄时容易溢出exp

— 路易·杨

@LouisYang我没有关注；哪个是“第一个”解决方案？哪个不使用exp？除了添加axis参数以外，这里还做了哪些修改？

— desertnaut

第一个解决方案是指@alvas的解决方案。不同之处在于alvas问题中建议的解决方案缺少减去最大值的部分。这很容易导致溢出，例如，exp（1000）/（exp（1000）+ exp（1001））vs exp（-1）/（exp（-1）+ exp（0））在数学上是相同的，但是第一个将溢出。

— Louis Yang

@LouisYang仍然不确定我是否理解您的评论的必要性-答案中已经明确解决了所有这些问题。

— desertnaut

@LouisYang请不要让线程的（随后的）流行欺骗您，并尝试想象提供自己答案的上下文：一个困惑的OP（“ 都给出相同的结果 ”）和一个（仍然！）接受的答案声称“ 两者都是正确的 ”（嗯，它们都不正确）。答案绝不是“ 通常是计算softmax的最正确，最有效的方法 ”；它只是为了说明为什么，在所讨论的特定 Udacity测验中，这两种解决方案不相等。

— desertnaut

56

因此，这确实是对Desertnaut答案的评论，但由于我的声誉，我暂时无法对此发表评论。正如他指出的那样，仅当您的输入包含单个样本时，您的版本才是正确的。如果您的输入包含多个样本，那是错误的。但是，desertnaut的解决方案也是错误的。问题在于，一旦他接受一维输入，然后接受二维输入。让我给你看看。

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# desertnaut solution (copied from his answer): 
def desertnaut_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

# my (correct) solution:
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

让我们以Desertnauts为例：

x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

这是输出：

your_softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

desertnaut_softmax(x1)
array([[ 1.,  1.,  1.,  1.]])

softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

您会看到在这种情况下desernauts版本将失败。（如果输入只是一维，如np.array（[1、2、3、6]），则不会。

现在使用3个样本，因为那是我们使用二维输入的原因。以下x2与来自desernauts示例的x2不同。

x2 = np.array([[1, 2, 3, 6],  # sample 1
               [2, 4, 5, 6],  # sample 2
               [1, 2, 3, 6]]) # sample 1 again(!)

此输入包含3个样本的批次。但是样本一和样本三本质上是相同的。现在，我们期望3行softmax激活，其中第一行应与第三行相同，并且也应与x1的激活相同！

your_softmax(x2)
array([[ 0.00183535,  0.00498899,  0.01356148,  0.27238963],
       [ 0.00498899,  0.03686393,  0.10020655,  0.27238963],
       [ 0.00183535,  0.00498899,  0.01356148,  0.27238963]])


desertnaut_softmax(x2)
array([[ 0.21194156,  0.10650698,  0.10650698,  0.33333333],
       [ 0.57611688,  0.78698604,  0.78698604,  0.33333333],
       [ 0.21194156,  0.10650698,  0.10650698,  0.33333333]])

softmax(x2)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

希望您能看到只有我的解决方案才有这种情况。

softmax(x1) == softmax(x2)[0]
array([[ True,  True,  True,  True]], dtype=bool)

softmax(x1) == softmax(x2)[2]
array([[ True,  True,  True,  True]], dtype=bool)

此外，这是TensorFlows softmax实现的结果：

import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})

结果：

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037045],
       [ 0.01203764,  0.08894681,  0.24178252,  0.657233  ],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037045]], dtype=float32)

— 查克五
source

6

那将是一个发表评论的地狱；-)

— 迈克尔·本杰明

27

np.exp（z）/ np.sum（np.exp（z），axis = 1，keepdims = True）达到与softmax函数相同的结果。带有s的步骤是不必要的。

— PabTorre

代替` s = s[:, np.newaxis]， s = s.reshape(z.shape[0],1)也应该起作用。

— Debashish

2

此页面上有这么多不正确/无效的解决方案。请帮忙，并使用PabTorre's

— Palmer小姐

@PabTorre您的意思是axis = -1吗？axis = 1不适用于一维输入

— DiehardTheTryhard

36

我要说的是，尽管两者在数学上都是正确的，但从实现角度来看，第一个更好。当计算softmax时，中间值可能会变得非常大。将两个大数相除可能会造成数值不稳定。这些注释（来自斯坦福大学）提到了归一化技巧，这实际上就是您正在做的事情。

— Shagun Sodhani
source

3

灾难性抵消的影响不可低估。

— 塞萨尔（Cesar）2016年

24

sklearn还提供softmax的实现

from sklearn.utils.extmath import softmax
import numpy as np

x = np.array([[ 0.50839931,  0.49767588,  0.51260159]])
softmax(x)

# output
array([[ 0.3340521 ,  0.33048906,  0.33545884]])

— 罗马奥拉克
source

3

这如何准确地回答特定的问题，该问题与实现本身有关，而不与某些第三方库中的可用性有关？

— desertnaut

8

我正在寻找第三方实施方案来验证两种方法的结果。这是此评论有帮助的方式。

— Eugenio F. Martinez Pacheco '18

13

从数学观点来看，双方是平等的。

您可以轻松证明这一点。让我们开始吧m=max(x)。现在，您的函数softmax将返回一个向量，其第i个坐标等于

请注意，这适用于any m，因为对于所有（甚至复数）数字e^m != 0

从计算复杂度的角度来看，它们也是等效的，并且都在O(n)时间上运行，其中n向量的大小在哪里。
从数值稳定性的角度来看，首选第一个解决方案，因为它e^x增长非常快，即使很小的值x也会溢出。减去最大值可以消除此溢出。为了实际体验我所谈论的内容，请尝试x = np.array([1000, 5])同时使用这两个功能。一个将返回正确的概率，第二个将溢出nan
您的解决方案仅适用于向量（Udacity测验也希望您也针对矩阵进行计算）。为了修复它，您需要使用sum(axis=0)

— 萨尔瓦多·达利
source

1

何时能够在矩阵而不是矢量上计算softmax有用吗？即什么模型输出矩阵？可以更立体吗？

— mrgloom

2

你的意思是第一个解决方案中的“从数值稳定点，第二个解决方案是优选......”？

— Dataman，

10

编辑。从1.2.0版开始，scipy包含softmax作为特殊功能：

https://scipy.github.io/devdocs/generation/scipy.special.softmax.html

我编写了一个在所有轴上应用softmax的函数：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

如其他用户所述，减去最大值是一种很好的做法。我在这里写了一篇详细的文章。

— 诺兰·科纳威（Nolan Conaway）
source

9

在这里，您可以了解他们为什么使用- max。

从那里：

“在实践中编写用于计算Softmax函数的代码时，由于指数的原因，中间项可能会非常大。将大数相除可能会造成数值不稳定，因此使用归一化技巧很重要。”

— 萨迪·萨利（Sadegh Salehi）
source

4

一个更简洁的版本是：

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

— 皮明·康斯坦丁·凯法鲁科斯（Pimin Konstantin Kefaloukos）
source

9

这可能会遇到算术溢出

— minhle_r7

4

要提供替代解决方案，请考虑以下情况：您的论点的数量级非常大，以致exp(x)于下溢（在否定的情况下）或上溢（在肯定的情况下）。您希望在此处尽可能长时间地保留在日志空间中，仅在您可以相信结果会表现良好的末尾进行幂运算。

import scipy.special as sc
import numpy as np

def softmax(x: np.ndarray) -> np.ndarray:
    return np.exp(x - sc.logsumexp(x))

— 匹卡沙特
source

为了使其与发布者代码相等，您需要将添加axis=0为参数logsumexp。

— 比约恩·林德奎斯特（BjörnLindqvist）

另一种方法是，可以解压缩多余的args传递给logsumexp。

— PikalaxALT

3

我需要一些与Tensorflow密集层的输出兼容的东西。

@desertnaut的解决方案在这种情况下不起作用，因为我有大量数据。因此，我提供了另一种在两种情况下均适用的解决方案：

def softmax(x, axis=-1):
    e_x = np.exp(x - np.max(x)) # same code
    return e_x / e_x.sum(axis=axis, keepdims=True)

结果：

logits = np.asarray([
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921], # 1
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921]  # 2
])

print(softmax(logits))

#[[0.2492037  0.24858153 0.25393605 0.24827873]
# [0.2492037  0.24858153 0.25393605 0.24827873]]

参考：Tensorflow softmax

— 卢卡斯·卡萨格兰德
source

请记住，答案是指问题中描述的非常具体的设置。从来没有想过要“在任何情况下，或者按照您喜欢的数据格式，通常如何计算softmax”……

— desertnaut

我知道了，我把它放在这里是因为问题涉及“ Udacity的深度学习课程”，如果您使用Tensorflow来构建模型，则该问题将不起作用。您的解决方案既酷又干净，但仅在非常特定的情况下有效。不管怎么说，还是要谢谢你。

— 卢卡斯·卡萨格兰德

2

我建议这样做：

def softmax(z):
    z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
    return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))

它将适用于随机和批处理。
有关更多详细信息，请参见：https : //medium.com/@ravish1729/analysis-of-softmax-function-ad058d6a564d

— 拉维什·库马尔·夏尔马
source

1

为了保持数值稳定性，应减去max（x）。以下是softmax函数的代码；

def softmax（x）：

if len(x.shape) > 1:
    tmp = np.max(x, axis = 1)
    x -= tmp.reshape((x.shape[0], 1))
    x = np.exp(x)
    tmp = np.sum(x, axis = 1)
    x /= tmp.reshape((x.shape[0], 1))
else:
    tmp = np.max(x)
    x -= tmp
    x = np.exp(x)
    tmp = np.sum(x)
    x /= tmp


return x

— 拉胡尔·阿胡加（Rahul Ahuja）
source

1

在以上答案中已经详细回答了。max被减去以避免溢出。我在这里在python3中添加了另一个实现。

import numpy as np
def softmax(x):
    mx = np.amax(x,axis=1,keepdims = True)
    x_exp = np.exp(x - mx)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    res = x_exp / x_sum
    return res

x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

— 废话
source

1

每个人似乎都发布了他们的解决方案，所以我将发布我的解决方案：

def softmax(x):
    e_x = np.exp(x.T - np.max(x, axis = -1))
    return (e_x / e_x.sum(axis=0)).T

我得到的结果与从sklearn导入的结果完全相同：

from sklearn.utils.extmath import softmax

— 朱利安
source

1

import tensorflow as tf
import numpy as np

def softmax(x):
    return (np.exp(x).T / np.exp(x).sum(axis=-1)).T

logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])

sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()

— 国王
source

欢迎来到SO。对您的代码如何回答问题的解释总是很有帮助的。

— 尼克，

1

根据所有答复和CS231n注释，请允许我总结一下：

def softmax(x, axis):
    x -= np.max(x, axis=axis, keepdims=True)
    return np.exp(x) / np.exp(x).sum(axis=axis, keepdims=True)

用法：

x = np.array([[1, 0, 2,-1],
              [2, 4, 6, 8], 
              [3, 2, 1, 0]])
softmax(x, axis=1).round(2)

输出：

array([[0.24, 0.09, 0.64, 0.03],
       [0.  , 0.02, 0.12, 0.86],
       [0.64, 0.24, 0.09, 0.03]])

— 雷米卡列姆
source

0

我想补充一点对问题的理解。在这里减去数组的最大值是正确的。但是，如果您在另一篇文章中运行代码，则当数组为2D或更高尺寸时，您会发现它没有给出正确的答案。

在这里，我给您一些建议：

要获得最大值，请尝试沿x轴进行操作，您将获得一维数组。
将您的最大数组重塑为原始形状。
是否使np.exp获得指数值。
沿轴做np.sum。
获得最终结果。

按照结果进行矢量化处理，您将获得正确的答案。由于它与大学作业有关，因此我无法在此处发布确切的代码，但是如果您不理解，我想提出更多建议。

— 徐浩
source

1

它与任何大学作业都没有关系，仅与未经认可的课程中的未分级练习测验有关，在该课程中，下一步将提供正确的答案……

— desertnaut

0

softmax函数的目的是保留矢量的比率，而不是随着值饱和（即趋于+/- 1（tanh）或从0到1（逻辑））用S形压缩端点。这是因为它保留了有关端点变化率的更多信息，因此更适用于N输出为1-of的神经网络编码（即，如果压缩端点，则很难区分1 -of-N输出类，因为我们不能说哪个是“最大”或“最小”的，因为它们被压扁了。）；也会使总输出总和为1，明确的获胜者将接近1，而彼此接近的其他数字将为1 / p，其中p是具有相似值的输出神经元的数量。

从向量中减去最大值的目的是，当您进行指数运算时，您可能会得到很高的值，该值会将浮点数修剪为最大值，导致出现平局，在此示例中不是这种情况。如果您减去最大值以得出负数，那么这将成为一个大问题，您将拥有一个负指数，该指数会迅速缩小值以更改比率，这是发帖人的问题中出现的结果，并且给出了错误的答案。

Udacity提供的答案很糟糕。我们要做的第一件事是为所有矢量分量计算e ^ y_j，保留这些值，然后将它们求和并除。Udacity搞砸的地方是他们计算两次e ^ y_j！这是正确的答案：

def softmax(y):
    e_to_the_y_j = np.exp(y)
    return e_to_the_y_j / np.sum(e_to_the_y_j, axis=0)

0

目标是使用Numpy和Tensorflow达到类似的结果。原始答案的唯一变化是api的axis参数np.sum。

初始方法：axis=0-但是，当尺寸为N时，这不会提供预期的结果。

修改方法：axis=len(e_x.shape)-1-总是在最后一个维度上求和。这提供了与tensorflow的softmax函数相似的结果。

def softmax_fn(input_array):
    """
    | **@author**: Prathyush SP
    |
    | Calculate Softmax for a given array
    :param input_array: Input Array
    :return: Softmax Score
    """
    e_x = np.exp(input_array - np.max(input_array))
    return e_x / e_x.sum(axis=len(e_x.shape)-1)

— 金斯普
source

0

这是使用numpy和comparision的广义解决方案，用于使用tensorflow ansscipy的正确性：

数据准备：

import numpy as np

np.random.seed(2019)

batch_size = 1
n_items = 3
n_classes = 2
logits_np = np.random.rand(batch_size,n_items,n_classes).astype(np.float32)
print('logits_np.shape', logits_np.shape)
print('logits_np:')
print(logits_np)

输出：

logits_np.shape (1, 3, 2)
logits_np:
[[[0.9034822  0.3930805 ]
  [0.62397    0.6378774 ]
  [0.88049906 0.299172  ]]]

使用张量流的Softmax：

import tensorflow as tf

logits_tf = tf.convert_to_tensor(logits_np, np.float32)
scores_tf = tf.nn.softmax(logits_np, axis=-1)

print('logits_tf.shape', logits_tf.shape)
print('scores_tf.shape', scores_tf.shape)

with tf.Session() as sess:
    scores_np = sess.run(scores_tf)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np,axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

logits_tf.shape (1, 3, 2)
scores_tf.shape (1, 3, 2)
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

使用scipy的Softmax：

from scipy.special import softmax

scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.6413727  0.35862732]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

使用numpy的Softmax（https://nolanbconaway.github.io/blog/2017/softmax-numpy）：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats.
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter,
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p


scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.49652317 0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

— 格洛姆
source

0

softmax函数是一种激活函数，可将数字转换为总计为1的概率。softmax函数输出一个向量，该向量表示结果列表的概率分布。它也是深度学习分类任务中使用的核心元素。

当我们有多个类时，将使用Softmax函数。

这对于找出具有最大值的类很有用。可能性。

Softmax函数理想地用于输出层，我们实际上是在尝试获得定义每个输入的类的概率。

取值范围是0〜1。

Softmax函数将logits [2.0，1.0，0.1]转换为概率[0.7，0.2，0.1]，并且概率之和为1。Logits是神经网络最后一层输出的原始分数。在激活之前。要了解softmax函数，我们必须查看第（n-1）层的输出。

实际上，softmax函数是arg max函数。这意味着它不会从输入中返回最大值，而是返回最大值的位置。

例如：

在softmax之前

X = [13, 31, 5]

在softmax之后

array([1.52299795e-08, 9.99999985e-01, 5.10908895e-12]

码：

import numpy as np

# your solution:

def your_softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum() 

# correct solution: 

def softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum(axis=0) 

# only difference

— 克里希纳·韦尔
source