曹耘豪的博客

Keras 损失函数

  1. 定义
  2. mean_squared_error:均方误差(mse)
  3. mean_absolute_error (mae)
  4. mean_absolute_percentage_error (mape)
  5. mean_squared_logarithmic_error
  6. squared_hinge
  7. hinge
  8. categorical_hinge
  9. logcosh
  10. categorical_crossentropy
  11. sparse_categorical_crossentropy
  12. binary_crossentropy
  13. kullback_leibler_divergence
  14. poisson
  15. cosine_proximity

定义

损失函数是模型优化的目标,所以又叫目标函数、优化评分函数,在keras中,模型编译的参数loss指定了损失函数的类别,有两种指定方法:

1
model.compile(loss='mean_squared_error', optimizer='sgd')

或者

1
2
from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer='sgd')

你可以传递一个现有的损失函数名,或者一个TensorFlow/Theano符号函数。 该符号函数为每个数据点返回一个标量,有以下两个参数:

实际的优化目标是所有数据点的输出数组的平均值。

mean_squared_error:均方误差(mse)

1
2
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum_{i=1}^{n}(y_{true}^{(i)}-y_{pred}^{(i)})^2

公式:

mean_absolute_error (mae)

1
2
def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum_{i=1}^{n}|(y_{true}^{(i)}-y_{pred}^{(i)})|

公式:

mean_absolute_percentage_error (mape)

1
2
3
4
5
def mean_absolute_percentage_error(y_true, y_pred):
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true),
K.epsilon(),
None))
return 100. * K.mean(diff, axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum^n_{i=1}|\frac {y_{true}^{(i)}-y_{pred}^{(i)}}{y_{true}^{(i)}}|\cdot 100

公式:

mean_squared_logarithmic_error

1
2
3
4
def mean_squared_logarithmic_error(y_true, y_pred):
first_log = K.log(K.clip(y_pred, K.epsilon(), None) + 1.)
second_log = K.log(K.clip(y_true, K.epsilon(), None) + 1.)
return K.mean(K.square(first_log - second_log), axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum^n_{i=1}(log(y_{true}^{(i)} +1)-log(y_{pred}^{(i)}+1))^2

公式:

squared_hinge

1
2
def squared_hinge(y_true, y_pred):
return K.mean(K.square(K.maximum(1. - y_true * y_pred, 0.)), axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum^n_{i=1}(max(0,1-y_{pred}^{(i)} \cdot y_{true}^{(i)}))^2

公式:

hinge

1
2
def hinge(y_true, y_pred):
return K.mean(K.maximum(1. - y_true * y_pred, 0.), axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum^n_{i=1}max(0,1-y_{pred}^{(i)}\cdot y_{true}^{(i)})

公式:

categorical_hinge

1
2
3
4
def categorical_hinge(y_true, y_pred):
pos = K.sum(y_true * y_pred, axis=-1)
neg = K.max((1. - y_true) * y_pred, axis=-1)
return K.maximum(0., neg - pos + 1.)

logcosh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def logcosh(y_true, y_pred):
"""Logarithm of the hyperbolic cosine of the prediction error.
`log(cosh(x))` is approximately equal to `(x ** 2) / 2` for small `x` and
to `abs(x) - log(2)` for large `x`. This means that 'logcosh' works mostly
like the mean squared error, but will not be so strongly affected by the
occasional wildly incorrect prediction.
# Arguments
y_true: tensor of true targets.
y_pred: tensor of predicted targets.
# Returns
Tensor with one scalar loss entry per sample.
"""
def _logcosh(x):
return x + K.softplus(-2. * x) - K.log(2.)
return K.mean(_logcosh(y_pred - y_true), axis=-1)

categorical_crossentropy

1
2
def categorical_crossentropy(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred)

注意: 当使用categorical_crossentropy损失时,你的目标值应该是分类格式 (即,如果你有10个类,每个样本的目标值应该是一个10维的向量,这个向量除了表示类别的那个索引为1,其他均为0)。 为了将 整数目标值 转换为 分类目标值,你可以使用Keras实用函数to_categorical:

1
2
from keras.utils.np_utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)

sparse_categorical_crossentropy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def sparse_categorical_crossentropy(y_true, y_pred):
return K.sparse_categorical_crossentropy(y_true, y_pred)
def sparse_categorical_crossentropy(target, output, from_logits=False):
"""Categorical crossentropy with integer targets.

# Arguments
target: An integer tensor.
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.

# Returns
Output tensor.
"""
# Note: tf.nn.sparse_softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output)

output_shape = output.get_shape()
targets = cast(flatten(target), 'int64')
logits = tf.reshape(output, [-1, int(output_shape[-1])])
res = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=targets,
logits=logits)
if len(output_shape) >= 3:
# if our output includes timestep dimension
# or spatial dimensions we need to reshape
return tf.reshape(res, tf.shape(output)[:-1])
else:
return res

binary_crossentropy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def binary_crossentropy(y_true, y_pred):
return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
def binary_crossentropy(target, output, from_logits=False):
"""Binary crossentropy between an output tensor and a target tensor.

# Arguments
target: A tensor with the same shape as `output`.
output: A tensor.
from_logits: Whether `output` is expected to be a logits tensor.
By default, we consider that `output`
encodes a probability distribution.

# Returns
A tensor.
"""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))

return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,logits=output)

kullback_leibler_divergence

1
2
3
4
def kullback_leibler_divergence(y_true, y_pred):
y_true = K.clip(y_true, K.epsilon(), 1)
y_pred = K.clip(y_pred, K.epsilon(), 1)
return K.sum(y_true * K.log(y_true / y_pred), axis=-1)

poisson

1
2
def poisson(y_true, y_pred):
return K.mean(y_pred - y_true * K.log(y_pred + K.epsilon()), axis=-1)

LaTeX:

1
L=\frac{1}{n}\sum^n_{i=1}(y_{pred}^{(i)}-y_{true}^{(i)}\cdot log(y_{pred}^{(i)}))

公式:

cosine_proximity

1
2
3
4
def cosine_proximity(y_true, y_pred):
y_true = K.l2_normalize(y_true, axis=-1)
y_pred = K.l2_normalize(y_pred, axis=-1)
return -K.sum(y_true * y_pred, axis=-1)

LaTeX:

1
L=-\frac{\sum^n_{i=1}y_{true}^{(i)}\cdot y_{pred}^{(i)}}{\sqrt{\sum^n_{i=1} (y_{true}^{(i)})^2}\cdot\sqrt{\sum^n_{i=1}(y_{pred}^{(i)})^2}}

公式:

简写:

1
2
3
4
5
6
mse = MSE = mean_squared_error
mae = MAE = mean_absolute_error
mape = MAPE = mean_absolute_percentage_error
msle = MSLE = mean_squared_logarithmic_error
kld = KLD = kullback_leibler_divergence
cosine = cosine_proximity

参考: 知乎

   /