Tensorflow Keras Compile Options Binary_crossentropy

In this tutorial, nosotros will focus on how to select
Accurateness Metrics, Activation & Loss functions
in
Binary
Classification Bug.

Showtime, we will
review
the
types
of

Classification Problems
,

Activation & Loss functions
,

characterization encodings
, and
accurateness metrics.

Furthermore, we will too talk over how the
target encoding
tin can impact the selection of Activation & Loss functions.

Moreover, nosotros volition talk about how to select the
accurateness metric
correctly.

Then, for each type of classification trouble, we will apply several Activation & Loss functions and observe their effects on
performance.

We will experiment with all the concepts past designing and evaluating a deep learning model by using
Transfer Learning
on
horses and humans
dataset.

In the end, we will summarize the experiment results.

I split the tutorial into
three parts. In this offset part, we will focus on
Binary Classification. Adjacent office, we will focus on multi-characterization classification and multi-label nomenclature.

If y’all would similar to learn more about Deep Learning with practical coding examples, please
subscribe
to my YouTube Channel or
follow
my blog on Medium. Do non forget to turn on
notifications
so that yous will be notified when

new parts are uploaded
.

You can access this
Colab Notebook
using the link given in the video description below.

Furthermore, you can spotter this notebook on Youtube as well!

If y’all are ready, allow’s go started!

You lot tin sentinel this notebook on Murat Karakaya Akademi Youtube channel.

Types of Nomenclature Tasks

In general, there are iii main types/categories for Nomenclature Tasks in car learning:

A. binary classification
two target classes

B. multi-class classification
more than than two sectional targets, only one form tin can be assigned to an input

C. multi-characterization classification
more two non-exclusive targets, one input can be labeled with multiple target classes.

Nosotros volition see the details of each classification job forth with an example dataset and Keras model beneath.

Types of Label Encoding

In full general, we can utilize different encodings for
truthful (actual) labels (y values)
:

  • a floating number
    (east.yard. in binary classification: 1 or 0)
  • ane-hot encoding
    (due east.g. in multi-class classification: [0 0 1 0 0])
  • a vector (array) of integers
    (eastward.thou. in multi-label classification: [xiv 225 3])

We will cover the all possible encodings in the post-obit examples.

Types of Activation Functions for Nomenclature Tasks

In Keras, there are several Activation Functions. Below I summarize ii of them:

  • Sigmoid or Logistic Activation Function:
    Sigmoid part maps any input to an output ranging from
    0 to 1. For small values (<-five), sigmoid returns a value close to null, and for large values (>five) the consequence of the function gets close to one. Sigmoid is equivalent to a two-element Softmax, where
    the second element is causeless to be zero.
    Therefore, sigmoid is more often than not used for binary classification.

Example: Assume the last layer of the model is every bit:

outputs = keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)(ten)

              # Permit the final layer output vector be:
y_pred_logit = tf.abiding([-20, -1.0, 0.0, 1.0, 20], dtype = tf.float32)
print("y_pred_logit:", y_pred_logit.numpy())
# and terminal layer activation role is sigmoid:
y_pred_prob = tf.keras.activations.sigmoid(y_pred_logit)
impress("y_pred:", y_pred_prob.numpy())
print("sum of all the elements in y_pred: ",y_pred_prob.numpy().sum())
y_pred_logit: [-20. -1. 0. 1. 20.]
y_pred: [2.0611537e-09 2.6894143e-01 five.0000000e-01 vii.3105860e-01 one.0000000e+00]
sum of all the elements in y_pred: 2.5
  • Softmax part:
    Softmax converts a existent vector to
    a vector of categorical probabilities. The elements of the output vector are in range (0, i) and
    sum to 1. Each vector is handled independently.
    Softmax is ofttimes used
    as the activation for the final layer of a
    classification
    network considering the result could exist interpreted as a probability distribution.
    Therefore, Softmax is mostly used for multi-class or multi-label classification.

For case: Assume the terminal layer of the model is every bit:

outputs = keras.layers.Dense(1, activation=tf.keras.activations.softmax)(x)

              # Assume last layer output is as:
y_pred_logit = tf.constant([[-twenty, -i.0, 4.5], [0.0, 1.0, 20]], dtype = tf.float32)
print("y_pred_logit:\north", y_pred_logit.numpy())
# and last layer activation part is softmax:
y_pred_prob = tf.keras.activations.softmax(y_pred_logit)
print("y_pred:", y_pred_prob.numpy())
print("sum of all the elements in each vector in y_pred: ",
y_pred_prob.numpy()[0].sum()," ",
y_pred_prob.numpy()[1].sum())
y_pred_logit:
[[-twenty. -1. 4.5]
[ 0. 1. xx. ]]
y_pred: [[2.2804154e-11 four.0701381e-03 9.9592990e-01]
[two.0611537e-09 5.6027964e-09 1.0000000e+00]]
sum of all the elements in each vector in y_pred: one.0 1.0

These two activation functions are the most used ones for nomenclature tasks

in the last layer
.

PLEASE Annotation THAT
If we
don’t specify whatever activation
office at the final layer, no activation is applied to the outputs of the layer (ie.
“linear” activation: a(10) = 10).

Types of Loss Functions for Classification Tasks

In Keras, there are several Loss Functions. Beneath, I summarized the ones used in
Classification
tasks:

  • BinaryCrossentropy:
    Computes the cross-entropy loss between true labels and predicted labels. We use this cantankerous-entropy loss
    when in that location are but two characterization classes (assumed to exist 0 and 1). For each example, there should be a
    single floating-signal value per prediction.
  • CategoricalCrossentropy:
    Computes the cross-entropy loss between the labels and predictions. Nosotros apply this cross-entropy loss function
    when there are two or more label classes.
    We expect
    labels to be provided in a one-hot representation. If y’all want to provide labels as integers, please use SparseCategoricalCrossentropy loss. There should be # classes floating point values per feature.
  • SparseCategoricalCrossentropy:
    Computes the cross-entropy loss between the labels and predictions. We use this cross-entropy loss function
    when at that place are 2 or more label classes. Nosotros expect
    labels to be provided as integers. If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss. At that place should exist # classes floating point values per feature for y_pred and a unmarried floating-point value per feature for y_true.
Baca juga:  Telegram Groups For Binary Options

Of import:

  1. In Keras,
    these three Cross-Entropy
    functions expect two inputs:

    right / true /actual labels

    (y) and

    predicted labels

    (y_pred):
  • Equally mentioned above,
    right (actual) labels
    can be encoded

    floating numbers
    ,

    i-hot,

    or an

    array of integer

    values.
  • However, the
    predicted labels
    should exist presented as a

    probability distribution
    .
  • If the predicted labels are
    non converted to a probability
    distribution

    past the last layer

    of the model (using
    sigmoid
    or
    softmax
    activation functions), we
    need to inform
    these three Cross-Entropy functions past setting their
    from_logits = Truthful.

2. If the parameter
from_logits is set True
in any cross-entropy function, then the function expects

ordinary

numbers as
predicted label values
and apply
sigmoid transformation
on these predicted label values to convert them into a
probability distribution. For details, you can check the
tf.keras.backend.binary_crossentropy
source code. The below code is taken from TF source code:

if from_logits: render nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

three. Both,
categorical cross-entropy
and
thin chiselled cantankerous-entropy
have
the same loss function
which we have mentioned in a higher place. The
but deviation
is the
format of the truthful labels:

  • If

    correct (actual) labels

    are
    one-hot
    encoded, use
    categorical_crossentropy. Examples (for a three-class classification): [i,0,0] , [0,ane,0], [0,0,1]
  • Merely if

    correct (actual) labels

    are
    integers, employ
    sparse_categorical_crossentropy. Examples for above 3-course classification problem: [ane] , [two], [three]
  • The usage entirely depends on how
    nosotros load our dataset.
  • One reward of using sparse chiselled cross-entropy
    is information technology saves storage in memory every bit well as time in computation because information technology simply uses a single integer for a form, rather than
    a whole one-hot vector.

I volition explain the above concepts by designing models in
three
parts

Types of Accuracy Metrics

Keras has several accuracy metrics. In classification, we can employ two of them:

  • Accuracy: Calculates how often predictions

    equal

    labels.
              
                y_true = [[1],    [1],   [0],    [0]]
y_pred = [[0.99], [1.0], [0.01], [0.0]]

print("Which predictions equal to labels:", np.equal(y_true, y_pred).reshape(-1,))
thou = tf.keras.metrics.Accuracy()
m.update_state(y_true, y_pred)
impress("Accuracy: ",m.result().numpy())
Which predictions equal to labels: [Faux True Faux True]
Accurateness: 0.five
  • Binary Accuracy:
    Calculates how often predictions

    friction match

    binary labels.
              
                y_true = [[one],    [1],    [0], [0]]
y_pred = [[0.49], [0.51], [0.5], [0.51]]

m = tf.keras.metrics.binary_accuracy(y_true, y_pred, threshold=0.five)
print("Which predictions match with binary labels:", m.numpy())

m = tf.keras.metrics.BinaryAccuracy()
one thousand.update_state(y_true, y_pred)
impress("Binary Accuracy: ", one thousand.result().numpy())

Which predictions lucifer with binary labels: [0. 1. i. 0.]
Binary Accuracy: 0.5
  • Categorical Accuracy:
    Calculates how often predictions

    friction match

    one-hot
    labels.
              
                # assume 3 classes exist
y_true = [[ 0, 0, 1], [ 0, 1, 0]]
y_pred = [[0.1, 0.9, 0.8], [0.05, 0.95, 0.3]]

m = tf.keras.metrics.categorical_accuracy(y_true, y_pred)
impress("Which predictions match with 1-hot labels:", one thousand.numpy())
g = tf.keras.metrics.CategoricalAccuracy()
m.update_state(y_true, y_pred)
print("Categorical Accuracy:", m.upshot().numpy())

Which predictions lucifer with one-hot labels: [0. ane.]
Categorical Accuracy: 0.v

Part A: Binary Nomenclature (two target classes)

For a binary classification task, I will employ “horses_or_humans” dataset which is available in
TF Datasets.

A. 1. Truthful (Bodily) Labels are encoded with a single floating number (1./0.)

Offset, let’s load the data from Tensorflow Datasets

              ds_raw_train, ds_raw_test = tfds.load('horses_or_humans',
                
dissever=['train','exam'], as_supervised=True)
impress("Number of samples in railroad train : ", ds_raw_train.cardinality().numpy(),
" in test : ",ds_raw_test.cardinality().numpy())
Number of samples in train : 1027 in test : 256 def show_samples(dataset):
fig=plt.figure(figsize=(14, 14))
columns = 3
rows = 3

print(columns*rows,"samples from the dataset")
i=1
for a,b in dataset.take(columns*rows):
fig.add_subplot(rows, columns, i)
plt.imshow(a)
#plt.imshow(a.numpy())
plt.title("prototype shape:"+ str(a.shape)+" Label:"+str(b.numpy()) )

i=i+one
plt.show()
show_samples(ds_raw_test)

ix samples from the dataset

Notice that:

  • There are
    only 2 label classes:

    horses and humans
    .
  • For each sample, there is a
    single floating-indicate value per label:
    (
    0 → horse, 1 → human
    )

Let’southward resize and calibration the images and so that we can save time in training

              #VGG16 expects min 32 x 32
                
def resize_scale_image(image, label):
paradigm = tf.prototype.resize(prototype, [32, 32])
image = paradigm/255.0
return epitome, label
ds_train_resize_scale=ds_raw_train.map(resize_scale_image)
ds_test_resize_scale=ds_raw_test.map(resize_scale_image)
show_samples(ds_test_resize_scale)
9 samples from the dataset

png

Prepare the data pipeline by setting batch size & buffer size using tf.data

              batch_size = 64
                

#buffer_size = ds_train_resize_scale.cardinality().numpy()/x
#ds_resize_scale_batched=ds_raw.echo(iii).shuffle(buffer_size=buffer_size).batch(64, )

ds_train_resize_scale_batched=ds_train_resize_scale.batch(64, drop_remainder=Truthful )
ds_test_resize_scale_batched=ds_test_resize_scale.batch(64, drop_remainder=Truthful )

print("Number of batches in railroad train: ", ds_train_resize_scale_batched.cardinality().numpy())
print("Number of batches in exam: ", ds_test_resize_scale_batched.cardinality().numpy())

Number of batches in train: xvi
Number of batches in test: 4

To train fast, let’south use Transfer Learning past importing VGG16

              base_model = keras.applications.VGG16(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(32, 32, iii), # VGG16 expects min 32 x 32
include_top=False) # Exercise non include the ImageNet classifier at the meridian.
base_model.trainable = False

Create the nomenclature model

              inputs = keras.Input(shape=(32, 32, three))
x = base_model(inputs, training=False)
ten = keras.layers.GlobalAveragePooling2D()(x)
initializer = tf.keras.initializers.GlorotUniform(seed=42)

activation = None # tf.keras.activations.sigmoid or softmax

outputs = keras.layers.Dumbo(1,
kernel_initializer=initializer,
activation=activation)(x)
model = keras.Model(inputs, outputs)

Pay attending:

  • The last layer has only 1 unit of measurement. So the output (
    y_pred
    ) will exist
    a single floating point
    equally the true (bodily) label (
    y_true
    ).
  • For the last layer, the activation function can be:
  • None
  • sigmoid
  • softmax
  • When in that location is
    no activation
    function is used in the model’south last layer, we need to fix
    from_logits=True
    in cross-entropy loss functions
    every bit we discussed above. Thus,
    cross-entropy loss functions
    volition use a
    sigmoid
    transformation on
    predicted characterization values:
  • if from_logits: return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

Compile the model

              model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.BinaryCrossentropy(from_logits=True), # default from_logits=Imitation
metrics=[keras.metrics.BinaryAccuracy()])

Important:
We need to employ
keras.metrics.BinaryAccuracy()
for
measuring
the
accurateness
since it calculates how ofttimes predictions friction match
binary labels.

  • As we mentioned above, Keras does

    non

    define a

    single

    accurateness metric, only

    several

    unlike ones, among them:
    accuracy,
    binary_accuracy
    and
    categorical_accuracy.
  • What happens under the hood is that, if y’all select

    mistakenly

    chiselled cross-entropy as your loss function
    in
    binary classification
    and if y’all practise

    non specify

    a particular accuracy metric by but writing
Baca juga:  5min Binary Options Strategy High Low Stochastic Cross Alert

metrics="Accuracy"

Keras (
wrongly
…)
infers
that you are interested in the
categorical_accuracy, and this is what it returns — while in fact, yous are interested in the
binary_accuracy
since our problem is a binary nomenclature.

In summary;

  • to get
    model.fit()
    and
    model.evaulate()
    run correctly (without mixing the loss role and the classification problem at hand) we need to
    specify the bodily accuracy metric!
  • if the true (bodily) labels are encoded binary (0./ane.), you lot need to apply
    keras.metrics.BinaryAccuracy()
    for
    measuring
    the
    accuracy
    since it calculates how often predictions match
    binary labels.

Try & Meet

At present, we can try and see the performance of the model past using a
combination of activation and loss functions.

Each epoch takes almost 15 seconds on Colab TPU accelerator.

              model.fit(ds_train_resize_scale_batched, validation_data=ds_test_resize_scale_batched, epochs=xx)
              Epoch 1/xx
xvi/16 [==============================] - 17s 1s/step - loss: 0.7149 - binary_accuracy: 0.4824 - val_loss: 0.6762 - val_binary_accuracy: 0.5039
...
...
Epoch 19/20
16/16 [==============================] - 17s 1s/step - loss: 0.3041 - binary_accuracy: 0.8730 - val_loss: 0.5146 - val_binary_accuracy: 0.8125
Epoch twenty/xx
16/16 [==============================] - 17s 1s/step - loss: 0.2984 - binary_accuracy: 0.8809 - val_loss: 0.5191 - val_binary_accuracy: 0.8125

model.evaluate(ds_test_resize_scale_batched)

4/4 [==============================] - 2s 556ms/step - loss: 0.5191 - binary_accuracy: 0.7266

[0.519140362739563, 0.7265625]

Obtained Results*:

When you run this notebook, about probably you would non get the exact numbers rather you would notice very similar values due to the stochastic nature of ANNs.

Note that:

  • Generally, we use
    softmax activation
    instead of
    sigmoid
    with the
    cross-entropy loss
    because softmax activation distributes the probability throughout each output node.
  • But, for
    binary classification, we apply
    sigmoid
    rather than softmax.
  • The practical reason is that
  • softmax
    is specially designed for
    multi-course
    and
    multi-label
    classification tasks.
  • Sigmoid
    is equivalent to a 2-element
    Softmax, where the 2d element is assumed to be zero. Therefore,
    sigmoid
    is by and large used for
    binary classification.
  • The above results support this recommendation

Why do BinaryCrossentropy loss functions with from_logits=Truthful atomic number 82 to good accuracy without whatever activation function?

Because using from_logits=True tells the BinaryCrossentropy loss functions to apply its own
sigmoid
transformation over the inputs:

if from_logits: return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

In Keras documentation: “Using from_logits=True may be more numerically stable.

In summary:

Nosotros tin can
conclude
that, if the task is
binary classification
and truthful (actual) labels are encoded as a
single floating number
(0./one.) we have ii options to go:

  • Option ane: activation =
    sigmoid
    loss =BinaryCrossentropy()
    accurateness metric=
    BinaryAccuracy()
  • Option 1: activation =
    None
    loss =BinaryCrossentropy(from_logits=True)
    accuracy metric=
    BinaryAccuracy()

A. 2. True (Actual) Labels are one-hot encoded [1 0] or [0 1]

Unremarkably, in binary classification problems, we

exercise not

utilise one-hot encoding for
y_true
values. However, I would like to investigate the effects of doing so. In your existent-life applications, it is upward to you how to encode your y_true. You tin can retrieve of this department
as an experiment.

Offset, catechumen the true (bodily) characterization encoding to one-hot

              def one_hot(image, label):
label = tf.one_hot(label, depth=ii)
return image, label
ds_train_resize_scale_one_hot= ds_train_resize_scale.map(one_hot)
ds_test_resize_scale_one_hot= ds_test_resize_scale.map(one_hot)
show_samples(ds_test_resize_scale_one_hot)
nine samples from the dataset

png

Notice that:

  • In that location are
    just two characterization classes:

    horses and humans
    .
  • Labels are now
    one-hot encoded

[1. 0.] → horse,
[0. 1.] → human

Prepare the data pipeline by setting the batch size

              ds_train_resize_scale_one_hot_batched=ds_train_resize_scale_one_hot.batch(64)
ds_test_resize_scale_one_hot_batched=ds_test_resize_scale_one_hot.batch(64)

Create the classification model

              inputs = keras.Input(shape=(32, 32, 3))
10 = base_model(inputs, preparation=False)
x = keras.layers.GlobalAveragePooling2D()(x)

initializer = tf.keras.initializers.GlorotUniform(seed=42)
activation = None # tf.keras.activations.sigmoid or softmax

outputs = keras.layers.Dense(2,
kernel_initializer=initializer,
activation=activation)(ten)

model = keras.Model(inputs, outputs)

Pay attending:

  • The last layer has
    now ii units
    instead of 1. Thus the output volition back up
    one-hot
    encoding of the true (actual) label. Remember that the i-hot vector has

    two floating-betoken numbers

    in
    binary
    classification: [1. 0.] or [0. 1.]
  • For the last layer, the activation role can be:
  • None
  • sigmoid
  • softmax
  • When there is
    no activation
    role is used, nosotros need to set
    from_logits=Truthful
    in cross-entropy functions
    as we discussed in a higher place
Baca juga:  How To Trade Binary Options On Etrade

Compile the model

              model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.CategoricalCrossentropy(from_logits=True), # default from_logits=Faux
metrics=[keras.metrics.CategoricalAccuracy()])

Of import:
We need to use
keras.metrics.CategoricalAccuracy()
for
measuring
the
accuracy
since it calculates how often predictions match
one-hot labels.
Practise Not Apply
just
metrics=['accuracy']
as a performance metric! Because, as explained above hither in details:

  • Keras does not ascertain a unmarried accurateness metric, but several different ones, among them:
    accuracy,
    binary_accuracy
    and
    categorical_accuracy.
  • What happens under the hood is that, if you
    mistakenly
    select
    binary cantankerous-entropy as your loss function
    when
    y_true
    is encoded

    one-hot

    and do

    not specify

    a particular accuracy metric, instead, if you provide only:
  • metrics="Accuracy"
  • Keras (
    wrongly
    …)
    infers
    that you are interested in the
    binary_accuracy, and this is what it returns — while in fact, you are interested in the
    categorical_accuracy
    (because of 1-hot encoding!).

In summary,

  • to get
    model.fit()
    and
    model.evaulate()
    run correctly (without mixing the loss function and the classification trouble at hand) nosotros demand to
    specify the actual accuracy metric!
  • if the true (bodily) labels are encoded on-hot, you lot demand to utilize
    keras.metrics.CategoricalAccuracy()
    for
    measuring
    the
    accuracy
    since it calculates how often predictions match
    one-hot labels.

Try & Run across

You can try and see the operation of the model by using a
combination of activation and loss functions.

Each epoch takes almost 15 seconds on Colab TPU accelerator.

              model.fit(ds_train_resize_scale_one_hot_batched, validation_data=ds_test_resize_scale_one_hot_batched, epochs=20)
              Epoch ane/20
17/17 [==============================] - 17s 1s/step - loss: 0.8083 - categorical_accuracy: 0.4956 - val_loss: 0.7656 - val_categorical_accuracy: 0.4648
...
...
Epoch 19/twenty
17/17 [==============================] - 17s 997ms/step - loss: 0.2528 - categorical_accuracy: 0.9182 - val_loss: 0.5972 - val_categorical_accuracy: 0.7031
Epoch 20/20
17/17 [==============================] - 17s 1s/stride - loss: 0.2476 - categorical_accuracy: 0.9211 - val_loss: 0.6044 - val_categorical_accuracy: 0.6992

model.evaluate(ds_test_resize_scale_one_hot_batched)

4/four [==============================] - 2s 557ms/step - loss: 0.6044 - categorical_accuracy: 0.6992

Obtained Results*:

  • When yous run this notebook, about probably you lot would non get the verbal numbers rather you would discover very similar values due to the stochastic nature of ANNs.

Why exercise Binary and Categorical cross-entropy loss functions lead to like accuracy?

I would similar to remind you that when we tested two loss functions for the truthful labels are encoded equally 1-hot, the calculated loss values are
very similar. Thus, the model converges by using the loss part results and since both functions generate similar loss functions, the resulting trained models would have similar accuracy every bit seen above.

Why exercise Sigmoid and Softmax activation functions atomic number 82 to similar accuracy?

  • Since we employ 1-hot encoding in true label encoding, sigmoid generates two floating numbers irresolute from 0 to 1 but the sum of these two numbers exercise not necessarily equal 1 (they are non probability distribution).
  • On the other paw, softmax generates two floating numbers irresolute from 0 to 1 but the sum of these ii numbers exactly equal to 1.
  • Normally, the Binary and Chiselled cross-entropy loss functions expect a probability distribution over the input values (when from_logit = Simulated as default).
  • Nevertheless, sigmoid activation function output is non a probability distribution over these two outputs.
  • Even so, the Binary and Chiselled cross-entropy loss functions can consume sigmoid outputs and generate similar loss values.

Why 0.6992?

I take run the models for 20 epochs starting with the aforementioned initial weights to isolate the initial weight effects on the performance. Here, 4 models reach verbal accuracy 0.6992 and the rest similarly achieve exact accuracy 0.7148. Ane reason might exist it is only hazard. Another reason could be if all the loss calculations end upward with the same values so that the gradients are exactly the same. Simply it is not likely. I checked several times just the process seems to be right. Delight attempt yourself at home :))

According to the above experiment results, if the job is
binary classification
and truthful (actual) labels are encoded equally a
one-hot, we might have ii options:

  • Option A
  • activation =
    None
  • loss =
    BinaryCrossentropy(from_logits=True)
  • accurateness metric=
    CategoricalAccuracy()
  • Option B
  • activation =
    sigmoid
  • loss =BinaryCrossentropy()
  • accurateness metric=
    CategoricalAccuracy()

Binary Nomenclature Summary

In a nut shel, in binary classification

  • nosotros employ floating numbers 0. or 1.0 to encode the class labels,
  • BinaryAccuracy is the correct accuracy metric
  • (By and large recomended) Last layer activation part is Sigmoid and loss office is BinaryCrossentropy.
  • Merely we observed that the last layer activation function None and loss function is BinaryCrossentropy(from_logits=True) could as well work.

So the summary of the experiments are below:

Next: Role B: Multi-Class classification (more than 2 target classes)

You can follow me on these social networks:

YouTube

Facebook

Instagram

LinkedIn

Github

Kaggle

Medium

Source: https://medium.com/deep-learning-with-keras/which-activation-loss-functions-part-a-e16f5ad6d82a




banner

×