Keras Succinctly^®
by James McCaffrey

CHAPTER 4

Binary Classification

The goal of binary classification is to make a prediction where the variable to predict can take on one of just two discrete values. For example, you might want to predict the sex (male or female) of a person based on their age, political party affiliation, annual income, and so on. Binary classification works somewhat differently than multiclass classification, where the variable to predict can be one of three or more possible discrete values.

Binary Classification using Keras

Figure 4-1: Binary Classification using Keras

The screenshot in Figure 4-1 shows a demonstration of binary classification. The demo program begins by loading 178 training data items, 59 validation data items, and 60 test data items into memory. Each item represents a patient who has heart disease (1) or not (0). There are 13 predictor variables in the raw data. After normalization and encoding, there are 18 input variables.

Behind the scenes, the demo program creates an 18-(10-10)-1 deep neural network, that is, one with 18 input values (one for each predictor value), two hidden layers both with 10 nodes, and a single output node. The demo program trains the neural network model using 2,000 epochs. During training, the loss and accuracy values for both the training data and the validation data are displayed.

After training completes, the trained model achieves a prediction accuracy of 83.33 percent on the test data (50 of 60 correct, 10 incorrect). The demo concludes by making a prediction for a new, hypothetical, previously unseen patient. The predicted probability is 0.0197, and because the value is less than 0.5, the output maps to 0, which in turn maps to a prediction of "no heart disease."

Understanding the data

The demo program uses the Cleveland Heart Disease dataset, a well-known classification benchmark dataset for statistics and machine learning. There are a total of 303 items. The raw data looks like this:

56.0, 1, 2, 120.0, 236.0, 0, 0, 178.0, 0, 0.8, 1,   3,   3,   0
62.0, 0, 4, 140.0, 268.0, 0, 2, 160.0, 0, 3.6, 3,   1,   6,   3
63.0, 1, 4, 130.0, 254.0, 0, 1, 147.0, 0, 1.4, 2,   2,   ?,   2
53.0, 1, 1, 140.0, 203.0, 1, 2, 155.0, 1, 3.1, 3,   0,   7,   1
[0]   [1] [2] [3]     [4]    [5] [6] [7]    [8] [9] [10] [11] [12] HD

The first 13 values on each line are the predictor values. The last value is 0 to 4, where 0 indicates no heart disease and 1 to 4 indicate heart disease of some kind. Predictor [0] is patient age. Predictor [1] is a Boolean sex (0 = female, 1 = male). Predictor [2] is categorical chest pain type encoded as 1 to 4.

Predictor [3] is blood pressure. Predictor [4] is cholesterol. Predictor [5] is a Boolean related to blood sugar (0 = low, 1 = high). Predictor [6] is categorical electrocardiographic result encoded as (0, 1, 2). Predictor [7] is maximum heart rate. Predictor [8] is a Boolean for angina (0 = no, 1 = yes). Predictor [9] is ST ("S-wave, T-wave") graph depression.

Predictor [10] is a categorical ST metric encoded as (1, 2, 3). Predictor [11] is a categorical count of colored fluoroscopy vessels encoded as (0, 1, 2, 3). Predictor [12] is a categorical value related to thalassemia encoded as (3, 6, 7).

The first step in data preparation is to deal with six data items that have one or more missing values. I took the simplest approach, which is to just delete any rows with missing data, leaving 297 data items. In my opinion, alternatives such as supplying an average column value, are usually not a good idea.

Partial Cleveland Heart Disease Data

Figure 4-2: Partial Cleveland Heart Disease Data

The raw data was prepared by min-max normalizing the five numeric predictor variable values, by (-1, +1) encoding the three Boolean predictors, and by 1-of-(N-1) encoding the five categorical predictors. The class values-to-predict were encoded so that 0 means no indication of heart disease, and 1 means indication of some form of disease. I replaced the comma delimiters with tab characters.

After dealing with missing values, normalization, and encoding, the 297-item dataset was randomly split into three files: a 178-item (60 percent) set for training, and a 59-item (20 percent) set for validation, and a 60-item (20 percent) set for testing.

Because the Cleveland Heart Disease dataset has 13 dimensions, it's not possible to easily visualize it in a two-dimensional graph. But you can get a rough idea of the data from the partial graph in Figure 4-2. The graph shows only patient age and blood pressure for the first 160 items of the full dataset. As you can see, it's not possible to get a good prediction model using a simple linear technique like logistic regression or a base support vector machine linear model.

The Cleveland program

The complete program that generated the output shown in Figure 4-1 is shown in Code Listing 4-1. The program begins with comments the program file name (the _bnn is not a standard convention and just stands for binary neural network) and versions of Python, TensorFlow, and Keras used, and then imports the NumPy, Keras, TensorFlow, and OS packages:

# iris_dnn.py
# Python 3.5.2, TensorFlow 2.1.5, Keras 1.7.0
import numpy as np
import keras as K
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

In a non-demo scenario, you'd want to include additional details in the comments. Because Keras and TensorFlow are under rapid development, it's a good idea to document which versions are being used. Version incompatibilities can be a significant problem when working with Keras and open-source software.

Code Listing 4-1: Cleveland Heart Disease Binary Classification Program

# cleveland_bnn.py

# Python 3.5.2, TensorFlow 2.1.5, Keras 1.7.0

# ==================================================================================

import numpy as np

import keras as K

import tensorflow as tf

import os

os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

class MyLogger(K.callbacks.Callback):

def __init__(self, n):

self.n = n

def on_epoch_end(self, epoch, logs={}):

if epoch % self.n == 0:

t_loss = logs.get('loss')

t_accu = logs.get('acc')

v_loss = logs.get('val_loss')

v_accu = logs.get('val_acc')

print("epoch = %4d t_loss = %0.4f t_acc = %0.2f%% v_loss = %0.4f \

v_acc = %0.2f%%" % (epoch, t_loss, t_accu*100, v_loss, v_accu*100))

# ==================================================================================

def main():

# 0. get started

print("\nCleveland binary classification dataset using Keras/TensorFlow ")

np.random.seed(1)

tf.set_random_seed(2)

# 1. load data

print("Loading Cleveland data into memory \n")

train_file = ".\\Data\\cleveland_train.txt"

valid_file = ".\\Data\\cleveland_validate.txt"

test_file = ".\\Data\\cleveland_test.txt"

train_x = np.loadtxt(train_file, usecols=range(0,18),