@Joshua Payne | | 5 minute read | Home
Code from this article was based off a tutorial found here. This article assumes a basic intuitive understanding of neural networks. For background, check this out.
Table of Contents
Using this dataset, I created a neural network capable of classifying breast tumors. The features are measured characteristics of cell nuclei within the tumor, including perimeter, concavity, and smoothness. The labels are 0 or 1, representing benign and malignant diagnoses respectively. With my network, I mapped the relationship between these two variables.
Here’s how I did it
We’ll first import the following libraries.
import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout import pandas as pd from sklearn import preprocessing from sklearn.model_selection import train_test_split from matplotlib import pyplot as plt
We’ll then use pandas to read in our features and labels data by assigning each of the sets of data to variables x and y. In pandas, these are called dataframes, which are basically the same as tables.
x = pd.read_csv(‘https://raw.githubusercontent.com/antaloaalonso/Classification-Model-YT-Video/master/X_data.csv') y = pd.read_csv(‘https://raw.githubusercontent.com/antaloaalonso/Classification-Model-YT-Video/master/Y_data.csv')
We’ll then scale our features data as part of the preprocessing stage.
x = preprocessing.scale(x)
It’s now time to split our data into testing and training data. Training data is what our neural network uses to learn how to map our features to our labels, and testing data is what we use to see our model in action on data samples it hasn’t seen before. 20% of our data will be testing data, and 80% will be training data.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
Let’s now convert our training and testing data into numpy arrays so that we can use them with a Keras neural network.
x_train = np.array(x_train) y_train = np.array(y_train) x_test = np.array(x_test) y_test = np.array(y_test)
Let’s now build our actual model. We’re simply building a feedforward neural network, so the Sequential model easily suffices. Our model is composed of dense layers with ReLU activation functions and 20 nodes, and then ends with 1 node representing the final classification prediction. To learn more about what the activation functions sigmoid and ReLU mean here, check this out.
Our input shape represents the shape of our feature arrays.
model = Sequential() model.add(Dense(20, input_shape=(30,))) model.add(Dense(20, activation=’relu’)) model.add(Dense(20, activation=’relu’)) model.add(Dense(20, activation=’relu’)) model.add(Dense(1, activation=’sigmoid’))
We’ll now compile our model. The Adam optimizer is highly effective, and binary cross-entropy is a go-to loss function for classification problems of 2 classes. It rounds are sigmoid output to an integer and then compares that against the dataset’s output to measure error. The ‘output’ represents the predicted or actual diagnosis for breast cancer — 0 for benign and 1 for malignant. We’ll use the accuracy metric so we can understand how accurate our model is at classifying these types of breast tumours.
Afterward, we’ll fit our model to our training data, pass over the data 500 times and have a validation split of 0.3. This means that 30% of our data becomes validation data, which our model tests its validation accuracy upon. These are different from testing samples as those are used for our own predictions, outside of our model’s training. We’ll then save the history of our neural network.
model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’]) history = model.fit(x_train, y_train, epochs=500, validation_split=0.3) history_dict = history.history
It’s now time to plot our training loss and validation loss measured during the model’s training on a graph, to better understand how our network is operating.
loss_values = history_dict[‘loss’] val_loss_values = history_dict[‘val_loss’] plt.figure() plt.plot(loss_values, ‘bo’, label=’training loss’) plt.plot(val_loss_values, ‘r’, label=’validation loss’)
After training the model over 500 iterations, here were the metrics for the final epoch and our graph.
Epoch 500/500 317/317 [==============================] - 0s 155us/sample - loss: 0.0851 - accuracy: 0.9621 - val_loss: 0.1539 - val_accuracy: 0.9270
We are obviously super successful! With uber-low loss rates and high accuracies for both validation and training data, our model was highly successful.
Let’s see it in action!
We’ll use the first data sample in our testing data, and see what our model predicts as its label. This means we’ll be using x_test.
x_test = [1.096e+01 1.762e+01 7.079e+01 3.656e+02 9.687e-02 9.752e-02 5.263e-02 2.788e-02 1.619e-01 6.408e-02 1.507e-01 1.583e+00
1.165e+00 1.009e+01 9.501e-03 3.378e-02 4.401e-02 1.346e-02 1.322e-02 3.534e-03 1.162e+01 2.651e+01 7.643e+01 4.075e+02 1.428e-01
2.510e-01 2.123e-01 9.861e-02 2.289e-01 8.278e-02]
However, when making predictions with Keras, we need to have commas and pass a list of a list. Let’s make a new variable so that these requirements are accommodated for.
x_test_1 = [[1.096e+01, 1.762e+01, 7.079e+01, 3.656e+02, 9.687e-02, 9.752e-02, 5.263e-02, 2.788e-02, 1.619e-01, 6.408e-02, 1.507e-01,
1.583e+00, 1.165e+00, 1.009e+01, 9.501e-03, 3.378e-02, 4.401e-02, 1.346e-02, 1.322e-02, 3.534e-03, 1.162e+01, 2.651e+01, 7.643e+01,
4.075e+02, 1.428e-01, 2.510e-01, 2.123e-01, 9.861e-02, 2.289e-01, 8.278e-02]]
An output of ‘0’ means the tumor is predicted to be benign and an output of ‘1’ means a malignant prediction. By creating a classes variable we can modify our output to say the type of tumor, not 0 or 1.
classes = [‘benign’, ‘malignant’]
We can now actually use our model! The numeric label, 0 or 1, that’s predicted for the first testing data sample, is used as an index for the classes variable. If 0 is predicted, benign is the prediction, and if 1 is predicted, malignant is the prediction.
prediction = classes[int(model.predict(x_test_1))]
Let’s see what our model predicted!
print(prediction) >>> benign
We can now check if this prediction was accurate, because we know what the label actually is on the dataset. The model is predicting the label (y) of the first testing data sample, which we have in our testing dataset to cross-reference.
print(classes[int(y_test)]) >>> benign
The actual diagnosis was benign, meaning our model successfully predicted whether the input data belonged to a benign or malignant breast cancer tumor!
- Present cell nuclei in medical images of benign and malignant tumors were analyzed
- Using a neural network with Keras, we were able to classify breast tumors based on input data
- We achieved an accuracy of 96%, and a validation accuracy of 92%!
- Our model accurately predicted that our first testing data sample was a benign tumor