Pancreatic Cancer Detection with Biometrics through Machine Learning

@Joshua Payne | | 5 minute read | Home


A/N: This article assumes a good understanding of neural networks. For background, check this out.

Table of Contents

At a hackathon a few weeks ago, my partner and I had just arrived. We were passionate about artificial intelligence and knew instantly that we wanted to center a project around it.

As moonshot thinkers, we immediately started brainstorming ideas we could tackle with this emerging technology. From carbon capture to bias and surveillance, we scoured the Internet for global problems needing solutions.

World Problems

Eventually, we dwelled upon an interesting topic — an intersection between two exponential technologies. Precision Medicine and Artificial Intelligence. In essence, patient data is analyzed using machine learning algorithms to inform doctors about appropriate medicine to prescribe. The same drug won’t be immensely effective for every patient, and this helps address this issue.

Different Medicine for Different People

To us, this sounded awesome! What if we could build something that would help address an issue in the medical field with AI? We ended up settling on pancreatic cancer detection after startling statistics.

The status quo for pancreatic cancer is terrifying.

As implied by the name, pancreatic cancer begins in the tissues of your pancreas and typically spreads rapidly to nearby organs. The latter process is called metastasis and makes cancer difficult to treat effectively.

Pancreatic cancer is rarely detected in its early stages but is often deadly later on. The five-year survival rate is a staggering 9%.

Imagine you were in the Rungrado May Day stadium, the biggest football stadium in the world.

Rungrado May Day stadium, located in North Korea

It houses 114 000 seats. If it was fully packed, and everyone inside had pancreatic cancer, around only 10 000 people would survive.

Early detection of pancreatic cancer

Those with this form of cancer that's diagnosed early have a better chance of surgery and survival. In fact, the survival rate of people with early detection of pancreatic cancer is 34%. Patients that are diagnosed later are often not eligible for surgery and are at a significantly higher risk of death.

No current solution

Pancreatic cancer is hard to diagnose early. There is currently no standard diagnostic tool or early detection method for pancreatic cancer.

Unreliable symptoms

Ways to find pancreatic cancer in the earliest stages are urgently needed. Symptoms are not always obvious and develop over time.

Time is valuable

Most pancreatic cancer patients are diagnosed at stage four when the five-year survival rate is 3 percent.

Notify for surgery

Diagnosing pancreatic cancer in time for surgery can increase a patient’s survival by ten-fold. Surgery is the only reliable way to remove tumors.

Cancer antigens as biomarkers

Cancer antigens are substances that produce immune responses in people and are extremely helpful for detecting tumors.

We came across promising data that indicated a correlation between cancer antigens CA 19–9 and CA 125. The dataset shows biometric data from patients who do or do not have pancreatic cancer, after being subjected to these antigens.

Leveraging machine learning

Using this dataset, we made a neural network to predict pancreatic cancer! In practice, it’d take a patient's responses to the two antigens and use those data points to predict whether they have pancreatic cancer.

Using ML as an approach for cancer detection introduces several benefits. By diagnosing pancreatic cancer earlier with machine learning, we reduce the chances of death. By diagnosing pancreatic cancer more accurately with our model, we avoid misdiagnosis.

As we’re using numerical, labeled data, a feedforward neural network would be best.

Our code

We first import pandas to read in all of our data as a dataframe.

import pandas as pd
url = ‘https://raw.githubusercontent.com/arielycliu/PancreaticBiomarkers/master/wiedat2b.csv'
df = pd.read_csv(url)
# Dataset is stored in a Pandas Dataframe
# https://research.fhcrc.org/diagnostic-biomarkers-center/en/datasets.html
# https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/wiedat2b_desc.txt

Afterward, we import the following libraries.

from google.colab import files
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout, Dense, Activation, 
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras import losses
import numpy as np

Next, our features and labels are separated into x and y variables.

x = df[[“y1”, “y2”]].sample(frac=1).reset_index(drop=True)
y = df[“d”].sample(frac=1).reset_index(drop=True)

We’ll then separate these dataframes into testing and training data.

x_train = x[:99]
y_train = y[:99]
x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = x[-42:]
y_test = y[-42:]

And here’s our model!

layer_sizes = [20]
dense_layers = [5]
for dense_layer in dense_layers:
 for layer_size in layer_sizes:
   model = Sequential()
   model.add(Flatten(input_shape = x_train[0].shape))
   for l in range(dense_layer):
     model.add(Dense(layer_size))
     model.add(Activation(‘relu’))
   model.add(Dense(1))
   model.add(Activation(‘sigmoid’))
   model.compile(optimizer=’adam’, loss=”binary_crossentropy”,       metrics=[‘accuracy’])
   model.fit(x_train, y_train, epochs=35)

We went with this architecture after optimization and iterating through variants. Accuracy and loss metrics within this process were recorded.

20 nodes and 5 dense layers had the highest accuracy and lowest loss

The model was then saved so that it could be used for separate sample predictions.

model.save(‘final_neural_network.model’)

And that’s it! With the powers of machine learning, we created a model with 74% accuracy for the task of pancreatic cancer detection. If implemented, the impact would be enormous — with better approaches to early diagnosis, we can save lives.


Key Takeaways

We ended up winning Best Data Hack! For more information on our project, check this out.