@Joshua Payne | | 5 minute read | Home
A/N: This article assumes a good understanding of neural networks. For background, check this out.
Table of Contents
At a hackathon a few weeks ago, my partner and I had just arrived. We were passionate about artificial intelligence and knew instantly that we wanted to center a project around it.
As moonshot thinkers, we immediately started brainstorming ideas we could tackle with this emerging technology. From carbon capture to bias and surveillance, we scoured the Internet for global problems needing solutions.
Eventually, we dwelled upon an interesting topic — an intersection between two exponential technologies. Precision Medicine and Artificial Intelligence. In essence, patient data is analyzed using machine learning algorithms to inform doctors about appropriate medicine to prescribe. The same drug won’t be immensely effective for every patient, and this helps address this issue.
To us, this sounded awesome! What if we could build something that would help address an issue in the medical field with AI? We ended up settling on pancreatic cancer detection after startling statistics.
The status quo for pancreatic cancer is terrifying.
As implied by the name, pancreatic cancer begins in the tissues of your pancreas and typically spreads rapidly to nearby organs. The latter process is called metastasis and makes cancer difficult to treat effectively.
Pancreatic cancer is rarely detected in its early stages but is often deadly later on. The five-year survival rate is a staggering 9%.
Imagine you were in the Rungrado May Day stadium, the biggest football stadium in the world.
It houses 114 000 seats. If it was fully packed, and everyone inside had pancreatic cancer, around only 10 000 people would survive.
Early detection of pancreatic cancer
Those with this form of cancer that's diagnosed early have a better chance of surgery and survival. In fact, the survival rate of people with early detection of pancreatic cancer is 34%. Patients that are diagnosed later are often not eligible for surgery and are at a significantly higher risk of death.
No current solution
Pancreatic cancer is hard to diagnose early. There is currently no standard diagnostic tool or early detection method for pancreatic cancer.
Ways to find pancreatic cancer in the earliest stages are urgently needed. Symptoms are not always obvious and develop over time.
Time is valuable
Most pancreatic cancer patients are diagnosed at stage four when the five-year survival rate is 3 percent.
Notify for surgery
Diagnosing pancreatic cancer in time for surgery can increase a patient’s survival by ten-fold. Surgery is the only reliable way to remove tumors.
Cancer antigens as biomarkers
Cancer antigens are substances that produce immune responses in people and are extremely helpful for detecting tumors.
We came across promising data that indicated a correlation between cancer antigens CA 19–9 and CA 125. The dataset shows biometric data from patients who do or do not have pancreatic cancer, after being subjected to these antigens.
Leveraging machine learning
Using this dataset, we made a neural network to predict pancreatic cancer! In practice, it’d take a patient's responses to the two antigens and use those data points to predict whether they have pancreatic cancer.
Using ML as an approach for cancer detection introduces several benefits. By diagnosing pancreatic cancer earlier with machine learning, we reduce the chances of death. By diagnosing pancreatic cancer more accurately with our model, we avoid misdiagnosis.
As we’re using numerical, labeled data, a feedforward neural network would be best.
We first import pandas to read in all of our data as a dataframe.
import pandas as pd url = ‘https://raw.githubusercontent.com/arielycliu/PancreaticBiomarkers/master/wiedat2b.csv' df = pd.read_csv(url) # Dataset is stored in a Pandas Dataframe # https://research.fhcrc.org/diagnostic-biomarkers-center/en/datasets.html # https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/wiedat2b_desc.txt
Afterward, we import the following libraries.
from google.colab import files import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dropout, Dense, Activation, from tensorflow.keras.callbacks import TensorBoard from tensorflow.keras import losses import numpy as np
Next, our features and labels are separated into x and y variables.
x = df[[“y1”, “y2”]].sample(frac=1).reset_index(drop=True) y = df[“d”].sample(frac=1).reset_index(drop=True)
We’ll then separate these dataframes into testing and training data.
x_train = x[:99] y_train = y[:99] x_train = np.array(x_train) y_train = np.array(y_train) x_test = x[-42:] y_test = y[-42:]
And here’s our model!
layer_sizes =  dense_layers =  for dense_layer in dense_layers: for layer_size in layer_sizes: model = Sequential() model.add(Flatten(input_shape = x_train.shape)) for l in range(dense_layer): model.add(Dense(layer_size)) model.add(Activation(‘relu’)) model.add(Dense(1)) model.add(Activation(‘sigmoid’)) model.compile(optimizer=’adam’, loss=”binary_crossentropy”, metrics=[‘accuracy’]) model.fit(x_train, y_train, epochs=35)
We went with this architecture after optimization and iterating through variants. Accuracy and loss metrics within this process were recorded.
The model was then saved so that it could be used for separate sample predictions.
And that’s it! With the powers of machine learning, we created a model with 74% accuracy for the task of pancreatic cancer detection. If implemented, the impact would be enormous — with better approaches to early diagnosis, we can save lives.
- Pancreatic cancer is rarely detected early, despite the immense survival advantage
- Cancer antigens produce biometric data that can indicate the presence of tumors
- Neural networks can map CA 19–9 and 125 figures to chances of pancreatic cancer!
We ended up winning Best Data Hack! For more information on our project, check this out.