Teaching a Computer to Recognize Dogs and Cats
@Joshua Payne | | 10 minute read | Home
A/N: This article assumes a good understanding of convolutional neural network architecture. For background, check this out.
A/N: The code used in this article is based off a tutorial from Sentdex. A basic understanding of Keras is encouraged.
Table of Contents
Are you a cat🐱 or a dog🐶 person?

For some of us, we’re firmly placed on one side of this question. Cats are cute, agile, and independent furballs, while dogs are playful, personable, and defensive! A̵l̵s̵o̵,̵ ̵t̵h̵e̵y̵’̵r̵e̵ ̵w̵a̵y̵ ̵b̵e̵t̵t̵e̵r̵ ̵😉̵.̵
Nonetheless, we’re able to prefer either of these animals because, as human beings, we’ve learned to differentiate between them. After years of learning what dogs and cats look like, with experience, we’ve come to favor one or the other.
Now, imagine replicating this process onto a computer. I tried to do so, and instantly hit roadblocks.

The logic seems to check out, but this program is obviously not robust. How do we look for whiskers? How do we look for a tail? It’s easy for us to do, but how do we instruct a computer to scan for these features?
Well, for ordinary code, it’s virtually impossible to develop a program that can identify cats from unfamiliar photos with reasonable accuracy.
So why use ordinary code?
With deep learning, we don’t need to.
Convolutional neural networks simplify this process.
With our conventional programming example, to teach a computer to recognize features in cats, even the most bare-boned program would probably need thousands of lines of code.
With a CNN, I got this number down to 70.
Let’s first refresh ourselves on the intuition behind CNNs.
Humans have years of experience learning what a cat and dog look like, probably from our childhoods (throwback to picture books and Dora). In contrast, our computers haven’t had the opportunity to learn to recognize these creatures.
With artificial intelligence through CNNs, we overcome this issue by letting them learn! Our neural network learns from the data provided by our dataset. Iteratively adjusting kernel values to lower loss, our model learns how to recognize features in images. Detected features from input images help the network predict which animal the image comprises of. So, for example, if whiskers were detected in an image, our convolutional neural network would likely predict that it was looking at a cat.
.gif)
Now, let’s actually build this!
To begin, we’ll use the Cats and Dogs dataset from Kaggle. I downloaded it as a folder called PetImages. It consists of 12499 photos of dogs and cats each.

We now need to make these images accessible to our future code (for our CNN).
Loading in and Preprocessing the Data
Firstly, let’s import the following libraries.
import numpy as np
import os
import cv2
We’ll now initialize variables to represent the file path of the dataset and the categories of our data. These categories are the sub-folders of our main folder, aka our dataset, PetImages.
DATADIR = “/Users/joshua/Desktop/PetImages”
CATEGORIES = [“Dog”, “Cat”]
IMG_SIZE refers to the dimensions of the photos they’ll be reduced to. Our training data will be a list of the resized photos, represented in pixel values, alongside their class (cat or dog) in the form of 0 or 1.
IMG_SIZE = 50
training_data = []
Now, let’s make the function to actually load in and preprocess our data.
Our function here is called create_training_data, with a self-explanatory name. Although later on, this dataset in itself will be split into training and testing data. So technically, not all of the training data will be employed as training data.
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
Let’s break down how create_training_data() works.
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
For every category, ‘Dog’ and ‘Cat’, the variables path and class_num are defined.
With ‘path = os.path.join(DATADIR, category)’, the ‘os.path.join’ function combines both of the provided strings into one file directory path. In my case, ‘/Users/joshua/Desktop/PetImages’ and ‘Cat’ or ‘Dog’ are combined. This would output:
/Users/joshua/Desktop/PetImages/Dog
/Users/joshua/Desktop/PetImages/Cat
This now makes a path directly to the two sub-folders, Dog and Cat, in our PetImages dataset, so our code can access both categories of photos!
Now, nested in this for-loop is another for-loop. This means that this loop runs twice, once for both the data within the Dog sub-folder and the Cat sub-folder.
Our code loops for each photo inside each sub-folder. Meaning, each dog photo or each cat photo.
for img in os.listdir(path):
For each of these photos, we try to make them an array (img_array) as an image in grayscale (cv2.IMREAD_GRAYSCALE).
The ‘os.path.join’ function is used again to access each piece of data in the subfolder. ‘/Users/joshua/Desktop/PetImages/Cat/1’ would access the first cat photo (1.jpg).
The image array is then resized by the dimensions of our IMG_SIZE variable from before (IMG_SIZE = 50, so the photo becomes 50 by 50). This array, new_array, is then added to our training data alongside its class number defined in our first for-loop. Remember, this class number (class_num) represents if the photo is a dog or a cat, 0 or 1.
If we encounter a file that isn’t able to be added to our training data, we ignore it (pass).
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
Let’s tie everything together now.
In summary, our function goes through each file in the Dog and Cat directories, and tries to add each in the form of a grayscale, resized array to our training data.
We can now call our function to actually create our training data, through running the code inside of it.
create_training_data()
Now, because our function went through our PetImages chronologically, it added all of the dog photos at once to our training data, and then the cat ones. That means the top half of our training data list is all dog photos, and the bottom half is all cat photos. This means that when we later try to split a part of our data into validation data, it’s largely possible that all of its photos will be either cats or dogs.
To avoid this, we need to shuffle our training data.
import random
random.shuffle(training_data)
Our training data needs to be separated into features and labels, as this is the basis for all supervised learning models.
Our training data list’s items are separated by the image array and the class number. The array corresponds to the features, and the class number is the label. We sort through each training_data item and add their features and labels to X and y.
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
Next, converting X and y into numpy arrays and reshaping X:
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
y = np.array(y)
To access these variables from different programs, we can save them and their data as pickle files.
import pickle
pickle_out = open(“X.pickle”, “wb”)
pickle.dump(X, pickle_out)
pickle_out.close()
pickle_out = open(“y.pickle”, “wb”)
pickle.dump(y, pickle_out)
pickle_out.close()
We now have our entire dataset loaded in, represented by the variables X and y for, respectively, their features and labels. These lists have been saved to our computer as pickle files (X.pickle, y.pickle).
Building the CNN
Alright! Now that we have all of our data preprocessed and ready to go, let’s build our actual neural network!
In a new file, we’ll now import the following:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle
After that, to use our data, let’s load our X and y pickle files. (rb refers to reading the files)
X = pickle.load(open(“X.pickle”, “rb”))
y = pickle.load(open(“y.pickle”, “rb”))
For further preprocessing, we’ll divide our features (pixel values) by 255 to get values solely between 0 and 1.
X = X/255.0
We can now start building our neural network! We’ll use the Sequential model.
model = Sequential()
We’ll then add our first convolutional and pooling layers:
model.add(Conv2D(64, (3,3), input_shape = X.shape[1:]))
model.add(Activation(“relu”))
model.add(MaxPooling2D(pool_size = (2, 2)))
We’re using 64 kernels, each with dimensions of 3 by 3. Our input is our features, X, because we teach our neural network to map them to the label, y.
They’ll then pass through a ReLU activation, and then max pooled for each 2x2 local receptive field within the feature maps.
This will be repeated two more times because after separate model optimization it showed to lead to higher accuracy. Stay tuned for articles on optimization and neural architecture search in the future 😉.
for l in range(2):
model.add(Conv2D(64 , (3,3)))
model.add(Activation(“relu”))
model.add(MaxPooling2D(pool_size = (2, 2)))
We’ll now flatten our pooled maps into a single layer, which will feed into our densely connected layer. These neurons will then undergo the ReLU activation.
model.add(Flatten())
model.add(Dense(64))
model.add(Activation(“relu”))
And now, our output layer, which comprises of a final neuron! This represents the prediction of the input image being a cat or a dog.
model.add(Dense(1))
model.add(Activation(‘sigmoid’))
Let’s compile our model and fit it to our feature and label data. We use a batch size of 32 to pass 32 data samples at a time. Additionally, validation_split represents 30% of our data being split into testing data.
model.compile(loss = “binary_crossentropy”,
optimizer = “adam”,
metrics = [‘accuracy’])
model.fit(X, y, batch_size=32, epochs=10, validation_split = 0.3)
And save!
model.save(‘64x3-CNN.model’)
Here’s what the entire model looks like:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import pickle
X = pickle.load(open(“X.pickle”, “rb”))
y = pickle.load(open(“y.pickle”, “rb”))
X = X/255.0
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape = X.shape[1:]))
model.add(Activation(“relu”))
model.add(MaxPooling2D(pool_size = (2, 2)))
for l in range(2):
model.add(Conv2D(64 , (3,3)))
model.add(Activation(“relu”))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation(“relu”))
model.add(Dense(1))
model.add(Activation(‘sigmoid’)) # softmax??
model.compile(loss = “binary_crossentropy”,
optimizer = “adam”,
metrics = [‘accuracy’])
model.fit(X, y, batch_size=32, epochs=10, validation_split = 0.3)
model.save(‘64x3-CNN.model’)
And… run! After 10 epochs, here were my final metrics.
loss: 0.2443 — accuracy: 0.8954 — val_loss: 0.4773 — val_accuracy: 0.8051
Seeing Our Model in Action!
We’ve made our CNN, but we haven’t seen it predict anything yet. Let’s do that now, and see our model in practice!
Firstly, we’ll need a photo of a dog our model hasn’t seen before. Let’s use this.

We’ll first import these libraries:
import cv2
import tensorflow as tf
Our model’s prediction is represented as 0 or 1, so we’ll create this variable to later help us convert this prediction number into either ‘Cat’ or ‘Dog’.
CATEGORIES = ["Dog", "Cat"]
This function will preprocess our image so it’s suitable for our ConvNet.
def prepare(filepath):
IMG_SIZE = 50
img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
return new_array.reshape(-1, IMG_SIZE, IMG_SIZE, 1)
We’ll now load in our model, saved when we trained it previously:
model = tf.keras.models.load_model(“64x3-CNN.model”)
Let’s now actually use our model to predict what the image is! We’ll use our prepare function on our photo, and our model will predict what animal it is in the form of 0 or 1. This value will be stored in our prediction variable.
prediction = model.predict([prepare('dog.jpg')])
Our prediction variable is a two-dimensional array, and so to actually get a 0 or 1 value, we’d use this.
prediction[0][0]
This outputs 0, but it doesn’t tell us whether the photo is a cat or a dog directly. Let’s fix this using our ‘CATEGORIES’ variable from before. We’re telling our code to print the item in the list at index 0.
print(CATEGORIES[int(prediction[0][0])])
This outputs ‘Dog’! Meaning our code successfully recognized dog.jpg!
And that’s it! That’s all the code we need to harness the power of AI. We’ve now taught a computer to recognize cats and dogs from foreign photos with 80% accuracy in 70 lines of code. Despite seeming impossible at the beginning of the article, we’ve managed to do it either way!
But this isn’t nearly the be-all-end-all for CNN application — imagine how this could be scaled for other tasks! Recognizing criminals from surveillance footage, detecting abnormalities in organs, the list truly goes on.

By enabling computers to recognize and classify images, we’ve opened the door to a boundless number of solutions to real-world issues. Humans are only beginning to find areas to leverage convolutional neural networks — where else is this possible?
Key Takeaways
- CNN simplifies and enables digital image classification
- By training our model on the Cats and Dogs dataset, we’ve taught it how to differentiate between the two
- Tensorflow and Keras allow for concise neural net programming
- We achieved 80% validation accuracy!