Home iOS & Swift Books Machine Learning by Tutorials

Training the Image Classifier Written by Audrey Tam & Matthijs Hollemans

Update Note: This chapter has been updated to iOS 13, Xcode 11 and Swift 5.1

In the previous chapter, you saw how to use a trained model to classify images with Core ML and Vision. However, using other people’s models is often not sufficient — the models may not do exactly what you want or you may want to use your own data and categories — and so it’s good to know how to train your own models.

In the chapters that follow, you’ll learn to create your own models. You’ll learn how to use common tools and libraries used by machine learning experts to create models. Apple provides developers with Create ML as a machine learning framework to create models in Xcode. In this chapter, you’ll learn how to train the snacks model using Create ML.

The dataset

Before you can train a model, you need data. You may recall from the introduction that machine learning is all about training a model to learn “rules” by looking at a lot of examples.

Since you’re building an image classifier, it makes sense that the examples are going to be images. You’re going to train an image classifier that can tell apart 20 different types of snacks.

Here are the possible categories, again:

Healthy: apple, banana, carrot, grape, juice, orange,
         pineapple, salad, strawberry, watermelon

Unhealthy: cake, candy, cookie, doughnut, hot dog,
           ice cream, muffin, popcorn, pretzel, waffle

Double-click starter/snacks-download-link.webloc to download and unzip the snacks dataset in your default download location, then move the snacks folder into the dataset folder. It contains the images on which you’ll train the model.

This dataset has almost 7,000 images — roughly 350 images for each of these categories.

The snacks dataset
The snacks dataset

The dataset is split into three folders: train, test and val. For training the model, you will use only the 4,800 images from the train folder, known as the training set.

The images from the val and test folders (950 each) are used to measure how well the model works once it has been trained. These are known as the validation set and the test set, respectively. It’s important that you don’t use images from the validation set or test set for training; otherwise, you won’t be able to get a reliable estimate of the quality of your model. We’ll talk more about this later in the chapter.

Here are a few examples of training images:

Selected training images
Selected training images

As you can see, the images come in all kinds of shapes and sizes. The name of the folder will be used as the class name — also called the label or the target.

Create ML

You will now use Create ML to train a multi-class classifier on the snacks dataset.

Precision and recall

Create ML computes precision and recall for each class, which is useful for understanding which classes perform better than others. In my results, these values are mostly 100% or very close. But what do they mean?

How we created the dataset

Collecting good training data can be very time consuming! It’s often considered to be the most expensive part of machine learning. Despite — or because of — the wealth of data available on the internet, you’ll often need to curate your dataset: You must manually go through the data items to remove or clean up bad data or to correct classification errors.

Transfer learning

So what’s happening in the playground? Create ML is currently busy training your model using transfer learning. As you may recall from the first chapter, transfer learning is a clever way to quickly train models by reusing knowledge from another model that was originally trained on a different task.

A closer look at the training loop

Transfer learning takes less time than training a neural network from scratch. However, before we can clearly understand how transfer learning works, we have to gain a little insight into what it means to train a neural network first. It’s worth recalling an image we presented in the first chapter:

What is feature extraction?

You may recall that machine learning happens on “features,” where we’ve defined a feature to be any kind of data item that we find interesting. You could use the photo’s pixels as features but, as the previous chapter demonstrated, the individual RGB values don’t say much about what sort of objects are in the image.

VisionFeaturePrint_Screen extracts features
CotiicXauqopuPreng_Wylaac aqhlenpb daifacep

Logistic regression

By the time you’re done reading the previous section, Create ML has (hopefully) finished training your model. The status shows training took 2 minutes, 47 seconds — most of that time was spent on extracting features.

Linear and logistic regression
Coxioy abj golakviy domzashuay

Looking for validation

Even though there are 4,838 images in the snacks/train dataset, Create ML uses only 95% of them for training.

Overfitting happens

Overfitting is a term you hear a lot in machine learning. It means that the model has started to remember specific training images. For example, the image train/ice cream/b0fff2ec6c49c718.jpg has a person in a blue shirt enjoying a sundae:


More metrics and the test set

Now that you’ve trained the model, it’s good to know how well it does on new images that it has never seen before. You already got a little taste of that from the validation accuracy during training, but the dataset also comes with a collection of test images that you haven’t used yet. These are stored in the snacks/test folder, and are organized by class name, just like the training data.

Examining Your Output Model

It’s time to look at your actual Core ML model: select the Output tab, then drag the snacks/test folder onto where it says Drag or Add Files. Quick as a flash, your model classifies the test images. You can inspect each one to see the predicted class(es) and the model’s confidence level:

Classifying on live video

The example project in this chapter’s resources is a little different than the app you worked with in the previous chapter. It works on live video from the camera. The VideoCapture class uses AVCaptureSession to read video frames from the iPhone’s camera at 30 frames per second. The ViewController acts as the delegate for this VideoCapture class and is called with a CVPixelBuffer object 30 times per second. It uses Vision to make a prediction and then shows this on the screen in a label.

The classifier on live video
Sjo kqocbileer uv teye filau


In this chapter, you got a taste of training your own Core ML model with Create ML. Partly due to the limited dataset, the default settings got only about 90% accuracy. Increasing max iterations increased training accuracy, but validation accuracy was stuck at ~90%, indicating that overfitting might be happening. Augmenting the data with flipped images reduced the gap between training and validation accuracies, but you’ll need more iterations to increase the accuracies.


Create your own dataset of labelled images, and use Create ML to train a model.

Key points

  • You can use macOS playgrounds to test out Create ML, and play with the different settings, to create simple machine learning models.
  • Create ML allows you to create small models that leverage the built-in Vision feature extractor already installed on iOS 12+ devices.
  • Ideally, you want the validation accuracy to be similar to the training accuracy.
  • There are several ways to deal with overfitting: include more images, increase training iterations, or augment your data.
  • Precision and recall are useful metrics when evaluating your model.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Have feedback to share about the online reading experience? If you have feedback about the UI, UX, highlighting, or other features of our online readers, you can send them to the design team with the form below:

© 2020 Razeware LLC

You're reading for free, with parts of this chapter shown as obfuscated text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.

Unlock Now

To highlight or take notes, you’ll need to own this book in a subscription or purchased by itself.