Home iOS & Swift Books Machine Learning by Tutorials

11
Data Collection for Sequence Classification Written by Chris LaPollo

You worked exclusively with images throughout the first section of this book, and for good reason — knowing how to apply machine learning to images lets you add many exciting and useful features to your apps. Techniques like classification and object detection can help you answer questions like “Is this snack healthy?” or “Which of these objects is a cookie?”

But you’ve focused on individual images — even when processing videos, you processed each frame individually with complete disregard for the frames that came before or after it. Given the following series of images, can the techniques you’ve learned so far tell me where my cookies went?

The Case of the Disappearing Cookies
The Case of the Disappearing Cookies

Each of the above images tells only part of the story. Rather than considering them individually, you need to reason over them as a sequence, applying what you see in earlier frames to help interpret later ones.

There are many such tasks that involve working with sequential data, such as:

  • Extracting meaning from videos. Maybe you want to make an app that translates sign language, or search for clips based on the events they depict.

  • Working with audio, for example converting speech to text, or songs to sheet music.

  • Understanding text, such as these sentences you’ve been reading, which are sequences of words, themselves sequences of letters (assuming you’re reading this in a language that uses letters, that is).

  • And countless others. From weather data to stock prices to social media feeds, there are endless streams of sequential data.

With so many types of data and almost as many techniques for working with it, this chapter can’t possibly cover everything. You’ll learn ways to deal with text in later chapters, and some of the techniques shown here are applicable to multiple domains. But to keep things practical, this chapter focuses on a specific type of sequence classification — human activity detection. That is, using sensor data from a device worn or held by a person to identify what that person is physically doing. You’ve probably already experienced activity detection on your devices, maybe checking your daily step count on your iPhone or closing rings on your Apple Watch. Those just scratch the surface of what’s possible.

In this chapter, you’ll learn how to collect sensor data from Apple devices and prepare it for use training a machine learning model. Then you’ll use that data in the next chapter, along with Turi Create’s task-focused API for activity detection, to build a neural network that recognizes user activity from device motion data. Finally, you’ll use your trained neural net to recognize player actions in a game.

Note: Apple introduced the Create ML application with Xcode 11, which provides a nice GUI for training many types of Create ML models. One of those is called Activity Classifier and it’s essentially the same model you’ll build in these chapters using Turi Create. So why not use the Create ML app here?

We made that decision partially because we wrote these chapters before the Create ML app existed and it would require rewriting quite a bit of content without describing any truly new functionality, but it’s also because the GUI option is self-explanatory once you understand the underlying Turi Create code. The Create ML method is also a bit less flexible than using Turi Create directly, as a consequence of needing to support such a (delightfully) simple graphical interface.

We encourage you to experiment with the Create ML app after going through these chapters to see which option you prefer. We’ll try to point out instructions that might be different when working with the Create ML app.

The game you’ll make is similar to the popular Bop It toy, but instead of calling out various physical bits to bop and twist, it will call out gestures for the player to make with their iPhone. Perform the correct action before time runs out! The gestures detected include a chopping motion, a shaking motion and a driving motion (imagine turning a steering wheel).

We chose this project because collecting data and testing it should be comfortably within the ability of most readers. However, you can use what you learn here for more than just gesture recognition — these techniques let you track or react to any activity identifiable from sensor data available on an Apple device.

Modern hardware comes packed with sensors — depending on the model, you might have access to an accelerometer, gyroscope, pedometer, magnetometer, altimeter or GPS. You may even have access to the user’s heart rate!

With so much data available, there are countless possibilities for behaviors you can detect, including sporadic actions like standing up from a chair or falling off a ladder, as well as activities that occur over longer durations like jogging or sleeping. And machine learning is the perfect tool to make sense of it all. But before you can fire up those neural nets, you’ll need a dataset to train them.

Building a dataset

So you’ve got an app you want to power using machine learning. You do the sensible thing and scour the internet for a suitable, freely available dataset that meets your needs.

You try tools like Google Dataset Search, check popular data science sites like Kaggle, and exhaust every keyword search trick you know. If you find something — great, move on to the next section! But if your search for a dataset turns up nothing, all is not lost — you can build your own.

Collecting and labeling data is the kind of thing professors make their graduate students do — time consuming, tedious work that may make you want to cry. When labeling human activity data, it’s not uncommon to record video of the activity session, go through it manually to decide when specific activities occur, and then label the data using timecodes synced between the data recordings and the video. That may sound like fun to some people, but those people are wrong and should never be trusted.

This chapter takes a different approach — the data collection app automatically adds labels. They may not be as exact — manual labeling lets you pinpoint precise moments when test subjects begin or end an activity — but in many cases, they’re good enough.

To get started, download the resources for this chapter if you haven’t already done so, and open the GestureDataRecorder starter project in Xcode.

Note: The chapter resources include data files you can use unchanged, so you aren’t required to collect more here. However, the experience will help later when working on your own projects. Plus, adding more data to the provided dataset should improve the model you make later in the chapter.

Take a look through the project to see what’s there. ViewController.swift contains most of the app’s code, and it’s the only file you’ll be asked to change. Notice the ActivityType enum which identifies the different gestures the app will recognize:

enum ActivityType: Int {
	case none, driveIt, shakeIt, chopIt
}

If you run the app now, it will seem like it’s working but it won’t actually collect or save any data. The following image shows the app’s interface:

Gesture Data Recorder app interface
Gesture Data Recorder app interface

GestureDataRecorder probably won’t win any design awards, but that’s OK — it’s just a utility app that records sensor data. Users enter their ID, choose what activity and how many short sessions of that activity to record, and then hit Start Session to begin collecting data. The app speaks instructions to guide users through the recording process. And the Instructions button lets users see videos demonstrating the activities.

Note: For some datasets, it may be better to randomize activities during a session, rather than having users choose one for the entire thing. My test subjects didn’t seem to enjoy having to pay that much attention, though.

Why require a user ID? You’ll learn more about this later, but it’s important to be able to separate samples in your dataset by their sources. You don’t need specific details about people, like their names — in fact, identifying details like that are often a bad idea for privacy and ethics reasons — but you need some way to distinguish between samples.

GestureDataRecorder takes a simple but imperfect approach to this problem: It expects users to provide a unique identifier and then saves data for each user in separate files. To support this, the app makes users enter an ID number and then includes that in the names of the files it saves. If any files using that ID already exist on this device, the app requests confirmation and then appends new data to those files. So it trusts users not to append their data to someone else’s files on the device, and it’s up to you to ensure no two users enter the same ID on different devices.

The starter code supports the interface and other business logic for the app — you’ll add the motion-related bits now so you get to know how that all works.

Accessing device sensors with Core Motion

You’ll use Core Motion to access readings from the phone’s motion sensors, so import it by adding the following line along with the other imports in ViewController.swift:

import CoreMotion
let motionManager = CMMotionManager()
let queue = OperationQueue()
/* TODO: REMOVE THIS LINE
...
TODO: REMOVE THIS LINE */
static let samplesPerSecond = 25.0
var activityData: [String] = []
do {
  try self.activityData.appendLinesToURL(fileURL: dataURL)
  print("Data appended to \(dataURL)")
} catch {
  print("Error appending data: \(error)")
}
func process(data motionData: CMDeviceMotion) {
  // 1
  let activity = isRecording ? currendActivity : .none
  // 2
  let sample = """
  \(sessionId!)-\(numberOfActionsRecorded),\
  \(activity.rawValue),\
  \(motionData.attitude.roll),\
  \(motionData.attitude.pitch),\
  \(motionData.attitude.yaw),\
  \(motionData.rotationRate.x),\
  \(motionData.rotationRate.y),\
  \(motionData.rotationRate.z),\
  \(motionData.gravity.x),\
  \(motionData.gravity.y),\
  \(motionData.gravity.z),\
  \(motionData.userAcceleration.x),\
  \(motionData.userAcceleration.y),\
  \(motionData.userAcceleration.z)
  """
  // 3
  activityData.append(sample)
}
func enableMotionUpdates() {
  // 1
  motionManager.deviceMotionUpdateInterval =
    1 / Config.samplesPerSecond
  // 2
  activityData = []
  // 3
  motionManager.startDeviceMotionUpdates(
    using: .xArbitraryZVertical,
    to: queue,
    withHandler: { [weak self] motionData, error in
      // 4
      guard let self = self, let motionData = motionData else {
        let errorText = error?.localizedDescription ?? "Unknown"
        print("Device motion update error: \(errorText)")
        return
      }
      // 5
      self.process(data: motionData)
  })
}
case Utterances.sessionStart:
  // TODO: enable Core Motion
  enableMotionUpdates()
  queueNextActivity()
func disableMotionUpdates() {
  motionManager.stopDeviceMotionUpdates()
}
case Utterances.sessionComplete:
  disableMotionUpdates()
  ...

Collecting some data

Now go collect some data, ideally from multiple people. Invite your friends over, serve some nice canapés and make it a phone shaking party. If your friends are anything like my kids, they’ll be willing to record data at least once before losing interest.

Analyzing and preparing your data

So you’ve got some data. You’ve collected it yourself or acquired it from elsewhere, but either way, your next step is to look at it. Don’t try reading every number — that way lies madness — but do some analysis to see exactly what you’re working with.

conda activate turienv
jupyter notebook
%matplotlib inline
import turicreate as tc
import activity_detector_utils as utils
train_sf = utils.sframe_from_folder("data/train")
valid_sf = utils.sframe_from_folder("data/valid")
test_sf = utils.sframe_from_folder("data/test")
train_sf.head()
First three samples of training set
Ricky shwue naqlsix ob wqeayijp nik

# 1
activity_values_to_names = {
  0 : 'rest_it',
  1 : 'drive_it',
  2 : 'shake_it',
  3 : 'chop_it'
}
# 2
def replace_activity_names(sframe):
  sframe['activity'] = sframe['activity'].apply(
    lambda val: activity_values_to_names[val])
# 3
replace_activity_names(train_sf)
replace_activity_names(valid_sf)
replace_activity_names(test_sf)
Partial list of features from first three samples of training set, with activity as strings
Paqciaw fokb et maatukev qnub zuhlj htriu harsrov ag bliudihz suk, zibf ilfogagn ic zfgeqxq

utils.plot_gesture_activity(test_sf)
Plot of testing dataset
Dman en cavduyg lugajep

utils.plot_gesture_activity(test_sf, activity="drive_it")
utils.plot_gesture_activity(
  test_sf[11950:12050], activity="drive_it")
100 samples of ‘shake_it', ‘chop_it', and ‘drive_it' activities from test dataset
436 wuctbiz oq ‘wxigo_ax', ‘psip_ox', opz ‘yzoyo_ok' astapetuuy jnay vibj kiyenux

Removing bad data

Now you’ll see one way to find and remove errors from your dataset. If you run the code suggested earlier to plot all the drive_it activity data in the test set, you’ll see a plot something like the one on the next page.

‘drive_it' samples in test dataset
‘rvaxu_ed' gomsqev aw talz lovixup

utils.plot_gesture_activity(
  test_sf[22200:22300], activity="drive_it")
Mislabeled data in test dataset
Yicyucobot kogu ic rorf zuxarag

utils.plot_gesture_activity(
  test_sf[21200:21500], activity="drive_it",
  features=["gravX", "gravY", "gravZ"])
Gravity values for ‘drive_it' gesture. Left: Incorrectly oriented. Right: Correctly oriented.
Fhifehp tijaid puy ‘jnavo_an' sokrehu. Kihp: Ulfircizvrw araijqev. Masjh: Burpopsjk unuofcak.

# 1
bad_session_1 = test_sf[21350]["sessionId"]
bad_session_2 = test_sf[22250]["sessionId"]
# 2
test_sf = test_sf.filter_by(
  [bad_session_1, bad_session_2],
  column_name='sessionId', exclude=True)

Optional: Removing non-activity data

What about motions that have nothing to do with gestures? You know, all those sensor readings that arrive between the gestures? Take a look at that data by plotting the rest_it activity. Here’s how you do so for the test set:

utils.plot_gesture_activity(test_sf, activity="rest_it")
‘rest_it' samples in test dataset
‘gall_ez' qakqmig ep func tuyahav

train_sf = train_sf.filter_by(
  ["rest_it"], 'activity', exclude=True)
test_sf = test_sf.filter_by(
  ["rest_it"], 'activity', exclude=True)
valid_sf = valid_sf.filter_by(
  ["rest_it"], 'activity', exclude=True)

Balancing your classes

After you are satisfied you’ve cleaned your data, there’s one final thing you should check: How many examples of each class do you have? Run the following code to count the examples in each dataset:

utils.count_activities(train_sf)
utils.count_activities(valid_sf)
utils.count_activities(test_sf)
Activity counts for train, validation and test sets
Ecburugt kaecdy nop kneid, regajofoij irh bonf sufc

train_sf.save('data/cleaned_train_sframe')
test_sf.save('data/cleaned_test_sframe')
valid_sf.save('data/cleaned_valid_sframe')

Key points

  • Core Motion provides access to motion sensors on iOS and WatchOS devices.
  • When building a dataset, prefer collecting less data from more sources over more data from fewer sources.
  • Inspect and clean your data before training any models to avoid wasting time on potentially invalid experiments. Be sure to check all your data — training, validation and testing.
  • Try isolating data from a single source into one of the train, validation or test sets.
  • Prefer a balanced class representation. In cases where that’s not possible, evaluate your model with techniques other than accuracy, such as precision and recall.

Where to go from here?

You have a bunch of motion data sequences organized into training, validation and test sets. Now it’s time to make a model that can recognize specific gestures in them. In the next chapter, you’ll use Turi Create to do just that.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Have feedback to share about the online reading experience? If you have feedback about the UI, UX, highlighting, or other features of our online readers, you can send them to the design team with the form below:

© 2020 Razeware LLC

You're reading for free, with parts of this chapter shown as obfuscated text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.

Unlock Now

To highlight or take notes, you’ll need to own this book in a subscription or purchased by itself.