Natural Language Processing on iOS with Turi Create

In this Natural Language Processing tutorial, you’ll learn how to train a Core ML model from scratch, then use that model within an iOS Application By Michael Katz.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 2 of 4 of this article. Click here to view the first page.

The Logistic Classifier

Turi Create will make a logistic classifier for this type of analysis, which actually works a little differently than a linear regression.

To oversimplify a bit: instead of interpolating a single value, a logistic classifier will compute a probability (from 0 to 1) for each class by multiplying how much each word contributes to that class by the number of times that word appears, ultimately adding all of that up across all of the words.

Take the first line of that first Yeats poem: “When you are old and grey and full of sleep”
And the first line of the first Keats poem: “Happy is England! I could be content”

If these two lines were the total input, each of these words contribute wholly to their author. This is because there are no overlapping words. If the Keats line was, instead, “Happy are England”, then the word “are” would contribute 50/50 for each author.

Word    Keats Yeats
-------------------
And       0     1
Are       0     1
Be        1     0
Could     1     0
Content   1     0
England   1     0
Grey      0     1
Happy     1     0
I         1     0
Is        1     0
Full      0     1
Of        0     1
Old       0     1
Sleep     0     1
When      0     1
You       0     1

Now if you take the poem you saw earlier, “On Being Asked for a War Poem”, as the input, only one word — I — appears in the training list, so the model would predict that Keats wrote the poem at 100% and that Yeats wrote the poem at 0%.

Hopefully this illustrates why a large data set is required to accurately train models!

Using Turi Create

Core ML is iOS’s machine learning engine, supporting multiple types of models based on different machine learning SDKs like scikit and keras. Apple’s open-source library, Turi Create, reduces the overhead in learning how to use these libraries, and handles choosing the best type of model for a given task. This is done either by having a pre-chosen model type for the activity or by running several models against each other to see which performs best.

Turi Create is app-specific, rather than model-specific. This means you specify the type of problem you want to solve, rather than choosing the type of model you want to use. This way, it can choose the right model for the job.

Like most machine learning tools, the ones that are compatible with Core ML are written in Python. To get started, very little understanding of Python is necessary. Having said that, knowing Python is useful if you want to expand how you train models or customize the input data, or if you run into trouble.

Setting Up Python

The following instructions assume you already have Python installed, which is likely if you have a Mac running the latest Xcode.

Run the following command in Terminal to check if you have Python installed already:

python -V

If Python is installed, you’ll see its version number. If it isn’t, you’ll need to follow these instructions to download and install Python https://wiki.python.org/moin/BeginnersGuide/Download.

You’ll also need pip installed on your machine, which comes with the Python installation. Run the following command to make sure it’s installed:

which pip

If the result isn’t for a folder ending in /bin/pip, you’ll need to install it from https://pip.pypa.io/en/stable/installing/.

Finally, it’s suggested to use virutalenv to install Turi Create. This isn’t generally part of the default Mac setup, but it can be installed from the Terminal by using:

pip install virtualenv

If you get any permission errors, preface the command with the sudo command.

sudo pip install virtualenv

If you get any SSL errors, you’ll need to add the --trusted-host command line option.

pip install --trusted-host pypi.python.org virtualenv

Virtualenv is a tool for creating virtual Python environments. This means you can install a series of tools and libraries in isolation in a named environment. With virtual environments, you can build and run an app with a known set of dependencies, and then go and create a separate environment for a new app that has a different set of tools, possibly with versions that would otherwise conflict with the first environment.

From an iOS perspective, think of it as being able to have an environment with Xcode 8.2, Cocoapods 1.0 and Fastlane 2.4 to build one app, and then be able to launch another environment with Xcode 9.1, Cocoapods 1.2 and Fastlane 2.7 to build another app, without those two conflicting. This is just one more reminder of the sophistication of open-source developer tools with large communities.

Installing Turi Create

With Python in hand, for the first step, you’ll create a new virtual environment in which to install Turi Create.

Open a Terminal window, and cd into the directory where you downloaded this tutorial’s materials. For reference, corpus.json should be in the current folder before continuing.

From there, enter the following command:

virtualenv venv

This creates a new virtual environment named venv in your project directory.

When you have completed that, activate the environment:

source venv/bin/activate

When there is an active environment, you’ll see a (venv) prepended to the terminal prompt. If you need to get out of the virtual environment, run the deactivate command.

Finally, make sure the environment is still activated and install Turi Create:

pip install -U turicreate

If you have any issues with installation, you can run a more explicit install command:

python2.7 -m pip install turicreate 

This installs the latest version of the Turi Create library, along with all its dependencies. Now it’s time to actually start using Python!

Using Turi Create to train a model

First, in a new Terminal window with the virtual environment active and launch Python in the same directory as your corpus.json file:

python

You can also use a more interactive environment like iPython, which provides better history and tab-completion features, but that’s outside the scope of this tutorial.

Next, run the following command:

import turicreate as tc

This will import the Turi Create module and make it accessible from the symbol tc.

Next, load the JSON data:

data = tc.SFrame.read_json('corpus.json', orient='records')

This will load the data from the JSON file into a SFrame, which is the data container for Turi Create. Its data is organized in columns like a spreadsheet and has powerful functions for manipulation. This is important for massaging data to get the best input for training a model. It’s also optimized for loading from disk storage, which is important for large data sets that can easily overwhelm RAM.

Type in data to see what you pulled out. The generated output shows the size and data types contained within, as well as the first few rows of data.

<bound method SFrame.explore of Columns:
    author  str
    text    str
    title   str
Rows: 518
Data:
+----------------------+-------------------------------+
|        author        |              text             |
+----------------------+-------------------------------+
| William Butler Yeats | When you are old and grey ... |
| William Butler Yeats | Had I the heavens' embroid... |
| William Butler Yeats | Were you but lying cold an... |
| William Butler Yeats | Wine comes in at the mouth... |
| William Butler Yeats | That crazed girl improvisi... |
| William Butler Yeats | Turning and turning in the... |
| William Butler Yeats | I made my song a coat\nCov... |
| William Butler Yeats | I will arise and go now, a... |
| William Butler Yeats | I think it better that in ... |
|      John Keats      | Happy is England! I could ... |
+----------------------+-------------------------------+
+-------------------------------+
|             title             |
+-------------------------------+
|        When You Are Old       |
| He Wishes For The Cloths O... |
| He Wishes His Beloved Were... |
|        A Drinking Song        |
|         A Crazed Girl         |
|       The Second Coming       |
|             A coat            |
|   The Lake Isle Of Innisfree  |
| On being asked for a War Poem |
| Happy Is England! I Could ... |
+-------------------------------+
[518 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.>
Note: If you get an error loading the data, make sure you launched Python in the same directory as the JSON file, or specify a full path to it.

Now that you have the data, for the next step, you’ll create a model by running:

model = tc.text_classifier.create(data, 'author', features=['text'])

This creates a text classifier given the loaded data, specifying the author to be the class labels, and the text column to be the input variable. To build a more accurate classifier, you can compute and then provide additional features such as meter, line length and rhyme scheme.

This command creates the model and trains it on data. It will reserve about 5% of the rows as a validation set. This means that 95% of the data is for training, and then the remaining data will be used to test the accuracy of the trained model.

Due to the poor quality of the training data (that is, there are a large number of words for a only a handful of examples per author), if the training fails or gets terminated before the maximum 10 iterations are complete, just re-run the command. The training is not deterministic, so trying again might lead to a different result, depending on the starting values for the coefficients.

Finally, run this command to export the model in the Core ML format:

model.export_coreml('Poets.mlmodel')

Voilà! With four lines of Python, you’ve built and trained an ML model ready to use from an iOS app.