Natural Language Processing on iOS with Turi Create

In this Natural Language Processing tutorial, you’ll learn how to train a Core ML model from scratch, then use that model within an iOS Application By Michael Katz.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 3 of 4 of this article. Click here to view the first page.

Using Core ML

Now that you have a Core ML model, for the next step, you’ll use it in the app.

Import the Model

Core ML lets you use a pre-trained model in your app to make predictions or perform classifications on user input. To use the model, drag the generated Poets.mlmodel into the project navigator. If you skipped the model-generation section of this tutorial, or had trouble creating the model, you can use the one included at the root of the project zip (Download Materials link at top or bottom of the tutorial).

Xcode automatically parses the model file and shows you the important information in the editor panel.

The first section, Machine Learning Model, tells you about the model’s metadata, which Turi Create automatically created for you when generating the model.

The most important line here is the Type. This tells you what kind of model it is. In this case it’s a Pipeline Classifier. A classifier means that it takes the input and tries to assign a label to it. In this case, that is an “author best match”. The pipeline part means that the model is a series of mathematical transforms used on the input data to calculate the class probabilities.

The next section, Model Class shows the generated Swift class to be used inside the app. This class is the code wrapper to the model, and it’s covered in the next step of the tutorial.

The third section, Model Evaluation Parameters describes the inputs and outputs of the model.

Here, there is one input, text, which is a dictionary of string keys (individual words) to double values (the number of times that word appears in the input poem).

There are also two outputs. The first, author, is the most likely match for the poem’s author. The other output, authorProbability, is the percent confidence of a match for each known author.

You’ll see that, for some inputs, even though there is only one “best match”, that match itself might have a very small probability, or there might be two or three matches that are all reasonably close.

Now, click on the arrow next to Poets in the Model Class section. This will open Poets.swift, an automatically generated Swift file. This contains a series of classes that form a convenience wrapper for accessing the model. In particular, it has a simple initializer, a prediction(text:) function that does the actual evaluation by the model, and two classes that wrap the input and output so that you can use standard Swift values in the calling code, instead of worrying about the Core ML data types.

NSLinguisticTagger

Before you can use the model, you need the input text, which is from a free-form text box, which you’ll need to convert to something that’s compatible with PoetsInput. Even though Turi Create handles creating the BOW (Bag of Words) from the SFrame training input, Core ML does not yet have that capability built in. That means you need to transform the text into a dictionary of word counts manually.

You could write a function that takes the input text, splits it at the spaces, trims punctuation and then counts the remainder. Or, even better, use a context-aware text processing API: NSLinguisticTagger.

NSLinguisticTagger is the Cocoa SDK for processing natural language. As of iOS 11, its functionality is backed by its own Core ML model, which is much more complicated than the one shown here.

It’s hard making sure a character-parsing algorithm is smart enough to work around all the edge cases in a language — apostrophe and hyphen punctuation, for example. Even though this app just covers poets from America and the United Kingdom writing in English, there’s no reason the model couldn’t also have poems written in other languages. Introducing parsing for multiple languages, especially non-Roman character languages, can get very difficult very quickly. Fortunately, you can leverage NSLinguisticTagger to simplify this.

In PoemViewController.swift add the following helper function to the private extension:

func wordCounts(text: String) -> [String: Double] {
  // 1
  var bagOfWords: [String: Double] = [:]
  // 2
  let tagger = NSLinguisticTagger(tagSchemes: [.tokenType], options: 0)
  // 3
  let range = NSRange(text.startIndex..., in: text)
  // 4
  let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace]


  // 5
  tagger.string = text
  // 6
  tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) { _, tokenRange, _ in
    let word = (text as NSString).substring(with: tokenRange)
    bagOfWords[word, default: 0] += 1
  }

  return bagOfWords
}

The output of the function is a count of each word as it appears in the input string, but let’s break down each step:

  1. Initializes your bag of words dictionary.
  2. Creates a NSLinguisticTagger set up to tag all the tokens (words, punctuation, whitespace) in a string.
  3. The tagger operates over a NSRange, so you create a range for the whole string.
  4. Set the options to skip punctuation and whitespace when tagging the string.
  5. Set the tagger string to the text parameter.
  6. Applies the block to all the found tags in the string for each word. This parameter combination identifies all the words in the string, then increments a dictionary value for the word, which works as the dictionary key.

Using the model

With the word counts in hand, they can now be fed into the model. Replace the contents of analyze(text:) with the following:

func analyze(text: String) {
  // 1
  let counts = wordCounts(text: text)
  // 2
  let model = Poets()
  
  // 3
  do {
    // 4
    let prediction = try model.prediction(text: counts)
    updateWithPrediction(poet: prediction.author,
                         probabilities: prediction.authorProbability)
  } catch {
    // 5
    print(error)
  }
}

This function:

  1. Initializes a variable to hold the output of wordCounts(text:).
  2. Creates an instance of the Core ML model.
  3. Wraps the prediction logic in a do/catch block because it can throw an error.
  4. Passes the parsed text to the prediction(text:) function that runs the model.
  5. Logs an error if one exists.

Build and run, then enter a poem and let the model do its magic!

.

The result is great, but you can chalk that one up to good training! Another poem may not have the desired results. For example, this Joyce Kilmer classic does not.

In this case, the model leans heavily towards Emily Dickinson since there are far more of her poems in the training set than any other author. This is the downside to machine learning — the results are only as good as the data used to train the models.