Home iOS & Swift Books Machine Learning by Tutorials

14
Natural Language Classification Written by Alexis Gallagher

Earlier in the book, you learned how to classify images — for example, judging whether they were of cats or dogs. You’ve also classified sequences of sensor data as device motions. Text is just another kind of data, and you can classify it as well. But what does a class of text look like?

Is this email legitimate or spam? Are customer messages praising your great work or demanding action to address complaints? What’s the topic of an article, patent or court document? These are just a few examples of text classification tasks.

There are a wide variety of techniques for extracting useful information from text, all falling under the general term natural language processing (NLP). This chapter focuses on using NLP for classification, specifically using the methods Apple provides as part of its operating systems. You may be familiar with NSLinguisticTagger, which has been available since iOS 5. It supports several NLP tasks and was covered in the “Natural Language Processing” chapter of our iOS 11 by Tutorials book, when Apple rewrote the class to take advantage of Core ML. This chapter does not use that class.

Apple introduced the new Natural Language framework in iOS 12 — and in each of its other device OS revisions that same year — which is meant to improve upon and replace NSLinguisticTagger. That’s the framework you’ll use here, along with Create ML to train your own models.

In this chapter, you’ll build an app to read movie reviews. Along the way, you’ll perform several NLP tasks:

  • Language identification
  • Named entity recognition
  • Lemmatization
  • Sentiment analysis

Don’t worry if any of those terms are unfamiliar to you — you’ll get to know them all soon.

A special thanks to Michael Katz and the editorial team of iOS 11 by Tutorials. Michael wrote that book’s “Natural Language Processing” chapter, on which this chapter is heavily based. Specifically, we reuse much of the starter project and general structure from that chapter, but we implement things differently, here. This chapter does cover some additional topics, such as training custom models, so we recommend going through it even if you’ve already read that book.

Getting started

Open the SMDB starter project in Xcode. Build and run to check out the app, which starts out looking like this (pull down on the list to reveal the Search bar):

The SMDB app
The SMDB app

The Search feature doesn’t work yet, but you’ll fix that soon. The app contains the following four tabs:

  • All: Shows a list of every movie review loaded from the “server.” (To keep things simple, SMDB actually loads from a JSON file included with the project.) You’ll add “heart-eyes” and “sad-face” emojis to the positive and negative reviews, respectively.

  • By Movie: Lists movie names where users can tap a name to only see reviews for that movie. You’ll eventually include tomato ratings showing each movie’s average review sentiment.

  • By Actor: Currently empty, you’ll make it show a list of names automatically discovered from the reviews, along with emoji showing the average sentiment for reviews mentioning each name. Users will be able to tap a name and see all the reviews that mention it.

  • By Language: Currently empty, it will soon list languages detected in the reviews. Users will then be able to tap a language to read all the reviews written in it.

You’ll add these missing features inside NLPHelper.swift, so open it now. It includes empty stubs for the functions that you’ll implement. Notice that it also imports the Natural Language framework, giving you access to well-trained machine-learning models for several NLP tasks. The first one you’ll take a look at is language identification.

Language identification

Your first classification task will be identifying the language of a piece of text. This is a common first step with NLP because different languages often need to be handled differently. For example, English and Chinese sentences are not tokenized in the same way.

func getLanguage(text: String) -> NLLanguage? {
  NLLanguageRecognizer.dominantLanguage(for: text)
}
Languages identified in reviews
Xuqmiovek otomcuroel at yimaiky

Additional language identification options

The NLLanguageRecognizer performs just one task: identifying languages used in text. If you need it, then you’ll most often use it as you did here, via its convenience function dominantLanguage(for:). However, there are situations that call for more control, and, in those cases, you’ll need to create an NLLanguageRecognizer object and call some of its other methods.

Finding named entities

Sometimes, you’ll want to find names mentioned in a piece of text. Maybe you want to sort articles based on who they are about, organize restaurant reviews based on the cities they mention, or extract important information from a document, which often includes names of people, places and organizations. This is called named entity recognition (NER), and it’s a common NLP task with many use cases. It’s also a form of text classification.

func getPeopleNames(text: String, block: (String) -> Void) {
  // 1
  let tagger = NLTagger(tagSchemes: [.nameType])
  tagger.string = text
  // 2
  let options: NLTagger.Options = [
    .omitWhitespace, .omitPunctuation, .omitOther, .joinNames]
  // 3
  tagger.enumerateTags(
    in: text.startIndex..<text.endIndex, unit: .word,
    scheme: .nameType, options: options) { tag, tokenRange in
    // 4
    if tag == .personalName {
      block(String(text[tokenRange]))
    }
    return true
  }
}
Names identified in reviews
Banuf ipuwyopaiz ac ruxaofv

Adding a search feature

In this next section, you’ll use NLTagger for another task: lemmatization. That’s the process of identifying the root version of a word. For example, consider the sentences, “I am running” and “I was running.” Reducing each term to its root, both sentences become the same: “I be run.” Sure, it no longer reads as correct, but it encapsulates most of the information contained in both sentences.

// 1
func getSearchTerms(text: String, language: String? = nil,
                    block: (String) -> Void) {
  // 2
  let tagger = NLTagger(tagSchemes: [.lemma])
  tagger.string = text
  let options: NLTagger.Options = [
    .omitWhitespace, .omitPunctuation, .omitOther, .joinNames]
  tagger.enumerateTags(
    in: text.startIndex..<text.endIndex, unit: .word,
    scheme: .lemma, options: options) { tag, tokenRange in
    if let tag = tag {
      // 3
      let lemma = tag.rawValue.lowercased()
      block(lemma)
    }
    return true
  }
}
Search results for 'sing'
Hiudhw yojuhbh koy 'dewg'

func findMatches(_ searchText: String) {
  var matches: Set<Review> = []
  // 1
  getSearchTerms(
    text: searchText,
    language: Locale.current.languageCode) { word in
    // 2
    if let founds = ReviewsManager.instance.searchTerms[word] {
         matches.formUnion(founds)
    }
  }
  reviews = matches.filter { baseReviews.contains($0) }
}
if let language = language {
  tagger.setLanguage(NLLanguage(rawValue: language),
                     range: text.startIndex..<text.endIndex)
}
let token = String(text[tokenRange]).lowercased()
if let tag = tag {
  ...
} else {
  block(token)
}
if lemma != token {
  block(token)
}

Sentiment analysis

Could we really cover machine learning for natural language without mentioning sentiment analysis? Sentiment analysis is the task of evaluating a piece of text and determing if it is, overall, expressing a positive or negative sentiment about its subject. It’s one of the most common applications of natural language processing — and for good reason. Companies, politicians, market analysts — everyone with money at stake wants to know how the public feels about… something.

// 1
func analyzeSentiment(text: String) -> Double? {
  // 2
  let tagger = NLTagger(tagSchemes: [.sentimentScore])
  tagger.string = text
  // 3
  let (tag, _) = tagger.tag(at: text.startIndex, 
                           unit: .paragraph, 
                           scheme: .sentimentScore)
  // 4
  guard let sentiment = tag,
     let score = Double(sentiment.rawValue) 
     else { return nil }
  return score
}
print("review text: \(review.text)\nscore: \(String(describing: analyzeSentiment(text: review.text)))\n\n")

Building a sentiment classifier

While it is convenient that Apple provides their own sentiment analysis API, it is instructive to build your own sentiment classifier. Why? Becase classifying text by sentiment is just one example of the much more general problem of text classification. Spam detection, prioritizing support requests, and identifying document topics are all variations of that same problem. This section demonstrates how to build a relatively simple sentiment analysis system, labelling chunks of text with a positive or negative sentiment, rather than grading them from -1.0 to +1.0. Remember, you can use these techniques for all sorts of classification tasks.

Training a text classifier with Create ML

You’ll use Create ML to train an MLTextClassifier model. This class is meant to classify larger chunks of text rather than individual words, although it is technically capable of doing both. You’ll see a different model later in this chapter that is better suited to classifying word tokens.

import CreateML
import PlaygroundSupport
// 1
let projectDir = "TextClassification/"
let dataDir = "MovieReviews/"
let trainUrl =
  playgroundSharedDataDirectory.appendingPathComponent(
    projectDir + dataDir + "train", isDirectory: true)
let testUrl =
  playgroundSharedDataDirectory.appendingPathComponent(
    projectDir + dataDir + "test", isDirectory: true)
// 2
let trainData =
  MLTextClassifier.DataSource.labeledDirectories(at: trainUrl)
let testData =
  MLTextClassifier.DataSource.labeledDirectories(at: testUrl)
Dataset folder structure
Cewuzin nexdeg wkpeflivo

let sentimentClassifier = try!
  MLTextClassifier(
    trainingData: trainData,
    parameters:
      MLTextClassifier.ModelParameters(language: .english))
Text classifier training output
Nahy xsetholiey rzieweck euzpic

// 1
let metrics = sentimentClassifier.evaluation(on: testData)
// 2
if metrics.isValid {
  print("Error rate (lower is better): \(metrics.classificationError)")
} else if let error = metrics.error {
  print("Error evaluating model: \(error)")
} else {
  print("Unknown error evaluating model")
}
Error rate on test set
Islem boho is yexp moz

// 1 (Optional)
let metadata = MLModelMetadata(
  author: "Your Name:",
  shortDescription:
    "A model trained to classify movie review sentiment",
  version: "1.0")
// 2
try! sentimentClassifier.write(
  to: playgroundSharedDataDirectory.appendingPathComponent(
    projectDir + "SentimentClassifier.mlmodel"),
  metadata: metadata)

Exploring other model types

You initialized MLTextclassifier with default parameters, specifying only that the language was English. But you can and should explore other configurations.

Use your text classifier in an app

Open your SMDB project in Xcode. Drag SentimentClassifier.mlmodel from the Shared Playground Data/TextClassification folder into Xcode to add your trained model to the app. Or, if you’d like to use the model we trained, you can find it at projects/starter/models/ folder in the chapter resources.

Looking at the mlmodel file
Giajomg ip dna qwvojas hiju

func getSentimentClassifier() -> NLModel? {
  try! NLModel(mlModel: SentimentClassifier().model)
}
func predictSentiment(
  text: String, sentimentClassifier: NLModel) -> String? {
  sentimentClassifier.predictedLabel(for: text)
}
Reviews with emoji showing sentiment
Gariefn kowj okasu nmazayb konrapotl

private func findSentiment(_ review: Review,
                           sentimentClassifier: NLModel?) {
  guard let sentimentClassifier = sentimentClassifier,
    review.language ==
      sentimentClassifier.configuration.language else {
    return
  }
  ...
}
Tomatoes showing average sentiment
Melazuiv lsolans opimuci baxnipiky

Emoji showing average sentiment
Oqona zkuyuch egagumu bapmejaxh

Comparing the analyzers

Before we finish, let’s make one more enhancement to the UI: update it to show the sentiment analysis from Apple’s built-in analyzer, so we can compare the result to our own classifier and provide the user more information.

cell.setSentiment(sentiment: review.sentiment, score: analyzeSentiment(text: review.text))
func setSentiment(sentiment: Int?, score: Double? = nil) {
  // 1
  let classified: String
  if let sentiment = sentiment {
    classified = sentimentMapping[sentiment] ?? ""
  } else {
    classified = ""
  }
  // 2
  let scored: String
  if let score = score {
    scored = "(: \(String(score)))"
  } else {
    scored = ""
  }
  // 3
  sentimentLabel.text = classified + " " + scored
}
Emoji vs Apple sentiment
Ebeyo rk Enwce qamkihott

Custom word classifiers

You’re done with the SMDB app for now, but you’ll come back to it again in the next chapter. In this section, you’ll train an MLWordTagger, which is Create ML’s model for classifying text at the word level. You’ll use it to create a custom tagging scheme for NLTagger.

[
 ...
  {
    "tokens": ["The", "Apple", "TV", "is", "great", "for",
               "watching", "TV", "and", "movies", ",",
               "and", "you", "can", "play", "games",
               "on", "it", ",", "too", "!"],
    "tags": ["_", "AppleProduct", "AppleProduct", "_", "_", "_",
             "_", "_", "_", "_", "_",
             "_", "_", "_", "_", "_",
             "_", "_", "_", "_", "_"]
  },
  {
    "tokens": ["Apple", "adding", "Windows", "support", "for",
               "iTunes", "helped", "the", "iPod",
               "succeed", "."],
    "tags": ["_", "_", "_", "_", "_",
             "AppleProduct", "_", "_", "AppleProduct",
             "_", "_"]
  },
 ...
]
import Foundation
import PlaygroundSupport
import CreateML
import CoreML
import NaturalLanguage
let trainUrl =
  Bundle.main.url(
    forResource: "custom_tags", withExtension: "json")!
let trainData = try MLDataTable(contentsOf: trainUrl)
let model = try MLWordTagger(
  trainingData: trainData,
  tokenColumn: "tokens", labelColumn: "tags",
  parameters: MLWordTagger.ModelParameters(language: .english))
let projectDir = "TextClassification/"

// Optionally add metadata before saving model
let savedModelUrl =
  playgroundSharedDataDirectory.appendingPathComponent(
    projectDir + "AppleProductTagger.mlmodel")

try model.write(to: savedModelUrl)
let compiledModelUrl =
  try MLModel.compileModel(at: savedModelUrl)
let appleProductModel =
  try NLModel(contentsOf: compiledModelUrl)
// 1
let appleProductTagScheme = NLTagScheme("AppleProducts")
// 2
let appleProductTagger = NLTagger(tagSchemes: [appleProductTagScheme])
// 3
appleProductTagger.setModels(
  [appleProductModel], forTagScheme: appleProductTagScheme)
let testStrings = [
  "I enjoy watching Netflix on my Apple TV, but I wish I had a bigger TV.",
  "The Face ID on my new iPhone works really fast!",
  "What's up with the keyboard on my MacBook Pro?",
  "Do you prefer the iPhone or the Pixel?"
]
let appleProductTag = NLTag("AppleProduct")
let options: NLTagger.Options = [
  .omitWhitespace, .omitPunctuation, .omitOther]
  
for str in testStrings {
  print("Checking \(str)")
  appleProductTagger.string = str
  appleProductTagger.enumerateTags(
    in: str.startIndex..<str.endIndex, 
    unit: .word,
    scheme: appleProductTagScheme, 
    options: options) { tag, tokenRange in
    
    if tag == appleProductTag {
      print("Found Apple product: \(str[tokenRange])")
    }
    return true
  }
}
Word classifier training output
Mipn dzufmoguoq wqiuzoln iuxziq

Word classifier test results
Falg yditcelauj cacw yafeyrf

The remaining bits

The Natural Language framework supports a few other things not specifically covered in this chapter. The three you’ll most likely use are gazetteers, part-of-speech tagging, and tokenization.

Key points

  • Use Apple’s new Natural Language framework to take advantage of fast, well trained machine-learning models for NLP.
  • NLLanguageRecognizer can identify the language used in a piece of text.
  • NLTagger and NLTagScheme allow you to chunk text into specific, labeled types. There are several built-in tagging schemes available, and you can specify your own.
  • NLTokenizer can break up text into documents, paragraphs, sentences or words.
  • Use Create ML and MLTextClassifier to train your own models to classify larger chunks of text, like sentences, paragraphs or documents.
  • Use Create ML and MLWordTagger to train models to classify text at the word level.
  • NLModel wraps Create ML models like MLTextClassifier and MLWordTagger in a way that ensures inputs are preprocessed in your app the same way they were during training. It’s also the required type for custom tagging schemes used with NLTagger.

Where to go from here?

This chapter covered most of what Apple makes easy via the Natural Language framework. You can find a completed version of the project in the chapter resources at projects/final/SMDB. When you’re ready, go on to the next chapter, where you’ll learn how to implement more advanced NLP features that involve creating custom models in Keras. You’ll continue working with this app, adding the ability to translate Spanish-language reviews into English.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Have feedback to share about the online reading experience? If you have feedback about the UI, UX, highlighting, or other features of our online readers, you can send them to the design team with the form below:

© 2020 Razeware LLC

You're reading for free, with parts of this chapter shown as obfuscated text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.

Unlock Now

To highlight or take notes, you’ll need to own this book in a subscription or purchased by itself.