Scanner Tutorial for macOS

Use NSScanner to analyze strings from natural form to computer languages. In this NSScanner tutorial, you’ll learn how to extract information from emails. By Hai Nguyen.

Leave a rating/review
Save for later
Share
You are currently viewing page 3 of 4 of this article. Click here to view the first page.

Extracting Cost-Related Information

A good example of parsing unstructured data is to determine whether the email’s body contains cost-related information. To do this, you’ll use Scanner to search for an occurrence of a dollar character: $.

Still working on ParserEngine.swift, add the following implementation inside ParserEngine class:

func costInfoByExtractingFrom(_ string: String) -> [Double] {
  // 1.
  var results = [Double]()
  
  // 2.
  let dollar = CharacterSet(charactersIn: "$")
  
  // 3.
  let scanner = Scanner(string: string)
  scanner.charactersToBeSkipped = dollar
  
  // 4.
  while !scanner.isAtEnd && scanner.scanUpToCharacters(from: dollar, into: nil) {
    results += [scanner.scanDouble()].flatMap { $0 }
  }
  
  return results
}

The code is fairly straightforward:

  1. Defines an empty array to store the cost values.
  2. Creates a CharacterSet object with a $ character.
  3. Initializes a Scanner instance and configures it to ignore the $ character.
  4. Loops through string‘s content and when a $ is found, grabs the number after $ with your helper method and appends it to results array.

Parsing the Message

Another example of parsing unstructured data is finding keywords in a given body of text. Your search strategy is to look at every word and check it against a set of keywords to see if it matches. You’ll use the whitespace and newline characters to take the words in the message as scanning.

Keywords-Parser-Illustration

Add the following code at the end of ParserEngine class:

// 1.
let keywords: Set<String> = ["apple", "macs", "software", "keyboard",
                             "printers", "printer", "video", "monitor",
                             "laser", "scanner", "disks", "cost", "price",
                             "floppy", "card", "phone"]

/// Return a set of keywords extracted from
func keywordsByExtractingFrom(_ string: String) -> Set<String> {
  // 2.
  var results: Set<String> = []
  
  // 3.
  let scanner = Scanner(string: string)
  
  // 4.
  while !scanner.isAtEnd, let word = scanner.scanUpTo(" ")?.lowercased()  {
    if keywords.contains(word) {
      results.insert(word)
    }
  }
  
  return results
}

Here’s what this code does:

  1. Defines the keywords set that you’ll match against.
  2. Creates a Set of String to store the found keywords.
  3. Initializes a Scanner instance. You’ll use the default charactersToBeSkipped, which are the whitespace and newline characters.
  4. For every word found, checks whether it’s one of the predefined keywords. If it is, appends it into results.

There — you have all of the necessary methods to acquire the desired information. Time to put them to good use and create HardwarePost instances for the 49 data files.

Connecting the Parser With Data Samples

Open HardwarePost.swift and add this initializer into HardWarePost structure:

init(fromData data: Data) {
  // 1.
  let parser = ParserEngine()
  
  // 2.
  let string = String(data: data, encoding: String.Encoding.utf8) ?? ""
  
  // 3.
  let scanner = Scanner(string: string)
  
  // 4.
  let metadata = scanner.scanUpTo("\n\n") ?? ""
  let (sender, email, subject, date, organization, lines) = parser.fieldsByExtractingFrom(metadata)
  
  // 5.
  self.sender = sender
  self.email = email
  self.subject = subject
  self.date = date
  self.organization = organization
  self.numberOfLines = lines
  
  // 6.
  let startIndex = string.characters.index(string.startIndex, offsetBy: scanner.scanLocation)                                               // A
  let message = string[startIndex..<string.endIndex]                      // B
  self.message = message.trimmingCharacters(in: .whitespacesAndNewlines ) // C
  
  // 7.
  costs = parser.costInfoByExtractingFrom(message)
  keywords = parser.keywordsByExtractingFrom(message)
}

Here's how HardwarePost initializes its properties:

  1. Simply creates a ParserEngine object named parser.
  2. Converts data into a String.
  3. Initializes an instance of Scanner to parse the Metadata and Message segments, which are separated by "\n\n".
  4. Scans up to the first \n\n to grab the metadata string, then invokes the parser's fieldsByExtractingFrom(_:) method to obtain all of the metadata fields.
  5. Assigns the parsing results to the HardwarePost properties.
  6. Prepares the message content:
    1. Gets the current reading cursor from scanner with scanLocation and converts it to String.CharacterView.Index, so you can substitute string by range.
    2. Assigns the remaining string that scanner has yet to read into the new message variable.
    3. Since message value still contains \n\n where the scanner left off from the previous reading, you need to trim it and give the new value back to the HardwarePost instance's message property.
  7. Invokes the parser's methods with message to retrieve values for cost and keywords properties.
  1. Gets the current reading cursor from scanner with scanLocation and converts it to String.CharacterView.Index, so you can substitute string by range.
  2. Assigns the remaining string that scanner has yet to read into the new message variable.
  3. Since message value still contains \n\n where the scanner left off from the previous reading, you need to trim it and give the new value back to the HardwarePost instance's message property.

At this point, you can create HardwarePost instances directly from the files' data. You are only few more steps from displaying the final product!

Displaying Parsed Data

Open PostCell.swift and add the following method inside the PostCell class implementation:

func configure(_ post: HardwarePost) {
  
  senderLabel.stringValue = post.sender
  emailLabel.stringValue = post.email
  dateLabel.stringValue = post.date
  subjectLabel.stringValue = post.subject
  organizationLabel.stringValue = post.organization
  numberOfLinesLabel.stringValue = "\(post.numberOfLines)"
  
  // 1.
  costLabel.stringValue = post.costs.isEmpty ? "NO" : 
                                               post.costs.map { "\($0)" }.lazy.joined(separator: "; ")
  
  // 2.
  keywordsLabel.stringValue = post.keywords.isEmpty ? "No keywords found" : 
                                                      post.keywords.joined(separator: "; ")
}

This code assigns the post values to the cell labels. costLabel and keywordsLabel require special treatment because they can be empty. Here's what happens:

  1. If the costs array is empty, it sets the costLabel string value to NO; otherwise, it concatenates the cost values with "; " as a separator.
  2. Similarly, sets keywordsLabel string value to No words found for an empty set of post.keywords.

You're almost there! Open DataSource.swift. Delete the DataSource initializer init() and add the following code into the class:

let hardwarePosts: [HardwarePost] // 1.

override init() {
  self.hardwarePosts = Bundle.main                                                // 2.
    .urls(forResourcesWithExtension: nil, subdirectory: "comp.sys.mac.hardware")? // 3.
    .flatMap( { try? Data(contentsOf: $0) }).lazy                                 // 4.                                                                    
    .map(HardwarePost.init) ?? []                                                 // 5.
  
  super.init()
}

This is what the code does:

  1. Stores the HardwarePost instances.
  2. Obtains a reference to the application's main Bundle.
  3. Retrieves urls of the sample files inside the comp.sys.mac.hardware directory.
  4. Lazily acquires an array of Data instances by reading file contents with Data failable initializer and flatMap(_:). The idea of using flatMap(_:) is to get back a subarray containing only elements that are not nil.
  5. Finally, transforms the Data results to a HardwarePost object and assigns them to the DataSource hardwarePosts property.

Now you need to set up the table view's data source and delegate so that your app can show your hard work.

Open DataSource.swift. Find numberOfRows(in:) and replace it with the following:

func numberOfRows(in tableView: NSTableView) -> Int {
    return hardwarePosts.count
}

numberOfRows(in:) is part of the table view’s data source protocol; it sets the number of rows of the table view.

Next, find tableView(_:viewForTableColumn:row:) and replace the comment that says: //TODO: Set up cell view with the code below:

cell.configure(hardwarePosts[row]) 

The table view invokes its delegate tableView(_:viewForTableColumn:row:) method to set up every individual cell. It gets a reference to the post for that row and invokes PostCell's configure(_:) method to display the data.

Now you need to show the post in the text view when you select a post on the table view. Replace the initial implementation of tableViewSelectionDidChange(_:) with the following:

func tableViewSelectionDidChange(_ notification: Notification) {
  guard let tableView = notification.object as? NSTableView else {
    return
  }
  textView.string = hardwarePosts[tableView.selectedRow].message
}

tableViewSelectionDidChange(_:) is called when the table view’s selection has changed. When that happens, this code gets the hardware post for the selected row and displays the message in the text view.

Build and run your project.

starter-final

All of the parsed fields are now neatly displayed on the table. Select a cell on the left, and you'll see the corresponding message on the right. Good Job!

Hai Nguyen

Contributors

Hai Nguyen

Author

Over 300 content creators. Join our team.