Scanner Tutorial for macOS

Use NSScanner to analyze strings from natural form to computer languages. In this NSScanner tutorial, you’ll learn how to extract information from emails. By Hai Nguyen.

Leave a rating/review
Save for later
Share
You are currently viewing page 2 of 4 of this article. Click here to view the first page.

Creating the Data Structure

Navigate to File\New\File… (or simply press Command+N). Select macOS > Source > Swift File and click Next. Set the file’s name to HardwarePost.swift, then click Create.

Open HardwarePost.swift and add the following structure:

struct HardwarePost {
  // MARK: Properties
  
  // the fields' values once extracted placed in the properties
  let email: String
  let sender: String
  let subject: String
  let date: String
  let organization: String
  let numberOfLines: Int
  let message: String
  
  let costs: [Double]         // cost related information
  let keywords: Set<String>   // set of distinct keywords
}

This code defines HardwarePost structure that stores the parsed data. By default, Swift provides you a default constructor based on its properties, but you’ll come back to this later to implement your own custom initializer.

Are you ready for parsing in action with Scanner? Let’s do this.

Creating the Data Parser

Navigate to File\New\File… (or simply press Command+N), select macOS > Source > Swift File and click Next. Set the file’s name to ParserEngine.swift, then click Create.

Open ParserEngine.swift and create ParserEngine class by adding the following code:

final class ParserEngine {

}

Extracting Metadata Fields

Consider the following sample metadata segment:

Metadata-Segment

Here’s where Scanner comes in and separates the fields and their values. The image below gives you a general visual representation of this structure.

Field-Structure-Illustraion

Open ParserEngine.swift and implement this code inside ParserEngine class:

// 1.
typealias Fields = (sender: String, email: String, subject: String, date: String, organization: String, lines: Int)

/// Returns a collection of predefined fields' extracted values
func fieldsByExtractingFrom(_ string: String) -> Fields {
  // 2.
  var (sender, email, subject, date, organization, lines) = ("", "", "", "", "", 0)
  
  // 3.
  let scanner = Scanner(string: string)
  scanner.charactersToBeSkipped = CharacterSet(charactersIn: " :\n")
  
  // 4.
  while !scanner.isAtEnd {                  // A
    let field = scanner.scanUpTo(":") ?? "" // B
    let info = scanner.scanUpTo("\n") ?? "" // C
    
    // D
    switch field {
    case "From": (email, sender) = fromInfoByExtractingFrom(info) // E
    case "Subject": subject = info
    case "Date": date = info
    case "Organization": organization = info
    case "Lines": lines = Int(info) ?? 0
    default: break
    }
  }
  
  return (sender, email, subject, date, organization, lines)
}

Don’t panic! The Xcode error of an unresolved identifier will go away right in the next section.

Here’s what the above code does:

  1. Defines a Fields type alias for the tuple of parsed fields.
  2. Creates variables that will hold the returning values.
  3. Initializes a Scanner instance and changes its charactersToBeSkipped property to also include a colon beside the default values – whitespace and linefeed.
  4. Obtains values of all the wanted fields by repeating the process below:
    1. Uses while to loop through string‘s content until it reaches the end.
    2. Invokes one of the helper functions you created earlier to get field‘s title before :.
    3. Continues scanning up to the end of the line where the linefeed character \n is located and assigns the result to info.
    4. Uses switch to find the matching field and stores its info property value into the proper variable.
    5. Analyzes From field by calling fromInfoByExtractingFrom(_:). You’ll implement the method after this section.
  1. Uses while to loop through string‘s content until it reaches the end.
  2. Invokes one of the helper functions you created earlier to get field‘s title before :.
  3. Continues scanning up to the end of the line where the linefeed character \n is located and assigns the result to info.
  4. Uses switch to find the matching field and stores its info property value into the proper variable.
  5. Analyzes From field by calling fromInfoByExtractingFrom(_:). You’ll implement the method after this section.

Remember the tricky part of From field? Hang tight because you’re going to need help from regular expression to overcome this challenge.

Note: Regular expressions are a great tool to manipulate strings with patterns, and this NSRegularExpression Tutorial gives a good overview of how to use them.

At the end of ParserEngine.swift, add the following String extension:

private extension String {
  
  func isMatched(_ pattern: String) -> Bool {
    return NSPredicate(format: "SELF MATCHES %@", pattern).evaluate(with: self)
  }
}

This extension defines a private helper method to find whether the string matches a given pattern using regular expressions.

It creates a NSPredicate object with a MATCHES operator using the regular expression pattern. Then it invokes evaluate(with:) to check if the string matches the conditions of the pattern.

Note: You can read more about NSPredicate in the official Apple documentation.

Now add the following method inside the ParserEngine implementation, just after fieldsByExtractingFrom(_:) method:

fileprivate func fromInfoByExtractingFrom(_ string: String) -> (email: String, sender: String) {
  let scanner = Scanner(string: string)
  
  // 1.
  /*
   * ROGOSCHP@MAX.CC.Uregina.CA (Are we having Fun yet ???)
   * oelt0002@student.tc.umn.edu (Bret Oeltjen)
   * (iisi owner)
   * mbuntan@staff.tc.umn.edu ()
   * barry.davis@hal9k.ann-arbor.mi.us (Barry Davis)
   */
  if string.isMatched(".*[\\s]*\\({1}(.*)") { // A
    scanner.charactersToBeSkipped = CharacterSet(charactersIn: "() ") // B
    
    let email = scanner.scanUpTo("(")  // C
    let sender = scanner.scanUpTo(")") // D
    
    return (email ?? "", sender ?? "")
  }
  
  // 2.
  /*
   * "Jonathan L. Hutchison" <jh6r+@andrew.cmu.edu>
   * <BR4416A@auvm.american.edu>
   * Thomas Kephart <kephart@snowhite.eeap.cwru.edu>
   * Alexander Samuel McDiarmid <am2o+@andrew.cmu.edu>
   */
  if string.isMatched(".*[\\s]*<{1}(.*)") {
    scanner.charactersToBeSkipped = CharacterSet(charactersIn: "<> ")
    
    let sender = scanner.scanUpTo("<")
    let email = scanner.scanUpTo(">")
    
    return (email ?? "", sender ?? "")
  }
  
  // 3.
  return ("unknown", string)
}

After examining the 49 data sets, you end up with three cases to consider:

  • email (name)
  • name <email>
  • email with no name

Here’s what the code does:

Field-Value-Illustration

  1. Matches string with the first pattern – email (name). If not, continues to the next case.
    1. Looks for zero or more occurrences of any character – .*, followed by zero or more occurrence of a space – [\\s]*, followed by one open parenthesis – \\({1} and finally zero or more occurrences of a string – (.*).
    2. Sets the Scanner object’s charactersToBeSkipped to include: “(“, “)” and whitespace.
    3. Scans up to ( to get the email value.
    4. Scans up to ), which gives you the sender name. This extracts everything before ( and after ).
  2. Checks whether the given string matches the pattern – name <email>. The if body is practically the same as the first scenario, except that you deal with angle brackets.
  3. Finally, if neither of the two patterns is matched, this is the case where you only have an email. You’ll simply return the string for the email and “unknown” for sender.
  1. Looks for zero or more occurrences of any character – .*, followed by zero or more occurrence of a space – [\\s]*, followed by one open parenthesis – \\({1} and finally zero or more occurrences of a string – (.*).
  2. Sets the Scanner object’s charactersToBeSkipped to include: “(“, “)” and whitespace.
  3. Scans up to ( to get the email value.
  4. Scans up to ), which gives you the sender name. This extracts everything before ( and after ).

At this point, you can build the project. The previous compile error is gone.

Starter-Initial-Screen

Note: NSDataDetector would be a better solution for known-data types like phone number, address, and email. You can check out this blog about email validation with NSDataDetector.

You’ve been working with Scanner to analyze and retrieve information from a patterned string. In the next two sections, you’ll learn how to parse unstructured data.

Hai Nguyen

Contributors

Hai Nguyen

Author

Over 300 content creators. Join our team.