Tesseract OCR Tutorial for iOS

In this tutorial, you’ll learn how to read and manipulate text extracted from images using OCR by Tesseract. By Lyndsey Scott.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 2 of 3 of this article. Click here to view the first page.

Loading the Image

First, you’ll create a way to access images from the device’s camera or photo library.

Open ViewController.swift and insert the following into takePhoto(_:):

// 1
let imagePickerActionSheet =
  UIAlertController(title: "Snap/Upload Image",
                    message: nil,
                    preferredStyle: .actionSheet)

// 2
if UIImagePickerController.isSourceTypeAvailable(.camera) {
  let cameraButton = UIAlertAction(
    title: "Take Photo",
    style: .default) { (alert) -> Void in
      // TODO: Add more code here...
  }
  imagePickerActionSheet.addAction(cameraButton)
}

// 3
let libraryButton = UIAlertAction(
  title: "Choose Existing",
  style: .default) { (alert) -> Void in
    // TODO: Add more code here...
}
imagePickerActionSheet.addAction(libraryButton)

// 4
let cancelButton = UIAlertAction(title: "Cancel", style: .cancel)
imagePickerActionSheet.addAction(cancelButton)

// 5
present(imagePickerActionSheet, animated: true)

Here, you:

  1. Create an action sheet alert that will appear at the bottom of the screen.
  2. If the device has a camera, add a Take Photo button to the action sheet.
  3. Add a Choose Existing button to the action sheet.
  4. Add a Cancel button to action sheet. Selecting this button removes the alert without performing an action since it’s of type .cancel.
  5. Finally, present the alert.

Immediately below import UIKit add:

import MobileCoreServices

This gives ViewController access to the kUTTypeImage abstract image identifier, which you’ll use to limit the image picker’s media type.

Now within the cameraButton UIAlertAction’s closure, replace the // TODO comment with:

// 1
self.activityIndicator.startAnimating()
// 2
let imagePicker = UIImagePickerController()
// 3
imagePicker.delegate = self
// 4
imagePicker.sourceType = .camera
// 5
imagePicker.mediaTypes = [kUTTypeImage as String]
// 6
self.present(imagePicker, animated: true, completion: {
  // 7
  self.activityIndicator.stopAnimating()
})

So when the user taps cameraButton, this code:

  1. Reveals the view controller’s activity indicator.
  2. Creates an image picker.
  3. Assigns the current view controller as that image picker’s delegate.
  4. Tells the image picker to present as a camera interface to the user.
  5. Limits the image picker’s media type so the user can only capture still images.
  6. Displays the image picker.
  7. Hides the activity indicator once the image picker finishes animating into view.

Similarly, within libraryButton’s closure, add:

self.activityIndicator.startAnimating()
let imagePicker = UIImagePickerController()
imagePicker.delegate = self
imagePicker.sourceType = .photoLibrary
imagePicker.mediaTypes = [kUTTypeImage as String]
self.present(imagePicker, animated: true, completion: {
  self.activityIndicator.stopAnimating()
})

This is identical to the code you just added to cameraButton’s closure aside from imagePicker.sourceType = .photoLibrary. Here, you set the image picker to present the device’s photo library as opposed to the camera.

Next, to process the captured or selected image, insert the following into imagePickerController(_:didFinishPickingMediaWithInfo:):

// 1
guard let selectedPhoto =
  info[.originalImage] as? UIImage else {
    dismiss(animated: true)
    return
}
// 2
activityIndicator.startAnimating()
// 3
dismiss(animated: true) {
  self.performImageRecognition(selectedPhoto)
}

Here, you:

  1. Check to see whether info’s .originalImage key contains an image value. If it doesn’t, the image picker removes itself from view and the rest of the method doesn’t execute.
  2. If info’s .originalImage does in fact contain an image, display an activity indicator while Tesseract does its work.
  3. After the image picker animates out of view, pass the image into performImageRecognition.

You’ll code performImageRecognition in the next section of the tutorial, but, for now, just open Info.plist. Hover your cursor over the top cell, Information Property List, then click the + button twice when it appears.

In the Key fields of those two new entries, add Privacy – Camera Usage Description to one and Privacy – Photo Library Usage Description to the other. Select type String for each. Then in the Value column, enter whatever text you’d like to display to the user when requesting permission to access their camera and photo library, respectively.

Add Privacy settings to Info.plist

Build and run your project. Tap the Snap/Upload Image button and you should see the UIAlertController you just created.

UIAlertController

Test out the action sheet options and grant the app access to your camera and/or library when prompted. Confirm the photo library and camera display as expected.

Note: If you’re running on a simulator, there’s no physical camera available, so you won’t see the “Take Photo” option.

All good? If so, it’s finally time to use Tesseract!

Implementing Tesseract OCR

First, add the following below import MobileCoreServices to make the Tesseract framework available to ViewController:

import TesseractOCR

Now, in performImageRecognition(_:), replace the // TODO comment with the following:

// 1
if let tesseract = G8Tesseract(language: "eng+fra") {
  // 2
  tesseract.engineMode = .tesseractCubeCombined
  // 3
  tesseract.pageSegmentationMode = .auto
  // 4
  tesseract.image = image
  // 5
  tesseract.recognize()
  // 6
  textView.text = tesseract.recognizedText
}
// 7
activityIndicator.stopAnimating()

Since this is the meat of this tutorial, here’s a detailed break down, line by line:

  1. Initialize tesseract with a new G8Tesseract object that will use both English (“eng”)- and French (“fra”)-trained language data. Note that the poem’s French accented characters aren’t in the English character set, so it’s necessary to include the French-trained data in order for those accents to appear.
  2. Tesseract offers three different OCR engine modes: .tesseractOnly, which is the fastest, but least accurate method; .cubeOnly, which is slower but more accurate since it employs more artificial intelligence; and .tesseractCubeCombined, which runs both .tesseractOnly and .cubeOnly. .tesseractCubeCombined is the slowest, but since it’s most accurate, you’ll use it in this tutorial.
  3. Tesseract assumes, by default, that it’s processing a uniform block of text, but your sample image has multiple paragraphs. Tesseract’s pageSegmentationMode lets the Tesseract engine know how the text is divided. In this case, set pageSegmentationMode to .auto to allow for fully automatic page segmentation and thus the ability to recognize paragraph breaks.
  4. Assign the selected image to the tesseract instance.
  5. Tell Tesseract to get to work recognizing your text.
  6. Put Tesseract’s recognized text output into your textView.
  7. Hide the activity indicator since the OCR is complete.

Now, it’s time to test out this first batch of new code!

Processing Your First Image

In Finder, navigate to Love In A Snap/Resources/Lenore.png to find the sample image.

Poem Image

Lenore.png is an image of a love poem addressed to a “Lenore,” but with a few edits you can turn it into a poem that is sure to get the attention of the one you desire! :]

Although you could print a copy of the image, then snap a picture with the app to perform the OCR, you’ll make it easy on yourself and add the image directly to your device’s camera roll. This eliminates the potential for human error, further lighting inconsistencies, skewed text and flawed printing among other things. After all, the image is already dark and blurry as is.

Note: If you’re using a simulator, simply drag-and-drop the image file onto the simulator to add it to its photo library.

Build and run your app. Tap Snap/Upload Image, tap Choose Existing, then choose the sample image from the photo library to run it through OCR.

Note: You can safely ignore the hundreds of compilation warnings the TesseractOCR library produces.

Running OCR

Uh oh! Nothing appears! That’s because the current image size is too big for Tesseract to handle. Time to change that!