Vision Tutorial for iOS: What’s New With Face Detection?

Learn what’s new with Face Detection and how the latest additions to Vision framework can help you achieve better results in image segmentation and analysis. By Tom Elliott.

5 (2) · 1 Review

Download materials
Save for later
Share
You are currently viewing page 5 of 5 of this article. Click here to view the first page.

Saving to Camera Roll

Next, in captureOutput(_:didOutput:from:), immediately before initializing detectFaceRectanglesRequest, add the following:

if isCapturingPhoto {
  isCapturingPhoto = false
  savePassportPhoto(from: imageBuffer)
}

Here, you reset the isCapturingPhoto flag if needed and call a method to save the passport photo with the data from the image buffer.

Finally, write the implementation for savePassportPhoto(from:):

// 1
guard let model = model else {
  return
}

// 2
imageProcessingQueue.async { [self] in
  // 3
  let originalImage = CIImage(cvPixelBuffer: pixelBuffer)
  var outputImage = originalImage

  // 4
  if model.hideBackgroundModeEnabled {
    // 5
    let detectSegmentationRequest = VNGeneratePersonSegmentationRequest()
    detectSegmentationRequest.qualityLevel = .accurate

    // 6
    try? sequenceHandler.perform(
      [detectSegmentationRequest],
      on: pixelBuffer,
      orientation: .leftMirrored
    )

    // 7
    if let maskPixelBuffer = detectSegmentationRequest.results?.first?.pixelBuffer {
      outputImage = removeBackgroundFrom(image: originalImage, using: maskPixelBuffer)
    }
  }

  // 8
  let coreImageWidth = outputImage.extent.width
  let coreImageHeight = outputImage.extent.height

  let desiredImageHeight = coreImageWidth * 4 / 3

  // 9
  let yOrigin = (coreImageHeight - desiredImageHeight) / 2
  let photoRect = CGRect(x: 0, y: yOrigin, width: coreImageWidth, height: desiredImageHeight)

  // 10
  let context = CIContext()
  if let cgImage = context.createCGImage(outputImage, from: photoRect) {
    // 11
    let passportPhoto = UIImage(cgImage: cgImage, scale: 1, orientation: .upMirrored)

    // 12
    DispatchQueue.main.async {
      model.perform(action: .savePhoto(passportPhoto))
    }
  }
}

It looks like a lot of code! Here's what's happening:

  1. First, return early if the model hasn't been set up.
  2. Next, dispatch to a background queue to keep the UI snappy.
  3. Create a core image representation of the input image and a variable to store the output image.
  4. Then, if the user has requested the background to be removed...
  5. Create a new person segmentation request, this time without a completion handler. You want the best possible quality for the passport photo, so set the quality to accurate. This works here because you're only processing a single image, and you're performing it on a background thread.
  6. Perform the segmentation request.
  7. Read the results synchronously. If a mask pixel buffer exists, remove the background from the original image. Do this by calling removeBackgroundFrom(image:using:), passing it the more accurate mask.
  8. At this point, outputImage contains the passport photo with the desired background. The next step is to set the width and height for the passport photo. Remember the passport photo may not have the same aspect ratio as the camera.
  9. Calculate the frame of the photo, using the full width and the vertical center of the image.
  10. Convert the output image (a Core Image object) to a Core Graphics image.
  11. Then, create a UIImage from the core graphics image.
  12. Dispatch back to the main thread and ask the model to perform the save photo action.

Done!

Build and run. Align your face properly and take a photo with and without background hiding enabled. After taking a photo, a thumbnail will appear on the right-hand side of the footer. Clicking the thumbnail will load a detail view of the image. If you open the Photos app, you'll also find your photos saved to the camera roll.

Note how the quality of the background replacement is better in the still image that it was in the video feed.

Primary view showing a thumbnail

Photo detail view

Where to Go From Here?

In this tutorial, you learned how to use the updated Vision framework in iOS 15 to query roll, pitch and yaw in real time. You also learned about the new person segmentation APIs.

There are still ways to improve the app. For example, you could look at using Core Image's smile detector to prevent smiling photos. Or you could invert the mask to check if the real background is white when not hiding the background.

You could also look at publishing hasDetectedValidFace through a Combine stream. By throttling the stream, you could stop the UI from flickering fast when a face is on the edge of being acceptable.

The Apple documentation is a great resource for learning more about the Vision framework. If you want to learn more about Metal, try this excellent tutorial to get you started.

We hope you enjoyed this tutorial. If you have any questions or comments, please join the forum discussion below!