iOS & Swift Tutorials

Learn iOS development in Swift. Over 2,000 high quality tutorials!

Saliency Analysis in iOS using Vision

In this tutorial, you’ll learn how to use the Vision framework in iOS to perform saliency analysis and use it to create an effect on a live video feed.

5/5 3 Ratings

Version

  • Swift 5, iOS 13, Xcode 11

Do you know what the creepy thing about robots is? You can never tell where they’re looking. They have no pupils to give them away. It’s like a person wearing sunglasses. Are they staring at you or something else?

Finally, Apple has said enough is enough. Apple has given us the technology to see what an iPhone thinks is interesting to look at. It’s called saliency analysis, and you too can harness its power!

In this tutorial, you’ll build an app that uses saliency analysis to filter the camera feed from an iOS device to create a spotlight effect around interesting objects.

Along the way, you’ll learn how to use the Vision framework to:

  • Create requests to perform saliency analysis.
  • Use the observations returned to generate a heat map.
  • Filter a video stream using the heat maps as an input.
Note: As this tutorial uses the camera and APIs introduced in iOS 13, you’ll need a minimum of Xcode 11 and a device running iOS 13.0 or later. You can’t use the simulator for this because you need a live feed of video from a physical camera.

Get ready to see the world through your iPhone’s eyes!

Getting Started

Click the Download Materials button at the top or bottom of this tutorial. Open the starter project and explore the code to get a feel for how it works.

The starter project sets up the camera and displays its output to the screen, unmodified. Additionally, there’s a label at the top of the screen that describes the screen output. Initially, Original is displayed, as the camera feed is unaltered.

Tapping on the screen changes the label to Heat Map. But nothing in the camera feed changes.

You’ll fix that shortly. First though, what is saliency analysis?

Saliency Analysis

Saliency analysis uses algorithms to determine what is interesting or important to humans in an image. Essentially determining what it is about an image that catches someone’s eye.

Once you have picked out the important areas in a photo, you could then use this information to automate cropping or provide filter effects highlighting them.

If you perform saliency analysis in real-time on a video feed, you could also use the information to help focus on the key areas.

The Vision framework provided by Apple has two different types of saliency analysis: attention-based and object-based.

Attention-based saliency tries to determine what areas a person might look at. Object-based saliency, on the other hand, seeks to highlight entire objects of interest. Although related, the two are quite different.

Roll up your sleeves and crack your knuckles. It’s time to code. :]

Attention-Based Heat Maps

Both Vision APIs used for saliency analysis return a heat map. There are a variety of ways to visualize heat maps. Those returned by Vision requests are in grayscale. Additionally, the heat map is defined on a much coarser grid than the photo or video feed. According to Apple’s documentation, you will get back either a 64 x 64 or a 68 x 68 pixel heat map depending on whether or not you make the API calls in real-time.

Note: Although the documentation says to expect a 64 x 64 pixel heat map when calling the APIs in real time, the code used in this tutorial to perform the Vision requests on a video feed still resulted in an 80 x 68 pixel heat map.

Functions used to return the width and height for a CVPixelBuffer reported 68 x 68. However, the data included in the CVPixelBuffer was actually 80 x 68. This may be a bug and could change in the future.

If you’ve never used the Vision framework, check out our Face Detection Tutorial Using the Vision Framework for iOS for some information about how Vision requests work.

In CameraViewController.swift, add the following code to the end of captureOutput(_:didOutput:from:):

// 1
let req = VNGenerateAttentionBasedSaliencyImageRequest(
  completionHandler: handleSaliency)
    
do {
  // 2
  try sequenceHandler.perform(
    [req],
    on: imageBuffer,
    orientation: .up)
    
} catch {
  // 3
  print(error.localizedDescription)
}

With this code, you:

  1. Generate an attention-based saliency Vision request.
  2. Use the VNSequenceRequestHandler to perform the request on the CVImageBuffer created at the beginning of the method.
  3. Catch and print the error, if there was one.

There! Your first step toward understanding your robotic iPhone!

You’ll notice that Xcode is unhappy and doesn’t seem to know what handleSaliency is. Even though they’ve made great strides in computer vision, Apple still hasn’t found a way to make Xcode write your code for you.

You’ll need to write handleSaliency, which will take a completed vision request and do something useful with the result.

At the end of the same file, add a new extension to house your Vision-related methods:

extension CameraViewController {
}

Then, in this extension, add the handleSaliency completion handler you passed to the Vision request:

func handleSaliency(request: VNRequest, error: Error?) {
  // 1
  guard
    let results = request.results as? [VNSaliencyImageObservation],
    let result = results.first
    else { return }

  // 2
  guard let targetExtent = currentFrame?.extent else {
    return
  }
  
  // 3
  var ciImage = CIImage(cvImageBuffer: result.pixelBuffer)

  // 4
  let heatmapExtent = ciImage.extent
  let scaleX = targetExtent.width / heatmapExtent.width
  let scaleY = targetExtent.height / heatmapExtent.height

  // 5
  ciImage = ciImage
    .transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))
      
  // 6
  showHeatMap(with: ciImage)
}

Here, you:

  1. Ensure that the results are VNSaliencyImageObservation objects and extract the first result from the array of observations returned by the Vision request.
  2. Grab the image extent from the current frame. This is essentially the size of the current frame.
  3. Create a CIImage from the CVPixelBuffer that represents the heat map.
  4. Calculate the scale factor between the current frame and the heat map.
  5. Scale the heat map to the current frame’s size.
  6. Display the heat map using showHeatMap.

Now, just above handleSaliency(request:error:), add the handy helper method that will display the heat map:

func showHeatMap(with heatMap: CIImage) {
  // 1
  guard let frame = currentFrame else {
    return
  }
  
  let yellowHeatMap = heatMap
    // 2
    .applyingFilter("CIColorMatrix", parameters:
      ["inputBVector": CIVector(x: 0, y: 0, z: 0, w: 0),
       "inputAVector": CIVector(x: 0, y: 0, z: 0, w: 0.7)])
    // 3
    .composited(over: frame)

  // 4
  display(frame: yellowHeatMap)
}

In this method, you:

  1. Unwrap the currentFrame optional.
  2. Apply the CIColorMatrix Core Image filter to the heat map. You are zeroing out the blue component and simultaneously multiplying the alpha component of each pixel by 0.7. This results in a yellow heat map that is partly transparent.
  3. Add the yellow heat map on top of the original frame.
  4. Use the provided helper method to display the resulting image.

Before you build and run the app, you’ll need to make one final change.

Go back to captureOutput(_:didOutput:from:) and replace the following line of code:

display(frame: currentFrame)

with:

if mode == .original {
  display(frame: currentFrame)
  return
}

This code ensures that you only show the unfiltered frame when you’re in the Original mode. It also returns from the method, so you don’t waste precious computing cycles (and battery!) doing any Vision requests. :]

All right, it’s time! Build and run the app and then tap to put it into Heat Map mode.

Improving the Heat Map

While the heat map you have is pretty cool, there are two problems with it:

  1. The brightest spots can be quite dim if the algorithm isn’t very confident in its results.
  2. It looks pixelated.

The good news is that both issues are fixable.

Normalizing the Heat Map

You’ll solve the first problem by normalizing the heat map.

In CVPixelBufferExtension.swift, add the following normalizing method to the existing CVPixelBuffer extension:

func normalize() {
  // 1
  let bytesPerRow = CVPixelBufferGetBytesPerRow(self)
  let totalBytes = CVPixelBufferGetDataSize(self)

  let width = bytesPerRow / MemoryLayout<Float>.size
  let height = totalBytes / bytesPerRow
    
  // 2
  CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
  
  // 3
  let floatBuffer = unsafeBitCast(
    CVPixelBufferGetBaseAddress(self), 
    to: UnsafeMutablePointer<Float>.self)
  
  // 4  
  var minPixel: Float = 1.0
  var maxPixel: Float = 0.0
    
  // 5
  for i in 0 ..< width * height {
    let pixel = floatBuffer[i]
    minPixel = min(pixel, minPixel)
    maxPixel = max(pixel, maxPixel)
  }
    
  // 6
  let range = maxPixel - minPixel
    
  // 7
  for i in 0 ..< width * height {
    let pixel = floatBuffer[i]
    floatBuffer[i] = (pixel - minPixel) / range
  }
    
  // 8
  CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}

Phew! That was a lot of code. Here you:

  1. Extract the width and height of the CVPixelBuffer. You could do this by using CVPixelBufferGetWidth and CVPixelBufferGetHeight. However, since you're going to iterate over the actual data, it's actually better to use bytes-per-row and total-data-size so that you know you're operating within the bounds of the assigned memory.
  2. Lock the base address of the pixel buffer. This is required before accessing pixel data with the CPU.
  3. Cast the base address of the CVPixelBuffer to Float pointer, since you know that the heat map data is floating point data.
  4. Initialize some variables to keep track of the minimum and maximum pixel values found.
  5. Loop through each pixel in the CVPixelBuffer and save the smallest and largest values. As the CVPixelBuffer data is mapped linearly in memory, you can just loop over the number of pixels in the buffer.
  6. Calculate the range of the pixel values.
  7. Loop through each pixel again and normalize their values to fall between 0.0 and 1.0.
  8. Unlock the base address of the pixel buffer.

Before you try this out, you need to call normalize from your Vision request pipeline.

Open CameraViewController.swift, and find handleSaliency(request:error:) again. Just above the line where you declare and initialize ciImage, add this line:

result.pixelBuffer.normalize()

As normalize updates the CVPixelBuffer in place, make sure to call it before using result.pixelBuffer elsewhere.

Build and run the app again to see your more prominent heat map.

Not bad, right?

Blurring the Heat Map

Now, it's time to tackle the second problem: Pixelation. The pixelation happens because the heat map is 80 x 68 and you're scaling it up to the resolution of the video feed.

To fix this, apply a Gaussian blur to the heat map after scaling it up. Open CameraViewController.swift and find handleSaliency(request:error:) again. Then replace the following lines:

ciImage = ciImage
  .transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))

With:

ciImage = ciImage
  .transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))
  .applyingGaussianBlur(sigma: 20.0)
  .cropped(to: targetExtent)

You're applying a Gaussian blur directly after scaling the heat map and using a blur radius of 20.0. Since the blur will cause the image to grow by the blur radius on each side, crop it to the original image extent.

Build and run again and see your new and improved heat map!

Object-Based Heat Maps

Now that you're an expert in attention-based heat maps, it's time for you to experiment with object-based ones.

The object-based heat maps will attempt to segment entire objects that are deemed interesting. The Vision framework will try to conform the heat map to the shape of an object.

Additionally, you'll write the code in a way that will allow you to flip quickly between attention-based and object-based saliency. Doing so will allow you to see the difference between the two saliency methods easily.

Open CameraViewController.swift again. Go to captureOutput(_:didOutput:from:) and find the line where you create the VNGenerateAttentionBasedSaliencyImageRequest.

Replace that line with the following code:

// 1
let req: VNImageBasedRequest

// 2
var selectedSegmentIndex = 0
    
// 3
DispatchQueue.main.sync {
  selectedSegmentIndex = saliencyControl.selectedSegmentIndex
}
    
// 4
switch selectedSegmentIndex {
case 0:
  req = 
    VNGenerateAttentionBasedSaliencyImageRequest(completionHandler: handleSaliency)
case 1:
  req = 
    VNGenerateObjectnessBasedSaliencyImageRequest(completionHandler: handleSaliency)
default:
      fatalError("Unhandled segment index!")
}

With this code change, you:

  1. Declare a constant VNImageBasedRequest. Both types of saliency analysis requests inherit from this class, so you can use this constant to store either. Additionally, you can make it a constant instead of a variable as you guarantee to only assign it once.
  2. Initialize a variable to store the index of the selected segment from a UISegmentedControl. You have to initialize it in addition to declaring it. Otherwise, you would get an error that it was not initialized before being captured by the subsequent closure.
  3. Read the selectedSegmentIndex property of the predefined UISegmentedControl on the main thread to avoid accessing UI elements on a background thread.
  4. Create either a VNGenerateAttentionBasedSaliencyImageRequest or a VNGenerateObjectnessBasedSaliencyImageRequest based on which segment was selected.

Before you can build and run, make the UISegmentedControl visible at the appropriate time.

Find handleTap(_:) and add the following line at the top of the method:

saliencyControl.isHidden = false

Then, under the .heatMap case, add this line:

saliencyControl.isHidden = true

The complete method should look like this:

@IBAction func handleTap(_ sender: UITapGestureRecognizer) {
  saliencyControl.isHidden = false
    
  switch mode {
  case .original:
    mode = .heatMap
  case .heatMap:
    mode = .original
    saliencyControl.isHidden = true
  }
    
  modeLabel.text = mode.rawValue
}

You're making the default state of the saliencyControl visible, unless you're going to transition to the .original state, at which point you want to hide it.

Build and run. Switch to heat map mode. You should see a segmented control at the bottom of the screen, which allows you to change between attention-based and object-based saliency.

Spotlight Effect Using the Saliency Heat Maps

One use for saliency analysis is to create effects based on the heat maps to apply to images or video feeds. You're going to create one that highlights the salient areas and darkens everything else.

Still in CameraViewController.swift, just below showHeatMap(with:), add the following method:

func showFlashlight(with heatMap: CIImage) {
  // 1
  guard let frame = currentFrame else {
    return
  }
    
  // 2
  let mask = heatMap
    .applyingFilter("CIColorMatrix", parameters:
      ["inputAVector": CIVector(x: 0, y: 0, z: 0, w: 2)])

  // 3
  let spotlight = frame
    .applyingFilter("CIBlendWithMask", parameters: ["inputMaskImage": mask])

  // 4
  display(frame: spotlight)
}

In this method, you:

  1. Unwrap the current frame, which is a CIImage.
  2. Use a Core Image filter to multiply the alpha channel of the heat map by 2, thereby producing a brighter and slightly larger heated area in the heat map.
  3. Apply another Core Image filter to mask out any pixels from the frame where the heat map is black.
  4. Display the filtered image.

This method will be your special effects workhorse. To enable it, add a new case to the ViewMode enum at the top of the file:

case flashlight = "Spotlight"

Xcode should now be complaining. Replace the switch statement in handleTap with the following:

switch mode {
case .original:
  mode = .heatMap
case .heatMap:
  mode = .flashlight
case .flashlight:
  mode = .original
  saliencyControl.isHidden = true
}

This adds the new .flashlight case and adds it as the new mode after .heatMap.

Finally, at the bottom of handleSaliency(request:error:), replace the call to showHeatMap(with:), with the following code:

switch mode {
case .heatMap:
  showHeatMap(with: ciImage)
case .flashlight:
  showFlashlight(with: ciImage)
default:
  break
}

Here you choose the appropriate display method depending on the mode the app is in.

Build and run your app and check out the spotlight effect using both attention-based and object-based saliency!

Where to Go From Here?

Congratulations! You've learned a lot, done a ton of coding and created a cool effect with it. What now?

You could create more effects using the heat maps and potentially even combine saliency data with depth map data to create even cooler effects. If you'd like some ideas for other effects, check out our tutorials Image Depth Maps Tutorial for iOS: Getting Started and Video Depth Maps Tutorial for iOS: Getting Started.

You could also try to create an auto-crop or auto-focus feature using the saliency analysis data. The robotic iPhone world is your oyster!

If you have any questions or comments, please join the forum discussion below.

Average Rating

5/5

Add a rating for this content

3 ratings

Contributors

Comments