Home · iOS & Swift Tutorials

Image Depth Maps Tutorial for iOS: Getting Started

Learn how you can use the incredibly powerful image manipulation frameworks on iOS to use image depth maps with only a few lines of code.

5/5 3 Ratings

Version

  • Swift 5, iOS 13, Xcode 11
Update note: Owen Brown updated this tutorial for Swift 5, iOS 13 and Xcode 11.4. Yono Mittlefehldt wrote the original.

Let’s be honest. We, the human race, will eventually create robots that will take over the world, right? One thing that will be super important to our future robot masters will be good depth perception. Without it, how will they know if it’s really a human they’ve just imprisoned or just a cardboard cutout of a human? One way they can do this is by using depth maps.

Before robots can use depth maps, however, they need to be programmed to understand them. That’s where you come in! In this tutorial, you’ll learn about the APIs Apple provides for image depth maps. Throughout the tutorial, you’ll:

  • Learn how the iPhone generates depth information.
  • Read depth data from images.
  • Combine this depth data with filters to create neat effects.

So what are you waiting for? Your iPhone wants to start seeing in 3D!

Getting Started

Download the starter project by clicking the Download Materials button at the top or bottom of the tutorial.

Before you begin, you need to run Xcode 11 or later. Running this tutorial on a device directly is highly recommended. To do so, you need an iPhone running iOS 13 or later.

Once that’s done, you can explore the materials for this tutorial. The bundled images include depth information to use with the tutorial.

If you prefer and you have a dual-camera iPhone, you can take your own images to use with this tutorial. To take pictures that include depth data, the iPhone needs to be running iOS 11 or later. Don’t forget to use Portrait mode in the Camera app.

Build and run. You’ll see this:

Screenshot of App after first running on an iPhone

Tapping on one image cycles to the next. If you add your own pictures, make sure they have the .jpg file extension.

In this tutorial, you’ll fill in the functionality of the Depth, Mask and Filtered segments that you can see right at the bottom of the screen. Feel free to tap on them. They don’t do much right now. They will soon!

Reading Depth Data

The most important class, in the iOS SDK, for depth data is AVDepthData.

Different image formats store depth data slightly differently. In images in the HEIC format, it’s stored as metadata, but JPGs store it as a second image within the same JPG.

You generally use AVDepthData to extract this auxiliary data from an image, so that’s the first step you’ll take in this tutorial. Open SampleImage.swift and add the following method at the bottom of SampleImage:

static func depthDataMap(forItemAt url: URL) -> CVPixelBuffer? {
  // 1
  guard let source = CGImageSourceCreateWithURL(url as CFURL, nil) else {
    return nil
  }

  // 2
  let cfAuxDataInfo = CGImageSourceCopyAuxiliaryDataInfoAtIndex(
    source,
    0,
    kCGImageAuxiliaryDataTypeDisparity
  )
  guard let auxDataInfo = cfAuxDataInfo as? [AnyHashable : Any] else {
    return nil
  }

  // 3
  let cfProperties = CGImageSourceCopyPropertiesAtIndex(source, 0, nil)
  guard 
    let properties = cfProperties as? [CFString: Any],
    let orientationValue = properties[kCGImagePropertyOrientation] as? UInt32,
    let orientation = CGImagePropertyOrientation(rawValue: orientationValue)
    else {
      return nil
  }

  // 4
  guard var depthData = try? AVDepthData(
    fromDictionaryRepresentation: auxDataInfo
  ) else {
    return nil
  }

  // 5
  if depthData.depthDataType != kCVPixelFormatType_DisparityFloat32 {
    depthData = depthData.converting(
      toDepthDataType: kCVPixelFormatType_DisparityFloat32
    )
  }

  // 7
  return depthData.applyingExifOrientation(orientation).depthDataMap
}

OK, that was quite a bit of code, but here’s what you did:

  1. First, you create a CGImageSource that represents the input file.
  2. From the image source at index 0, you copy the disparity data from its auxiliary data. You’ll learn more about what that means later, but you can think of it as depth data for now. The index is 0 because there’s only one image in the image source.
  3. The image’s orientation is stored as separate metadata. To correctly align the depth data, you extract this orientation using CGImageSourceCopyPropertiesAtIndex(_:_:_:). Now you can apply it later.
  4. You create an AVDepthData from the auxiliary data you read in.
  5. You ensure the depth data is the format you need — 32-bit floating point disparity information — and convert it if it isn’t.
  6. Finally, you apply the correct orientation and return this depth data map.

Now that you’ve set up the depth data, it’s time to put it to good use!

Implementing the Depth Data

Now before you can run this, you need to update depthData(forItemAt:). Replace its implementation with the following:

// 1
guard let depthDataMap = depthDataMap(forItemAt: url) else { return nil }

// 2
depthDataMap.normalize()

// 3
let ciImage = CIImage(cvPixelBuffer: depthDataMap)
return UIImage(ciImage: ciImage)

With this code:

  1. Using your new depthDataMap(forItemAt:), you read the depth data into a CVPixelBuffer.
  2. You then normalize the depth data using a provided extension to CVPixelBuffer. This makes sure all the pixels are between 0.0 and 1.0, where 0.0 are the farthest pixels and 1.0 are the nearest pixels.
  3. You then convert the depth data to a CIImage and then a UIImage and return it.
Note: If you’re interested in how normalize() works, look in CVPixelBufferExtension.swift. It loops through every value in the 2D array and keeps track of the minimum and maximum values seen. It then loops through all the values again and uses the min and max values to calculate a new value that is between 0.0 and 1.0.

Build and run and tap the Depth segment of the segmented control at the bottom.

Screenshot of app displaying a color image of bikes in a row Screenshot of app displaying a depth map of bikes in a row

Awesome! This is essentially a visual representation of the depth data. The whiter the pixel, the closer it is, the darker the pixel, the farther away it is. The normalization that you did ensured that the furthest pixel is solid black and the nearest pixel is solid white. Everything else is somewhere in that range of gray.

Great job!

How Does the iPhone Do This?

In a nutshell, the iPhone’s dual cameras are imitating stereoscopic vision.

Try this. Hold your index finger closely in front of your nose and pointing upward. Close your left eye. Without moving your finger or head, simultaneously open your left eye and close your right eye.

Now quickly switch back and forth closing one eye and opening the other. Pay attention to the relative location of your finger to objects in the background. See how your finger seems to make large jumps left and right compared to objects further away?

Finger looks like it's to the right of the background Finger looks like it's to the left of the background

The closer an object is to your eyes, the larger the change in its relative position compared to the background. Does this sound familiar? It’s a parallax effect!

The iPhone’s dual cameras are like its eyes, looking at two images taken at a slight offset from one another. It corresponds features in the two images and calculates how many pixels they have moved. This change in pixels is called disparity.

Image Disparity — the distance between the left- and right-eye views

Depth vs Disparity

So far, you’ve mostly used the term depth data but in your code, you requested kCGImageAuxiliaryDataTypeDisparity data. What gives?

Depth and disparity are essentially inversely proportional.

Disparity Depth Formula: Depth = 1/Disparity

The farther away an object is, the greater the object’s depth. The disparity is the distance between the equivalent object in the two images. According to the formula above, as this distance approaches zero, the depth approaches infinity.

If you played around with the starter project you might have noticed a slider at the bottom of the screen that’s visible when selecting the Mask and Filter segments.

You’re going to use this slider, along with the depth data, to make a mask for the image at a certain depth. Then you’ll use this mask to filter the original image and create some neat effects!

Creating a Mask

Open DepthImageFilters.swift and find createMask(for:withFocus:). Then add the following code to the top:

let s1 = MaskParams.slope
let s2 = -MaskParams.slope
let filterWidth =  2 / MaskParams.slope + MaskParams.width
let b1 = -s1 * (focus - filterWidth / 2)
let b2 = -s2 * (focus + filterWidth / 2)

These constants are going to define how you convert the depth data into an image mask.

Think of the depth data map as the following function:

Graph showing Pixel Value on the Y axis, Disparity on the X axis and a straight line at a 45-degree angle plotted between them

The pixel value of your depth map image is equal to the normalized disparity. Remember, a pixel value of 1.0 is white and a disparity value of 1.0 is closest to the camera. On the other side of the scale, a pixel value of 0.0 is black and a disparity value of 0.0 is farthest from the camera.

To create a mask from the depth data, you’ll change this function to be something much more interesting. It will essentially pick out a certain depth. To illustrate that, consider the following version of the same pixel value to disparity function:

Graph showing the focal point

This is showing a focal point of 0.75 disparity, with a peak of width 0.1 and slope 4.0 on either side. createMask(for:withFocus:) will use some funky math to create this function.

This means that the whitest pixels (value 1.0) will be those with a disparity of 0.75 ± 0.05 (focal point ± width / 2). The pixels will then quickly fade to black for disparity values above and below this range. The larger the slope, the faster they’ll fade to black.

You’ll set the mask up in two parts — the left side and the right side. You’ll then combine them.

Setting up the Left Side of the Mask

After the constants you previously added, add the following:

let depthImage = image.depthData.ciImage!
let mask0 = depthImage
  .applyingFilter("CIColorMatrix", parameters: [
    "inputRVector": CIVector(x: s1, y: 0, z: 0, w: 0),
    "inputGVector": CIVector(x: 0, y: s1, z: 0, w: 0),
    "inputBVector": CIVector(x: 0, y: 0, z: s1, w: 0),
    "inputBiasVector": CIVector(x: b1, y: b1, z: b1, w: 0)])
  .applyingFilter("CIColorClamp")

This filter multiplies all the pixels by the slope s1. Since the mask is grayscale, you need to make sure that all color channels have the same value. After using CIColorClamp to clamp the values to be between 0.0 and 1.0, this filter will apply the following function:

Graph showing the color clamp

The larger s1 is, the steeper the slope of the line will be. The constant b1 moves the line left or right.

Setting up the Right Side of the Mask

To take care of the other side of the mask function, add the following:

let mask1 = depthImage
  .applyingFilter("CIColorMatrix", parameters: [
    "inputRVector": CIVector(x: s2, y: 0, z: 0, w: 0),
    "inputGVector": CIVector(x: 0, y: s2, z: 0, w: 0),
    "inputBVector": CIVector(x: 0, y: 0, z: s2, w: 0),
    "inputBiasVector": CIVector(x: b2, y: b2, z: b2, w: 0)])
  .applyingFilter("CIColorClamp")

Since the slope s2 is negative, the filter applies the following function:

negative slope

Combining the Two Masks

Now, put the two masks together. Add the following code:

let combinedMask = mask0.applyingFilter("CIDarkenBlendMode", parameters: [
  "inputBackgroundImage": mask1
])
let mask = combinedMask.applyingFilter("CIBicubicScaleTransform", parameters: [
  "inputScale": image.depthDataScale
])

You combine the masks by using the CIDarkenBlendMode filter, which chooses the lower of the two values of the input masks.

Then you scale the mask to match the image size since the data map is a lower resolution.

Finally, replace the return line with:

return mask

Build and run your project. Tap the Mask segment and play with the slider.

You’ll see something like this:

Table - Mask Filter

As you move the slider from left to right, the mask is picking out pixels from far to near. So when the slider is all the way to the left, the white pixels will be those that are far away. And when the slider is all the way to the right, the white pixels will be those that are near.

Your First Depth-Inspired Filter

Next, you’ll create a filter that mimics a spotlight. The spotlight will shine on objects at a chosen depth and fade to black from there.

Because you’ve already put in the hard work of reading in the depth data and creating the mask, it’s going to be super simple.

Return to DepthImageFilters.swift and add the following method at the bottom of the DepthImageFilters class:

func createSpotlightImage(
  for image: SampleImage,
  withFocus focus: CGFloat
) -> UIImage? {
  // 1
  let mask = createMask(for: image, withFocus: focus)

  // 2
  let output = image.filterImage.applyingFilter("CIBlendWithMask", parameters: [
    "inputMaskImage": mask
  ])

  // 3
  guard let cgImage = context.createCGImage(output, from: output.extent) else {
    return nil
  }

  // 4
  return UIImage(cgImage: cgImage)
}

Here’s what you did in these lines:

  1. Got the depth mask that you implemented within createMask(for:withFilter:).
  2. Used CIBlendWithMask and passed in the mask you created in the previous line. The filter essentially sets the alpha value of a pixel to the corresponding mask pixel value. So when the mask pixel value is 1.0, the image pixel is completely opaque and when the mask pixel value is 0.0, the image pixel is completely transparent. Since the UIView behind the UIImageView has a black color, black is what you will see coming from behind the image.
  3. You created a CGImage using the CIContext.
  4. You then created a UIImage and returned it.

To see this filter in action, you first need to tell DepthImageViewController to call this method when appropriate.

Open DepthImageViewController.swift and go to createImage(for:mode:filter:).

Look for the switch case that switches on the .filtered and .spotlight cases and replace the return statement with the following:

return depthFilters.createSpotlightImage(for: image, withFocus: focus)

Build and run. Tap the Filtered segment and ensure that you select Spotlight at the top. Play with the slider. You should see this:

Wall - Spotlight Filter

Congratulations! You’ve written your first depth-inspired image filter.

But you’re just getting warmed up. You want to write another one, right?

Color Highlight Filter

Return to DepthImageFilters.swift and add the following new method again:

func createColorHighlight(
  for image: SampleImage,
  withFocus focus: CGFloat
) -> UIImage? {
  let mask = createMask(for: image, withFocus: focus)
  let grayscale = image.filterImage.applyingFilter("CIPhotoEffectMono")
  let output = image.filterImage.applyingFilter("CIBlendWithMask", parameters: [
    "inputBackgroundImage" : grayscale,
    "inputMaskImage": mask
  ])

  guard let cgImage = context.createCGImage(output, from: output.extent) else {
    return nil
  }

  return UIImage(cgImage: cgImage)
}

This should look familiar. It’s almost exactly the same as createSpotlightImage(for:withFocus:), which you just wrote. The difference is that this time, you set the background image to be a grayscale version of the original image.

This filter will show full color at the focal point based on the slider position, and fade to gray from there.

Open DepthImageViewController.swift and, in the same switch statement, replace the code for (.filtered, .color) with the following:

return depthFilters.createColorHighlight(for: image, withFocus: focus)

This calls your new filter method and displays the result.

Build and run to see the magic:

Building - Color Filter

Don’t you hate it when you take a picture only to discover later that the camera focused on the wrong object? What if you could change the focus after the fact?

That’s exactly what the depth-inspired filter you’ll write next does!

Change the Focal Length

Under createColorHighlight(for:withFocus:) in DepthImageFilters.swift, add one last method:

func createFocalBlur(
  for image: SampleImage,
  withFocus focus: CGFloat
) -> UIImage? {
  // 1
  let mask = createMask(for: image, withFocus: focus)

  // 2
  let invertedMask = mask.applyingFilter("CIColorInvert")

  // 3
  let output = image.filterImage.applyingFilter(
    "CIMaskedVariableBlur", 
    parameters: [
      "inputMask" : invertedMask,
      "inputRadius": 15.0
    ])

  // 4
  guard let cgImage = context.createCGImage(output, from: output.extent) else {
    return nil
  }

  // 5
  return UIImage(cgImage: cgImage)
}

This filter is a little different from the other two.

  1. First, you get the initial mask that you’ve used previously.
  2. You then use CIColorInvert to invert the mask.
  3. Then you apply CIMaskedVariableBlur, a filter that was new with iOS 11. It will blur using a radius equal to inputRadius multiplied by the mask’s pixel value. When the mask pixel value is 1.0, the blur is at its max, which is why you needed to invert the mask first.
  4. Once again, you generate a CGImage using CIContext.
  5. You use that CGImage to create a UIImage and return it.
Note: If you have performance issues, try decreasing the inputRadius. Gaussian blurs are computationally expensive and the bigger the blur radius, the more computations occur.

Before you can run, you need to once again update the switch statement back in DepthImageViewController.swift. To use your shiny new method, change the code under (.focused, .blur) to:

return depthFilters.createFocalBlur(for: image, withFocus: focus)

Build and run.

Bike - Blur Filter

It’s… so… beautiful!

Its so beautiful

More About AVDepthData

Remember how you scaled the mask in createMask(for:withFocus:)? You had to do this because the depth data captured by the iPhone is a lower resolution than the sensor resolution. It’s closer to 0.5 megapixels than the 12 megapixels the camera can take.

Another important thing to know is the data can be filtered or unfiltered. Unfiltered data may have holes represented by NaN, which stands for Not a Number — a possible value in floating point data types. If the phone can’t correlate two pixels or if something obscures just one of the cameras, it uses these NaN values for disparity.

Pixels with a value of NaN display as black. Since multiplying by NaN will always result in NaN, these black pixels will propagate to your final image, and they’ll look like holes.

As this can be a pain to deal with, Apple gives you filtered data, when available, to fill in these gaps and smooth out the data.

If you’re unsure, check isDepthDataFiltered to find out if you’re dealing with filtered or unfiltered data.

Where to Go From Here?

Download the final project using the Download Materials button at the top or bottom of this tutorial.

There are tons more Core Image filters available. Check Apple’s Core Image Filter Reference for a complete list. Many of these filters create interesting effects when you combine them with depth data.

Additionally, you can capture depth data with video, too! Think of the possibilities.

I hope you had fun building these image filters. If you have any questions or comments, please join the forum discussion below!

Average Rating

5/5

Add a rating for this content

3 ratings

More like this

Contributors

Comments