Image Depth Maps Tutorial for iOS: Getting Started

Learn how you can use the incredibly powerful image manipulation frameworks on iOS to use image depth maps with only a few lines of code. By Owen L Brown.

Leave a rating/review
Download materials
Save for later
Share
Update note: Owen Brown updated this tutorial for Swift 5, iOS 13 and Xcode 11.4. Yono Mittlefehldt wrote the original.

Let’s be honest. We, the human race, will eventually create robots that will take over the world, right? One thing that will be super important to our future robot masters will be good depth perception. Without it, how will they know if it’s really a human they’ve just imprisoned or just a cardboard cutout of a human? One way they can do this is by using depth maps.

Before robots can use depth maps, however, they need to be programmed to understand them. That’s where you come in! In this tutorial, you’ll learn about the APIs Apple provides for image depth maps. Throughout the tutorial, you’ll:

  • Learn how the iPhone generates depth information.
  • Read depth data from images.
  • Combine this depth data with filters to create neat effects.

So what are you waiting for? Your iPhone wants to start seeing in 3D!

Getting Started

Download the starter project by clicking the Download Materials button at the top or bottom of the tutorial.

Before you begin, you need to run Xcode 11 or later. Running this tutorial on a device directly is highly recommended. To do so, you need an iPhone running iOS 13 or later.

Once that’s done, you can explore the materials for this tutorial. The bundled images include depth information to use with the tutorial.

If you prefer and you have a dual-camera iPhone, you can take your own images to use with this tutorial. To take pictures that include depth data, the iPhone needs to be running iOS 11 or later. Don’t forget to use Portrait mode in the Camera app.

Build and run. You’ll see this:

Screenshot of App after first running on an iPhone

Tapping on one image cycles to the next. If you add your own pictures, make sure they have the .jpg file extension.

In this tutorial, you’ll fill in the functionality of the Depth, Mask and Filtered segments that you can see right at the bottom of the screen. Feel free to tap on them. They don’t do much right now. They will soon!

Reading Depth Data

The most important class, in the iOS SDK, for depth data is AVDepthData.

Different image formats store depth data slightly differently. In images in the HEIC format, it’s stored as metadata, but JPGs store it as a second image within the same JPG.

You generally use AVDepthData to extract this auxiliary data from an image, so that’s the first step you’ll take in this tutorial. Open SampleImage.swift and add the following method at the bottom of SampleImage:

static func depthDataMap(forItemAt url: URL) -> CVPixelBuffer? {
  // 1
  guard let source = CGImageSourceCreateWithURL(url as CFURL, nil) else {
    return nil
  }

  // 2
  let cfAuxDataInfo = CGImageSourceCopyAuxiliaryDataInfoAtIndex(
    source,
    0,
    kCGImageAuxiliaryDataTypeDisparity
  )
  guard let auxDataInfo = cfAuxDataInfo as? [AnyHashable : Any] else {
    return nil
  }

  // 3
  let cfProperties = CGImageSourceCopyPropertiesAtIndex(source, 0, nil)
  guard 
    let properties = cfProperties as? [CFString: Any],
    let orientationValue = properties[kCGImagePropertyOrientation] as? UInt32,
    let orientation = CGImagePropertyOrientation(rawValue: orientationValue)
    else {
      return nil
  }

  // 4
  guard var depthData = try? AVDepthData(
    fromDictionaryRepresentation: auxDataInfo
  ) else {
    return nil
  }

  // 5
  if depthData.depthDataType != kCVPixelFormatType_DisparityFloat32 {
    depthData = depthData.converting(
      toDepthDataType: kCVPixelFormatType_DisparityFloat32
    )
  }

  // 7
  return depthData.applyingExifOrientation(orientation).depthDataMap
}

OK, that was quite a bit of code, but here’s what you did:

  1. First, you create a CGImageSource that represents the input file.
  2. From the image source at index 0, you copy the disparity data from its auxiliary data. You’ll learn more about what that means later, but you can think of it as depth data for now. The index is 0 because there’s only one image in the image source.
  3. The image’s orientation is stored as separate metadata. To correctly align the depth data, you extract this orientation using CGImageSourceCopyPropertiesAtIndex(_:_:_:). Now you can apply it later.
  4. You create an AVDepthData from the auxiliary data you read in.
  5. You ensure the depth data is the format you need — 32-bit floating point disparity information — and convert it if it isn’t.
  6. Finally, you apply the correct orientation and return this depth data map.

Now that you’ve set up the depth data, it’s time to put it to good use!

Implementing the Depth Data

Now before you can run this, you need to update depthData(forItemAt:). Replace its implementation with the following:

// 1
guard let depthDataMap = depthDataMap(forItemAt: url) else { return nil }

// 2
depthDataMap.normalize()

// 3
let ciImage = CIImage(cvPixelBuffer: depthDataMap)
return UIImage(ciImage: ciImage)

With this code:

  1. Using your new depthDataMap(forItemAt:), you read the depth data into a CVPixelBuffer.
  2. You then normalize the depth data using a provided extension to CVPixelBuffer. This makes sure all the pixels are between 0.0 and 1.0, where 0.0 are the farthest pixels and 1.0 are the nearest pixels.
  3. You then convert the depth data to a CIImage and then a UIImage and return it.
Note: If you’re interested in how normalize() works, look in CVPixelBufferExtension.swift. It loops through every value in the 2D array and keeps track of the minimum and maximum values seen. It then loops through all the values again and uses the min and max values to calculate a new value that is between 0.0 and 1.0.

Build and run and tap the Depth segment of the segmented control at the bottom.

Screenshot of app displaying a color image of bikes in a row Screenshot of app displaying a depth map of bikes in a row

Awesome! This is essentially a visual representation of the depth data. The whiter the pixel, the closer it is, the darker the pixel, the farther away it is. The normalization that you did ensured that the furthest pixel is solid black and the nearest pixel is solid white. Everything else is somewhere in that range of gray.

Great job!

How Does the iPhone Do This?

In a nutshell, the iPhone’s dual cameras are imitating stereoscopic vision.

Try this. Hold your index finger closely in front of your nose and pointing upward. Close your left eye. Without moving your finger or head, simultaneously open your left eye and close your right eye.

Now quickly switch back and forth closing one eye and opening the other. Pay attention to the relative location of your finger to objects in the background. See how your finger seems to make large jumps left and right compared to objects further away?

Finger looks like it's to the right of the background Finger looks like it's to the left of the background

The closer an object is to your eyes, the larger the change in its relative position compared to the background. Does this sound familiar? It’s a parallax effect!

The iPhone’s dual cameras are like its eyes, looking at two images taken at a slight offset from one another. It corresponds features in the two images and calculates how many pixels they have moved. This change in pixels is called disparity.

Image Disparity — the distance between the left- and right-eye views