Image Recognition With ML Kit

See how to use the new ML Kit library from Google to easily enable machine learning capabilities in your Android app and perform image recognition.

Version

  • Kotlin 1.2, Android 4.4, Android Studio 3

A few months back Google introduced ML Kit, a new API to help developers add machine learning (ML) capabilities into their apps. Thanks to ML Kit, adding ML to your app is super easy and no longer restricted to ML experts.

In this tutorial, you’ll learn how to use Google’s ML Kit in your Android apps by creating an app capable of detecting food in your photographs. By the end of this tutorial, you’ll have learned:

  • What Image Recognition is and how it is useful.
  • How to set up ML Kit with your Android app and Firebase.
  • How to run image recognition on-device and on-cloud.
  • How to use the results from running image recognition with ML Kit.

Note: This tutorial assumes you have basic knowledge of Kotlin and Android. If you’re new to Android, check out our catalog of Android tutorials. If you know Android, but are unfamiliar with Kotlin, take a look at Kotlin for Android: An Introduction.

If you’ve never used Firebase before, check out the Firebase Tutorial for Android.

Getting Started

Instagram is a site regularly used by food bloggers. People love taking food pictures to share with family and friends. But how do you know if the food is delicious or not?

The project you’re working on, Delicious Food, will allow you to take a picture of some food with your camera and identify if the food is as good as it looks.

Start by downloading the materials for this tutorial using the Download materials button at the top or bottom of this tutorial. With the Android Studio 3.1.4 or greater welcome screen showing, open the project by clicking Open an existing Android Studio project and select the build.gradle file in the root of the DeliciousFoodStarterProject project.

If you explore the project, you will find in the layout folder two layout files (activity_main.xml, activity_splash.xml) and in the java folder three Kotlin files: MainActivity.kt, SplashActivity.kt and Extensions.kt.

The interface is already built for you, so you will only focus on writing code for this tutorial inside the MainActivity.kt file.

Build and run the app on a device or emulator.
starter_image

Right now, it is an empty canvas with a button at the bottom. That is about to change! :]

Before diving into the project, first a little about image recognition.

Understanding Image Recognition

Image recognition, in the context of ML, is the ability of software to identify objects, places, people, writing and actions in images. Computers can use machine vision technologies, in combination with a camera and artificial intelligence software, to achieve image recognition. It is used to perform a large number of machine-based visual tasks, such as labeling the content of images with meta-tags.

Various types of labeling are possible, and include:

  • Image Labeling to classify common elements in pictures.
  • Text Recognition to process and recognize text from pictures.
  • Face Detection to help you know if a face is smiling, tilted, or frowning in pictures.
  • Barcode Scanning to read data encoded in standard barcode formats like QR Codes.
  • Landmark Recognition to identify popular places in images.

Current and Future Uses of Image Recognition

Image Labeling on Social Networks

If you have a Facebook or Instagram account, you might be familiar with face recognition.

Whenever a user uploads a photo, Facebook immediately suggests tagging some of your friends. Besides the tagging feature, image recognition translates content for visually impaired people with screen readers. It also helps to recognize inappropriate or offensive images.

There are privacy concerns around using people’s pictures to train ML and Artificial Intelligence (AI) technologies. Facebook states it only uses public pictures and not pictures from private accounts, most users are not even aware of that usage.

Security and privacy aside, it’s always interesting to know how ML and AIs work behind the scenes.

Organization of Pictures

Another popular use of image recognition is the automated organization of photo albums. Have you ever traveled to another country and ended up with hundreds of pictures stored on your phone?

Google Photos is a great example of such an app to store images. It helps you organize your pictures in albums by identifying common places, objects, friends or even pets.

Image recognition improves the user experience of organizing photos inside the app, enabling better discovery with the ability to accurately search through images. This is possible thanks to new discoveries in ML technologies, which identify patterns and groups of objects.

Image recognition is also used commercially to organize pictures in stock photography websites and provides photographers a platform to sell their content.

A problem with stock photo websites before ML is that many photographers are not tech savvy, or they have thousands of pictures to upload. Manual image classification is very time consuming and tedious.

Image recognition is thus critical for stock photography websites. It makes life easier for contributors by providing instant keyword suggestions and categories. It also helps users by making visual content available for search engines.

Self-Driving Cars

In the past couple of years, self-driving cars have evolved dramatically. Companies like Uber use computer vision technologies to create different versions of self-driving vehicles ranging from delivery trucks to cab drivers.

Computer vision and AI are the main technologies used to power self-driving cars. Image recognition helps to predict the speed and location of other objects in motion on the roads.

Augmented Reality

Augmented Reality has long been one of the most researched topics due to its uses in fields like gaming and user experience (UX).

With the help of image recognition, you can superimpose digital information on top of objects that you can see in the world – providing rich user experiences and interaction without precedents.

Pokémon Go, for example, uses augmented reality to put Pokémon in the landscape of places like the Eiffel Tower or the Empire State building.

Now that you have some background on the possible use cases for image recognition, it’s time to learn about on-device and on-cloud APIs in Firebase.

On-Device vs. On-Cloud APIs

On-device APIs can process data quickly without the need for an Internet connection. This is useful if you don’t want to consume the mobile data of your users and you need fast processing.

The main drawback is the confidence of results provided by ML. The confidence is a value showing how happy the ML algorithm is with the answer it provided. On-device APIs only have so much information to consult, so don’t be surprised if your device thinks that photo of a hot dog is a hamburger.

On-cloud APIs offer much more powerful processing capabilities thanks to Google Cloud Platform’s ML technology, but these APIs require an Internet connection to work. In the case of using the Google Cloud Platform, this requires a payment after the first 1,000 requests.

You can read a comparison of on-device and on-cloud APIs here, provided by Google:
Image Recognition on device vs in cloud

On-device APIs have 400+ labels for each category. They cover the most commonly found concepts in photos (like ‘food’). Cloud APIs have more than a 1,000+ labels in many categories, making it more likely that you get an accurate result.

Overall, the recommendation on which to use, on-device or on-cloud, is that you first carefully analyze your project needs. If you believe that you will need a high level of accuracy and money is not an issue, then go for the on-cloud APIs. Otherwise, stick with the on-device APIs; they are usually enough for most projects.

Note: Since this project is rather simple, all the code lives inside the MainActivity.kt file. However, for more complex projects, you will normally use design patterns like MVC or MVVM. Remember that activities should focus on interacting with the UI rather than your API or database.

Setting Up Firebase and ML Kit

ML Kit is part of Firebase, so you will need to create a Firebase app to use it. Don’t worry, it is quite simple.

To start:

  • Open the Firebase Console. You will need to sign in with your Google Account (or sign up for one) and create a new project.
  • Click on the Add project button.
    add project button
  • Next, add the name of your project and the country/region. Enter DeliciousFood as the project name and choose your own country as the region. Check the boxes below the region textfield to agree to the Firebase and Google Analytics terms and then Click Create project.
    Image Recognition full project image
  • You should see a confirmation screen telling you that your project is ready. Click Continue.confirmation screen

Now that your project is ready, you need to add it to your Android app.

  • Click on Add Firebase to your Android app.
    add firebase app
  • Next, you need to provide a package name. You can find your package name in Android Studio in the app/build.gradle file. Open the gradle file and your package name can be found under a variable named applicationId:
    applicationId "com.raywenderlich.deliciousfood"
  • Copy and paste the package name into the Android Package Name textfield in the Firebase Console, then click Register App.
    register app
  • You will need to download the generated google-services.json file. Download it and place it under your app/ folder.
    google services file
  • In your project level build.gradle file, add the following line in the dependencies block:
    classpath 'com.google.gms:google-services:4.0.1'
  • Next, add the Firebase Core and the ML Kit dependencies to your app level build.gradle file in the dependencies block:

    implementation 'com.google.firebase:firebase-core:16.0.1'
    implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
    implementation 'com.google.firebase:firebase-ml-vision-image-label-model:15.0.0'
    
  • At the bottom of your app/build.gradle file, apply the following plugin:

    apply plugin: 'com.google.gms.google-services'
  • Sync your Gradle files now to ensure everything is working.

  • Finally, open MainActivity.kt and add the following line inside the onCreate() method, replacing // TODO: Init Firebase:

    FirebaseApp.initializeApp(this)

    Be sure to import the Class if you need to. This ensures Firebase is initialized when your app is launched.

Build and run the app. While the app is running you should be to see logging in the console reporting that everything is setup. Head back to the Firebase Console and click next on the steps until the verify installation step is shown. Firebase will check to make sure everything is working and report back with a congratulations message once it detects your app.

success 3

Click Continue to console and you will see Firebase detecting the number of users running your app. That means you!

success 3

Taking a Delicious Picture

Configuring the Camera

Now you have Firebase all set up, you will proceed to the first coding part of this tutorial: taking a picture with Android.

Setting up the camera in an Android app can be a tricky process. You need to handle runtime permissions, storage location, file formats and much more.

Fortunately, there are many Android libraries that can assist you in handling all those complexities via an easier API. For this tutorial, you are going to use an Android library called ParaCamera since it is easy to use and configure.

To use it, open your app level build.gradle file and add the following line to your dependencies section:

implementation 'com.mindorks:paracamera:0.2.2'

Sync your files to verify that everything is working properly.

Next, open the MainActivity.kt file and add the following just above onCreate() method:

  private lateinit var camera: Camera
  private val PERMISSION_REQUEST_CODE = 1

Note: Make sure you import the Camera class from the com.mindorks.paracamera.Camera package, not the usual Android Camera class.

You’ll set up the camera property shortly, and don’t worry about the request code until later.

Add the following code to the end of onCreate(), replacing // TODO: Configure Camera to initialize and configure the camera:

camera = Camera.Builder()
        .resetToCorrectOrientation(true)//1
        .setTakePhotoRequestCode(Camera.REQUEST_TAKE_PHOTO)//2
        .setDirectory("pics")//3
        .setName("delicious_${System.currentTimeMillis()}")//4
        .setImageFormat(Camera.IMAGE_JPEG)//5
        .setCompression(75)//6
        .build(this)

Taking each commented section in turn:

  1. Rotates the camera bitmap to the correct orientation from meta data.
  2. Sets the request code for your onActivityResult() method.
  3. Sets the directory in which your pictures will be saved.
  4. Sets the name of each picture taken according to the system time.
  5. Sets the image format to JPEG.
  6. Sets a compression rate of 75% to use less system resources.

Now, implement the takePicture() method, replacing the stubbed // TODO: provide an implementation message.

fun takePicture(view: View) {
  if (!hasPermission(android.Manifest.permission.WRITE_EXTERNAL_STORAGE) ||
        !hasPermission(android.Manifest.permission.CAMERA)) {
      // If do not have permissions then request it
      requestPermissions()
    } else {
      // else all permissions granted, go ahead and take a picture using camera
      try {
        camera.takePicture()
      } catch (e: Exception) {
        // Show a toast for exception
        Toast.makeText(this.applicationContext, getString(R.string.error_taking_picture),
            Toast.LENGTH_SHORT).show()
      }
    }
}

This is the method you are going to call when the user presses the TAKE PICTURE button.

It checks if the app has the CAMERA and WRITE_EXTERNAL_STORAGE permissions granted. If it does, then it calls camera.takePicture() method of the Android library to take a picture. Otherwise, it requests those permissions with the hasPermission() method, which helps you verify if the user has granted the permission. This method is already implemented for you in the starter project.

Requesting Permissions

Your app will need the CAMERA and WRITE_EXTERNAL_STORAGE permissions to take pictures and save files.

Back in the day, just adding those permissions in the Android Manifest was considered enough, but now things are handled a bit differently.

Permissions are classified into two categories: normal and dangerous. Dangerous permissions are the ones that request access to private data such as the Calendar, Contacts, or Internal Store and require runtime permission from the user.

You can consult this table for a full list of the dangerous permissions.

You need to request the appropriate permissions to save files and take pictures.

Implement the requestPermissions() method, replacing the stubbed // TODO: provide an implementation message.

private fun requestPermissions(){
  if (ActivityCompat.shouldShowRequestPermissionRationale(this,
            android.Manifest.permission.WRITE_EXTERNAL_STORAGE)) {

      mainLayout.snack(getString(R.string.permission_message), Snackbar.LENGTH_INDEFINITE) {
        action(getString(R.string.OK)) {
          ActivityCompat.requestPermissions(this@MainActivity,
              arrayOf(android.Manifest.permission.WRITE_EXTERNAL_STORAGE,
                  android.Manifest.permission.CAMERA), PERMISSION_REQUEST_CODE)
        }
      }
    } else {
      ActivityCompat.requestPermissions(this,
          arrayOf(android.Manifest.permission.WRITE_EXTERNAL_STORAGE,
              android.Manifest.permission.CAMERA), PERMISSION_REQUEST_CODE)
      return
    }
}

requestPermissions() asks for the CAMERA and WRITE_EXTERNAL_STORAGE permissions. If the user rejected those permissions previously, it will display a nice snackbar with a message.

Now, implement onRequestPermissionsResult() method, replacing // TODO: provide an implementation:

override fun onRequestPermissionsResult(requestCode: Int,
    permissions: Array<String>, grantResults: IntArray) {
  when (requestCode) {
    PERMISSION_REQUEST_CODE -> {
      // If request is cancelled, the result arrays are empty.
      if (grantResults.isNotEmpty()
          && grantResults[0] == PackageManager.PERMISSION_GRANTED
          && grantResults[1] == PackageManager.PERMISSION_GRANTED) {
        try {
          camera.takePicture()
        } catch (e: Exception) {
          Toast.makeText(this.applicationContext, getString(R.string.error_taking_picture),
              Toast.LENGTH_SHORT).show()
        }
      }
      return
    }
  }
}

The code above calls the takePicture() method when the user has granted all the permission to use the camera.

Note: A detailed explanation of permission handling is beyond the scope of this tutorial, but you can check out this document for more information on requesting permissions at runtime.

Finishing the Camera Code

The last step to be able to take pictures is to implement the onActivityResult() method to process the data from the camera, convert it to a Bitmap and display it in the ImageView. To do that, under onActivityResult() replace // TODO: provide an implementation with the implementation as in following code block:

override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
  super.onActivityResult(requestCode, resultCode, data)

  if (resultCode == Activity.RESULT_OK) {
    if (requestCode == Camera.REQUEST_TAKE_PHOTO) {
      val bitmap = camera.cameraBitmap
      if (bitmap != null) {
        imageView.setImageBitmap(bitmap)
        detectDeliciousFoodOnDevice(bitmap)
      } else {
        Toast.makeText(this.applicationContext, getString(R.string.picture_not_taken),
            Toast.LENGTH_SHORT).show()
      }
    }
  }
}

Build and run the app and click the TAKE PICTURE button to see the camera in action:

Android Camera

Hurray! The camera is working properly. Now you can move on to detecting delicious food pictures. :]

Note: If you don’t want the pictures to be saved, override onDestroy() and call the camera.deleteImage() method:

override fun onDestroy() {
  super.onDestroy()
  camera.deleteImage()
}

Detecting Food On-Device

It’s time to see what happens when the user takes a picture and returns to the app’s main activity:
Image Recognition main screenshoot

Although it is a nice picture, you are still not receiving any information about its contents. That’s because you still haven’t implemented the ML Kit methods to detect food. Don’t worry, that’s about to change.

Image recognition with ML Kit uses two main classes: FirebaseVisionLabel and FirebaseVisionLabelDetector.

  1. FirebaseVisionLabel is an object that contains a String for an associated image and the confidence for the result.
  2. FirebaseVisionLabelDetector is an object that receives a Bitmap and returns an array of FirebaseVisionLabels.

In your case, you will supply the FirebaseVisionLabelDetector with a picture, and it will return an array of FirebaseVisionLabels with the objects found.

Implement the detectDeliciousFoodOnDevice() method, replacing // TODO: provide an implementation:

private fun detectDeliciousFoodOnDevice(bitmap: Bitmap) {
  //1
  progressBar.visibility = View.VISIBLE
  val image = FirebaseVisionImage.fromBitmap(bitmap)
  val options = FirebaseVisionLabelDetectorOptions.Builder()
      .setConfidenceThreshold(0.8f)
      .build()
  val detector = FirebaseVision.getInstance().getVisionLabelDetector(options)

  //2
  detector.detectInImage(image)
      //3
      .addOnSuccessListener {

        progressBar.visibility = View.INVISIBLE

        if (hasDeliciousFood(it.map { it.label.toString() })) {
          displayResultMessage(true)
        } else {
          displayResultMessage(false)
        }

      }//4
      .addOnFailureListener {
        progressBar.visibility = View.INVISIBLE
        Toast.makeText(this.applicationContext, getString(R.string.error),
            Toast.LENGTH_SHORT).show()

      }
}

Step-by-step:

  1. You display a progress bar and creates the necessary FirebaseVisionLabelDetector and FirebaseVisionImage objects. The threshold represents the minimum level of confidence that you will accept for the results.
  2. You call the detectInImage() method of the FirebaseVisionLabelDetector object and add an onSuccessListener and onFailureListener.
  3. If successful, you call the hasDeliciousFood() method with the array of FirebaseVisionLabel objects mapped to an array of Strings to check if there is food on the picture, and then display a message accordingly.
  4. On failure, you display a toast with the error message.

The above code block you added uses some utility methods already added in the starter project. The logic behind them is simple:

  1. hasDelicious() receives an array of Strings and returns true if it finds the word “Food.”
  2. displayResultMessage() receives a Boolean and displays a nice DELICIOUS FOOD or NOT DELICIOUS FOOD message accordingly.

Build and run your app, and take a picture of some food to see your app in action:
Image Recognition food picture

Huzzah, your app is a toast detecting marvel!

Detecting Food via On-Cloud Processing

When you run image labeling in the cloud, you receive more detailed and accurate predictions. You could receive as many as 10,000 labels for each category. You also save processing power on your device since the cloud will do all the work for you, but you depend on a stable Internet connection for cloud processing to work.

You can set up on-cloud processing for your Android app as explained below.

Enabling On-Cloud Image Recognition via ML Kit

There are a few extra steps that you have to follow if you want to use the on-cloud APIs. It is easy, and your project will be ready if you decide that the on-device APIs are not enough.

Open the Firebase Console and, on the bottom left of the site, in the navigation bar, select Upgrade to see the current plan for your project:

spark plan

You will need to be on the Blaze Plan if you want to use the on-cloud APIs. It is a paid plan, but the first 1,000 requests are free and that should be enough to complete this tutorial.

Note: If you don’t want to submit your payment information, you can stick with the Spark Plan and skip the Detecting Food via On-Cloud Processing section of this tutorial.

firebase plans

Your last step is to enable the Cloud Vision APIs in the Firebase Console.

Click Develop in the navigation bar on the left to show a dropdown menu, then click ML Kit on the left menu.

mlkit

Choose Get started on the main screen.

get started

Next, click Cloud API usage on the next screen.

cloud api usage

Choose Enable on the Google Cloud Platform page.

enable

The code to use the on-cloud APIs is basically the same as the on-device APIs. The main difference is that your image will be processed in the cloud, receiving more accurate predictions.

Now, implement the detectDeliciousFoodOnCloud() method, replacing // TODO: provide an implementation:

private fun detectDeliciousFoodOnCloud(bitmap: Bitmap) {
  progressBar.visibility = View.VISIBLE
  val image = FirebaseVisionImage.fromBitmap(bitmap)
  val options = FirebaseVisionCloudDetectorOptions.Builder()
      .setMaxResults(10)
      .build()
  val detector = FirebaseVision.getInstance()
      //1
      .getVisionCloudLabelDetector(options)

  detector.detectInImage(image)
      .addOnSuccessListener {

        progressBar.visibility = View.INVISIBLE

        if (hasDeliciousFood(it.map { it.label.toString() })) {
          displayResultMessage(true)
        } else {
          displayResultMessage(false)
        }

      }
      .addOnFailureListener {
        progressBar.visibility = View.INVISIBLE
        Toast.makeText(this.applicationContext, getString(R.string.error),
            Toast.LENGTH_SHORT).show()

      }
}

As you can see, the on-cloud code is pretty much the same as the on-device code. The only difference is that you are using FirebaseVisionCloudDetector instead of FirebaseVisionLabelDetector in the code block above at the point marked with comment //1.

Also, since the number of results could be very high, you should set a limit using the setMaxResults() method. In your case, it’s been set to 10.

Finally, change the call detectDeliciousFoodOnDevice(bitmap) to detectDeliciousFoodOnCloud(bitmap) in your onActivityResult() method.

override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
  if (resultCode == Activity.RESULT_OK) {
    ...
    detectDeliciousFoodOnCloud(bitmap)
    ...
  }
}

Build and run your app.

food picture

Sweet! Your image recognition is now occurring in the cloud.

Where to Go From Here?

That’s it! You have built your very own Delicious Food detector with image recognition using ML Kit (on-device and on-cloud).

Remember that ML Kit is still in beta, and Google might add some extra APIs in the future with more ML capabilities, so stay tuned. :]

Feel free to download the final project and play around with it; the Download Materials button can be found at the top or bottom of this tutorial.

Note: Remember that you will need to set up your own Firebase Project and add your google-services.json file for the final project to work. This is all covered under the Setting Up Firebase and ML Kit section above.

Here are some other resources related to ML Kit that you may take a look at, including a similar tutorial that uses ML Kit for text recognition:

I hope you enjoyed this tutorial, and if you have any questions or comments, please join the forum discussion below!

Contributors

Comments