Congratulations! If you’ve made it this far, you’ve developed a strong foundation for absorbing machine learning material. However, before we can move forward, we need to address the 10,000 pound snake in the room… Python. Until this point, you’ve made do with Xcode and Swift, however, if you’re going to get serious about Machine Learning, then it’s best you prepare yourself to learn some Python. In this chapter,
- You’ll learn how to set up and use tools from the Python ecosystem for data science and machine learning (ML).
- You’ll install Anaconda, a very popular distribution of Python (and R).
- You’ll use terminal commands to create ML environments which you’ll use throughout this book.
- Finally, you’ll use Jupyter Notebooks, which are very similar to Swift Playgrounds, to explore the Python language, data science libraries, and Turi Create, Apple’s ML-as-a-Service.
The starter folder for this chapter contains:
- A notebook folder: The sample Jupyter Notebook data files.
- .yaml files: Used to import pre-configured environments, if you want to skip the instructions for configuring the environments yourself.
Python is the dominant programming language used for data science and machine learning. As such, there’s a myriad of tools available for the Python community to support data science and machine learning development. These include:
Packages and environments
Python is already installed on macOS. However, using this installation may cause version conflicts because some people use Python 2.7 while others use Python 3.x, which are incompatible branches of the same language. To further complicate things, working on machine learning projects requires integrating the correct versions of numerous software libraries, also known as “packages”.
The data science community developed Conda to make life easier. Conda handles Python language versions, Python packages, and associated native libraries. It’s both an environment manager and a package manager. And, if you need a package that Conda doesn’t know about, you can use
pip within a
conda environment to grab the package.
In a browser, navigate to https://www.anaconda.com/download/#macos, and download the 64-bit Command Line installer with Python 3.7, as highlighted in the image below:
Using Anaconda Navigator
Anaconda comes with a desktop GUI that you can use to create environments and install packages in an environment. However, in this book, you’ll do everything from the command line. Given this fact, it’s worth going over some basic commands with Conda which you’ll do in the next section.
Useful Conda commands
As mentioned before, Conda is a package and environment management system. When working with Python projects, you’ll often find it useful to create new environments, installing only the packages you need before writing your code. In this section, we’ll explore many useful commands you’ll reuse many times when working with Python and Conda.
Create a new environment:
conda create -n <env name>
conda create -n <new env name> --clone <existing env name>
conda env create -f <.yaml file>
conda activate <env name>
conda install <pkg names>
conda install -n <env name> <pkg names>
pip install -r requirements.txt
jupyter notebook <directory path>
conda remove -n <env name> --all
conda env remove -n <env name>
Listing environments or packages
List the environments you’ve created; the one with the * is the currently active environment:
conda info --envs
conda env list
(activeenv) $ conda list (activeenv) $ conda list <package name>
conda list -n <env name> conda list -n <env name> <package name>
Setting up a base ML environment
In this section, you’ll set up some environments. If you prefer a quicker start, create an environment from myenv.yaml and skip down to the Jupyter Notebooks section. You can do this by importing mlenv.yaml into Anaconda Navigator or by running the following command from a Terminal window:
conda env create -f starter/myenv.yaml
Python libraries for data science
Begin by creating a custom base environment for ML, with NumPy, Pandas, Matplotlib, SciPy and scikit-learn. You’ll be using these data science libraries in this book, but they’re not automatically included in new Conda environments.
conda create -n mlenv python=3.7
conda activate mlenv
conda install numpy pandas matplotlib seaborn scipy scikit-learn scikit-image ipython jupyter
An important note about package versions
Technology moves fast, also in the world of Python. Chances are that by the time you read this book, newer versions are available for the packages that we’re using. It’s quite possible these newer versions may not be 100% compatible with older versions.
With Jupyter Notebooks, which are a lot like Swift Playgrounds, you can write and run code, and you can write and render markdown to explain the code.
From Terminal, first activate your environment and then start Jupyter:
$ conda activate mlenv $ jupyter notebook
/anaconda3/envs/mlenv/bin/jupyter_mac.command ; exit;
Pandas and Matplotlib
The notebook has a single empty cell. In that cell, type the following lines:
import numpy as np import pandas as pd import matplotlib.pyplot as plt
data = pd.read_json('corpus.json', orient='records') data.head()
authors = data.author freq = authors.value_counts() freq
plt.hist(freq, bins=100) plt.show()
Differences between Python and Swift
In this section, you’ll spend some time getting familiar with common Python syntax.
if a == b: print('a and b are equal') if a > c: print('and a is also greater than c')
if authors is None: print('authors is None') else: print('authors is not None')
authors is not None
def mysum(x, y): result = x + y return result print(mysum(1, 3))
mylist = [1, 2] mylist.append(3) if mylist: print('mylist is not empty') for value in mylist: print(value) print('List length: %d' % len(mylist))
mylist is not empty 1 2 3 List length: 3
for value in mylist: print(value) print('List length: %d' % len(mylist))
1 List length: 3 2 List length: 3 3 List length: 3
Transfer learning with Turi Create
Despite the difference in programming languages, deep down Turi Create shares a lot with Create ML, including transfer learning. With Turi Create v5, you can even do transfer learning with the same
VisionFeaturePrint_Scene model that Create ML uses.
Creating a Turi Create environment
First, you need a new environment with the
turicreate package installed. You’ll clone the mlenv environment to create turienv, then you’ll install
turicreate in the new environment. Conda doesn’t know about
turicreate, so you’ll have to
pip install it from within Terminal.
conda create -n turienv --clone mlenv
# # To activate this environment, use: # > conda activate turienv # # To deactivate an active environment, use: # > conda deactivate #
conda activate turienv
pip install -U turicreate==5.8
List pip-installed packages
In Terminal, use this command to list all of the packages in the active environment or a specific package:
conda list conda list coremltools
# packages in environment at /Users/amt1/anaconda3/envs/mlenv: # # Name Version Build Channel coremltools 3.0 <pip>
Turi Create notebook
Note: If you skipped the manual environment setup and imported turienv.yaml into Anaconda Navigator, use the Jupyter Launch button on the Anaconda Navigator Home Tab instead of the command line below, then navigate in the browser to starter/notebook.
jupyter notebook <drag the starter/notebook folder in Finder to here>
import turicreate as tc import matplotlib.pyplot as plt
train_data = tc.image_analysis.load_images("snacks/train", with_path=True)
# Grab the full path of the first training example path = train_data["path"] print(path) # Find the class label import os os.path.basename(os.path.split(path))
Getting the class labels
OK, now you know how to extract the class name for a single image, but there are over 4,800 images in the dataset. As a Swift programmer, your initial instinct may be to use a
for loop, but if you’re really Swift-y, you’ll be itching to use a
SFrame has a handy
apply() method that, like Swift’s
forEach, lets you apply a function to every row in the frame:
train_data["path"].apply(lambda path: ...do something with path...)
train_data["label"] = train_data["path"].apply(lambda path: os.path.basename(os.path.split(path)))
Let’s do some training
Once you have your data in an
SFrame, training a model with Turi Create takes only a single line of code (OK, it’s three lines, but only because we have to fit it on the page):
model = tc.image_classifier.create(train_data, target="label", model="VisionFeaturePrint_Scene", verbose=True, max_iterations=50)
model = tc.load_model("HealthySnacks.model")
After 15 iterations, validation accuracy is close to training accuracy at ~90%. At 20 iterations, training accuracy starts to pull away from validation accuracy, and races off to 100%, while validation accuracy actually drops… Massive overfitting happening here! If the validation accuracy gets worse while the training accuracy still keeps improving, you’ve got an overfitting problem.
Run these commands to load the test dataset and get the class labels:
test_data = tc.image_analysis.load_images("snacks/test", with_path=True) test_data["label"] = test_data["path"].apply(lambda path: os.path.basename(os.path.split(path))) len(test_data)
metrics = model.evaluate(test_data)
print("Accuracy: ", metrics["accuracy"]) print("Precision: ", metrics["precision"]) print("Recall: ", metrics["recall"]) print("Confusion Matrix:\n", metrics["confusion_matrix"])
Accuracy: 0.8697478991596639 Precision: 0.8753552272362406 Recall: 0.8695450680272108 Confusion Matrix: +--------------+-----------------+-------+ | target_label | predicted_label | count | +--------------+-----------------+-------+ | ice cream | candy | 1 | | apple | banana | 3 | | orange | pineapple | 2 | | apple | strawberry | 1 | | pineapple | banana | 1 | | strawberry | salad | 2 | | popcorn | waffle | 1 | | carrot | salad | 2 | | orange | watermelon | 1 | | popcorn | popcorn | 36 | +--------------+-----------------+-------+ [107 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
Exporting to Core ML
In the next cell, Shift-Enter this command:
Class : ImageClassifier Schema ------ Number of classes : 20 Number of feature columns : 1 Input image shape : (3, 299, 299) Training summary ---------------- Number of examples : 4590 Training loss : 1.2978 Training time (sec) : 174.5081
Shutting down Jupyter
To shut down Jupyter, click the Logout button in this browser window and also in the window showing your ML directory.
Deactivating the active environment
If you activated turienv at the terminal command line, enter this command to deactivate it:
Docker and Colab
There are two other high-level tools for supporting machine learning in Python: Docker and Google Colaboratory. These can be useful for developing machine learning projects, but we’re not covering them in detail in this book.
Docker is like a virtual machine but simpler. Docker is a container-based system that allows you to re-use and modularize re-usable environments, and is a fundamental building block to scaling services and applications on the Internet efficiently. Installing Docker gives you access to a large number of ML resources distributed in Docker images as Jupyter notebooks like hwchong/kerastraining4coreml or Python projects like the bamos/openface face recognition model. Our Beginning Machine Learning with Keras & Core ML (bit.ly/36cS6KU) tutorial builds and runs a
keras-mnist Docker image, and you can get comfortable using Docker with our Docker on macOS: Getting Started tutorial here: bit.ly/2os0KnY.
Google Research’s Colaboratory at colab.research.google.com is a Jupyter Notebook environment that runs in a browser. It comes with many of the machine learning libraries you’ll need, already installed. Its best feature is, you can set the runtime type of a notebook to GPU to use Google’s GPU for free. It even lets you use Google’s TPUs (tensor processing units).
from google.colab import drive drive.mount('/content/drive/')
!ls "/content/drive/My Drive/machine-learning/snacks"
- Get familiar with Python. Its widespread adoption with academics in the machine learning field means if you want to keep up to date with machine learning, you’ll have to get on board.
- Get familiar with Conda. It will make working with Python significantly more pleasant. It allows you to try Python libraries in a controlled environment without damaging any existing environment.
- Get familiar with Jupyter notebooks. Like Swift playgrounds, they provide a means to quickly test all things Python especially when used in combination with Conda.
Where to go from here?
You’re all set to continue learning about machine learning for image classification using Python tools. The next chapter shows you a few more Turi Create tricks. After that, you’ll be ready to learn how to create your own deep learning model in Keras.