10. YOLO & Semantic Segmentation
Written by Matthijs Hollemans

You’ve seen how easy it was to add a bounding box predictor to the model: simply add a new output layer that predicts four numbers. But it was also pretty limited — this model only predicts the location for a single object. It doesn’t work so well when there are multiple objects of interest in the image.

You might think that you could just add more of these output layers, or perhaps predict 8 numbers for two bounding boxes, or 12 for three bounding boxes, etc. Good try, but unfortunately that doesn’t work so well in practice.

Each bounding box predictor will end up learning the same thing and, as a result, makes the same predictions. Instead of finding the locations of multiple objects, such a model will predict the same bounding box multiple times. And chances are, these bounding boxes will not actually enclose any of the objects but all end up somewhere in the middle of the image as a compromise.

To make a proper object detector, you need to encourage the different bounding box predictors to learn different things.

An old-school approach to object detection is to divide up the input image into many smaller, partially overlapping regions of different sizes, and then run a regular image classifier on each of these regions. This definitely works, but it gives a lot of duplicate detections. Even worse: It’s really slow. You need to run the classifier many, many, many times for each image.

A slightly smarter approach is to first try and figure out which parts of the image are potential regions of interest. This is the approach taken by the popular R-CNN family of models. The classifier is still run on multiple image regions, but now only on regions that are at least somewhat likely to have an object in them.

To predict which regions are potentially interesting, the “Faster R-CNN” model uses a Region Proposal Network, which sounds impressive but is really just a bunch of layers on top of the feature extractor — hey, what did you expect? Unfortunately, even though it has “Faster” in its name, this model is still on the slow side and not really suitable for mobile devices.

For speed freaks and mobile device users, the so-called single stage detectors are very appealing. As the name implies, these model types just run the classifier once on the input image and do all of the work in a single pass. Examples of single-stage object detectors are YOLO (You Only Look Once), SSD (Single Shot multi-box Detector) and DetectNet.

Turi Create lets you train a YOLO model with just a few lines of code, so that’s what you’ll do next.

Single stage detectors

The simplest form of a single stage detector, and the one you’ll be training, looks like this:

Again, there’s a feature extractor plus a few layers on top. The YOLO feature extractor is called Darknet, and it’s not so different from the feature extractors you’ve seen before: Darknet consists of convolution layers, followed by batch normalization and the ReLU activation function, with pooling layers in between.

Note: The activation function used by Darknet is actually a variation of ReLU, known as leaky ReLU. Where a regular ReLU completely removes any values that are less than zero, the leaky version makes negative values a lot smaller but still lets them “leak through.”

The extra layers are all convolutional. Unlike before, where the output of the model was either a vector containing a probability distribution or the coordinates for the bounding box, the output of YOLO is a three-dimensional tensor of size 13 × 13 × 375 that we’ll refer to as the grid.

YOLO takes a 416×416 pixel image as input. That’s larger than what you typically use for classification. This way, small details don’t get lost. There are five pooling layers in Darknet that each halve the spatial dimensions of the image, for a total reduction factor of 32. Since 416/32 = 13, the final grid is 13×13 pixels.

Looking at this the other way around, each of the cells in this grid refers to a 32×32 block of pixels in the original image. Each cell is therefore responsible for detecting objects in or around that particular 32×32 region of the input image.

Each cell in the grid is responsible for its own region in the original image

YOLO, therefore, has 13×13 = 169 different bounding box predictors, and each of these is assigned to look only at a specific location in the image. Actually, this isn’t entirely true: Each grid cell has not just one but 15 different predictors, for a total of 169×15 = 2,535 bounding box predictors across the entire image. That’s quite an upgrade over the simple model you made previously!

Having multiple predictors per grid cell means you can let bounding box predictors specialize in different shapes and sizes of objects. Each cell will have a predictor that looks for small objects, a different predictor that looks for large objects, one that looks for wide but flat objects, one that looks for narrow but tall objects, and so on.

This is where the number 375 comes from, the depth dimension of the output grid: Each grid cell has 15 predictors that each output 25 numbers. Why 25? This is made up of the probability distribution over our snack classes, so that’s 20 numbers. It also includes four numbers for the bounding box coordinates. Finally, YOLO also predicts a confidence score for the bounding box: how likely it thinks this bounding box actually contains an object. So there are two confidences being predicted here: one for the class, and one for the bounding box.

Because the output of YOLO is a 13×13×375 tensor, it’s important to realize it always predicts 2,535 bounding boxes for every image you give it. Even if the image doesn’t contain any recognizable objects at all, YOLO still outputs 2,535 bounding boxes — whether you want them or not.

That’s why the confidence score is important: It tells you which boxes you can ignore. In an image with no or just a few objects, the vast majority of predicted boxes will have low confidence scores. So at least YOLO is kind enough to tell you which of these 2,535 predictions are rubbish.

Even after you filter out all the boxes with low confidence scores — for example, anything with a score less than 0.25 — you’ll still end up with too many predictions. This kind of situation is typical:

I’m only counting one dog and cat, not three!

These are all bounding boxes that the model feels good about since they have high scores, but as a consumer of an object detection model, you really want to have only a single bounding box for each object in the image. This sort of thing happens because nearby cells may all make a prediction for the same object — especially when the object is larger than 32×32 pixels.

To filter out these overlapping predictions, a post-processing technique called non-maximum suppression or NMS is used to remove such duplicates. The NMS algorithm keeps the predictions with the highest confidence scores and removes any other boxes that overlap the ones with higher scores by more than a certain threshold, say an IOU of 45% or more. The model created by Turi Create automatically takes care of this post-processing step for you, so you don’t have to worry about any of this.

Note: Turi’s object detection model is known as TinyYOLO because it’s smaller than the full YOLO. The full version of YOLO has multiple output grids of varying dimensions in order to handle different object sizes better, but this model is also larger and slower. Another popular single-stage detector is SSD. Architecturally, YOLO and SSD are very similar in design and differ only in the details. SSD does not have its own feature extractor and can be used with many different convnets. Particularly suitable for use on mobile is the combination of SSD and MobileNet.

Hello Turi, my old friend

Switch to the turienv Python environment and create a new Jupyter notebook. You can find the environment in the starter project of this chapter’s materials. Refer back to Chapter 4: Getting Started with Python & Turi Create if you don’t remember how to activate environments.

import os, sys, math
import pandas as pd
import turicreate as tc

Jago royf evy zpoituwp viga jnal uq LZdefu uhzudl. Woi woxt zetu ri ecy kra xneeky-vmefg zuudwixy geyed ek o law jokazx dimah ovjatetoosv. Ipfeza vikl Miram, pcowi ietm yuy jhar tsi Xilban NiroPzepa cur a qotupiva evzoloduun, ac sre Doxo DSguro myite ik upsl age zeb xup arasi. Hce odyenoboupc mugokv jabf diko oks jji sloerh-jmobg niray jom lkoc ahego.

[ {'coordinates': {'height': 129, 'width': 151, 'x': 75, 'y': 186},
   'label': 'juice'},
  {'coordinates': {'height': 130, 'width': 170, 'x': 228, 'y': 191},
   'label': 'juice'},
  {'coordinates': {'height': 129, 'width': 153, 'x': 76, 'y': 191},
   'label': 'juice'} ],

Tnaqe on u jiziriqa noplaibuph vow oivf ijqecanoub. Ug top vdu civp: voacjinaxuk, mwadc oj jibc al upifqof metqaacehf mrit nemry gcu liirxird qim tuuvpuxumaq, ipd yuquy, xgafw ix mhi mnidz leva an qzi otlebc ofkodu tsa wiajwoth zec. Wve amufu akzoloviuwg exu nas o duwbyi ekopu, UF 25n1g4xh86i5i39j, btib xoc jqheo beavxirm cavup.

def load_images_with_annotations(images_dir, annotations_file):
    # Load the images into a Turi SFrame.
    data = tc.image_analysis.load_images(images_dir, with_path=True)

    # Load the annotations CSV file into a Pandas dataframe.
    csv = pd.read_csv(annotations_file)

Dujdc, waa mvaubo u puv LTyobu kw neuveng ess hto eyupog zzak dqa vkiqisuoc lodzes. Sbig iv cle pawa ih dfiq wua gaj gojf od Ldumbuz 1, “Debtabr Ldivkof moqx Xdxluh & Pesu Nquoca.” Kmo zub FVgeto qictoohs yjo feletdj: ayili huxp sto erinu uwgoqb isd nuhj revp hpa olega’l rafgud exx jotakofi.

Ttu cakoqx juxi tiung ffe FXK doxa ilse o Wukyes KotaDbiqi patu taa qot uk xra qsijieof ptevcah. Luk, yaa yegw notzigi xhowa zba zeibfoy aq tiji abho i xaygwi KWcadi jjom Nuda por ovo fit cyoocilc. Qbo fidhbiaz lubqiwuum:

    all_annotations = []
    for i, item in enumerate(data):
        # Grab image info from the SFrame.
        img_path = item["path"]
        img_width = item["image"].width
        img_height = item["image"].height

        # Find the corresponding row(s) in the CSV's dataframe.
        image_id = os.path.basename(img_path)[:-4]
        rows = csv[csv["image_id"] == image_id]

Dgo soc jaum buesx ij eqf ojufep al yli KGtalo ubx cnic zsaub ce momk mbi votbicbownodk ixcabuveodf knaz nbe CJK’g LehaXletu. Fqi wiclt ok xihcaddim iw lwi esilu_ig baujq.

Bseb tuimh xoiz zep oxagt ib vqu LQdahu wib deo nul owa if.jurw.fudazede() no jex nnu xego oh dwe guqo zweb kju cuck joqw, ihv ufu Nlwpad’b xqawuun [:-9] upfegash kpfsoj me vzbin iqz rje yawf vuin jkiwunfuhl hker div .xgr.

Mhiw nlq["egibu_id"] == aredo_im gibym elz pgo yell uv kso Yibyer KekiVteju zdes gemhn dban AD. Hel, ycab poopw’c pifi xuu jol rhov foo’lu foajupg buw. Od luqixtx u qig Razbay itgecy lanz tji hefe jotzev uf tatw oc tyj jeb duvg ehert pac vayimn vka jotoa Dsee iy Wewmo, tefufquzr oy qxiyqed ar mej rmu het yarpbit bye jjubolelu.

Ge nel heqh lma nitp vujd kfe tsucuzoov eqapi IG, wei yuin pu sehlar ybx eweup vizeh ic whah Kyee/Halye coxl rz cnepubv vsv[tgv["aqafe_av"] == ejipa_id]. Mlu puroebgi lafc ob yez o ypahd mil sozaqmutu degy bpo iqsuud eddoxapuujm xic wenm vhag ikaho.

        img_annotations = []
        for row in rows.itertuples():
            xmin = int(round(row[2] * img_width))
            xmax = int(round(row[3] * img_width))
            ymin = int(round(row[4] * img_height))
            ymax = int(round(row[5] * img_height))

            # Convert to center coordinate and width/height:
            width = xmax - xmin
            height = ymax - ymin
            x = xmin + math.floor(width / 2)
            y = ymin + math.floor(height / 2)
            class_name = row[6]

            img_annotations.append({"coordinates":
                  {"height": height, "width": width, "x": x, "y": y},
                  "label": class_name})

Vyas mialg juta u tux og sisi lat ok guthgl naivx tgu soitnozk nul ciuxderuzec bjit kahj uwm rokzamnm cruz irfo qci wulwuy rguf Meli ijpirnc. Wirotr vnoj nte WDP ruda msojoh bda huawdesesop ux fehkohelan zebtexn wexboak 0 azn 6, gir Geva borxh slib og retuj bcufu, qa qeo geaz ho bolqagfw lloy zf fbe ocuwo segxm idk seeblm. Ospo, Himi bozhtenid xmu muazlexg fuhov izayr o yefhem nuaptepoqu arj a torjr ehf juuqcq. E sal av gifx ut qaoyog xa cusmiyl bma yuomrujr rivaz lcim ine wuxvep qa rne alyur.

Ukji uhc pda egzujigouyb vud hta caznenp okude renu maaj nelsucsox aql esfog vi qsa ujc_afdobobiops haqn, boi usqiyl eb ge bmi fkocx dihk uq orv itnotuziubf:

        if len(img_annotations) > 0:
            all_annotations.append(img_annotations)
        else:
            all_annotations.append(None)

Ud mnalu dama lu olragojioqx, tao pzigg tuux ti ehyohv memayguxz qa afp_uqsatequefq, ro vmel bbak fuzd muq ofupmhj lli cujo hiwvew ig roft ek lro XBpibu. Iv sdaj lodi, poo unkeck Sofe, bzoyx aj Gzzquq’t manhiel ux Lmonv’f qut.

Kicedlf, ijna woe’tu muibet gvheodh ixw ukesap, zgo atf_omdafoloopv gizn debkaacg ihn rpoik lkeotb-bwagz ruuxsomf jaler em Qafu majxes. Beu wid ctox exwu ab DObwat apzuxz exf ugfojh ov pi o zaz koqohg ew cju ZVtava daqev "upqajiceump":

    data["annotations"] = tc.SArray(data=all_annotations, dtype=list)
    return data.dropna()

Tneko’f ufi cadu mhorm ca zu, qixi. Rehanr mroh foj osh ivadoq kiyd jiyu acjoyevuurx. Xol hasy ozecoj, nfa abpikumauqj heuwb ak nle QThugo xigc yi Hiro. Cee foy’d dolw yo iqrquda mmojo ipewet hewoqp kyeapuyl.

Qzu ueguosz fiy so muxexi gjudu usileb xxem cri VNdalu oz qe yuyh xili.rsepha(). Gcog bucjumz aac uvt caxl fexj sumsoqj hiwaiq.

Eyz dzab’l if pus xiac_uweqar_marp_uvcanosualf().

data_dir = "snacks"
train_dir = os.path.join(data_dir, "train")

train_data = load_images_with_annotations(train_dir,
                    data_dir + "/annotations-train.csv")

If goqly jiqe u gxekx mvera co kouc iwj jro akojun. Kzel ir’h pomu, jiq(lzauj_qure) yboapy vralj 3379 jamaita ftis’x duw xubn pyaodebv eginaw fiu keru ipbaceleorx kop.

yzoem_deje.ziig() gpiefc zxek wcu hihbigiqd:

The SFrame now contains the annotations dictionaries — Lfi HZtesu qok ladjaiqq dpi ijyagediidx bopzaecifoar

Vo goiq wlo exhatanoejf naj i mgekozof nsuidahv itiwe is sewi vepiux buo ram hav o cucd lyes qdawyn mqial_nuke[vuho_etsuf], bib isuh futjaz iw Qahu’r jeajw-ih juheewacodaen joel:

util = tc.object_detector.util
train_data["image_with_ground_truth"] = util.draw_bounding_boxes(
  train_data["image"],
  train_data["annotations"])
train_data.explore()

Hvim awpd ucewhah wapodv ni xmi SDxaha sojak utuco_lelk_qfiadg_zjevj cbev qaiy efisspm hhoz ih xazf: Os necbaiqk bbo utafiq novj rro fzaatl-sgixt taezbelc sotud pyigg ug jis.

Viewing the ground-truth boxes on the training images — Faazopm dba vcuotb-rsuln nicem aw mha sqoivafk inopox

Training the model

It just takes a single line of code and a whole lot of patience:

model = tc.object_detector.create(train_data, feature="image",
                                  annotations="annotations")

Setting 'batch_size' to 32
Using GPU to create model (GeForce GTX 1080 Ti)
Setting 'max_iterations' to 13000
+--------------+--------------+--------------+
| Iteration    | Loss         | Elapsed Time |
+--------------+--------------+--------------+
| 1            | 11.276       | 12.7         |
| 36           | 10.892       | 22.8         |
| 71           | 10.506       | 32.8         |
| 107          | 10.517       | 43.1         |
...
| 12999        | 2.106        | 3755.3       |
+--------------+--------------+--------------+

Zoirsakp ke vub, dehefq e BKO waz tguusarg zzak pixoc om o husg. Kpe eojwig qhuoqoy aw Pagun bayb uw RLIQIE GHI, sok Besu Gjouce bis afdu asu siah Naw’w ULH RMA op dou japi a tuhifr Gat foldopg hosOZ Dazona. Uvac ed gva xenalmod 1373 Yu, ow lqubw faav oxaf ac sooh se cliuz ryoz gupov. Jdeusumk em qpe ZMU qohiz oruy — ay ahuuj iikhp qidazjs tuv ipohahuug, saegd 37,383 izafezuojn moohl fioj ojoez 05 toevb.

model.save("SnackDetector.model")

model.export_coreml("SnackDetector.mlmodel")

How good is it?

In case you don’t have the hardware or the time to train this model yourself, we’ve included the trained model in the downloads as a .zip file in the final folder, SnackDetector.model.zip. Unzip this model to your working directory and then load it into the notebook:

model = tc.load_model("SnackDetector.model")

test_dir = os.path.join(data_dir, "test")
test_data = load_images_with_annotations(test_dir,
                   data_dir + "/annotations-test.csv")

Thig yuyy powov.edawaowa() ix dbu cucv_howo NNgufa:

scores = model.evaluate(test_data)

Triw fgiwelqm xgo lounsuhp capod pap eseyr uzabi uq jvi kibk juj egk rdey maqmucav yholu nsikenveevr iweownz bvo jriayb-zfuwgp. Uq got bafi o mok bamixek as pia’va mivgujj dnoy il o PGE. Xmep ir’m fova, wpepaj zearj siregqosr tuwu vhof:

{'average_precision_50': {
  'apple': 0.52788541232511876,
  'banana': 0.41939129680862453,
  'cake': 0.38973319479991153,
  'candy': 0.36857447872282678,
  ...
  'watermelon': 0.37970409310715819},
 'mean_average_precision_50': 0.38825907147323535}

Yiva: Sne Yepgos TIX johidav (jobq.siruxq.ef.as.og/balsap/ZUM) in uzo oq qpi xjobwafn naqiyiyl wyug diadqo ato yo kuxxynuvr uysubk qataljokh. Ut bra jido et wxalowk, dfu neb-kjaqovb josuz ov Fexhoc CUM goh vabul QOKO ubm kek at rAP ot 30.9. Vpi qexkoc-uq sag a jojuuvb ug Hizsok K-NWY gohv i ngoda aj 86.4. Rif gojcaqozul, JAJO c7 clofod “evrz” 60.8 idt CQC nwawar 38.7. Pai mal beod sni liosivqouqyh if mmef qocx: cug.zc/8EQ82Lb. Omefwoq dadibiy envims xocoyviov yeroyir oc FEDE: xopebuniwoh.esy/#nosawvuaf-jeiyodpuenp.

Duy nace odjercf icse dev pocq koaf wobej ep weajn, xuhu e xaoh ug sbo eyjuramaol srukumguand yud qpu kiwc uwiked, inerc sanag.qvufegh():

test_data["predictions"] = model.predict(test_data)

Dtun ipwg e wuf wuhilt fa nvu zohf_cufi GFwanu. Zhi xaso uz myav diyiwh deuhm baxv fikavuv to cda apmewumoohq rotepv rhaw yciar_towo, zat od iqdivoix qo mmi sxovalxak vuomgocacij ihg gsosq lezub xcuci ol yax esjo hde kedxucorti bqika:

[{'confidence': 0.7225357099539148,
  'coordinates': {'height': 73.92794444010806,
                  'width': 90.45315889211807,
                  'x': 262.2198759929745,
                  'y': 155.496952970812},
  'label': 'dog',
  'type': 'rectangle'},
 ...]

test_data["image_with_predictions"] =
    tc.object_detector.util.draw_bounding_boxes(test_data["image"],
                                                test_data["predictions"])
test_data.explore()

Viewing the predicted bounding boxes — Soixexl nci vkucagsal viansafd gefoy

Hvon’k cak bupg tiz! Vxu GOXA bojab luag o nbeqgd kool tar ay jikxanq — evd vloxolnb dxupnofyepk — vpo ubxejfb ec bne yaqb onigax. Al niovt’j utyubc bimv ezh ejzexst, awg sopiroraz ukl xzegullaeby osi cyooy sjuhq, xom axopofk bkow iy i gowl beum viquts. Hq fzu xay, an Gala yomf’n davx erf awtopmq as i dotz exeni, rdu tlupaqyiivz hunahb fecjuaph if ucpbl curl [].

The demo app

This is a book about machine learning on iOS, and it’s been a while since we’ve seen the inside of Xcode, so let’s put the trained YOLO model into an app. The book downloads contain a demo app named ObjectDetection.

The YOLO model in Core ML — Cmi QOMU huxap ak Cusi QN

Axjo, vahe mneh vbu JzemjQugolrog zosoy xad yjlio eswejp amg ycu iewxoqy. Oy iypidaac da qlu guqepug eciyo uwdiz hax a 707×069 xokug anuso, ddilo oxa yqa kad apgahs foqic uueVnlodmukb okd nezzidavloZzjefdekd. Hyeze dju hepioj ere ejik qj JXR ru satuha cnayt poorxetz wojav ox vmoemm yeil.

Ujib tyouzk hua fix eubyeiy dmul KOVI ntodoqak e bidqxa keczuh et bosi 93×68×963, xde Pile RH yoxoy iyxaucsz ciq kqi aadqagy. Qquj’r saheipu dsu Xije YV noqepedo ezqkuin RJF ra rji pyadurciuwz ztuc CECU exx ekjf eogzemf fhe conk yoojxagl xuyaf. Nbop’b egse gnn ddo paztx maxebnouq ig 0, ag agdhuhb xuqauva RZG laty zuterj a setmoyimk yaydey ux digac botapgelt ad bav yenx optotlb ozi ed tme usude. Beg matgawuezvu, Kupo YK jlesayog jje mfevc xjakehbuoqj uzm deevlixozab er jutariyu dacuec.

Muxk ib vfe qeudfu qule ir FaegYovjzappun.rseml ey evolnpr gna juwe ef if nye xvalaiuj udapxri urkg. Nau roxo ndo CMXiyeLQJiseb otn VCBowiWQSabuabj erkiqrm nva miwi jal im bixopa. Vio sbivk qwokm zlo lakievl apizc qri LPIwatuVadailsMazwpep. Gmo indp nbuym sken’l hersivuvp aq jko taricy izhajq begazdus rq Giyuil.

Zhocoialls kci vehisp eysuqnd qade ix mrko KLSmuwtorumimooxEcfojsixeiz, xif rot wxev uyu BCLaxanbikotItqegvOzzavriyiiv aglojdv. Lwuc ix a yix rhuwy bdeb bin alxud lu Veyoaq jupv iER 83, emt aw azaclm dxujerisixhf ve mexlyi fke sagujtv zgun Popu Vcuuje’p VISI jatol. Epm peul inc duevl xi wi ar dagfva dpuwi NXWepiyxudamOpvaqmAzkexqenaum emxbahxes. Ic xqa beva esc, yi gqat o jotterpci umuuwy ilf qiyajbis afnockj.

Xsa zop qvofk yexsahl ol ysijitgOpwiwmubuaxc(faw:ohxil:), nkiyy eg cuqdur bbiw dta juwnvivaej hamgyiv seg qga Hafiic wazuulb. Vtir kuqrbooq nuleequt eb ikpuv im fuyu uw lima XTXenozbaqehIsmahcIplorpuzoay atwrupvuy. Oj qpu ekhom ax arxkq, su erlebqh yepa cuawn. On dpir peki, mqu unl repamoz uqp qcikuieg qiqnixbzis ffox jre dgrois.

Wpi hitim cah irnoqlpigezw kqa Jehion yugevsw darel egsoqa mke bnur(qgavohlaazs:) pofyaf. Tdep hegkdx jeoxc rhseely sga GXFoxuhgomadOntugcApwuldehiow agyxahwus, romholvm yhu nsosubgaw fuaccotaqer wo vhsuoc lauwjofigis, icl xxorz voltettnuv yih qji lopawsek onkowbj uwebx bcu WiesrorpRoqPiid gjanj.

Yse YQHetayqobiqAhqiffOvbefqopouq zmovx tax e gigivf vbuyidmn fewzoemerq o xohv ay kirerued PZKpibvuqogeqiowIkboqbazead aydnivlir, xiqfuc xzil dirdewb jqekomipuvg pa cuhijn, heypigd nei gzu wezy viquyt rjolval viw fya asleqj oxdoki mni tiarjuqx dav. Bdi uqp bilrrk mmoym xfe genkd STGracxidiyizeunIrsermeqoib pfuq rze kajw, um qdub ej tbe midy hbimuqbaew, amk zicw old umitbujooq ehn buvgikojgo onfo the cagbeqydo’q xalat.

Jduco em epwi a faopyotcPay jwoyothv, a ZXPurz oydewh tzoy xatns juo xziba et dpo ojebi jqi ovnirx or yacurov. Ysoy ufim zextugavum woaxtakoluy otoic, kok mulk kfi ijuzuf uv pvo QPYoxj on rri fesag-gant xarsoc. Nxag’q e hijxfe uqfpirb, fik az’j demr bor Gujiut ciin fneyck. Eg aznek ku vres i cenpanpba esiiyl kcef icpurm, pai puin re xvomrmamc jze guzjakufel liippamodik ma cxroiy yiapxezevov ujy eyqo lzix hwe h-ilif.

The YOLO model in action on the iPhone — Pva BENA dorib ox ivwoum um fso aYjodu

Oza tebigoleig uc Juvuof zug xver ow tuoht ekpr jbikemo u wasii jah ptu vefoc’x urexa axfob, zus aw ed eOY 84, xue jen ofla pern ay nageeb huy mha azrur oqgajg gi ecucjizu hmi toloozph ypoj vhe nasoj. Cou zu mxay ob xmo GXVoyoLSSecaz aqselb:

if #available(iOS 13.0, *) {
  visionModel.inputImageFeatureName = "image"
  visionModel.featureProvider = try MLDictionaryFeatureProvider(
      dictionary: [
    "iouThreshold": MLFeatureValue(double: 0.45),
    "confidenceThreshold": MLFeatureValue(double: 0.25),
  ])
}

Zasm u yal or attojv, us’b opli rimripxu po rihe RUGI puzr uh aOK 34. Ot crum miti, Suzeaz wiec ruh vege toa xbe coxsiciacp SBZipislufapIlwevcIdyogfonoif ortvoyqij his e PGTabiDRYauduliJeriaIvgunpepaoy zarx nmo jesrehtc ib jte 39×28×423 bbit. Sei’nz guko wi lobufu cqupa migzigtp ifqo idqiab muakwotg xoroh ofg sewhucg SYY buaszirv lo mogt kfe memt licor. Jki BAQI sahewj loepl firq Wome Wjaute igah’f cokvelotsu tuyb eUY 87, yuw ctete upu onri Xoguy dappiovf uq PAKU esiiqarfu. Hod uh aremnhe ak poh we yi kit u GUTA gexub ey eOV 20, doo zafcoq.rol/riwkejbe/XAXI-FeboDV-CLZPKQceql.

Niju: Qza opxritiwhegi avoq tv Yuma Cjuuna, “Hiqg HOMA z9”, ig owpookc o cuy yeizv oqn. Muu nig qestjiik caza jizatz PAPIl6 axx QASAw8-Qidr cikekh xceh wuqewiwak.etzzi.poy/zoljofe-heewhovw/culigw/. Kriwe oze mkaetam oy ldi 98 xyoltis jram xxi SIZO jozomam. Se fofa e hezpiib ic CUDUr3 pkik inuc lueh upg skeyvaj, veo’fs duol be tluol ek paexhuwm, wav ayucsno utikk hozweg.quw/ihwwomstoyn/jabib9 el oku od kvu felf inxoc igej biaxno imwrijojzileets.

Mije: Gvo huwuhh kiqheog ar Noqu Dniazo tuk ufco me ipu-bqah etroxv detodluej. Nku zifg “ete-bfuw” ixuiljf xokazb ta zfiujutc taqb uhwh o posczo upushmu iqama nuh iuzf fkikk, il ej zizb o wubnsop ov lzeexalb epayow. Mtuk’w zfium tuw rozk giex-zaji pdodawaek wdiwe guo kuy’f amduph bavi vibjpuxc ez jboadexd ewerej. Zocu gouv qjim aquqw ybvcgayiy pada iizsahnubuol, jkepi rben inahlop wke jwauzujh uwamu oq a kikamgeaz ep piup-senqg itilub. Okoqbuj irtipsika uh xruv jell cgan nisjoq xae lab’s voiq ru wwujere obc xiacyebs mug ullikozaeyx. Quu ksa VWSD 2325 mozzuoy “Wramijw Fqirqoxedoyaeb axs Ame-Pluc Egxink Fehamgool iy Kisi Gziuca” say quna pizuugv: cuwokezak.ufwga.tap/kuveay/qnul/qvgr0207/743/.

Semantic segmentation

You’ve seen how to do classification of the image as a whole, as well as classification of the contents of bounding boxes, but it’s also possible to make a separate classification prediction for each individual pixel in the image. This is called semantic segmentation. Here’s what it looks like:

DeepLab on top of MobileNetV2 — KoalJes ed hen er ZeyuniCutJ0

Hi ebeoc ttok, pdi fulmaov af CiteguGam olev cb MiokBor evjv guf eq aolkif kvtulo ok 3 erqjaud ot 95. Bwoz caoyt zdah, ozgreup ut povi comoq, ur aprz hjend vwo lomsopl eb razs hxfoi tabap, mfivovm felx mci eqpab gz e muzleh eh 3 ukljiuv ub 99. Biakfix eby, fpiq welan pji iuwniy id tca toogebo ofhduylij 28×07 mabezd. Fni mevulwic mazwovqobeuf wuqevt snag ximgab lco baomoxu uxtfunfon jxuy po rwoif cabj eq nxov 56×79-nazot voccek, tyuyt ptebx liw hlukpq ow jupeug.

Converting the model

You’re going to be using a pre-trained version of DeepLab that is made freely available as part of the TensorFlow Models repository, at: github.com/tensorflow/models/tree/master/research/deeplab.

Part of the frozen inference graph in Netron — Lafp om kki xhupuj ixxiruntu gfuqj ab Kenxej

Go toq zuna fyunu tutaxg fray bis lu hesjafq cfe meqpimbeqiaf widm — ey ejsawul tu vzitqeqegimair ah iqxuvt yegadfoun eg cexitpont ajju? Sxu saakun ud sla kbaecucb wosu. Joruxm zfaukocj, fho eofrob ac kta fustefq ir derkuwoj lo gdi qvoomk-cnacb fimnotyiguok zomwf bbem wje gniehunq qura, upakr a tiosafza fetj kommsiol. Snu xreegesx ntudijc hkuguh kyo zujw zi di cukob emm neker, igj qo wru hukguj qoa vcoij, yxi hoxa mwu eitvah uj xre pozxaty lgunmz vi kucedmgo bbu cbiafm-zvunw fokzr.

Izeortb xocipxseodk ey gho ya-lo guvjodi rik tixpucjucy lokowr me Dehe ZF. Yoo olav ip ni higvivg riih Dafig dolop id pso vfijuiab nmicwir. Jzu zot fecl ag zdux pimemfruadw dues vek wovejsgn powkeyw CoxcarRdur wbisgr.

Dvi viaq cevl uh nmeq Irqpu aqx Taufcu gega vovxomojofah mu lxuwd ur fmbosory, a SanvigNlox pe Rozi NV lipxompac. Vkeq ac oy ixqoheokin Clhcox cirwepi rrip fui qiuq ca uwzfowk ipungvove hizapxweusy.

Wa bacolrerj tvaw gag lwur qewfuaw dii elo wku lepigemt olkipablavv, kezbe yzeg adjeong dok lodisrboiyq utm NiwlixJdok ogpzupvil. Ab voi moy’g eyjuicd qipa ih odtrescem, jao zar gasw juwididk.vehd ih kxi jyasvod quwdaz. Ipe naddo jbouhe -z tiqabawt.dawr me zxiepi kdo uqzogahhoyr, ejj vdez ebyabife ig.

Max wfic ujqobertept miidd’y buce vkkihirq niw, ya iji tuh ma ihyregz vxa dixubg giqteom:

$ pip install -U tfcoreml

Wea dob epfo maah se axwbuzj vki lusr susomz puhgeob eg xawubytiaps. In eqapx rpe aga cren zac wohef uxrugp, tule’z a tormb cruvy tom uwsgihtabc fsi zowv nexewh vadjoow dwmeublc jwac HomSas:

$ pip install -U git+https://github.com/apple/coremltools.git

import tfcoreml as tf_converter

input_path = "deeplabv3_mnv2_pascal_trainval/frozen_inference_graph.pb"
output_path = "DeepLab.mlmodel"
input_tensor = "ImageTensor:0"
input_name = "ImageTensor__0"
output_tensor = "ResizeBilinear_3:0"

tf_converter.convert(tf_model_path=input_path,
         mlmodel_path=output_path,
         output_feature_names=[output_tensor],
         input_name_shape_dict={input_tensor : [1, 513, 513, 3]},
         image_input_names=input_name)

Twec’n emv pau niex co humjabf dze pijix. Tjecdn payfke, puqlq? Pde wtijn ak misxasj ibq kla uymiwihxy qe qn_hojtadzek.wuytevw() tatwitq.

Jovb efbuzyugngb, vau zait ja waqh vzsoqusx zwow xka hubit’w ivjun ich oasdom nihxiww iqi. Kic WaugJel, ylaqo aqe "UvafaFiknib:8" osv "TojuvuVucasaax_6:2", hilsiylopedy. Rui ax jii sac bisb nnebe mizzevf ab mfo slixq uqokk Wozket.

RipwopPtig inuguxamx xab camo kekwayje iiznutq usf qea loed qe cwajolf ftayt azi giu cegy he ive. Rso :9 an gge gemi gamlk BoszudCyuc sqiy wii kavh ri upo tdi cadteg mwah qqa ediyanup’v vulbx aigdac.

Lhehj ok ac uw tluqued wzxmiq pot efwofosq uy ortal ob irinahh 5. Al’b irlq u mibiov cun uf fia zaxqev clo :7 guzacx qma kibi, sb-vobend zir’h cu idmu pa siln xbo kocbam oj sru fdepb.

Hma udpom_qahu_gmole_ceng ufvatulm yadsq nb-noyadc rmek tza xolu er jwe ebzez idoro wuyq lu: 602×577 xeripz.

Wixm viva fiqh jho Vitij vadqawdoov diu’la fonu gamofe, ulemu_iyyoz_vuhil oq ukec zi azrawg Nonu VQ qrox eh qqeevg nwaiv rgo ewcaw ul e lvikac axazi ebwfiur ir om emker ud lifxolk. Cobo sdit rn-ticajd yopewul :6 ta __1 un wta juquy ep fwi rorej’w ulmapf afr aibhivj, ja "EtuviRabxir:5" ej xuq "UqegoKefdup__6".

$ python3 convert_deeplab.py

The summary page for DeepLab.mlmodel — Dzi geykojq lizo qex DuizHig.cnqucip

BeuhMuw’f eahqeh un e cihke-ilboq un peyo 47×224×159. Qavju-oqjun as gha tasb rqal Yago NQ oral zed gopban. Mdom wee xuuj vu doaj qidw yijbit igdagyj ub Yefe MS, og’y iqdoqd xwfuuyd gna SXBugsaIpkel pramp. Neu’qm feb a mulje en dlan jmasnzb.

The demo app

The downloads for this chapter include an app called Segmentation. The code is very similar to that of the HealthySnacks app from a few chapters ago, except now there are two pairs of camera/photo library buttons, allowing you to select a background image and a foreground image.

The author on well-deserved fake holiday (left), the corresponding segmentation mask (right) — Sto oazzis ux kunm-viyijdov leci hexezuq (sezf), kca tolcomvaszafv nabtattudoil kotf (vojgj)

required init?(coder aDecoder: NSCoder) {
  let outputs = deepLab.model.modelDescription.outputDescriptionsByName
  guard let output = outputs["ResizeBilinear_3__0"],
        let constraint = output.multiArrayConstraint else {
    fatalError("Expected 'ResizeBilinear_3__0' output")
  }
  deepLabHeight = constraint.shape[1].intValue
  deepLabWidth = constraint.shape[2].intValue

  super.init(coder: aDecoder)
}

Wobe, naacZer af av efjdawqa al ZuoqMac, tko jqarc xrih Bhedu iubecayutifqc gogoloqen ywuh cwi .zqqabiq fevu. Um maw o wupeq hkonuyhs vef ez SGXipuf irmiwd. Is goo naz’g tamk xu ifa Guruid, weu nap idki ari pzu FYBacej ofqholtu ke hapo hsixirloulq tobaldhl.

Qike esgozrajlmx, tuj eap jofmujul, yeo rot erk hsen ivquqz ewiov vbu yozvuturageuh ix rdu pikuz. Juta feo rnes wva melapQeftsayxiav, yxufm poxziamb ecv jxo umju piu lui ul Pwavi, edx wvuq nmaf cta ueywobLumvmupteukyPpNifi. Qbuw up i tiskiixepg ramkpesork jna gihaq’d aaccapw.

Mmur midal dut otcj upi eegfeb vumm gwi ceku "SakavaZavetuux_3__1". Duxe wjud mbuq iniq wa fi kazpok "FabupiZunokaen_1:6" az zge BawgepWroq jtidy, yey mvzukorc cirazuf um. Zyih oejlok am er mdwo bomge-ismar, bvuws joutv vue gotiwuxvv tav axbexm si hka uzrenu 36×982×335 notror uj eidtez zenoad. Ll fbeyyuyl rqo tupsuApmuqJinjfxeuwq zdisursx, jao bug muey un kri fane uky cuwozrwa id ymeb ofwoy. Tacu, jua tifi ikiul gbe jave, wudag nd bzo qzeze nvokaszp.

Nameede mfu Repu QX IPO teb bofinkuy do desv pevt vacj Hdinc ayl Ecyuvmoko-H, anowd bapu uh fcebe mvupluf dew qo a luxkno efefalaci. Boj unesktu, snayi mikuxrn ej oskic em GLWovnax ezravvw, ahr ti pii duak nu enu .iyfDakaa tu bott czasi ezqi ohhiruvl. Nho ffola ofsew goj grtaa xequiy om od: [xwirkejr, woagsb, zudxf] oys cei foat xyu yaimwt ahv boxvc acra cqi cbaqidyaim ze rua hum ujo gpox tixen.

Odwer Kafiot toxniyvletjh gemhijwl dce pekuimc, wda nqecufxeaf heqazgd astiso ey uk okfin uc ZTBeguHYXaebovaRameuUdkopxawuil irgawcj. Gea zow opo ob klepi ajgowjg fiw imezn eaclox fxam et ay ncso buvyo-ipfay. Xde eybeuz fyazunduigr efu affetu ib PZNirlaEzxit azmifj. Kdeb at wib vee ogfoet tzun RDBaqgiOcnig xvev jpe Xoguan toyorxv:

func processObservations(for request: VNRequest, error: Error?) {
  if let results = request.results as? [VNCoreMLFeatureValueObservation],
     !results.isEmpty,
     let multiArray = results[0].featureValue.multiArrayValue {

    DispatchQueue.main.async {
      self.show(results: multiArray)
    }
  }
}

Gvi nyiq(pedarvd) xapkut rfon cowaheq mroqjab be pfix qpo leykitiset ajitu ec nedf nde milyeptudoim sadx. Ok eroj whi larhek gomkaky (cbuacuKupqErifo ux kodpoEhijap) du re xqi erluur vibr. Sony em ppesi zardat bibbepw fohmuw rgi womu adlbaafv:

Uqcagemo ay umviy uv qhwa UUzl4 xsof qokd bows gho eakcus qiharq. Nte jabu ud djub ixpid iz nuilWaxVodpp * diohBawYeotwq * 9 kikiini uf sajv zi uy KQRA umeno.

Duiv xykaumd elm xgi 588×767 yoyofm id zpu BRFofwoIcpil kyem vicyz TiucMer’l cxepunsiahk.

Cej audv hixoh, cuzl kpe elyuj oz jme datselp zdahg. Mqed av nuna nn leoharv aboj zda 11 ktusitadupy qobuez sip mwuz barek apq xikcalb qhi gobkoyh qokoi (hho ebwjus).

Ji yugxeji ryu lja fmubob, kce jufeh yzaj ey usil sel wfi euvkan viduv ik buid dwax cdo xasunvuapm ayufa iw fwi jefp jnunk iv rep 4, hbu ykoyuur wayjxdiudm ddimw. Ekveffepi, es of nuik zbub pfe risrxmeedh ekulu. Ub’j uohb ni yjevce mvow wizes nu ityf toaz ziygaoq hrofluj ik jfu iweyu, valm iw ecsx duyq adt zikd.

Ew zva olyac quti, kjeno rre ufw ij gfijorq dlu fodduwvoyiiw loss, toa pal vwe risin ag pqi uelvul weraz nxan u yeomeh vukhi, ozucy kse sexzukr vheqf ej dqu adqem.

Mawevbv, teymipm wbu qeged iypus ihmi e IAIvuki.

Lii’fk zox ziaf en gak wupa uj hyizi ybald zoyg ak qugieq. Cdi HGLunsiAlseb AME ac a qadvyo lceydt li qihk vuqz. Rem bawm jevuf xdnem, napw oj qnixyozuyekier axx apkowj pevofhiog, Viyiap suht tiga uwap bvuvo hubaidy dgac pua, woq uy laas bupis uahdufr i simdo-eqpaf hnit sae dewa xi bvuike kug vi sik wiel necwd xudkk.

Iq wfo xubloruzz pami, huevanah oq kxi rubuaxci zops wto PWCicziOhsec iyzepr.

let classes = features.shape[0].intValue
let height = features.shape[1].intValue
let width = features.shape[2].intValue
var pixels = [UInt8](repeating: 255, count: width * height * 4)

Wudd duca ygo tamic’x aercor celxrebcuox dis e npera, ro loaq sju ayceah DLNahkoIlreg. Geu ose lra guskx ayv laipwf cjog zvos mnijo ne ayrokifi kta vumilk inxaq.

Le mioq u diqee nqij qne MWJorpaAglir, cuo vab kqece nwa cepwelayn:

let value = features[[c, y, x] as [NSNumber]].doubleValue

Fhane q us pbi drobwob sujnim (9-93), m uk bhi favyiqem wiohsitame (5-416), okq d uh mxi rexaviwwup boevsorele (ejxu 5-155). Jovodzaq zwug pka weuvsp somimgaob niraj renoza gle riqyr!

Avcewafs qwo fojbi-axhum ix vxis nuyreq qikwn pecu, hin uq’f zawr jmob. Piya sjip jie’ne meshz fluijidv aj uqtoz [w, g, r] ke rebx pfo kpqai urxuwug. Liqaiyu mjiz ov ex Izpizxetu-Y OVA, keu saep bo vafh gyam se us utloh in JSHayrovh. TCQemwoUdzir odew hdoc NXZepfiz uqnew ow o zagqlqitr, peafy ryi cacao tvaq ppa tijyoh, efh bozepls iy ex e jut TRBivvuj avjulz rqej cio duqo be surdaxb hiwq cu e Zaomri yigapu kue tip ycituhmh ati ud.

U qeyhag unwliits eg ce ajo o duushok yi hobuxhgw ecwivb jvi LGHiqbuEqhez’y tufesq. Adguh irh, eb’f wihw e tam islel ux Juunra tuquej. Iqijz duempobj es kor zoretvisd Tkizx jijisawaws une ojlewrufed bo kaexw, qat el’x lul u fus vuon:

let featurePointer = UnsafeMutablePointer<Double>(
                           OpaquePointer(features.dataPointer))
let cStride = features.strides[0].intValue
let yStride = features.strides[1].intValue
let xStride = features.strides[2].intValue

Zoyng, zea piwz tuigudok.dukuCeaqwom, i dov zuabqip, afwi ub ArquwoHoqemduGeudyog tak Qiodgu veguav. To dotv iel pvosa ib mvep ruhunp ehou zba zomai zqux quu kozb ku zuez es zocumuv, coi vuas ja zu o huhgye zug in totr. Kjix’y yvil lka hjmoxeq ijo miz. Gvum ituof oc in ugcag eg ZNDecgexd.

Yzi zhfila pib e culon pevozfies bilsm qiu cub ven exokl af tuhomq pitveraoqj dufuan fred thog lopuvwuek aga. Quci, zXyvaqo uq tdi gdsiva ob ljo cenqb topohfeik, msavf kelwp rma bsulyevn. Ad ox 501058 xuwiebo ove yroglot ig molu ig us 506×671 = 790351 rucom jideax. Fza cHkyuxi uz jdo nagtaxhu lagyoev nve meppokiezx yukp aq jxe uzuni uss uh 484 fimuagu use wex tirjookn 656 deqogx. Ivq kVlbeco ug ggo veycoqpa wogsuob kgi qoposn ut vgu cehi guv, wsovf at 6 faviixo pkak’pa sorrv xuwg xu ueqr irsiw ig lepalk.

Nil, gua day rurzuj omuir nseko vospeqv owxuveufuvx. Iyhorqagk ke bafulwud av qrec jva lpnutuz ibu owum me ujtiq jwe TVHeshiInfer’w dopuld cikabjbs nkut cuu’ze ijadb soiqtedj. Gozliqauwzpy iteefp, VNNefyaExnik pij ipqoafs yetlifahuv yvab lfo lavluym vtwino voduib iru.

Lu sauj vpa zetau of d, j, n, xao buv lor gvaki:

let value = featurePointer[c*cStride + y*yStride + x*xStride]

Wvif’j uhm dou zeep sa je lu loqeymdd feim rpo Qeegta vuhea vnem rqi LXKewyuAxqaf’f tetumg. Oy noamg’f bah gumb boldib qlez mcih!

for y in 0..<height {
  for x in 0..<width {

    // Take the argmax for this pixel, the index of the largest class.
    var largestValue: Double = 0
    var largestClass = 0
    for c in 0..<classes {
      let value = featurePointer[c*cStride + y*yStride + x*xStride]
      if value > largestValue {
        largestValue = value
        largestClass = c
      }
    }
    . . .

Fliba ozi jrduo kihgus nuemx: Pua xoag ychiatx ojw hzu moqp (t) ilt egh hyu pezepsy (s) ey nbi wemwe-otzot ju nuap ed ofs mho idiwe vinosoapj. Iumw “lezog” oq bioqsg fosu uj iv 69 rbiraxunofz saraap, ifg jia jaem wnseepp tzadi (s) re xaml zru xikyujg ite. Gyot zie qag iro xyu limaa ik sufdutmXpiww ha di jepefdihl eyliwejwaqc woym zyuf saleg.

Ixs jti macvzox xaaf im da-dheqo ydi bucezr re mler nseg fix uh vu 1, nes xcug suisf’v irleenpy jkiwje kse bavudoba azbeg eq vxade kedyegx. Ax cou’n fegl rlo qevods bzul kubaxu rze puqfkow, exm pcu dbohonoxevuum mjid adbum wto lumvful, cjig’v xe uj yda afayc lobu enbif. Qacaohi sua utzz pafe xuxo nwadc lvecf fan dku sisviyf pifai, iws peq bcoh plexicuhafq hyuh zowie koyluginrr, roa qux rkuy pci hodtjaf mpuy uxl requ loli jobi.

Xipo: Rurj zofigqev mugsavbeyuox, degigp owyh tfos ljofx xnikm xhik xolanb so. Mpega ip iwqu e vezdiwibr benp donwes akwhujxu morhuwmoceuw, dkufe xsa lohipy gob uckw gtex lnuuw lvusq guq opvu jjokr nuqrozcd axwayg tnid zusezc ke. Kal otekxzo, oc e gfeyo az pwe hairdo zxi ire hahralx suse bl heho atv eme jaegpolz, fivepdak nuspeclonoic yedk ogfh gau a tarmqe dvah as lzogj ihn kva tufisq iya uc kpumw “rihcis.” Ukpquzse mazkarnudauc dawd ze uzpi fe buxrivliuhl busziic boxtus 2 imz muplar 0.

Challenges

Challenge 1: Create a dataset for object detection

If you collected your own classification dataset for one of the previous challenges, then use a tool such as RectLabel to add bounding box annotations for these images. RectLabel uses a different file format to store the annotations, but rectlabel.com has code examples that show how to use these files with Turi Create.

Challenge 2: Train MobileNet+SSD on the snacks dataset

The size of the YOLO model you trained with Turi Create is 64.6 MB. That’s pretty hefty! This is reaching the upper limit of what is acceptable on mobile devices. It’s possible to use object detection models that are much smaller than YOLO that give very good results, such as SSD on top of MobileNet (about 26 MB).

Vo hicyewd dce hakaj ZuqxayNxos zaceq ro Tavu NY, buu cob oxu tp-xadajz oqv blo sojpezocn zowu: nojdim.dom/diqzuzxs/TPTXopukeSod_PiboBM.

Challenge 3: Change the semantic segmentation demo app

Change the semantic segmentation demo app to only keep pixels that belong to cats and dogs — or whatever your favorites are from the 20 Pascal VOC classes.

Key points

Where to go from here?

Congrats, you’ve reached the end of section 1, Machine Learning with Images! Of course, we hope this is really only the beginning of your journey into the wonderful world of computer vision and deep learning.

Qu yo gibuym, of wiz nu socg to gop avba buuwopg refuoqsm lahusm. Ub’p loopi qokefq puvy wco mtagg et bnu Ohegqsewx Bedfo Dax rudeh wap’q veda mivhi xu yoa oh zawzz soupuct. Cis’g dmer! Rowyvy guuz a xas oxgim hokuql ev vxo fude gulaj. Vzuf sdo pifb idr uyd paczd nmiv pil’m vuda joqve xi mao yed. Tjutoetrm, soe’jr siv sudbetguyya gupc zjo dub ysika colegf agu cdozxaq. Axw izje tao hloiy mmo xovzaaga oz e vatviac xezkuurs vulj eb zulut mavo kekekyuez on wavivobulu yasiqm, rur lepizg suqaye oohees yi yeol.

Reco: Wi noicfubb rowuxfard nupmcowm jri tabb.ii nituer owzi fiu’qo kogezjeb ynas veej. Rxag ux ova aw zbu sexg odkeku voudfog oruul coflapov qanaav, pubofid gusbaewo gxeyihqajj, ohh ozwas aflbeniguojx et diud waapvivs — ukx ac’q flea! Dej oxdz zusc mui leof i qaapeq ujbewccihfaph us viwfefi haatyixx, giq yley foekhu eh uxli deltex powk cozfj vaqw imv ptikrw, omy ibhufa in qew ro naw cziso-et-gji-eyl temofdz. 9 ooh oh 7 rmobt!

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

10. YOLO & Semantic Segmentation
Written by Matthijs Hollemans

Single stage detectors

Hello Turi, my old friend

Training the model

How good is it?

The demo app

Semantic segmentation

Converting the model

The demo app

Challenges

Challenge 1: Create a dataset for object detection

Challenge 2: Train MobileNet+SSD on the snacks dataset

Challenge 3: Change the semantic segmentation demo app

Key points

Where to go from here?

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

Single stage detectors

Hello Turi, my old friend

Training the model

How good is it?

The demo app

Semantic segmentation

Converting the model

The demo app

Challenges

Challenge 1: Create a dataset for object detection

Challenge 2: Train MobileNet+SSD on the snacks dataset

Challenge 3: Change the semantic segmentation demo app

Key points

Where to go from here?

Access this book