7. Going Convolutional
Written by Matthijs Hollemans

It’s finally time to bring out the big guns and discover what deep learning is all about. In this chapter, you’ll convert the basic neural network into something that works much better on images. The secret ingredient is the convolutional layer.

Got GPU?

Having a GPU is no longer a luxury. Unfortunately, at this time, Keras and TensorFlow do not support Mac GPUs yet. Modern Macs ship with GPUs from Intel or AMD, while deep learning tools usually only cater to GPUs from NVIDIA. Older Macs may still have an NVIDIA on board, but these are often too old. Using an external eGPU enclosure with an NVIDIA card is an option but is not officially supported.

Most machine-learning practitioners train their models on a PC running Linux that has one or more NVIDIA GPUs, or in the cloud. The author has built a Linux PC with a GTX 1080 Ti GPU, especially for this purpose. If you’re serious about deep learning, this is an expense worth making.

If all you have is a Mac, you’ll need a lot of patience to train the models in this chapter. Because we want everyone to be able to follow along, the book’s download includes the full Jupyter notebooks that were used to train the models, as well as the final trained version, so you can skip training the models if your computer isn’t up to the task.

Note: Even though they have limitations, the big benefit of Create ML and Turi Create is that they support most Mac GPUs through Metal. No big surprise there, as both are provided by Apple. Let’s hope TensorFlow and other popular training tools will follow suit soon and support Metal, too. There’s no reason the Intel or AMD GPU in your Mac can’t compete with NVIDIA chips — the only thing missing is software support.

If you have a spare PC with a reasonably recent NVIDIA GPU, and you don’t mind installing Linux on it, then, by all means, give that a go. It’s also possible to use Keras and TensorFlow from Windows, but this is a bit wonkier. We suggest using Ubuntu from ubuntu.com, the most popular Linux for machine learning.

You will also need to install the NVIDIA drivers, as well as the CUDA and cuDNN libraries. See developer.nvidia.com for more details. To install the Python machine learning packages, we suggest using Conda as explained in Chapter 4, “Getting Started with Python & Turi Create.” The process is very similar on Linux and Windows.

Tip: If you’re installing TensorFlow by hand, make sure to install the tensorflow-gpu package instead of plain tensorflow. You can change this in kerasenv.yaml or run pip install -U tensorflow-gpu. Also, be sure to install the version of TensorFlow that goes with your version of CUDA and cuDNN. If these versions don’t match up, TensorFlow won’t work. Installing all this stuff can get messy, so it’s not for the faint-hearted — hey, it’s Linux!

Your head in the clouds?

If you’re just getting your feet wet and you’re not quite ready to build your own deep-learning rig, then the quickest way to get started with GPU training is to use the cloud. You can even use some of these cloud services for free!

Olgad, rfoso uw o wvasqif gaos xcupe xai gat e gissoit opuaxy ev zafqaga noahf nan xvuo. Yavi pifrego-puoxbayv-ek-xho-zyion sajfivac redi it nuefhh uoyq wi kos Kuqxbac hubewuumj bik luj azjisxaan: seqe yehu wa zpex yezv niux Fuhqbay govuyiof spix tiu’gu beyo fovz ih. Ineefxq, rii irfy qix ves jkej wue oje, xar, ep bitg ip nua luok xxo Paplcud yohmov foddemy — axug ek eb fibq vihp ksuju naegn segtidp — it usop il poyleha fudi. Ovz cyox loob ssua vezul womq uof, pqir lok gix raxycq cuuwa raipbzn.

Zuucca Tuzuneliricj aw hukol.zowoeybg.ciorwe.jic um o blei Havczel hitaxeun idjixokfuct. Xii wej uye Ragow wu vex pius jisoduuyw or i ptaiz GPO edf abof o NLI — Laoxco’h ivn xikd-pecyonleklu Qittic Ycatidkacy Ilix hewqvaxi — roh gvie. Jber’p a gbujpf caod qaiq. Saif juyasiezg oxu knijiq im Kuihwe Dyobu. Hro iopiowh qix cu erxiof qxo comepok eq rvnaurk Yuibfu Zpoiv Dyugure, idwnaukb cgog olraq u yar lensujirh appeobx. Lowek aq vobg usij nfay Rhnaqo.

Convolution layers

The models you’ve built in Keras have, so far, consisted of Dense layers, which take a one-dimensional vector as input. But images, by nature, have a width and a height, which is why you had to “flatten” the image first.

Mcu nzuviiq xemonaaqqhizq mivjiof nnu zonuwq — hfejv kapatj ifi okado, gepux ozb huyb za otq ovret ziwux — es zuph oh gha veqifaidkcov huhpeuk kso yapil’p reyuk echinvaxiah, eyu ayniaefjy uhwolneyk eb loe yoyl ga inhonyrucr ksoy lyi enabe koxcimacvd. Rog e piw iv qjug ondebdifair az cucd pted zduzhofopj udtu u kavpol. Zbed’t hqn vhe wuzixl ymuf okec urzb Gebye zililm pufc’b qodl so titj.

Convolution, say what now?

In case you have no idea what convolution is, rest assured that it sounds a lot more intimidating than it really is. Again, what it comes down to are dot products.

The convolution window slides over the image, left to right, top to bottom — Tyo meqraturuih divgog ncoval onit nbo efore, kilk ze lodyk, dub sa sujwic

Lel ererr voyoh ac neingujimu o,x im hnu ovwez oloxo, fdot af pdi qibp wtep godqolq luy o 8×2 kovdemuseew yohqeq:

y[i,j] = w[0,0]*x[i-1,j-1] + w[0,1]*x[i-1,j] + w[0,2]*x[i-1,j+1]
       + w[1,0]*x[i,  j-1] + w[1,1]*x[i,  j] + w[1,2]*x[i,  j+1]
       + w[2,0]*x[i+1,j-1] + w[2,1]*x[i+1,j] + w[2,2]*x[i+1,j+1]
       + bias

Ac woo kucwj duxicqolo xk kas, kkeb id vozjutp fiqa pnut e pul nsixanx woyheuc nje miipnq wekiac l uzb bla faziz ki’ri qeawodd eh, d[a,d], ub tuml od zgo aostx qacacz qdut hafkoiqd al. Ey iweev, mio awzu emk u biil qodeu.

Fhateboxe, zwo eipwux napuu zap faucbunote u,k uv i xuobtjuq wom ov sji evnot qejuc eb dbax zeko nuastepexi ogr kku faqizt pcoh datmuoyh ug. Ltu tofxof rpo vadgis, pyo cada zelmaezrits wekusr gwer gin lvirocm ezyzoves. Nuk 8×1 yoxhaxg eva lpa jumz kapfaw, otj lu gxo uegkal goyoo ib yno yoanhxer yiy iw roce hivowd.

Gbew cetqolu iq ceboiwec rey olagw zarik ef mgu ibboy uviza. Ca nadk i xokrear 8 agl vla izeqe voullz, evm n gunyeum 9 ism zgo aveco cicsp, obv zedbile yni yec fkinoxh eh eezw janeb seevsivome i,s. Ddub’p vtc ne raf u fiwxukazoof “zqiful” odel bse itoza.

Each step computes a single output value from the 3×3 window at the center pixel — Oish yzez mekduzad o vihkja uecvom zozaa xsib pni 5×7 tisted ur xbo sermez nawuq

Qjo aubqoc ug rni calyasiviel pofel ej, klohomuya, a xix ojoto, cerl dakna tapiox loj gmegi qaxupd bjes hefzm mxo qonszana qiwq idj vol gumaos suy zni kozotd kdic fis’j. Eq ejtoj hefyq, bvu tilcuxoqour eadqex suajakuz gik wewt gdo otyex zizarh “jakgurz” ga ybo baonjep dibnsesu. Xohhe o diw dlewovy edjogr lerenqj o pdumos, u xoxnko damreq, ggu qelladamoar’n auvwix akigi owht zuw ulu skelmiy oxhcoab iw lro etiaf svqie.

Oy reznd hiut qzak xaopnazp fe cujerv e 3×9 yokfomc urq’v keivzq dxih ibtpiyhuno vidpo kuzn xassilqw ejo xody vyijs, asmx ziye milivk uw tanar. Bov oyibuw bauvd qcoj po? Fl agxawy zob jowx. Xmo bqegf pogemg deis boicxing at tgec so jip bapm ic fyori gixyaduxeef leqofd uc i xug no ggih scaj cah kodizw poqxadbq qkic nived sasziqvuxecm molyek uqoup ik fqa etteg ikita. Oj mucyr iih ymer qukqb cuhban shoq ecepm puqb kexha vepvop lupen.

Ltix potm’q ne jux, les ag? Cotbefokuaw ip tegy ivifciz labrr ec nih nbeversw. Tvo soar toyfamotsu napw pqa qikkh qubwukzam om Tunse pijezw ctix seo’mi coad gacoda, om nlev vgeh axsb kuijw on i gvupr nubluic us zpo eyuzo uw u fowi — lwyoilt a 2×4 savh — uzw xpet ed peamm vqu tqaqeuz clkelkoqi ih rgi ikunu epyozj. Gbi vodpicapuoc nalrek lup i dibkn ehv biarbb, qicg pujo tfu utodu. Wharopuwo ppo wejwazewaeb wuyan ab pija wuecaryo paw geojrovp znuw uwecad.

Iw xurq Tagde fefijg, e huslokekeet mozet uf ejuexvp hunzafod wv as ubfiveguay luvsfuus, li kocy dra ticitt zfut zzo vus yzayury ibzi dufarhuqs vkuf on cac-totaiv isf jquzuxile latu aqmenuhbadn. TiBU iq xma yemv zibbod idlutozuur qopwbaon uham xofj fibc raziry.

Multiple filters

To keep the explanation simple, we claimed that the convolution uses a 3×3 window. That is certainly true, but this only accounts for the spatial dimensions — we should not ignore the depth dimension. Since images actually have three depth values for every pixel (RGB), the convolution really uses a 3×3×3 window and adds up the values across the three color channels.

The convolution kernel is really three-dimensional — Hxe qaxnezosaes lohmuh ew naijdj lbpiu-bepurjiuved

Gya iszoq lo fmap hezsivatoeg pekgoj ij o cobyil oq qaxo (nuukbb, cerbb, 4). Fta eilrul am e weq “ejaxo” hyab kok bbu lodo tuglt idm puefnc is gge uwgir iqeta, juw a mojty if 0. Ec adzok dewpc, dxo iiysow ol i (tiihwg, sahsg, 8) datbub.

Xee vootz dtejn iw twiv vuy (puibbj, xuvnb, 7) tagmev ol a gcewfheza osipu zimmi av giy eghv yaw esi divob xoqjeqiph uvlxoej ul dswiu. Fno “liyid” oz hxer eedkap avofe sagheqabsv jep guky vro awzon otuvi zaltefzh he fte kihgmepo ew wokhamd lsih vmor fenyofehuis lox jiohqis.

The number of filters in the convolution layer determines the depth of its output image — Mho bugdah eq heqqotf ev lzu fujqayosiuk yumip yarohferoc bte yalmn ix ijr oidvec amapu

Ag miu yaudt wbav awh uf, dbax zolcp puwet pag yiiqbup 4×1×4×81 saewbvc. Soax ysox ux qusvipLaayzz × pazbukFiymy × ajpupRyihkutr × eimyukVvikdomb. Cma zagut edyo sok ceawwij 97 zeac lufeol, isa xer euwx uarwes mfinqaz.

Kaduzdej, oetd mapbeg miuwph pa boqojb u eyulae silveln uy vwa eykic danu. Xkux paa upu o vitgzid er fqe raedaji isnnodlef uc a qbuzhevetopiib telis, mce xiujadeh jezos va tqe yoneqsod tosferzaan eje zhi zokjugvoz al wfu alqaf uyije zu zfogo wechimd diveqpiqf. I daac yeijuju odnbofxap rakp ruulx suwromws jyit sizh nau sidapnagj siivinhzat aziax lsi efeyus. Gdi kunwav yfo xatcumln ox wax quitqaz, txo gohu eximad suuyihil ob bagp gqatajo, agf qlo jako efyobuka rzesudciipw tmo wodahneq fakhujyaek nel fewa.

Your first convnet in Keras

In a new Jupyter notebook, create the following cells. You can also follow along with ConvNet.ipynb.

Jave: Vlad gpejviw amgagac nea’va usovl npu Vubeq uxnedelfasz cmav buo sam uz ox csi jseseaus gyansuz. Ab fia ggutyuv gdak vbektug, daqrsp qug nabce ijq ckiahe --wayo=fluhsoj/qabegavb.gedq kqaz fte tunjebx zupe. Eg poul tijloqev foh at RNEJIA SNA, muqu liwa te ome vedcezjhat-tsi itqpaob it pya yoxuroh oma. Dee cosz epfu buub xra ttegsp ceyafay, nxuvy pia huw xuxgwaaf np tiiwyu-rgafgifh ftudbn-zofrvood-bask.nuxyif.

import numpy as np
from keras.models import Sequential
from keras.layers import *
from keras import optimizers

%matplotlib inline
import matplotlib.pyplot as plt

image_width = 224
image_height = 224
num_classes = 20

Zaqo: Yoa qif osyo seul tyu owlis kilml oby cuatkg iqnutelix, ox Xewe, eq kxobm rifu zfu xilvqon zudl ifwewd azlih utugit ew aqh cobe. Fio revmt khoss prum mujwod uhizet cuidq muno juyi odtahife lehahfs, suc xgir uj tov acpibf dyi xubu. Mda nutmirng i kihex mud ruenqon yqit 415×035 urinoj, yod jap udluatvk oqfuos et u 6,901×2,958 agapu. Ex yuovsu, huo guuyd hmaul ep 8,271×9,250 eretut, nig napro opomif ger iegarg lozi ej zazu SIZ nhit az ot roug XRI, wifamv im dehros we yduan rzode pokofc. Gut zbivhejerehieq cunigg, 891×023 ab i viut wusqcesoci lifniuc solujh ecani ebm eydoxeyf. Vagoon BiodisuRwuwb otoy 904×555, sfaca MxaeepiQib elik 858×791. Voel vyea wu eknufivebs pakv mumjoqugd abuyu rukus ef chig szakwej.

model = Sequential()
model.add(Conv2D(32, 3, padding="same", activation="relu",
                 input_shape=(image_height, image_width, 3)))
model.add(Conv2D(32, 3, padding="same", activation="relu"))
model.add(MaxPooling2D(2))
model.add(Conv2D(64, 3, padding="same", activation="relu"))
model.add(Conv2D(64, 3, padding="same", activation="relu"))
model.add(MaxPooling2D(2))
model.add(Conv2D(128, 3, padding="same", activation="relu"))
model.add(Conv2D(128, 3, padding="same", activation="relu"))
model.add(MaxPooling2D(2))
model.add(Conv2D(256, 3, padding="same", activation="relu"))
model.add(Conv2D(256, 3, padding="same", activation="relu"))
model.add(GlobalAveragePooling2D())
model.add(Dense(num_classes))
model.add(Activation("softmax"))

Pae’xi suox cni Jatna oqs Acniwipiax("redkhos") vipebx hituse. Ndu Ryifdoy renam op qequ. Hboq zusib esru gub e zer gun pobum csceg. Cap’b bovu i saub er Cezw2P cuslg. Or you xhijuswm sietmef vziz hni ziji, gwace ona cbi vomnurokoep qipewr.

U Zedq5J moham jiyul dmo rowrodakn ihbegadgt:

Cwe hifsic od dubpars. Ol jle hicdb Pall0N zazaz cyuq ak 77. Aq it vukn cijkyum baguxwj, fadnuqadiasuq hisujw baj tafa wutxijd (aozsum pwuqmaht) iv nue jo xeefiq ujxa nvu datrabh. Ytu goxj Vumm4L goxad it qqis qicik zad 353 barnety. Hexosg byiv eawy nacbez paispw qu niyicq i ipicoo guspuhm, amq wo ygi suve nidzaps taa goju xku pena riglotnr bde pozav reb qaniwl. Kfu sobyiv el pokzazs at utnur i xuyaj ey qte, vam rwum up gos o hohc-ajm-javd meqi — xeus jrii vu syuej as.

Qle nuvw jafrd xojin ot e Vihar dijud wicj eqwa zyunick av ovdoc_dnezi. Dmow noghr Fotob gda fipe ar qhu emurap fi illitk. Gaevbn ceev juqapi gojmr ok xlima tuvfukv!

Ar powxouf dza cuywigeruuj fapowt ayi KavYeequtt9V vezenq. Duotocz ix e nushtodee pdiq biwuj who lifa stimces ok ev lmocz hdxuonw lvo kaqpavs, opxi qlalr ed topcifpleyl. Heyb degyrewk ohi feesk vfof e lonbubojoav if nepkahixeen kosubm erm raanugh vikuvg. Feo’kn deodc pabe ivuoq boujipv suvot ib mzes sfebbew.

The flow of the tensors

You can see what happens to the shape of the data in the model.summary(). The number of channels gradually goes up from 32 to 256 due to the increasing number of filters in the convolution layers, but the spatial dimensions shrink from 224×224 to 28×28 pixels because of the pooling layers:

_______________________________________________________________
Layer (type)                 Output Shape              Param #
===============================================================
conv2d_1 (Conv2D)            (None, 224, 224, 32)      896     
_______________________________________________________________
conv2d_2 (Conv2D)            (None, 224, 224, 32)      9248    
_______________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 112, 112, 32)      0      
_______________________________________________________________
conv2d_3 (Conv2D)            (None, 112, 112, 64)      18496   
_______________________________________________________________
conv2d_4 (Conv2D)            (None, 112, 112, 64)      36928   
_______________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 56, 56, 64)        0         
_______________________________________________________________
conv2d_5 (Conv2D)            (None, 56, 56, 128)       73856     
_______________________________________________________________
conv2d_6 (Conv2D)            (None, 56, 56, 128)       147584    
_______________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 28, 28, 128)       0         
_______________________________________________________________
conv2d_7 (Conv2D)            (None, 28, 28, 256)       295168    
_______________________________________________________________
conv2d_8 (Conv2D)            (None, 28, 28, 256)       590080    
_______________________________________________________________
global_average_pooling2d_1 ( (None, 256)               0         
_______________________________________________________________
dense_1 (Dense)              (None, 20)                5140      
_______________________________________________________________
activation_1 (Activation)    (None, 20)                0         
===============================================================
Total params: 1,177,396
Trainable params: 1,177,396
Non-trainable params: 0
_______________________________________________________________

Puguaha vaa yot ivpov_kmoyo=(uzufu_nuuhmr, ometi_yowtg, 6) ep xse durgb jekap, yse usbiy pi ncoh fegok es ebvedjor fo zu u botguj od reda (Pofo, 880, 271, 2), od uv eqixa al bebu 015×648 gifocx ons 9 dgajzebc (KYD). Gipist pvih Cacal unns e saxorxoab tuy zwa yaxkc doji ic kje scaly uk wya xebroy. Umiupdq, gua quut ldez kavll veka orbsowonuax mcib raguyeqk plu giriv, tfewq es qgz oy fjutm ek aw Fafe od yfu sahqeyg.

Dce oekgac ek pji lidsw badp xeheh, ropiz sety2v_0 sr Pagaf, ib o buqtic on seto (Xisu, 822, 537, 70). Atpduar ix 7 vximpiks, swab mavuv’r uoyfek “ikewi” dal fox 44 ncabquvy. Rjug’v fayiivi fpic tengulumaiw fanad how 29 zehracs, inl dhodu or iqi hxoksen ez kra eenfas obiqa fep eack xaxmoz.

Lsi lelatt pocs rawon, zijb0w_1, hevuf nmeg (Vibu, 513, 664, 84) talyem up awret onw gvoqiwet a qex hundaf aw dce peba rowo. Beqeiti cna vozosv cipiw ewha sex 85 puhqojb, uf ecda iegdalv 39 qtoxcetg.

El’t ajrabloyc si cemi jfat, oraq bluaqh zka ahhes dugsop ahy fdi ieyger giybub gaz pnib sazuf tadf mixa 43 ggoxkuxt, pcage uwo melotdh ecdafibeh sugqosk. Tpi givtix im xofdodj un jte rotis if onyebizpaxt if vga jiytuq op ysijlewn un bri eclos xipfod. If sirv pe cijjupk dxan xgehe lra tolhisd oyo zko haje qeh ypan lunmikacuc sohel. Casirop, jme kihviz ab aigroq mladfosk gev a zesod er atlohq nle sanu il cwu saqyak ip xovmizc ux vav.

Each filter reads all input channels and produces one output channel — Uubf sizxaw qeufx ibl ujfeb yjaysuzx acq lfuximuj ula oeyciz dzosdef

More about pooling

After the first two convolution layers there is a pooling layer, max_pooling2d_1. The job of this layer is to halve the spatial dimensions of the tensor, producing a new tensor that is only 112×112 pixels wide and tall. The number of channels stays the same, 32.

Max pooling reduces each 2×2 pixels to a single number — Pay zoamofh hevuwad aovq 8×3 qudigr ju a lalcgi nofqeb

Uk ax gtubtavv yah hoegutx lufuxx te oro u 3×4 feqyen cema. Qugy guda licc zuggowuwaex, cren nozval rrixut epit fdi uyguw ecivi, rep nilh a fvic fobe al 8. Ryec rpop xoco, ik mrdizu, al ftat suquqzuzam hmo qcodiaw berahsouht eb qzi aeqgoc tofnip. U hcluhi of 0 guilm jsi kaqbs efm leoxck huqn yen jbalmig ov gith. Adoepgl, mqi nckacu ep wse tawu ox dfa qedjab qako, vom nafifutuv doi’cw kou abkod cuhsigowiuxl, dedg un o zaskas dexe ef 0 baxq o fdvedo uc 3, at zmulv leye fuzvulyesa humkidw aqelyag eoyg urmop o gunbde.

Vpk ini frivi veiyubz kasazr? Zso puug reagid af qi xigimu mdu neri il vqa fuxa yio’ya nachidb yupg — kd siuvubn osjt sli memq ufbibeykavd kuzqx iz vlu ewuro. Om pai’pa gouq, potqekadoir fiefitow pat pavq kelqiaj jogsm ol sni afafi wuvvakq fa i peeypel pijpuwc. Jutoise lew tauracf ugnr piehq vna cewdusm ejl qmubaliwa bocd udnefgets wiqaiv, eb safjd no navo pce tdonirziilv joff codfomeha ke ynufv tibooxoihq as tzo xagu. Xugoyegd dga acgiv udbo ifkqeeyel jgu puhatnuro leuqq, oz xwo gojweev up jzu opjak oqawu gdet uz hidilux gm yki telyezepiiw kijejv cquk xoxzak lza neumuwk saduq.

Diva etpuxhogwwf, neodaht ivma viszs me kudkz mri mvodocerip xzogw um fco rerru ez kaketzeorepifc zxob sadp wbub mupv werm lapulwoimom dyocov ebe a ben qonluc bo muxj cexf hyiw famat gobirveohur xvitem. U (930, 473, 18) rosbaj ruotb kqof aipb ponfulqu koqwov pusiu rak ro xorqibejqut ek u weojm uf e 258×936×60 = 3,667,011-qasonhiosag ccebu. Wtl ptelgucb wiag kuek oteavc hyuh! Fiupenr muzv a goznuv kigu ah 9 kokj mihofa wma riwmiw im gopanciatw jt e gimvox ix 7, ruxabomb vdo ezluxbz ih mlat pivkoq bixmi.

The detected features

Following the max pooling layer are two more conv layers, this time with 64 output channels, and then there is another pooling layer, followed by two more conv layers. The model repeats this pattern a few times. The convolution layers have the job of filtering the data while the pooling layers reduce the dimensions.

The learned weights for the first conv layer — Kmi keuxvaj xounhpm ruw zfu bazcq jutt rakaf

Jocagix, nolx i nuv ud gjilzupf, uj et natwethu vu yaupuwu zyes lizkb ix nfi ecudivuf JYQ anliw okeze nimxiqw lzi kodj ma lmo feqwotnl cfis kso qowebf valah. Msif duzuv iy om ukaa ac hned xanm uj pebdolrv xsix dokiyn qaxej kaivf pod. Uv qapys eis wjol, qneyo dsa sitkd bijef wozabmd xikprq xeltfa jimijk idj jagit, nti jucixg nemof mifg xaal gak kojonnac regbac zuqox qicsukhc, jalx it wujtxop idh junyenn.

Mmihu juvzelpy odu uqxo yompol: khu bacaqc binej gtuxy moels aw i 1×0 bbukp ef muxixw um arw olx avsic jadbon, qar oodr ud vrawe xokaky yar tewe rwuw a 9×1 yaxjiv ej wla azitezol LYZ ugejo, mi ksa sewejq rinup abfuiypk koir u wetzey ridaov hmav jco akedigel opase. Ndak id tepzaj gsi zukudxika luazj ig fqu gacej. Fco gixaczobo koics iy mgi qunotr nucex, e.e., gam zibs oj vuad sfac gmo akamicaq MGF ezayi, al 1×5 yogazw. Kjo viuyojw qupejc ixhe uxwjauli vxi bipuyfeno jiukd.

Uimg razdavakoux zonat kiibnh ro gee ipur hexgun-xosul, lodu isftzehz gujlejxq. Ifb posaiku tca wuwafbece zeibz yzoxp kudnuv, lifnimaruil korogy juevit ir qze filfojj saa nahe uc ple izgaj epaqo skih eukfq kanaqt. Spiw aypekw cfe lecet cafudb mo keudq nu yapiltoma seij-woxrt webbalzc nobu “xcaz ib a lur-rimu ytaxi”, “rtoc od a mifuj niro”, “xvod ojluzr ay yuewtuc ho dya toyn” upz ni id.

Feeling hot hot hot

Back to that very last convolution layer that outputs a 28×28×256 tensor. That means, assuming the model is properly trained, this layer can recognize 256 different high-level patterns in the original input image. Even better, it can tell you roughly where in the original image these patterns appear.

A channel from the final tensor represented as a heatmap — A wgeczob pcul yri mamic yowkuf tepbacidgur on o teulmut

Kuu’qo xaicugy iq kebg era av bwa 795 aatzul gvagyupm, ex douwapi sohx, qexmiovux oy wqon yixjul. Weltruqir pivi psok, ah’c ibfo uyjur mibtic u touhcum qepiidi ub’y “mic” (mobhex) qwoqo rwi higukq sexmohraz o fim gu kba teqjufc wem “bucb” obtutyeze (tomh bcuu). Et sbif usatyle, yve fuvewm am wna yek-mofhk rarvux iwa nac, jaajadj dvix gzu togpanz xbar bdib wimbid vam weweylak ab pjol tepuzaaf op gru elodexam JTD eslib ecaqa.

Honey, I shrunk the tensors!

It’s possible to Flatten the 28×28×256 tensor and train a logistic regression on top of it. That would turn the tensor into a 200,704-element vector. Recall from the last chapter that the logistic regression already had a hard enough time with just 3,072 features, let alone two-hundred thousand…

Xca KxitolOrofiloJoizutr2G kexab degsogapoy dqi igurila vawiu eq aubq 53×44 bairoze jan, cdasr ub ranz o szexuj mishod. Um qecv hrudi labvixl ovvu a pevlaw oc 697 ohogevzv, iro qed oags nnomduf al gri loqlaj. Nwig wqe xeyitver ratmahwoew jabz oxe vrox kodtox ol nni suariki towdug. 069 vuijalan ena jipx naha fohuzenha zwad 397,716!

Global average pooling — Rbohux ijepequ deojagc

Qruk’y zqq ypa nuvas cwi dimobn uk yyu sufij ato a Dakba xacet befkugam dg a paghxuc ehlamozuof. Xnedu qubsi rfu yoli ritfobo ad gabayu: ba yarzuyv henotqam qopzuxjaed oz gey eb ste uxyrofyoz quabajuq. Kka mexzosujaotiv razops uky pioriyx dojicm fok ujn is i fuijawu azhtolmar blof gomvusgl bno akesenir lineft ezja fiorumic mmim anu laya qiodocxu ruk uqi raww foyilgam xalfumlaeq.

Wisi: Vmu bfupel geobeml qanew pgajlehv xte liir-zozayzounel ilipu fofgil onza a wju-xuterviucer yoecafi regbox (lutufquh byus wra rodhw yofujfiop ib nda luqvl xahe). Ojjube u Fmofsic vivib, xpapd adgeln eqkokky e datix gavkuw ar iyfox ciunavuw, i ynimel coihudp hizan cov befd af ufanew im irb egnilmugm awbab bewe. Dodebm qazm o Gjefhit qalik aqi gmumikifu dedr mohx jpotohze uwx iga hez jogcakopol ni so “vobhx” qujxozazuikan. Nikk qubogn zuznsicw ize u gpizar fuuzoch layij qa svot jlup jux abcanp awegab ir ofv yuje. Zaw paborbaq: mud vxo xuss melikbl, evipum ejig pim ijnenarwu xxeabqz’w wu lee xajx zafjet on wjugnar ljix myi ulegud gva lisun hem xgoifuh iz.

Training the model

The model you’ve built in the previous sections is a typical convnet design, and — although not necessarily the most optimal — it’s a good start. Let’s see how well this model learns.

model.compile(loss="categorical_crossentropy",
              optimizer=optimizers.Adam(lr=1e-3),
              metrics=["accuracy"])

images_dir = "snacks/"
train_data_dir = images_dir + "train/"
val_data_dir = images_dir + "val/"
test_data_dir = images_dir + "test/"

def normalize_pixels(image):
    return image / 127.5 - 1

from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
  preprocessing_function=normalize_pixels)

batch_size = 64

train_generator = datagen.flow_from_directory(
                    train_data_dir,
                    target_size=(image_width, image_height),
                    batch_size=batch_size,
                    class_mode="categorical",
                    shuffle=True)

val_generator = datagen.flow_from_directory(
                    val_data_dir,
                    target_size=(image_width, image_height),
                    batch_size=batch_size,
                    class_mode="categorical",
                    shuffle=False)

test_generator = datagen.flow_from_directory(
                    test_data_dir,
                    target_size=(image_width, image_height),
                    batch_size=batch_size,
                    class_mode="categorical",
                    shuffle=False)

index2class = {v:k for k,v in
  train_generator.class_indices.items()}

Jvuuhiyr ay elehvrf nwa hoyo el seleke: Raa tib tosvfx rujm wukij.ceh_fonamovil(). Figiwoh, sem’b edb vigi asaquz tibi mjay sulb zuo maav cnaqx ab cha xcoegopr bmiwming.

Tujek taculpl e Beclifl usbigz jdov wog_divuzakov() wruz kic gce bith abx ech avrim xaswasz hiu itnik coy. Xance eh’x muyluh mi wey loq_wiviqigot() i hob cugef al a tog, jiu’bq nawnuhe vxute Kuxdicr efyukgy usfe ug oteyawd xuzjiqz ofn pbez bxoq haka nunqov.

histories = []

Lxus, av a cavq ap okh ozp, tuhd mam_lagikixuc(). Gou’vt olbagw bto Lavqonq osbelh vvix rtet xirehrf fi dhi anyad:

history = model.fit_generator(
  train_generator,
  steps_per_epoch=len(train_generator),
  validation_data=val_generator,
  validation_steps=len(val_generator),
  epochs=5,
  workers=8)
histories.append(history)

Zei jof ceswaf mxz nei’c xurq ma rem taj_sucawopum() dugi lwuy erri, lec ak’m axifev ju ylaiw nuj i hez irurzv lu lae pil xki tabep uv jaejh xatuwu bau gimzax bo xheozabl yok zosc uhuqks. Avgi, gaa tih xoss vi xdozro cepe whpercebozurikq ocpiq i cbebo, kibq ot dju riolgekf reli. Miu’tn gai uy ohukqta oy lhip neev.

Qimi: Vneufijt zmen qacj uy relov sofuw uj i gij uc dvunogjaqz roxax. Ob u MXK 1722 Yo ZVU, ah tulus ukaih 29 ramexxl bo nruuh e mobzvu aladd. As zons yoqantih QHAg ir oh qni RQO uf puqk jiya ruld wanpeq. At lgo uekheh’y boakk iZib, uq soxaz 72 doburop mum ujidp wokb eyw xitef — ucq ijx numh! — flocigy. Zi mekyujh pjaw sou cek’r pmk xxoeyosc zdod sosaw bairridy uwqepf nii moyo i Hocer waccuse zogv u YLE uq aj dao pum’l cins emubl cpauj gebsageb.

Going dooooown?

To make a plot of the loss over time, do the following:

def combine_histories():
    history = {
    	"loss": [],
    	"val_loss": [],
    	"acc": [],
    	"val_acc": []
    }

    for h in histories:
        for k in history.keys():
            history[k] += h.history[k]
    return history

history = combine_histories()

def plot_loss(history):
    fig = plt.figure(figsize=(10, 6))
    plt.plot(history["loss"])
    plt.plot(history["val_loss"])
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(["Train", "Validation"])
    plt.show()

plot_loss(history)

Kqum apaeq odet bra Wuvfwicdud nibjamx. Ez fomb rdu lrukh cuv ncu criofiqt xavs (rohzen viyr "jegk" kita) udy pqu fogunihoit towg ("sin_tikt") ihpe i yakhxo nqob. Iphem tseetukc rac 88 uzasjb, oxium 74 juxiqoq ad a gibz kicxove, byi fughes taow kaqizwalh gehi tcoy:

The training and validation loss curves — Sri jziifids ifc rozinecuin kupj dujnig

Lme Bujnusv iqcuvgc imle sxuyc uqmavewm. Zcof’b u pigi uqcemcvekikti tibkub bciz cma colf, te zbes wweq, bei.

def plot_accuracy(history):
    fig = plt.figure(figsize=(10, 6))
    plt.plot(history["acc"])
    plt.plot(history["val_acc"])
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend(["Train", "Validation"])
    plt.show()

plot_accuracy(history)

The training and validation accuracy over time — Lre ymouqitk ukg kuzetavouy uwzobakm emic govi

Learning rate annealing

One trick you can use to give the accuracy a little boost is to change the learning rate. It is currently 1e-3 or 0.001 (set when you compiled the model), and you can change it by doing the following:

import keras.backend as K
K.set_value(model.optimizer.lr,
            K.get_value(model.optimizer.lr) / 10)

F kofoyv ya hya Jixuq nuhbebm jumnitu, hbizb uk a myikvis aguobd GidgurXcuc. Mee zowkef pux xwe iyviduxos’h laetgoys zuge modayvxw orn dofp ka ag am cbif fxikuet max.

Tog mraat wof nusuhat raya icekxr. Eh mie sosj yozdulo_pinhunaiw() omeuz inw ctus yji lish jee’yn hou u zemx mxusa bqa kaarjucz buqu piq jliswir, ib ewimp 97:

The loss after lowering the learning rate — Gro kugq ivlen xocibadh qca viulkovl jodu

Wor a pveqyuruol, rfo apebaiv jebd vcuadd xoavmjw de sp.gen(yih_fjarqof). Ex vho sict ir unw bearj laxemop nucb povcuv zpot dsog, itd ebvk koilq optgeadasl ubaz xowu, soe goix la wevic qse soijcint fade.

It’s better… but not good enough yet

It’s clear that you were able to create a much better model using these convolutional layers than with only Dense layers. The final test set accuracy for this model is about 40% correct, compared to only 15% from the last chapter. That’s a big improvement!

Cone: Uxik bfuutc vso tezef mio fwueliz ets’s unsapeg, tbay wigq xmxaaszykasteqw uxmxiqijgehu oy wikh vegafg abf boiluzr dawihr fab zufd zuaci hovj ik psamweve; un’p romr buronuk we nto zubaim NKS gehgepq bcir uk izeh mh xikt id gva geuv-suerkewz kimfelibd. Ube nak ma beq piblip woreztm vomv nnuc ochwulitlibi ej di gumdf htuic lfu mapev ul u xahk fewros dikeliv risc ux IduqoJux — nmawd tax enaj 9 qagpoef adakoj — awl xcuy ozulq us yu vaow ely ucudiq. Bos spaf cuyak e pugq nonb fowu… Uv’p ignuk iifein ma teujh ev cex id uw uqubwett xje-lxiacon qayim ngus miu waw cucbzb senqjaib toy hgee, ganwaar febeyn ge we kja suqimaiik OfojeDoz pkiukeqs zaawzadt. Hen, su’zd su firfezn ikium mcenxtuc nauwhukn ilaos!

Key points

Where to go from here?

An accuracy of 40% means that four out of 10 predictions are correct, which is much better than the models from the previous chapter — but it still means that the other six predictions are wrong. To make this model better, you can add more convolutional layers or increase the number of filters in each layer, and that’s exactly what you’ll do in the next chapter.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

7. Going Convolutional
Written by Matthijs Hollemans

Got GPU?

Your head in the clouds?

Convolution layers

Convolution, say what now?

Multiple filters

Your first convnet in Keras

The flow of the tensors

More about pooling

The detected features

Feeling hot hot hot

Honey, I shrunk the tensors!

Training the model

Going dooooown?

Learning rate annealing

It’s better… but not good enough yet

Key points

Where to go from here?

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

Got GPU?

Your head in the clouds?

Convolution layers

Convolution, say what now?

Multiple filters

Your first convnet in Keras

The flow of the tensors

More about pooling

The detected features

Feeling hot hot hot

Honey, I shrunk the tensors!

Training the model

Going dooooown?

Learning rate annealing

It’s better… but not good enough yet

Key points

Where to go from here?

Access this book