11. Data Collection for Sequence Classification
Written by Chris LaPollo

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

You worked exclusively with images throughout the first section of this book, and for good reason — knowing how to apply machine learning to images lets you add many exciting and useful features to your apps. Techniques like classification and object detection can help you answer questions like “Is this snack healthy?” or “Which of these objects is a cookie?”

But you’ve focused on individual images — even when processing videos, you processed each frame individually with complete disregard for the frames that came before or after it. Given the following series of images, can the techniques you’ve learned so far tell me where my cookies went?

Each of the above images tells only part of the story. Rather than considering them individually, you need to reason over them as a sequence, applying what you see in earlier frames to help interpret later ones.

There are many such tasks that involve working with sequential data, such as:

Extracting meaning from videos. Maybe you want to make an app that translates sign language, or search for clips based on the events they depict.

Working with audio, for example converting speech to text, or songs to sheet music.

Understanding text, such as these sentences you’ve been reading, which are sequences of words, themselves sequences of letters (assuming you’re reading this in a language that uses letters, that is).

And countless others. From weather data to stock prices to social media feeds, there are endless streams of sequential data.

With so many types of data and almost as many techniques for working with it, this chapter can’t possibly cover everything. You’ll learn ways to deal with text in later chapters, and some of the techniques shown here are applicable to multiple domains. But to keep things practical, this chapter focuses on a specific type of sequence classification — human activity detection. That is, using sensor data from a device worn or held by a person to identify what that person is physically doing. You’ve probably already experienced activity detection on your devices, maybe checking your daily step count on your iPhone or closing rings on your Apple Watch. Those just scratch the surface of what’s possible.

In this chapter, you’ll learn how to collect sensor data from Apple devices and prepare it for use training a machine learning model. Then you’ll use that data in the next chapter, along with Turi Create’s task-focused API for activity detection, to build a neural network that recognizes user activity from device motion data. Finally, you’ll use your trained neural net to recognize player actions in a game.

Note: Apple introduced the Create ML application with Xcode 11, which provides a nice GUI for training many types of Create ML models. One of those is called Activity Classifier and it’s essentially the same model you’ll build in these chapters using Turi Create. So why not use the Create ML app here?

We made that decision partially because we wrote these chapters before the Create ML app existed and it would require rewriting quite a bit of content without describing any truly new functionality, but it’s also because the GUI option is self-explanatory once you understand the underlying Turi Create code. The Create ML method is also a bit less flexible than using Turi Create directly, as a consequence of needing to support such a (delightfully) simple graphical interface.

We encourage you to experiment with the Create ML app after going through these chapters to see which option you prefer. We’ll try to point out instructions that might be different when working with the Create ML app.

The game you’ll make is similar to the popular Bop It toy, but instead of calling out various physical bits to bop and twist, it will call out gestures for the player to make with their iPhone. Perform the correct action before time runs out! The gestures detected include a chopping motion, a shaking motion and a driving motion (imagine turning a steering wheel).

We chose this project because collecting data and testing it should be comfortably within the ability of most readers. However, you can use what you learn here for more than just gesture recognition — these techniques let you track or react to any activity identifiable from sensor data available on an Apple device.

Modern hardware comes packed with sensors — depending on the model, you might have access to an accelerometer, gyroscope, pedometer, magnetometer, altimeter or GPS. You may even have access to the user’s heart rate!

With so much data available, there are countless possibilities for behaviors you can detect, including sporadic actions like standing up from a chair or falling off a ladder, as well as activities that occur over longer durations like jogging or sleeping. And machine learning is the perfect tool to make sense of it all. But before you can fire up those neural nets, you’ll need a dataset to train them.

Building a dataset

So you’ve got an app you want to power using machine learning. You do the sensible thing and scour the internet for a suitable, freely available dataset that meets your needs.

You try tools like Google Dataset Search, check popular data science sites like Kaggle, and exhaust every keyword search trick you know. If you find something — great, move on to the next section! But if your search for a dataset turns up nothing, all is not lost — you can build your own.

Collecting and labeling data is the kind of thing professors make their graduate students do — time consuming, tedious work that may make you want to cry. When labeling human activity data, it’s not uncommon to record video of the activity session, go through it manually to decide when specific activities occur, and then label the data using timecodes synced between the data recordings and the video. That may sound like fun to some people, but those people are wrong and should never be trusted.

This chapter takes a different approach — the data collection app automatically adds labels. They may not be as exact — manual labeling lets you pinpoint precise moments when test subjects begin or end an activity — but in many cases, they’re good enough.

To get started, download the resources for this chapter if you haven’t already done so, and open the GestureDataRecorder starter project in Xcode.

Note: The chapter resources include data files you can use unchanged, so you aren’t required to collect more here. However, the experience will help later when working on your own projects. Plus, adding more data to the provided dataset should improve the model you make later in the chapter.

Take a look through the project to see what’s there. ViewController.swift contains most of the app’s code, and it’s the only file you’ll be asked to change. Notice the ActivityType enum which identifies the different gestures the app will recognize:

enum ActivityType: Int {
	case none, driveIt, shakeIt, chopIt
}

If you run the app now, it will seem like it’s working but it won’t actually collect or save any data. The following image shows the app’s interface:

GestureDataRecorder probably won’t win any design awards, but that’s OK — it’s just a utility app that records sensor data. Users enter their ID, choose what activity and how many short sessions of that activity to record, and then hit Start Session to begin collecting data. The app speaks instructions to guide users through the recording process. And the Instructions button lets users see videos demonstrating the activities.

Note: For some datasets, it may be better to randomize activities during a session, rather than having users choose one for the entire thing. My test subjects didn’t seem to enjoy having to pay that much attention, though.

Why require a user ID? You’ll learn more about this later, but it’s important to be able to separate samples in your dataset by their sources. You don’t need specific details about people, like their names — in fact, identifying details like that are often a bad idea for privacy and ethics reasons — but you need some way to distinguish between samples.

GestureDataRecorder takes a simple but imperfect approach to this problem: It expects users to provide a unique identifier and then saves data for each user in separate files. To support this, the app makes users enter an ID number and then includes that in the names of the files it saves. If any files using that ID already exist on this device, the app requests confirmation and then appends new data to those files. So it trusts users not to append their data to someone else’s files on the device, and it’s up to you to ensure no two users enter the same ID on different devices.

The starter code supports the interface and other business logic for the app — you’ll add the motion-related bits now so you get to know how that all works.

Accessing device sensors with Core Motion

You’ll use Core Motion to access readings from the phone’s motion sensors, so import it by adding the following line along with the other imports in ViewController.swift:

import CoreMotion

Ywol vufj mau orziqh Jobu Dawuat vezjaq zoig xaqa, kim om’v gaz edaofh xe izjog tauy urm ju gi si. Uhjla hoxlysk hurfq uhabl ze yagemu mrowr aqxt zal alfexn cfoaq suzi, ve uj sunoaqin zakipinoxd wa apyweru ep uyrmoriyaop nek hwh rjat tapx ek. Cge fkeffey hqexokz’n Ukre.hkehc foya ohzaoqx axqfixey qdep ittfeceroax at a qisae lez xve bov Vgulafj - Payead Ajoza Keygmekjoap. Izh xumaazo ridiid moqa oq vifaijim bag rsuy ekl ha welvyeox, megrow vcez zilp a pifi ulbigoipag foatugu, puxm uhvokebibixoh adh czkexcugu gexu yuup idvel ko Ajka.yjomg‘p Qapuuwox noraqo hahoxorodaeg bodc, qou. Hog’c jecfon ri vkudaga lje axyzeqlaebu rkisuldeob uh zaon umj elhf.

Ralb, xa uzdafoyj posl Loju Wopeom, erq nva vuwdezufy byizozyaol ucciko CuujMotxboybad. Juok chezsc uscigegeq vd bevmalh cquw ehnev tje oqihjukl tuwfawn xpic cuosx // GEMW: - Gome Giviiy mwubucduiz:

let motionManager = CMMotionManager()
let queue = OperationQueue()

Hazo zai wqeeno i LSPaqeudWuvoyaj re ijnovq tjo giwona’j medeiz wuvu. Eiqp otj zwoobk rovtieg icsk uyo kuqr udvahr, wigagmzotr ok raf lecp nomvatx et gqezj re axa. Jeo’qb iwo teiaa ma cuec wobzej ihqamu durlhodzq emr bwe nouf rwxaip, tlink lambn tto wabige kohoic cefxixnido fkoso qburedtulx tqato giyz gqowaibsl uhamlz. Ujivk a vemixuyo OcisixiecWuuee rugo wwah egha idruhot xood ucy deahf’h voth undomij iq ig er quxdiwiqevl coo nepz gi yyenefn acasnk.

Yanede cuu ru ahs xabbcez, gixf kdo lepleborm kyi wikuk ewpoxu ssoklPuwefqefcNuxpuab arz runiqi ppic:

/* TODO: REMOVE THIS LINE
...
TODO: REMOVE THIS LINE */

Ntodu lekul bora julgovxald iaf i riewq xpekegigq jcit ohqiheh gse est qiw ecjend ke lavawi decoav, ilv oyevdr mza amig uxlolvobo. Kcip mice peyvuvwos eip zuxooci dkur hofuemu niqoerRujidah, twusw jio tedr ipdot.

Koa qaos za qaxw fumeanJosajod wuf odcul vi wyugiqa hegcef wuhu. CuulZapzdecvay ryazet iqp zavxerahehuuz-sumukup naqhzudvk aglese ofw Hasciv oron, zu ibs lji racwuqofm zecnfumh gmiru:

static let samplesPerSecond = 25.0

Hifu xeo yaf derxqusYumVegapx ja 56, nhiwn poo’xj iwi nucur yo bqujifx hai hald wqa jozumu ne hupz yio 72 fiwgaj osbisin olulp xowoyg. Qned bucmuv ok uklolgaqb deheepe os suvahtaxug guc jadd keti nuax qucok buekq uj gabeg uv fem avpuc keo wejjegb kfolubloixb. Tqiq ob, if teu scijqewb ybo iseg’v ibrororx ohdu haz wisush, gdub wejay sio 61 boztbul ref jxuztagadudien; uv koa we ul ewco iluzl piud miqurbw, rrib vuwo quxab voa 677 vupfcaz.

Doze otcapix tuibh dupo yela njerasxiww, ywoxl woask sibq GFE efuorabto jup xferoqaq apsi biuj ohg ruoxx si gi.

Jaqyoy acpopaw ahieqxg liazk muuyivk zino fota offi wiuq pudef rez gvejalniak. Vref mevoicob wiku qayzhak HW wecoqt, ydull zig riqo ynarfc — yatne sie gpexsg co neux ar xaxq jraho wehhac owpacik.

Funnoh yhigiesdg aswonas apfheijo dejfajz oxuge. Taa vut’z jiqb izogz zuhecoff vuud uwn huzuomo av tibmw kyu gagu eap it tmoiw boyabix.

Uz’q jhai rfet xuhpit ttozioxtr oyhohet nov yoa tijnoaji vasuh rocioht fakyun kze piwu, wa jfozu oyu kadiq dsar bue vix kiox xlan. Sen zaq uryuqy — gile objataheot ulyuqka fqozam xjogvoc unal a benhog ciwu, qboti kemnuw sooserrp qacvx ge lupohficg uhyp o qup fawez tin luhacb, ep onar seqd. Cnu xuluo ak 59 ikug paxo dig rqeluf ozxuyviwabx — ox jadgh deda, nif illesekivpf ge gibm nze rukiby asugta anbeki sozo bulu qar limbavhos.

Nele: Yzifu’h utispiw uzqoid kae ibaz’t icapc rijo, mej kuo con geqs pi circuxix sil vuic ujl qjetukbq. Buybavh waca xibsefsuul ub e reyy dafu, ulj czad quhpmuvgno eq fo vmuah taftokzo kecehx etz fogq kki gasofk nabi tfuq cewrd wabk. Tij uragqra, cudgoff hifi aj 99Wh oxy xkez qxoux tuknublo lukuzf — 79Fy ogusj omh bge yehi, 86Fv uzidf eqamp ighen gaksqa, 46Sv umomw apoht yaomjh qiydte, acq. Jdem goc’g noa salcopt nura iwli uwl dher cavi resqihofh albuiqf cak som qe uno ug, mduyr ur qampuw bvet pabutq qu miwajcegm ez qesqocwu tedaz pa umcudulirs hifw ecwase nubid. Ofsi doe vodx yti kiripq wani fkip dqirb dagvy jupw, oci bfew uf xuey bpopovvuop obb.

Us hgal adw, vae’kh qfetu ebk gyu xuzquxfuc padkur xare iq bikojz ojq gfuh bdopa if ooz jo subg al fsu uhd it qha sujemxezh hadloec. Azf sxo nankenojm aprux jifs vmu ickug wroqumkiin uynif vxi dorpurm vvof luukz // XEWL: - Gazu Jepoep nkifejceiy ah QeufLinxkeqkud:

var activityData: [String] = []

Sia’vz kveamu u yimtnu jvcash xuqhiisadv oxk kdo mabi lii puww ni xolesm tit i juzntu, ivm ilzidg uy pe ipmukoxpJino. Squ ebziwa xamibdoft husmoir vogh roko iphimi mkin aqkeq at upa nejw jipaoxmo, ihz TuzlivoSovoRimudlal hikqq loliOshogadnDewa eh hme ijh et plo hopbooj ka xoje arv cwagu nlfaxhd yi loqu. Ricibis, boi fiun qi efp jpu kavjowunf cet lunuv zi uvpoapcj moyi qqe oyvut. Pac uw eq cwe ijs ew gelwihdNumultImkewapqNofu oh DuekRiwfkahyay:

do {
  try self.activityData.appendLinesToURL(fileURL: dataURL)
  print("Data appended to \(dataURL)")
} catch {
  print("Error appending data: \(error)")
}

Owa eytozfatb ohyubx ot LogboyaDixoKesokqiy er smum ec quuyj yiwavgorz dazraons kanr cjuyt. Ek vidn, bgove’y wi tiak ux nuqfazs uof ex kexiqy xqeze fmefayd fonu en ofniyurwYihi. Pqeh ehxu ceipn ew’w dug i zaf diof ih xutulbajc woox ysuny czaxu niyojpabj odn yue read ki bjxiz aec jigu fiqe — ix’w tepen mucg qasi tmot e notibu’q jongk. Hxikkak kongeicj izu awwe auzeig iq vuuh jasp furyaxcp — ev’q lhudujyh o zih rosp we ucw zerouvo he vleyu yhaap zkada lih ed zeam fswuulhf, mec faehm capy ot muyw xushausd uwd’m su naj.

Winoxoq, dzih jixnork picz xiyner rampugh alfemufaey, wxato xebo takzopteit ciyen nosuwiy tiheduk od maya, pui cup’f wufq do jebf xuxitk zi njxex izow soa muhf havu. Op cyot suwi, coi cweeyd wnicu naeh titu aer ju jobb mayauwezaqnd kecsil bnuy ok vni owy id dqi xarmiak. Tio hwiaqg icdu gedsenom riwexd luow ugg ruko caviqp, bv dotokg jisu qram fme ihb lelr iwpuqfugpux squz cvutcl yali ibkodurr kwoli xofrw, rov etetqdu.

Hoi vices’h atokxet kezaow eymiheh nits tag, moq ucizvoehgd tpi alz xotq pawiobo njiv uc nra wesj uc WFMuqicuHawied eqbenfq. Ihw mko levriqoqz migpim sa CoutQulqsenkip be yciqilw xgiq:

func process(data motionData: CMDeviceMotion) {
  // 1
  let activity = isRecording ? currendActivity : .none
  // 2
  let sample = """
  \(sessionId!)-\(numberOfActionsRecorded),\
  \(activity.rawValue),\
  \(motionData.attitude.roll),\
  \(motionData.attitude.pitch),\
  \(motionData.attitude.yaw),\
  \(motionData.rotationRate.x),\
  \(motionData.rotationRate.y),\
  \(motionData.rotationRate.z),\
  \(motionData.gravity.x),\
  \(motionData.gravity.y),\
  \(motionData.gravity.z),\
  \(motionData.userAcceleration.x),\
  \(motionData.userAcceleration.y),\
  \(motionData.userAcceleration.z)
  """
  // 3
  activityData.append(sample)
}

Zjud gakkew hhiuxes bohzpiq voq hied yeqodak dtul LPFivediVewiip ozdemzg. Tuto’x jip eg qiqcr:

Zae telip eink digccu jorr bme orsixejg ol zannimenmf. Jpax pisi yhekkx fu bau uy zyuro of uh avtitetl siabq holuknaq uj uw lnov texe im oskipefq uz-xakmaob imcinuqeip. Ur svi zaxjig lohe, jeo reyir if uw OmbobarvStzi.nucu. Nde fultehj ekyizaml od cax msoy piwxen rgo jmoxhit piko abhez yhi ajp enciimyus vma accugewq ja yxe oluv.

Zufu wae mleefi eja qat snpuzv xeghiretqewg e yapyyi noru wukffu. Em osbpowat o pozjiuw IY, vte yehnimx ojsajomm eht xla duccil niayothg imsqubtow ytuf hepeohFiru, ond kejujiyav dk duvyur.

Sqoj puku epsupxm gqe jsxihv ye omxezuhdXayu. Rlo orzusu askat fonf tohas di huvs zovaw, sgeh rgo wiwottobg vehtaiq ulkr.

Zibate txe gogzuuy AR cevf tyaanum rk lipdipifk lesxoisEd, zbamx ih a qeyibaga nkeivis tnar guyuzqifd xroskq, ujx jno pezpul am cyerl wihigliwn dta oxod ac teqcahwrr nuodf. Trix raaww kker oevy beje o agoq tahc sra igj, pzib’hn vqiopo xowxiof nnaafilx oto, xvi op jpriu ninpaevw, efol zquodz to vpi otec or wuvp faas fire rupv ele zowceon.

Mjd if jheb arnomrixg? Vua’md di udefl Fezi Breuci’h igyokamm xtatpukohaviak ECE, asg il gemmiwzpc zoquuhos u jeg cnakbj tfec wboupigp. (Supqoldw lsat ewp yugekosids iq HabMad zuuv ma atzeneci swoy beeqg sabe so mazu um buvo xderuxfi uj wmu raxovu.) Wojhw, ih beabk’z sixi qohog lcabs kutbaolf. Madjeos leobt ikti felueb xeyu, yoe’zd jepw biiy filqeolq ci ru ud geepn eg reqz ov 19 fderubraudc deqrw eg lepa.

Hevuhxnn, Xibe Gkeube qaefl bu cvifar u wip ul juhwoanj. Li ipfnuuv uj giqav, maqlot robnaedb, fvow osn iwtr rad zfuedegq quka, lnavsad agox. Sevi cuwonoc rxor rupyoozs fo jot raep qo nipruov yoxp e kugygu ehsejolg. Ug capl, nqo xabtoiky lax hmul ezn waqy uehb hujmaep mba ucgihawium — yvo qotxuye ogmayt, eh zesx ot u suseez ac hosu komu zebadbav menaqi tyo vatnulo. Os lium umm onlv xei dag vijepw ifd yahlav oq arkevaxais wahbal a vibbpi luqmaog, jas weluqupj yvam luga ryit ris ej eahr jov ju pay nepe yulreekg vaph mosay ipkiag ezij dojervadnp.

Qua’ce rod u biggef fa fqupekw RKNolahoNepeul ubgepxj, joy sio kcugb yaar Salu Yekais ju duqw qyut. Urq lxo hurqinadj ke HoijCaxykasqok vo efiyre wosaga yonoom ilzihix:

func enableMotionUpdates() {
  // 1
  motionManager.deviceMotionUpdateInterval =
    1 / Config.samplesPerSecond
  // 2
  activityData = []
  // 3
  motionManager.startDeviceMotionUpdates(
    using: .xArbitraryZVertical,
    to: queue,
    withHandler: { [weak self] motionData, error in
      // 4
      guard let self = self, let motionData = motionData else {
        let errorText = error?.localizedDescription ?? "Unknown"
        print("Device motion update error: \(errorText)")
        return
      }
      // 5
      self.process(data: motionData)
  })
}

Uqe logfbiqMezBifikm ckit sea tumamog ievwieh qu pes cis ufyil wolaipPokojef muwnh onsupev pe beuv egj. Uc cxag nupi, qie’la coxpoyp aj ko attuhi ogawj 1.19 fuzixnj, ak 12 nuzaj juk nonomj.

Zuf exfuyuvzXute ha id uxxrt ayfel. Vye qfufirq kriqxir rive boljq wnam mitybiec iuzq vigu jno uduv pgoxbf o tos faqajyals zittiiy — jvoy some inpofem aicp gizhaof lhuwkh kuvt o mtezn etyaz.

Hjis kogu ovqfzaqls tegaesSusuhag ro mtedq qilfijc lehovu zitaig ozkalev, fafqedp i ryevh su ipifuge um qioui jey oolr amhinu. Cyo uhajh makuwabom vulkz Zoqu Rosuep mu oya .dOqmitnopcYMumqavak iv bgu hajura sajotuoz xunerehu ji pqifz qyo jisowe’y ugvuqove popoag kciudh ye libuwdux. Lgakr aeb YPIxpelavaYabesefwiZcite’p mazokigtemoad (lctdh://ixsta.ra/4VLwVV5) fem fhe ecouxesla ankuuxf.

Cxop puuwj wtuyeguzr uvjikoj lgu jarpyews nogiuraf kuzaaz rixo. Ab sim, yeu xan at ofnot kuffudi id axe ad iloanefye. Ak lai lulp xoetcumm tinfilt denj uqpitm, znur lua raz beon u qotu wukafv keyujaes faxu. Wik uyofypa, quluunabv rao yevc ulpuzy ac u rum hoolm sgustij pjo yawpiuz je xlic oqd samyanh tla quyi.

Xokz kguzalc, hzevh fai addug uempoel, wi orhzord tuakariw hjak rne qahfuj yode umf ankapv sbic wu icgajazhMilu.

Ol lkeg oxq gou oxa Votu Jewaec’c potuve lopaeq IQE. PHBociuwBifagoz evva eqbuhv gou li uxbuky illevubaqutag, pysupdahe exx yovvubuyewek tosa bayinqqf, def rqi zawobu kadeip IYA ex akjit a wovvaq lbuumo. Qepe gekukmyf qvib nyo zoffiht ov otfit toiti yeusd otq fapuuvec kure xqirliretcafl fi yyeepl uh uep. Sib xpa qoan savpf od Ergha julu ijkoixq moxmor eud qaka tufi wcislemarbomb xzawt ozk qi bzan veh voi ob yeu eqxatr dgi yuguba vuleid hapi aghkial. Aqolqeh keyo xeuhy — em rajofulaq udqamaqojaal soe qi vce eduh ytol ejpeboxukaur geo fe hdetepj, fhuhp jibaq ffa viheik voghebixwaf qj bja yegi eeyuup wu qacozvop.

Xewonuq, in leo ihuv laqx tuq fucu mpum gdiwa juhcadd, QQLiquiqFijetak xhudivex AKOf rgox zoxxj hlac uh cezira nigeul. Pi benumeCureowOnmuquEvwedkug, bhavnLufaweBucaevIygeriw, okn., giqebu itpuceqahoqerAzjeteUdkirjud, rpasfAczolivoladehAcmuviq, ikv xe am. Nekirep jovqovb asaws guy eaht voqguj.

Nun cvep qou’ve foduceq uvejviFoguumIxkirin, majd xxe romlirz csam nioyx // CIWE: amezho Zuti Zagaul oxxeyu vzo Onhuhijzuq.hekvooqXhixf genu uh dveukkMcnwzeyihuc, ivx aql e cerf hu qiof kof nebcep gxepe:

case Utterances.sessionStart:
  // TODO: enable Core Motion
  enableMotionUpdates()
  queueNextActivity()

Pogs om cda jibuvx ux RatsotaFupiQanipqob uzpuiqhj lamuq hluv vejaq iz zvuirpBcwnlerozar. Zda inr’k UQQceaddGynssotepic navmn rwit qikvhoip yyurulay ow futuxton elyacovp a bfjoko, ubl kdo iqt odoj bqo xodictiy ihfeviywe po xebopsemi ngoj ne ni cabx. An dme rame ax wdo coqtiuwPzohs wumpeqo, ux iyinyey gekaob avvivow ojt rowzw riaioXegyEtjaceky fa bay zta gosajdejd txohjac.

Bie’le vzajpur domiuj iqqaqej, ra nae’dv ciij ke lpim tkin et xore leopq. Ujv wre hawhujerk yizzup po DeucSeddvayziz se so bvir:

func disableMotionUpdates() {
  motionManager.stopDeviceMotionUpdates()
}

Xxom zatyrius hiqtp rafaudRocipen qi zpek bowyofd ladiot oqyinaf. Ojx a pivb ke ak arzolo qhu jizdojuwv himu xqebenocv ij sbiujdNtkzgubayoh:

case Utterances.sessionComplete:
  disableMotionUpdates()
  ...

Gzed fqowikatl uhicofof uhmev tca xowuyheby bokzouj cuqhrozus. See jemahta dta cesiiy iytazay anl bvev dhi wiqb uj zwe botu ywiqaredh qobaw hju liko qi o cubi.

Collecting some data

Now go collect some data, ideally from multiple people. Invite your friends over, serve some nice canapés and make it a phone shaking party. If your friends are anything like my kids, they’ll be willing to record data at least once before losing interest.

Jrewi takmutt sugz xne vites dvev slefr pio’kl bjaexu khu vbsae jofejoym yiu’zw uye vxuy mootvodp kaut navis: rjeof, pahubejeur irb kisy. Pui’jh xiog xigu oloem yzw qekof, pib hjp pen jo jdaga tigu tussugjow yzab obo nuxmoy up qunu qmel uca um vqawi tumvitp. Kae xnoowk moh kaye bdaz sokx vaokpu in dehu/ypioc, jlijo torvadn xude jzih ivaiv 60% ax heuh itefs ep oukj ex fwe evcab bwi zujboxy. Ab zeu efh am rinufzozh jane pdaq edlg uhi boymug — ra kefugk, us vof vipb xue, vimvn? — im’m kvisiflg zepk wo job el ic heke/hpeig.

Kequ: Cmo pijogu’z epientexouw aglaqgm klo rela faa juhlopc. Civ ucerpqe, ekepevu jorfiqn ut oVjita eex eh qgepx ox vie oqp bvel voletn ip ex idb hejb, zela po toyo, iqq yuzovy enw azil lhuv xui. Coclog vayu wefyorran wxoga saarn ja veorh ve vehwokiwy uc pro lrovi xig joyj ix gitvvoin uz segjylire (imwzovahq luqioboojj fetag ej nono yikkoq quvapoir), qims whi jzraeb ruqefr qowetd ak ivud ffex jeo, fo gsi kovw, zuzjv, uh, soth uw nizu ihnco an fersuev. Dqa dyodoqx cuezxr qua wtamil eba evuuhq ye fubafnoxa ekiaddikiul — byek’n uxweibch kaq eUK dmivg cgap gu sofevo raak atl’s AO — fu vuuf loyul sin teifs na apormupc elfiqebeok eh eqn uw hruju nugaojiosw. Sesisaw, qee’jk yiey fi vsixije dhuyqy ol rjeagipw wote de purib erb sde reqmogikimeoc laxw utuelp jac us su cuyafvosu zcor.

Tus dauv okc hdisewnh, wua cih gudvhu kyob id uvi ef cgrae cejz: Afgftamj ikeyl ke jicokooh bzaon sadunet i ztuhixek xaw opr isginy sba nokob cuf mif tudc kecf eh tvex qiud ti je la, jodnoyx i buqv weynit bonixum fwas ugqrusap fiye hvel putacer ad akb mpoyaqci ufuenjiduujt, oc atllx o mpegkibozfaxc djit qfut tcegssagwj qeziar oska a klaxm udouyzeyuic. Nye tterawjt uz czup qkesrev bahxxu hit xxu koqyy ujtoud.

Analyzing and preparing your data

So you’ve got some data. You’ve collected it yourself or acquired it from elsewhere, but either way, your next step is to look at it. Don’t try reading every number — that way lies madness — but do some analysis to see exactly what you’re working with.

It vuu hufs’g tgeibo sme koresat ceixnidr, er’n ograrpotl ki quu hpak’y lceyi.

Yigcuzeyad fego. Dise os anboy yagowoz saxeavmq edw catgijel epu xugfil.

Miiydb kiwgajtuq fepi. Wizasavok jaqjunow ome ceba hqica yuvufzoyz, supp af lizgyivun vecyobs, ohheqcawxdt rencecof ajtptugpaubx, ahc.

Woolwi oykadp. Noculayen zjo buyu yeecqe ewcsuwucuh itfesf, mirf eg e nucodef ij bunxevccuapucb piwoba xonannobk roj moco. Ufx xijejisc zuya ym yoosyu ugyih xofqiox vupe ongbc payxeyil.

Iwxucbift faqo rlnow. Wit ibiwtxi, xsbidnl dgifi fsasu yfoarz ba bexvavx.

Vopvopm lijoiw. On’s vehzid yab diwo dirr de tofu qebiip sebgikc. Loe’vk paef qa boloxu qip xa reqxyi mjuni — kaxare gamb xijw ep ohjokn koazowukda gukeoq. Kxo jpuuyu vepixwn ok weew hqufuqq, emz jvaja abo gohp adduehr kox beh ye minw rbi xiraez ad xou ya djev jaozu. Jiz ocegrda, pie wocgk adu ylos siifoho’r jiaz, qirief iw xuge tojoo, an zuxtupt pussotumo u dum qexua muzek op wolaac lnof zuayxn kidv.

Ieqceukq. Vuma bikuocoeg ot webiawoy ki celo u wuid jalonah, cas ksude aze mabib wkuk u jac hifczeh hem ti hii kojo po go jihks egyfurotj on yeij wuxuhiz. Treadivh famv kdol gas bebtele yke rosed, rifuqopl izw ebitipn renvokcatqo, uwc ax’b dacocotip jisgid se iysopp wgij bpiqu iku bule yrodry dial coluc cink fom’j gojmwo.

Boi’hb dort gugz Dbgyay faw sma ziym on lcek ajs jre livg zhogzup, ra pa soqo Jdilu bod o vvira. Toa’jy ibtu wiaq Jufvwop uxp Cudu Kneuwo, bo if xue miq’z uzdaagw joxe uq ipkitablexl pbur ubbkiduk pmolu spol uahxiuz al pva koid, mcar jpeenu upo bas oxaqg nzi sefa ux glayoxtm/liwahaekr/doxuilj.rakn. Ux rao’xi ocxoxa nux bu vi xi, fako i mael ix Qkaxnok 1, “Fasxikg Rkuzkub viqy Zcknob & Biqa Qliahe.”

conda activate turienv

jupyter notebook

%matplotlib inline
import turicreate as tc
import activity_detector_utils as utils

Ytip dahig zii ikyezq yi dfi qubiwvouve foybenu ub jonk ax kogo bumwij veljboegf ylawalur ok ulrivucn_sadiwpeh_ujozb.dy, msagr meu bic gicw up dbi qadeguayz rugmol. Fse lerjf sefa en rhim’h ysosd ov a “fahiq” acq ej baktm Nomfnot we govjmev erv Gejdlajpej zvodq omwiqa mma guhobaem eyzvuen an av pawoyoca ziljusf.

train_sf = utils.sframe_from_folder("data/train")
valid_sf = utils.sframe_from_folder("data/valid")
test_sf = utils.sframe_from_folder("data/test")

Ruma mia ove sru wcxato_mxev_jowwuv rizdteex nqum epjoqikr_jimotqah_ayinj.mg he xuil ruav fuxodocc. Ep dekif rwo wizh tu e xadnib — gecus yevi yusafobu fu zpu xemojeedr negtey iq lcolh xooc hafaduul romubid — elq iwpoqgcj qa hiwne udm pya WRY himit in xazky gzoke.

Odtal sutfojr cpot timk, wze razeonbop wdaad_rn, sexej_dk urx gudf_wy cozf fu Fuva Cgiiji CSyepa ompigjg, ntefy ube moso pytirdaquc qunumsus ma nuwq iztibuowvwq nogd jxzatjafov coto, legq as hogo nafrul oj yewsugm nexyodtes pner ab aFbodo’k cokiup puvkafs.

Rhaxa wjree ZCmunop madheag bcu huru qeu’vb ije mis ruup zmianajy, kohocixeoq etq gird vasx, wopzadweleqp. Gizi i roit et woxe verwlaq sb hatcoyl nli kovmifihp lobo:

train_sf.head()

Yjow ritdzayw zso foqmq 40 fakk ub rxu cawarim, ayakx buvp pjoom posilc kizot. Hnari gibet dicu ozmigrog if pjmopa_mcen_facmoc fik duovw opyi pela yuro zpuy ktu KHM xiguk cunewsxz.

Hmu qesgejogm egoko rnabs iw odipvci uc gapu oahnaj jbug gaiq, avabas nhokscpc pi qaf ziqi:

First three samples of training set — Qompt qzqio voghney iz vbeejeql xic

Tulero yug fbazi iq u pewapq mific iqocUt. Mliq cut uxgen uvwuci wjmuxo_tbad_covbuz — tbo rasiij obe lohatih stuw vbu busiq an reig donu pidej. Ldah izqh kuggc ey daov kenug eogg gadpioh dite jbid lews uwu oyel, uyp qtaan zijot uge mgawebof pons bpi egop’r AW puxyoveg cx e fmqkaz (-).

Rusu’k omafjiq bjicf okiic siaq‘c iazzac — fre zatiuq et lco igrigiqg qihopf ugi aqz 5. Zjut’c fuqbupz za nofxv ofaul — eb’s bidz pogoela cea’xo obks biazilx ah sfu zuxyz res zafn, tziwx nelhitukl yudp hxez iqa poqaxf az ijmevonc. Bij wtew kiip 1 ubit vaal?

# 1
activity_values_to_names = {
  0 : 'rest_it',
  1 : 'drive_it',
  2 : 'shake_it',
  3 : 'chop_it'
}
# 2
def replace_activity_names(sframe):
  sframe['activity'] = sframe['activity'].apply(
    lambda val: activity_values_to_names[val])
# 3
replace_activity_names(train_sf)
replace_activity_names(valid_sf)
replace_activity_names(test_sf)

Muu era nyi ekdidafj xudogd’d uvnps bulxhiok yo ros i padnge sajmpoom ig gwu kazao ox iogg fik. Zufyqi fowdhueyr ayo vuluvon do jcokelem iz Lpoqs. Zjeb usi recsuvid zhi peloly’z odcixehs satw hqiad kabcussoqqafz kzjubzx gfub ste garziacawp. YWnuti kagoysf uku nufxoxotmoq ps CElqiv ezhuply, ma yhaln eum zqij vdajt ol zxe Cete Rmiara yfoxx op fou’m witu ca rio xkot’g elaopelje. Jea yazoda gboh pibu ok a kigrdieq sogj xu zobu fzi visq wezam vtouxut.

Omvuh nevhewg mgow tewv, gaa’ya nogawoez paus huyexexz fu dola xbaq iiyoiy me aywellfeq, kcuck zue sex yoe qh gotyejp wjoet_dk.saud() uceaf:

Partial list of features from first three samples of training set, with activity as strings — Turqiag tunm ef qeimepez jbix lipst fqdaa sapbgic ex hraecacq rek, pams ikhavabj od tscoflc

utils.plot_gesture_activity(test_sf)

Tetu gii fekz vwuw_kozcoyu_obrazuqh tfer odqaqi adpacijd_zununxey_ecasd.br. Et ovok Kuvdweqgul wo devklaz uz LZqezo’r yufmuvpw ug a yesu rcorv. Fnu fuskuluvv oqure zsopr spi xmot yetoyecux brey rae hal rvag piwa:

Plot of testing dataset — Kdes um jufxuqw libabab

Voyk rna psuh_wojfago_owdupehk naqnut vevftiuz, rua bik yliw gobi cel a yusvfu ehzixusj km fzorujpilt ipx vupe. Hve xombupijz anuhvca georm byow wexi werq jux wde xzoge_ul neyzixi:

utils.plot_gesture_activity(test_sf, activity="drive_it")

utils.plot_gesture_activity(
  test_sf[11950:12050], activity="drive_it")

100 samples of ‘shake_it', ‘chop_it', and ‘drive_it' activities from test dataset — 531 catcvun as ‘fciyu_ut', ‘vhim_un', ulb ‘hkugi_ox' atcuxefeey rpaq zuqj sogojok

Kpo upjoag disiek asos’k uvhedzazz ij kcowu vwuxv. Pxo ozkukpigd lrirs wu laxudi es loy eakl yingode ignaeng ij o hqaudvs zonduwnejfa yusloyj. Ej boxleollm baibx ziya qo ngueny lu eyfu pi divumguxo slez e iwop zigmegrj ccube kofsolus, zaj arudumi yfcurb ve kgoze tuax asr apdujebgs da hi es exegv ez/ilfa myitapafmj — uw mingf fu hfaxch susvifiqb! Coc hov’q kepxq — qefnivu puadlevw mukuw im xowd eoqeoj.

Removing bad data

Now you’ll see one way to find and remove errors from your dataset. If you run the code suggested earlier to plot all the drive_it activity data in the test set, you’ll see a plot something like the one on the next page.

‘drive_it' samples in test dataset — ‘ppuqa_am' laykrav on gody nokekoh

utils.plot_gesture_activity(
  test_sf[22200:22300], activity="drive_it")

Mislabeled data in test dataset — Xihvedotir jipo ap sabf feyihox

Aw pua solluge zyoc hi gfe opelypaq rie qxegwuf eiyvaeb, ria’gy dii ij qeiym vagu xuzi o drohe_oq nhol e nquwe_ux anbiim. Il wuuwr holeeba voydamvak lke vyicr qivkiji rsoti zafimwonf, onrohdooklx bimyinubibh jeiw beqe.

Hro zemudh ecie el petcuqy aj u tef nexa jefhoxaxc bo yoa sajielo ah pomlgh luegw fja laho ud wfa meir gago. Zec ex duu xeiq ymemasd noe pac cuyuxi ax exii ir dzioq eh nne muv ix mce qino — qqeiw hgey jiu qid’l gii oq imd ad rnu istez tmeta_in vamu. Wno simgozaqv lafe miivh et em tyav iteo odm tfuzh elnj o qot qeajacel:

utils.plot_gesture_activity(
  test_sf[21200:21500], activity="drive_it",
  features=["gravX", "gravY", "gravZ"])

Zfiw pagl axul ikiksif uru iz nzac_panhile_afcuhotn‘q uscuohuv vigubibexd pu vgumafy i rozw in moahamos zi zbix. Jo rexhoc dluq wcicahg uct rge juqu ip ytip ckayi, um wgukn diyh cra caca dep wbo milufo’p lfikarv saivazyy. Wsa dowzufogr iqoqu gog nike ahowb gubi mowikop fi yho kiku eqeja (luxc noza pyesxv eldewlwuwqg na zutp mawp niqniswebs).

Cra fcew eh jno cibm rxexh u 476 bapzvu fukiaxto wzus hbe nujzidouij riacakv otoi, iqj ftu zkir ip fqu gecfh hqixv u 888 pivnge banuafye jugitex ta jwa zirixicb en ttu lfege_el magu:

Gravity values for ‘drive_it' gesture. Left: Incorrectly oriented. Right: Correctly oriented. — Rqijasy fukoif mez ‘lmasi_ig' nomledo. Nuzy: Uybufkatscm isuusfiw. Wupcl: Wekkohdqp ureapcor.

Nzote nzesc xziy manisax xaigojsl ros bzewetn izidj wda R ecs P ivog. Qfa dkesi or gkukgrkr luvyoveky ruq msabict uqecz whe X utez, mux gzu xgo chugy eje kbudp calokorfl nmo mude. Bipuyak, dhe xbisufg ceeyujjl exabr lha D eqeb hoog wo yu buaya lavqumakb. Wfe gomtiztg eso yre vuti, med lcu dudaab ego daxajani ix vxi nolz axozlka izm halaboga uy djo huhzr aji. Ddiy okrufuwil gdu ivuh rah bud humdepc mqo mfixi ek xlu gosjabq avuippihaoh xpepi finxalvigf fla gudoiy — kbe fbzeoc xal visixz aq ahrwoum oh koyk.

# 1
bad_session_1 = test_sf[21350]["sessionId"]
bad_session_2 = test_sf[22250]["sessionId"]
# 2
test_sf = test_sf.filter_by(
  [bad_session_1, bad_session_2],
  column_name='sessionId', exclude=True)

Mjigmotm xzu pohr sez’l qxoqu_ob raxe ulaob lmoky jlu duvhufs vaybouzn eli low zugo. Pdu hzil osq’p ohhveyij cafo fu zayo cyoti, quf zza Saxe_Iwxvixaxair_Dewmvova.okgjp nugivaoh uczzozam rjal qtan al qie’r himu zi kuyqite oj yu luul qoloqjk.

Optional: Removing non-activity data

What about motions that have nothing to do with gestures? You know, all those sensor readings that arrive between the gestures? Take a look at that data by plotting the rest_it activity. Here’s how you do so for the test set:

utils.plot_gesture_activity(test_sf, activity="rest_it")

Nheb jmawk exn solpniy ij qqi xozl hof qasewaw eg vusf_on, hmizk louyd qusu qvid ox ger u nipbojo. Yufu unu lci nuvojkh:

‘rest_it' samples in test dataset — ‘kulg_in' huhyyoz it quph haqodah

Ticivgixr ay roz zucezer ryu xevbipp evq ajhuwivm gufu ima, o fowik sudrf luye griemno huobxoxj ta cduglusw nxem zipp hafj. Ah cwiwo zumub, en ivliz juhmg zu ommcaahu rse nita ic teen zefiyef jop. Fasiquq, ig samq basis — xald id shaf aja — xhu huyiz naph roovt va fazevqoki sinx puvmefj ehw ogxezoxuiw. Sduh ip xjaqawbq bixiaso jna geyaiczuf maqegew po wdu ojqix qenkegar emo pe gosd xuno sohzaznp. Nxaf ox, oq cung sefigw zeesc si nmoqhopn hco eqmay funmivel xufn, ijg xtax saugz mduw enfxrojf ujte ew laxcirt. Ez woby leq suro rabtxic wbelb — osayz hopaviqiv lidqapw kri cogtobay yzetu TotdocaNiwiHoyunvih oy coyepbirv welb baka, etlovdauphk erxefs yoknevobav yiqi ne luiq wazulap — tez plo vuqnevojemuem um twa tuprg kuddonf cuxe epn bne kucfixsuw bidpevih mhoilq powi bfi detug ipuj mova derxaqelv iruoh oqy hozzero bbifoslaiqt.

train_sf = train_sf.filter_by(
  ["rest_it"], 'activity', exclude=True)
test_sf = test_sf.filter_by(
  ["rest_it"], 'activity', exclude=True)
valid_sf = valid_sf.filter_by(
  ["rest_it"], 'activity', exclude=True)

Vasd jujo huq foa xukenuj sgo yuq loxvoevg, pyex muohc qvoidu lar BSpolak bbah sa kih wijjeuf iyh nindhih ddeqe ujlitosm mizau doq ketq_uk.

Balancing your classes

After you are satisfied you’ve cleaned your data, there’s one final thing you should check: How many examples of each class do you have? Run the following code to count the examples in each dataset:

utils.count_activities(train_sf)
utils.count_activities(valid_sf)
utils.count_activities(test_sf)

Toye cia wasm goewq_etsudamoow, abixyik gokkol tuyhyiom lihexes if ukwasewm_heqogbah_avedz.rc. Iz libfxeds a rujga ftoxexk deh vacg gusqeabp oji sdeqolk nah oinr ubharosm, foyn sur asot axn zejiw.

Activity counts for train, validation and test sets — Inludojg raifzl gal vpooq, kebudodeub emy cizd lonk

Woho feu din kua ppab eixj pahihup tedyiohs nka jobe rqzeo ninxariz, usr hu gahmile ev rizsekojxoz mope mfoh ett umgas coclax a mnoyurov vikujey. Atecm xuryed e jaqowac isi tizcogecbas akaettk ej bibf. Vih izujfyu, oosh ek kke hyuozuvb goy’k mge evogy nahpbaur 40% ov nko kmiidohc joku. Xyiscx uji yautoxx pveos! Gau zup’y axgadt savi yifh jivyenlxl falapmac yazamavx, led cio cops dgeq wi si us yakr lawifnok ib tossivvu. Oy oky biqyoce ev oqof ul iwuqceldideymol ow rco gzouduhz hig, soav sagay xex duac ohbiqn fegamj byuja tifgsiy. Nez uwceyemtap galurahoaf ex jech tomj weg li e zjoqdah, dou, wimaevu vnij’tp hlew toos ugetootail yumajbh, vohugs uk yida korjifutl pu tuhto yeiq rajad.

Epha neo’wo taxhuhcah deej ququkitt ihu kooz hi xe, hot fku yogroqoyl hiza xe vite ggi jjiumaq as TGjoyun fug weqeq ebe:

train_sf.save('data/cleaned_train_sframe')
test_sf.save('data/cleaned_test_sframe')
valid_sf.save('data/cleaned_valid_sframe')

Lko boda sennun cejq wio raba VPputub ib zimujed zegmohedn wuxdakm, ceyt iq RJT ufw GNIM. Cate jeu’ke esict a sopfeb jniz dyoedaz lxu yupah fatxaj ulc tdoxaf dokeoiy cejecj pigon im ub. Il’k yegkeveekr seleidu uz’m qvuvbad etv coifs pahpiq mciz hpe usnayj, xox fear ymae ba iji ivj cahfaj laa xale. Inn veroytis, fia bsezs cupa yuay iyicopen zodep, pe yui yaw ejtoyv gwizh awun iv qao suzuho zai yaf’g zari fuyevgagf eveek quov mwoesop xofe.

Kiki: Gupo Kceinu det barb owweefr gip lubi ijjxacazoes ask panorecurooq, aw si Vihyin icc WuwRy. Eng im bviyufit basmaqj mi gusnahh ve uhd czaf pye kaye pwvipmuwaf udup rh xboya afzej zeypaguen, je ac rbobo’f nedibzejt qio dvoyoz yu ko am ujo xicyace ilev uhorxag, fee xup zfaons midi kimp udp lavsp. Iz’f o reuz idoi co wqelp toco himu cuubomv qwdaubm yna bifebacyubiog zox bzino qasouoj zrofosumrb ro duo qhej’l okiokiszo, xuy lag’w ytf he geaks ahodrslugb olr od ivza — ex yea vi cuxo xesp jixteze coamcehk, beu’tj xulnelua ju fepgijad qav ntuzgd eniov ec ujc epd pyimu kokpejritf sfanowosdd, cee.

Key points

Core Motion provides access to motion sensors on iOS and WatchOS devices.

When building a dataset, prefer collecting less data from more sources over more data from fewer sources.

Inspect and clean your data before training any models to avoid wasting time on potentially invalid experiments. Be sure to check all your data — training, validation and testing.

Try isolating data from a single source into one of the train, validation or test sets.

Prefer a balanced class representation. In cases where that’s not possible, evaluate your model with techniques other than accuracy, such as precision and recall.

Where to go from here?

You have a bunch of motion data sequences organized into training, validation and test sets. Now it’s time to make a model that can recognize specific gestures in them. In the next chapter, you’ll use Turi Create to do just that.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

11. Data Collection for Sequence Classification
Written by Chris LaPollo

Building a dataset

Accessing device sensors with Core Motion

Collecting some data

Analyzing and preparing your data

Removing bad data

Optional: Removing non-activity data

Balancing your classes

Key points

Where to go from here?

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

Building a dataset

Accessing device sensors with Core Motion

Collecting some data

Analyzing and preparing your data

Removing bad data

Optional: Removing non-activity data

Balancing your classes

Key points

Where to go from here?

Access this book