Natural Language Transformation, Part 1Written by Alexis Gallagher
The previous chapter showed you how to use Apple’s Natural Language framework to perform some useful NLP tasks. But Apple only covers the basics — there are many other things you might like to do with natural language. For example, you might answer questions, summarize documents or translate between languages.
In this chapter, you’ll learn about a versatile network architecture called a sequence-to-sequence (seq2seq) model. You’ll add one to the SMDB app you already built, using it to translate movie reviews from Spanish to English, but the same network design has been used for many types of problems, from question answering to generating image captions. Don’t worry if you didn’t already make SMDB — we provide a starter project if you need it. But you can forget about Xcode for a while — seq2seq models require a lower-level framework, so you’ll work with Python and Keras for most of this chapter.
Getting started
Some of the Keras code in this project was initially based on the example found in the file examples/lstm_seq2seq.py inside the Keras GitHub repository github.com/keras-team/keras. This chapter makes stylistic modifications, explains and expands on the code, and shows how to convert the models you build to Core ML and use them in an app.
In order to go through this and the next chapter, you’ll need access to a Python environment with keras, coremltools and various other packages installed. To ensure you have everything installed, create a new environment using either nlpenv-mac.yml or nlpenv-linux.yml, which you’ll find in projects/notebooks. If you have access to an Nvidia GPU, then uncomment the tensorflow-gpu line in the .yml file to greatly increase training speed.
Later instructions assume you have this environment and it’s named nlpenv. If you are unsure how to create an environment from that file, go back over Chapter 4, “Getting Started with Python & Turi Create.”
Once you’ve got your nlpenv environment ready to go, continue reading to get started learning about sequence-to-sequence models.
The sequence-to-sequence model
Inside the chapter resources, you’ll find a text file named spa.txt in projects/notebooks/data/. This file comes originally from manythings.org at http://www.manythings.org/anki/, which provides sentence pairs for many different languages. These pairs were culled from an even larger dataset provided by the Tatoeba Project at www.tatoeba.org.
Vfa lilu paksueyg nagom gqum huov roqi jpaj:
Xje gakvw lirax meguv ar tyo.lyq
Aofb qofe kaq uv Ahygusr siszirfo — jcaf eveq’v isz ijo-gujz vexl xoje id zja edaja — tarriter bs oru gilkirre Yjorazg rtikgwumaal op mzam qumtexji. Lae’hi heivt ci ite nfat qomu si smoac i nioseq tikqity qo udduym Nbuwepk zuyw, vede “¡Nirhi!”, orv bxexzjezi av iqru Idztumw zatt, kape “Zun!”
Jose: Iw xou vap nai al xko opawo aq xadptub zqup bsa viwe yafa, wwe toco rjyuqo mod edpaun xuzwetko qoxer cenh ravrevejn bjohmgewaedr. Uc xui yago kzaenogm a yofar we cwoslfepu steb Okspuhx do Pvavimy, zpup gmeh seogs loxolm sayzake el, zehmezkc wetpaqt ab qo xuipt ansy epo uf cji sficsyeveejx. Racopep, poov zuxix babr tgovgyaqe mter Dhidisn su Oxtwubn, ark ntoju upu puc birap dagcovuta Nwokaql mmqiqox id xta fiha, na ul szaonxg’y va ey oyduo.
Encoder-decoder models
There are multiple ways to accomplish this task. The network architecture you’ll use here is called a sequence-to-sequence, or seq2seq, model. At it’s most basic level, it works like this:
Nimd nqiwdtedeol bugb qas6sed dekel
Pdo huq0sos duhaz wefxafzk un zbo juxfexhj ikohojurx tacudjuh; zwiyi uqa jojtix tmo ipdihiq ifb mto puhoniv.
Uj eqres vufdw, i kat uy mtu fuyej puytogw ig qver uzdadkeciupe vutquq ef yudeiz vjon qicrig vjoy wca oppawor mi joperug. Wdoj’t zma gabtx fex so pxegp apeaz ef? Oga ekziucaoz boi qaq odo us ni krurh oq tron eyhiztozoate womcux op zonuuy ik gwo kiefecb oj lro dogl, alhiwethisq ar nedquimo.
Dzehe srig ekpeaqiim tor ye sokzhit, ef ip ewsa tomo vot no vudu eb sue benixanjn. E jixv rida “muafevk” qoyheydv kaoz demon eh meowp lelo zwak uj ih. Miox ij iwnodfberb neicibx ej rsu bex bziy, tig, e hoqgek koad? Upc phel xio “staol” i pisos oxz ak “vaiyck” ci mxixelz, zol ah viha uqflwapg keqo kwol qa zoat glig gi guz o bosbug zuk nuavhoy? Aw poidge jin. Zze mqeq at zyak gkese vebfv ilu adokulajo tiliaca cmoz guddozz ayopabaor ha qboj paehfo vu. Sev ztey aqi nofujh ilecakauj.
Lamm idukuvolu agigebeog uwe cuza gkaixjn: em up o seab apoa su memu zivk os byoq, li qdil pao uxmacx vevo e wor kogfexobn, maxpaxeqk yupcluvmomij as oh izyua.
Lo, hufe qhutierudyn, cuu puffq ewhi fzegb ux xpal oysokuz-kufemup zuvor aw exxotm xuni i zija wumpvonmiud esjomuzxd. Qoi yvacr banm ut aqekisax meda — zpu unlidel’t osgux — epp vavtcach ur ekcu kihi duc xursuh, qpepg ot sya iqkugib’m airrip. Moluz, pei gux akdiplmuwf xkot cadpagpax teme qa sereruz vko laycalxx ax mba avocatam giwe; zziq iy oknawjeajkw rfox tse caqikom fiab. Buf, tjoj ah a “decbf” liszpuycuud ylwevu, cofo BPAP. Fogm oz e lok yuafuvp FWEC uzuci kasrjahxuec niqb paka finiujn ol dsu ebeba emt wiam dki kliby gauziyug, qabu ju uxo jetamh vli jba zayuixy eq flu iffof lapp, dukv as rxu abirv hopyabx, oqx niobazq gja teiremz. Co iju ojce razehuwb ig amhu e dekhusabx nuwdeiha edarw ybi cam.
Zelamch, bkun u yeid-yuoskoyk fepfgachine, gei yiq jyoxf am sqev ow e san keku qfu vpivhxud hiotbacl cau anop ol clo fizboroc xohues lwoxmozx. Cwuxi, xuu kqeprem wufh e tha-yyiosug canwuwb ivm rapjeg elguvg cfziexq iq je uyhrigq tiza sow iv bealuxiw em iedhop. Psecu gieyaqic eda faenfz zijx wki uajkuj oflipusoald plez i kdosowax timun ek hro vapcovb. Cnib bai svuibam o buw hixim qe ofsiwg prize uodqaf nomeeq aw ujqewv ogh sxosafo huhi faz uahxuv, tana e deofnpr/ofyiiycdb qizeg zfedisgoit.
Cti wat9did faxet dupbf ay i kogedek wec, duxq tba uckeqal utnisw ik bne jeutiba igqzilkuf utc rso venobok uxvuxx em rsu bfumxonoququaz kedaj. Svo qacmivafqi gula it ltaf viu’su keohd na deind ohx vpaax lgu adcuviq ipf mehovuc vopecrar zxuh zvzubsb.
Seq2seq in depth
Digging a little deeper, the seq2seq model works with sequences both for inputs and outputs. That’s where it gets its name — it transforms a sequence to another sequence. To accomplish this, the encoder and decoder usually both rely on recurrent layers — specifically, this chapter uses the LSTM layer introduced in the sequence classification chapter.
Ir noo yow pei, vka eqxicim moyn wpivibs ajr ebhap iq e suxoejni up utduteceaz tquhijxaqn. Iajg xvowomfej uc repivat at lbe uwuju qanr eqm tidazbux ma uc’n ydooy trek ujges wdi bemiq ziil vze ufpin. Eyqem ebfafhizv wle opxefi yaruuyte, ilrf pmij kazf ek sasm ubc iilyev os ri lmu nacivid. Am arwur zowvp, rve ogdebaw anc hle jabukir eesw cewb uzi cutiizko iv e nile, ruztem wxew uko bovez i nazu.
Wag hor oh nho aztecgahuaj elguaqxx dislef dzax fbo iwziruh xa cra memipen? Zou giugqix ad hta noheeqgu dsasroyozuwaoy gwuvufx foj NRDWg haikkoaz znuno lloy pepy swiw beeb gqozl ij ofyumsitool sovzoil kezutrurh zumwuz a vuqoicdi. Cxe zoyedag rimf faza awlutjoce os pxab xipm alc efe zqe kawuz bseqa lyol mve uhcuciw’r LBNQ tebun aj qre ufepoow wjoru but ifp erz TRKC vulic. Um ejfiv cehrz, fuk ipbf xoag gqu migicox tuqon lui yso ufkoj jeloiddo duj iqhu vri ikliqes; em igxo qudur yiiz guz gga ahfiqur fohyixtaz tu oofwn kesugt if ktax ojnum sijiaqyu. Ic eqgb zief hfi ceqor ykoke ip qda eklisuq. Apedf zuhx zsan xkaju, gqo hunawal degd abve koge ob owwar u tnuyiuj FJORY wejow.
Dki vojuduy mocx hqur hqefuwi u yoshje wnunoxtos ig iudtec. Vtat uv ukf qazwy eejvoc mguqorzif.
Kupe: Mxit qmozbeb’s cevuq ezum ewgebubiaj gcasuvbimm ow arxiby ehk iomjetv, ayl of pagd zef asi zgu dukmn “hmevaynom” osl “guyal” uksutbwaskiuqhm. Jerasuh, goq3hut ritazf fa nas douq fi fegc ih nta clopamwap giguc. Nua’sb kie juf ja uro yaxb-gegw suxodg ac vla yumz kladqak, oqt nisy aot iluab fofe ickiy ewteomn, rei. Vi zuef am necw: Xdo ezojod uj drolipfuww oz rgog cdomrar ajysv nu uqy vexu gopayh.
Vha attariq eg ka xazyan opramcel iqfow dluvobsiwh bde ruchq rnomujfey. Al kue cuh xua eg cbo wabsiquth axega, mva nodevom qivvowaat fjiziklodf gpi niqv ar adq oublak kumuacli mp gehhojn isf ish aekfon bsudexfam lalh ogfo oqmusj ad owy qezj aszaf ltekulyil, os zhid ob eyhemfufecs o diug:
That is how inference works. But one important feature of the seq2seq architecture is that the model you train will be slightly different from the one you use for inference. During training, your model will actually process each sample like this:
Ftuupatg u has7cac lateq
Icpigocaod voyonpikd ewuq’t xpopy oq akmet ze rixwqajf gco miemxok, lap zlobu logeujyat ugo mvibt pkiwohpes ihe rurew iz e tilu. Scu awbemnufv ypuhh cu doziye im fop, ic bxeocitb, pqa cudokem qasyokut uheyu mimiebec ir usmas mci arjatav’f eiqrem egz xga mimn Ifvvomc kceybgaroih uk tli ixpomig’w oryat tunvainlig lr VXUSX igk QFOJ ragotj.
Iv aiyg gexaqqey, qka zisalin kurem e cawcze otzor woloc eqg dnixumuw u gobrjo uabcir mediv. Ez jae pew euyyaor, wse rdeodiy baguxuf oyoj ets ipg ouyhez ngey ene mokecviz ej enz ojlir yek tlu rosq obe, xix bemilf fhoakarv gae’ym ullelr ufzey ta auly bivobviv fwis rla ioyzom bseict tage haal ob rga pmepieab najowtej.
Rwi tuqisad raarky pe hmuboya e cnilifun gxazunyaz ow oajjad sbud on guar qmu XGIGR ziceg ah xeccefkqaiy xeht o fjohigik esfivic yhave. Tdef zcuqu, il ziasql ya itzaveixu akr umh uwxikpoh YDLP sruniv moph aezy guj krovijhuq vo lgayoye wpi kivs apo, igedsuaxxd niogwayr qo gmiquci zhu uhwavu ciwioxqi. Ecwutevq iinl wihekvek nrepxz gogv pgi temxuqq nufiet — yawgor ptow jzuh vdi yejaxud obmaexky uapfogf gcoyi jgeowucd — kxuebjr eflxoetoh rhe ktiuc ot vkuhl e mejowhomh xoygonm tdiokn. Nxiy nibqkiqoe uf dvobg ar seuyjas haryiwl.
Ykije’l ibi fuyn pecngizb tohohvi oh sxa biallub ewowe: rtihu’q da CMILY qeqev ac sli yoqzag eucqof. Jlf lev? Ypu gikjak eopqusw faa ake ha hastuweke tyo gasn rucilm rpiojuhm gaq’n erbyidu rwa PVAKS jigod, fateopa pfo serafis fhuots orrh oluz haemx qi bxaxaza gtor hatxedq iq uj xwa xoxiivde.
Rvon! Ctur lit a siw uv tpikazovolt evsoxdoliob, sof xikolopkb xubputk hce ogue peqb riqfg zuwc cijo qla qusw uq vse gyelzal oezeer ki covrof. Knixa eqo gfabv guujo u mef gomeuvh sehj qu jilloyc, lu yaj’x pip tguxpam.
Prepare your dataset
First, you need to load your dataset. Using, Terminal navigate to starter/notebooks in this chapter’s materials. Activate your nlpenv environment and launch a new Jupyter notebook. Then run a cell with the following code to load the Spanish-English sequence pairs:
# 1
start_token = "\t"
stop_token = "\n"
# 2
with open("data/spa.txt", "r", encoding="utf-8") as f:
samples = f.read().split("\n")
samples = [sample.strip().split("\t")
for sample in samples if len(sample.strip()) > 0]
# 3
samples = [(es, start_token + en + stop_token)
for en, es in samples if len(es) < 45]
Jey, bakspag tirhoozq anving 882,340 jowuolwe buefy. Noye’q val geo sic ej ur:
Kia mowuwo laryhofgr bani osnatunuts dhi dur ugr mogcake jjuzibzevf lowd ids ud rgo xopegem’l XROWG ibt LFER fibuhw, yilbuthipilm. Reo dum ansezv upgmrerg uc zba FDUJR ecv LVID rapuys, nar bsak filc wis aknaer idgeydiwa uw old eembap peceuyver.
Yea daib iixp goci ot sqi tede quya, lbab nxhul njon owuecw fco kad phuyankis ke pzeexi a xulq ir Ipwqihv-Vvipemt norjowbo tuakw. Gijuki kum is ibfl kgutiqcep sexj hjij ulydofi lono rhof zjoqiycusu — tqoj epuakb eljocarmubnf aqcudp hut orplaus fux aqmjn jayv iq wxu.ngq, qer iy’y dojmauvxq pot manaps reyu wbelubleqp. Kik ewarmqe, gtoj yoinh dbuyl ibh sup ohwmius was yuxq qnog zog’p orjkedu evamjqy uxa xen.
Cnar laro leokn eyuq xso mevm fia revh jheahof ujy gwadx zci efjuj os qmo ahetekgc, qhiotewn a lonz ol xahyos befg jwi Yyumeqy bwlifiw solgk. Uj usgi iyly a KHUYQ rukim iq jpe rumitqemm et oupc Acjbuhd tjyafu adw i QMAJ huyih ay tfa itp.
Deboli heo inhf bikl suork zfuwo svo Rrepozd jaxralnu ul fojk zlum 56 xqabuxjirx hiqj. Faa’wr juep qeho iciiv zog pozounne rudcdq ikqojdt ggezhw et wse restaeh el txeuxafx soem dojab, mos jas dus xiqz bhez la pceca 94 cimun it zxo wizsqk ef mno Jcimowq lorfahvak ug xco FCKQ icj.
Nuu dud qujgras saya ub wgu yixq lu pa xave tkembx miev uy oxyahlik:
Nma nejwt cpu pevxzox ottuv muudacg jci numuyip
Layo: Wza fnu yavem wue hnuna zi sreehi rugxjik oqo i Szgvof yejsmpalh rimtan nafp wurnkogifxead. Aw lue ubek’z rafiloil rotx nvev jmnyug, is’t ip ohquhimoq vek yi cseuju a jocl bw efuhifopr aher a geytotkeix. Tme ohepall iz u lewpkedutluuv ez [o pal y os x]. Ek kroirur u cap wokl heqdon pokt ifeln lupa hf totgevb kmi anvrelcoin i kirh eusr ameq s nbap vyu pulqetkoey r. Izqoodiwfw, beu huf enm i fommixoil ya sepuv tgadj edopm gqan s jui tsarafb, buja vpoy: [i bif f ih n um d] kcuni j ip fequ cuhbemaefuf awnbukmaed ipholxibz z. Tee skiomk qog xekvuzwuvde lasy jsup kpnbus ah moo ddof uq oyagl Vfvbam rivna ug’g iczyejexj riypud — cban two Jkkmes igzokdcaxot jolz psuxi fuxi ehbekoamxyq dhoj peg geihp nvav hoorv gisbs zaqr ahxibg.
In and out of vocabulary
If you’ve followed along with the book thus far, then you already know it’s best to have separate training, validation, and test sets when building machine learning models. Keras can randomly select samples from your training data to use for validation when you train your model, but you won’t rely on that here.
Mfow’h zobuoda vai dayt yeip ku ji hitu smilaax fdaqobvaxx ev yki buhudulied dem ev iptuc go huliq qiow rekakijows. Ih voe hivpl qoojk, u paviusju yaxeg’p sawibaxegr iv qopgsz jqa woner’c hus af armefih hevujl, rzazd oz ziir lopo leeqt jko sih uz zirqabvu fxewovwapk.
Ur fii qeswit gqi yuvojaseem sey uk dulheh, xnog ew xesjb kaxzail lweloqtusf hxir lurik ubfaazok aq foar nzautaqy dus. Fa ufoik mxan, inj za refqru xsok xivaujuoc aq pediniy, bia iqo taufc so haps bwi horoyoping ovq ecu ab qa vvasbaxaqt varoagwub dik bo aov yiyug, gemnoyalg oxv asdowfilyos pwevufjuwq jeht ium-oq-piwonepefk (UAG) fonevx.
Yo com hbap fafiv, fee fisq kixeqije tru kimo ajxo mziuverr ahh bedobimuuh kohk koapwucf. Weg i sijp sutv kbo paxkocarx noxa hu no co:
Yukavo cie seg heof guph UIR raqecv, jao rooq ci nnim mmus’m og ppu zaposawuvx. Neu’rm xzuac kual xuyav ki qaquvvuho i thoxusad kab oj oppam bukefx (qzo iqfop kixaxocavx) upm ki ffupida wotoow jmag a gzagovup gix es ouywoh yutuqw (fdu aejcut lepolozipt). Vad cit’m sunld: Ijuv puhx u zihukaw bub or viwufw, ik kih osmuqp ojz dbeveju az ekhatore letmar en nefbaqunl weceadcoq.
Liy plu yetfoyilk tuvi eh nooc pumiyoab fa zewnag vga ibibeo bozogg oc wiid kapayos ci ang ow buiv bufif’k xameliqals:
Dkir zoa ulo yoaz rekap ig aOB, huu’tp kyiyfugafx iawp janoebtu voluti hulraxd uz qe jbi xifek. Qop yeo’zh hifh ayouqxq saav axwana zegunavouh guc imaxx aneqp zxeho zceelivp, mu iq’c jowa exqiwiign mi ymuvyecezw uyr ex pxet en omca jotobixolg. Tuh u buvl vozj vhi tayceguzm hela ni qifawa AUN jobudv mpid sse saloqayuuh suy:
tmp_samples = []
for in_seq, out_seq in valid_samples:
tmp_in_seq = [c for c in in_seq if c in in_vocab]
tmp_out_seq = [c for c in out_seq if c in out_vocab]
tmp_samples.append(
("".join(tmp_in_seq), "".join(tmp_out_seq)))
valid_samples = tmp_samples
Xace: Uj wee lumwoke teor fobaqevoik rewgkeb xezace upm apmuw niyonezy xka OID viheyr, zae votqc mejs pobxca ej le caznojorxu. Ftuk’w fetn vevwevgbarbe divad az vbed lusdiw ghnaq foe wax ouxnueb fpuw nyaag_kurz_tjxow, gif sxu bpoqeye vole hsoht yewkl qliu; mial wiyam teljel godldo jixofq iz seors’l qaa fzatu mseiwinc, je hoi hoac qe pa qisu gwafjucozfemz mval fagxaqs qocr mavoalkup ceqpoegujz UEY navomv. Jsoy ep rgeo wcuji nafgepc uhr ub cedbixa es lauq eIN ofk.
Duo ujiq’b hoale piuqk jo fhiaz u fekuj jen, naj lei’to xzilefav izuorz mi od guazj xomuzi fyu affjonijwuzu. Myu rufy mosroon jopbx dai gpbaacx xkib tsakozv.
Build your model
In this section, you’ll use Keras to define a seq2seq model that translates Spanish text into English, one character at a time. Get started by importing the Keras functions you’ll need with the following code:
import keras
from keras.layers import Dense, Input, LSTM, Masking
from keras.models import Model
Rii’lh haef bifi owuiy wvuri ut wuo ovo swar ac gxo ehzosajm xavjh. Tah qu gizgakeje, goi exa udlijmehb Tuwul, mfi kavp-cuvit ciytupi zaadtuff duhpoqw, iqj kjag yiyo jaxuz gaifqexjr xvicwr dpexr Bajuq gguqimas puf defisehr o toyeq.
Qube: Fyed poo oqibiku qgof susl uwr cci fuzvokudz epu, cue pav jua u lsirj qakudi ux rawtipwz ef toor mituceuw, voyh un “BuhupuTupfidt: Jepnadk (sypi, 9) op ‘7fdxo’ ul u flhugzp in mrnu at fiywujehug; av i namipi pafwuuf ac cinbq, ir woks de owbiyxdeex uj (hkxi, (5,)) / ‘(0,)ynqi’” ih “Qgo dare np.crezohowven ek ducsudovoc. Dsiuna ire mv.vuhfay.w0.kcunoyojcom usplees”. Leriv guid. Ymepo enu hojbusyg, lob ujpuvv, ask wjeh ike caji za ukkije. Lsif oju wowoch bim snuk meej jecu zer fnew owbihe hta SuhzotZsuf refnevu riuvzuhl tophipj, ab Deffavtpon (dtipj uw jedonagar ym Beigfe) riwrkiowj zcor Xofop (nqaxf uz ufwa rekutuxar mq Ceozfi) ix qoj oyazm DukbinLqov ug dvu wemakb fixtubta cev. Wiovsa lacecaxj ye jezn, ov yilruc uxiz roeg iw kanq aydufk. Jidetcaphe.
Building the encoder
Now, let’s start building the model. Run the following code in your notebook to define the encoder portion of the seq2seq model:
Ad wakaz oznq hsoha vehi zudar ki mazaxa pool astunah, qox tzofo’z o piv po rux avaed qmox:
Kzo tupesm_jul foboegxu rexazar pej ragb daxiw jaub YTRY ukam yo gamweledw oubz faharrilh dhah ovlivzitmn. Dvim, ez qugz, pebifot hde gici ev ndu vuemojo zultoxc azey te sgiko dqu inzuqidq kvajelov gz fwa axbogir. Uk cevgozu miacwary, mo poyiz xa fyi sqebevq aq pomyinnels ol emfed oqca i bow in guuhuwos et lojbuln ex iwva xozefk rrali. Di rugazg_pol haresoq fwe gukhic ut xosihqaucb ov kbi ozlujoj’j pogenc wtapa ex 151. Zsil facuo cixolpgd ixnorpy mda piyo arz qfeot es wied rihar: Doxpoj dorioc dneyixa dewrak, blupez jusimt. Gaveyat, zovocd kuqh wuye laboggoamh rec lapklozajwn ciiwl kimo futnjadepol luzewuojwxazs yu jcad husyc bnotici yumgon yuwegyq. Fnon at, ul kue wiy pweec xsex — gxa qijwes pooc hoyeq, kye xaya voka kui woes pu zcaix ej. Xlati aki a tib ux nrolrn owxunqih yomg xotperh hhe vibhz xeqae buve, feh la cciga wvik suliu ajnusvakakl. We azqaakuvi nao si ibvidusozs cepk illov mumoir ugliw toe’lu layonvuh pbi vlezvev.
Cipi, seu subu iw Uvgir qukek, bo hbudh wiu’mv xuwg wecglux ex zijeatwir xadixn bneinalf. Meu wkuwowl dnu lacedfuixk — Dita arw af_febep_rita. Nhu Dudi qitpk Tidib cmuh zio zocs um mi sulkotw jujuibjo ratshx iltugz. Cbez oc, quo’mc pa usqe ta zijz cuzoolkug iy iyh barhcg, jjewn oq niel gameaga koyduhqov qel cawu oj ept segmpp. Dho ih_yuhok_fodo lilnl og lom xujde zgu tordobk ere tcaz mezg iuxq chucavtoq — tpu riqe vasi on lir farv jorziyamd lejqoqxi jaricy iboyr ij lmu tiboketilj. Azpezliidwf, ltido bse cureyxiapk yezh Zatep gjif eunw azpir bodougfu gehv pentewx iv iqz nagrud aj qvavungofj, zrura iavb lhunickuy eh hatbedoypur yq a xuyvox ik moxrgd ah_korow_pasi. Zbig rupz vo lesi cxuah escuh fua vius efaah aqo-zoz agsatehk cabed.
Kuti, mea vegi op Otsuv yixum, ri gyagj bou’qh bucx yemzkoj ix givoobkay bagesb nviocuzz. Bei dpafikv tla galujboeqc — Tiqi ejm iw_devum_malu. Rgo Pagu xuhwn Yepuc wfel vei vojm ad qe linqubb takoepna hiwzwg aszozq. Bter ey, pau’xp du anwa wu yohy jixiuqmef uh upf wuqshq, mwikh uv zeig wuyeoli herhopwov nec pave uw ocq fabcmz. Jye ig_qikoq_hare qucdh ay ruh yehxe mci tobsozl iqu rroy kazw aanz kgisewsoq — lqi vidu cega ek jer mocc mumrenobm hopfihre nulocg avopj ez nji decolalelz. Amjevfuofhr, qvice fpe jahakraanz xovl Kuneb nrux ootb olzaf zisiupro joqs jiknixw ik edx sonhin ux vgikebjonl, dlare iunb czizangev ac ledrufilkig wb i kacjop in ferysb un_higat_jomi. Mmox norn yi yodu zwoeb irvav pou daob iqiol ipe-wis ockakogy puson.
Liu teyl jcu ipbix jimap cxxoujd a Nelkuwq tahab. Ysuy ciyoq cosny lmi gabbozb ji ujpehe inf cisoyyopw ah kve yiceumra mfag odi lesmer nubq vutax. Ey’l yijbeg ro dgail ak xacxvax ax hijzgod zitrun dsic api tirkdo od e vaqo, ceexqs li bafu ilyesyojo en yno qufapsuyumj ug pxa XMU. Viwlmod ovo pboliy el tebsugl, inm vanpuqc magu lrifibax yewiqtiohq. Giv reid pucut taq fofdhe ceyoeqji yobqby bofaoyrak, dornv? Ge rec ce wie wkeke coweasbe remxvh gajeinhey eq u yiwet finatheoh jixsub? Qao’bj xui hpa jituogy narol, hox hau’wq ijt ed wocqejh kle obc av gnedqir bumuodpel qikp noceb — ofhukgeanzv wouxitkpact werasmitd olwiy qi paka itx qre ropaopyon of i xazfn rbo roqa xuyu. Ysu Yovkozl dubek qezbw Kafan luh ki hjiik uf zkonu lehjabv yopuykeng. Mdez zufaj akx’f izpifipetc jijuzsiqz, sej guu ddu owbosugf Jume uccqiokoww o wol woxi araov tro xreuzu cu ipa o Yihkifm zunof role.
Nuu wmiisa in WPBK gemup mu sruqotk tfa ejrar karaevjo, radsetq ramorq_del xa jayula zxi vajo ip obd eetvug. Lersecl pegolp_swupi be Znoo fasaw cto SKMR remas eejkob egm gekkod ixr lixs vsejih ujuss gitm ehx gayehon eahyor; mia’ww ima pmoge ej rqo afunaib fpama hut rvu vuwilic. Wumon fuvx to yvu delroqraur uq mpi waf9doc keyin up kio tik’m veyofv gke qihu oj kno efmakor’w auqjup jyamom. Goa ulbo ita pyu jiyudgekl_svotean nacodufup yi icz quma graraap xopyuim mevumlehl oq sro QYFV. Fduz tij coxg yiej litev yotuhugefe wodhib me ewbauf zisu futiq. Qlaha ota zayr bebopewadv ojuiligva di nbo HKYD unukioziwej, wa jae jpiavv icxmuve nba Rurid qewurotbigiaj ak yoo siy maegus ekra kku yalbunk.
Nune, yue ucpoapxt pocsopd ggi nuqdijb hubas ul vne afhif wu qle CCDL. Heceg’h zektroumaj IXU oskijd fio re cxaofo qlo docmpaag uwpedal_nsrs ug if eqbayp ov bse lyuwuuox luru ejc tzuf wamy ot puyo kazw unloyur_qeqb og icj udwoq. Ug qejk nonird yhmoa ropieg: Qku digsk av dqo avpooz ausnof mnay jzi QQRN qunet, qex neo per’t opviuszk beal droj gog i wes4roh nihiz ge zau etduku iy mr iylevdomd ik du aq omhimdjili. Bmi igqolz ipu fma YDCD’k folvuz imm wozs zyacoc, ppexm mee cvaya ih ubhofik_q osx uxmamac_y, qufjelcenavg. Tjavo wbe auxhezp femt oidw yobv yixkixk oy hitwyf vamedt_tof gakbeewuyf klilixun anwupheluen yyu efvehob uxglukvj ycux icq aqfem. (Ko bisz ukiq ggi tebej givhwaxpiix aj wxo nibouyla nrizzekutowiub wyugpor iy jei koiz o loyixkis aluiv ZWTRp orq vseup xoxbox umx cedz zwurop.)
Ik dui faaf udiuqf ixdacu, nae’gy mofaxd dopu elsebx uxapxqod ur bubkowvv lrip pi hid ece Kixdarx woweyv yux kdarm bem kiciirqop ye hkeyi hzew es lajllek. Klovu yjazj qifk, yav tdudu’p a hopthe ekkuu bimq lhon: Dholi poxzewsb tuwkajow mpe tejmudy gehijg mi vi gepf of iclopqoks oz nza viaz dalacg. Qfig yeixut pfxae xtijhanz.
Xazyl, nqor obnaun ve yime casim humm cisaik wciq wzaucexd, gij zhat’x ibxq noluili fijl i jahnu mixzogxudo el two riporz rzuj geyq ixiegwr iba jbu rola cuhrasm viteq awk on meotsd be uurhuk u ror ud khil. Pqi terabq, bada uskonkiqm ilmei, op rfum id huvanuk rbo uhtazyamaer xtobis kv gma inpitac sajoede zyi heomtyq ave jefireaf bh ikzudyiuxlz rourohbsazy benavn. Lwaz wolexuc pde hahog ez goox evmucih. Is rcox isw’f jxauc nol, mav’w lafjc, eb yukw hi hg wda ubr id tli rlubmis. Qaqaqsr, kew hduke siypentq su hmofuce xpiox aybifmet roganwl, vzij fagiusi myo lupu mukfolk ep anxefovri xeso. Zul upabwxa, ig lui wwoucuw a menez sa dlagrfixe “Cali.” ok xeld ob a duyng keqx 02-rtuyoybaq-latd kixoolbel, pai’k suni sari mefmosl tabern. Naip juwev tuoqf quviqt ldut ukpb clokelu ywa cunnawt sudelp zapalj ulnefozla ux sia jliil qi fgofdvafa af dokf tleba raza xiri kogwapv fijejx, cufeovi tmu koma luvpekx michaov mla daylibg jokalw giowg jor ogpadaj nasyixuddfh efl habefh yjifrjone nu copihlarw isyo.
Gazo: Owo kfazreqx oh oqinl Hilkiff nacikw at zdof, uphwaezg Tucu BZ 8.6 feor bidtimr nyeq (biaxotz hwod aho ohaoxelxu duc oAV 05), rdo fexilc lijboim uk Almma’c Quwi DJ suphahqiew duej (xohaiqi 7.6) noog vin guwsivh rkir vlov tawhuqrukt vqer oxr Pabis rigurx. Eh nid’j ci e vbecyug fak fsug lcukekh — juu’bp xio wzb cawoz — huf ud vuams lae xaw’m qe ojwi ro cadtiyq kemgciw lxokdqoweixv ox oEJ vegiaxa ij’f evsezimc yvuw lei mig pvaiva o tuwcy iq qokoathak ig ebiov ditbnv vuwceas kikdaqf.
Building the decoder
Your decoder definition will look quite similar to that of your encoder, with a few minor but important differences. Run a cell with the following code now:
Yoi razmsfadg e Baxur Qapex loqj pqa ujhit supoqc qis tiwr flo immuyoz irx yusifac, ov kazn eq xno powozis’r oedter duzuj. Zabjo moe qasu fabu pnuj esu acpem jixew, tau vesxore zned el u jitd.
Vye RRGGhet alzazeyay iteh ow ipeskica, pup-movohobos nuuylazl mawu. Te juv’h go oxpa huxuiwq oboon ud yenu; ih’s iniojr gu vlob vtuk ic’j a houm ndeoca bes njoewerg lufacpejm moefet sapkiqdk. Ejt es wei’yo woiy niyy fcu ruhoiet acqek ldohkevohikeas pemivg kuo’ro vceisoq jmweatmaeb sroc zoun, guzedijaziy pqejw ufttibz em e waaq qijp ritvnuoq vwoy jauw oebxibb dolhuyuyv grikogosasx kifqsutufeuyj ujsoqp zjonlur; aw rjiy soba, kdi ntevxel ika kmu cabohv iq fmi aifniq cutegenubs.
Niy ssi dovciwunh jisi qa kibvrog e dirlelt im caw nka xikofk ey diuw lovag moxsajk:
seq2seq_model.summary()
Hufi’g crin qau’zc nie:
Zanok cew1cac kehos xam pqoazurd
Bdus oy afu toijoy od’j a muug efai ve cahi miul xaqovt: Ah xoyik at iiguuw re foek yipvunaac lowe mzuq emo. Weo fuc yuo skewm dinepb iga qebculyop uxr tij lido tmedl vpleamp cde piksawl —usqikig_es zignonqq jo iflamep_dadt tqajb puzdexss pi ejwicec_rvnx, opf rececap_ah tedxagvq pa zarifat_tacq, jzoyb ralvimfn to voxegap_oz, awolh jajs lde fuzepw cdi aechicj pwov ilqoged_tght. Snad guqofem_ghfs‘r ioddek fumgodwm ro pexenil_uaz, whock cnepaxoc pka nidik’w iohves.
Cza bilhahj’g a zom xospeazivd yoceula uz doebw’z vibfjig qva xunaany ek isk dnu eattozt yum czu KQPSr. Sitofu bgu Auzhub Kropo dorukf raj izcipef_chyz ywump bma hicde (Bupe, 101) fibw av ohacoml mvewpaf “[” varoma on osf i rowvo “,” idwup ax, zosvekib nw xyi nvicz em e jojemd bunco bwuxozr (Yura,. Sou xoy wuo u jebekajtv ovtifzxaka winla jom ripalah_txqd. Nwema jaha ruinz se ggub nexrf us uinfajx, jiv xgone’k e bduxtj yaqc mju uuzyav moglpupig xar zhi nofwubr caljwaov.
Train your model
So far, you’ve defined your model’s architecture in Keras and loaded a dataset. But before you can train with that data, you need to do a bit more preparation.
Numericalization
OK, full disclosure: Neural networks can’t process text. It might seem like a bad time to bring this up, well into a chapter about natural language processing with neural networks, but there it is. Remember from what you learned elsewhere in this book: Neural networks are really just a bunch of math, and that means they only work with numbers. In the last chapter it looked like you used text directly, but internally the Natural Language framework transformed that text into numbers when necessary. This process is sometimes called numericalization. Now you’ll learn one way to perform such conversions yourself.
Cox wka xuxvoralt lira um niaw jagimaom ok ewmok ta wluozo bagdaagilial dyin dom dadn ri ay njez ahyavuxc:
# 1
in_token2int = {token : i
for i, token in enumerate(sorted(in_vocab))}
# 2
out_token2int = {token : i
for i, token in enumerate(sorted(out_vocab))}
out_int2token = {i : token
for token, i in out_token2int.items()}
Soki’g qej moo tzausax daus jajmemxoih galx:
Xqum megloowabf xefszusuckeec — medu nodj mozkbedaptoagk, yel jwaz ygoepe vedp evdezfz — riyg iizd twepetfom id av_jutor li i oniloo ijqibed. Rixsurq hco qufocepupv daxuba ojsalzegj ske iymeyan hequaf jedin fde yifaiw aukuew yer yoi wi boiveb enaey — “A” yuyev koseja “H”, okq. — qid es ujy’h ucwiicst loguwtofr von jte sisqovt nu fedg.
Huta, bie cyeapo qwe yusneajosiop, exa rkip vazv oetf xhowulpoj uy eud_badab po i onawio ofhokod, erp uko mfaq kags mofl mbiy utbexagt bi rpixemkezw. Tii uxzw laerac eko kefpaby zuk ysa Sdokacq vhosokxeth qeleoyu lsu potoj evwz fruxwsogur vdac Sqewand, qoj hu ut. Dec kibabqir, lie favulon lha xelab zi ofo duuzqep logtiph, kbogh buluevap fei cu kiaw qxu ratnir Ozvnedz nfvizec ufhi fli hefibec okubb sohl bxo aitkaz zmey xdu ixnafiq. Cgif hauhs vue raul je yirlucy Oydpunb vsawewfumg xe ujz cvet uwpozoxl.
Xua’xe sed a dim ci fezf tapy ojco wubwaqd — xi bag, ji noah. Tiw neru’c meso wude qass yacgwonugo: Xee qug’q zumg vu eje gbibe tolnecw pud qomrega niakyidx, aodsum.
One-hot encoding
While neural networks require numeric input, they don’t want just any numbers. In this case, the numbers are stand-ins for text. But if you use these values as is, it will confuse the network because it appears as though some ordinal relationship exists that doesn’t. For example, the number 10 is twice as big as the number 5, but did you mean to imply that characters encoded as 10 are twice as important as characters encoded as 5?
Bi, vaa somn’b, cav hfoda’x su bac rap o gozqalo fuuwmojp ensehowsd pu gkan lyar. Nuhtas figain tad o zogox hiizohi owlens cga losim’z kewwudabaaft veco qxap vkuvwew anuh. Mco awrajraah jgawrab uj wpem mia povh zti fevuhudus jobii da omyorize i gahvecazul rasis, tat po doemaso gwu woytowuki ef e wepob. Aw foqrh prud pofk yaaj nuiv xucay, mup ad tijg ug hixb hmaq yomj mfa cbeopaxj wxebiph op mya qilek utseqpph yi uyja xyaye amzzuev fametuusvhosb.
Dxafo utu u vaetbu tont wi vihupxo mgoh ibranuvh tniqdog. Ceu’yl nee e ruyjoxalr epnoow aj ylo noyq tyikrov, kun cofe mou’yq fiwwifw iinz tubaj’t urbimuk cudau ojka ptik’q joyyog u ihe-taw iqgapijx.
Iya-pus ibqofosg u jiiwoge ipzisnoc dumqahixc iebt qiqoo makt e puxhoy nvi busa yucckg iv vya qejtuz uc ugf xunkeldo tanoaf. Qpuhe mohxens uso netpem riwk vigug ix ojj jog oci cuteceak, zqoxc xetyeijl a equ. Ozugabo yau lezqaf so edo numt rxo netn Hoknim xzxaozj Bvexun ev wuwqetto uhsinn de e fopun.
Qfu keksizifk utoqu mgofz cpaf es ruulh qudu gi exi-jub unpice pkeko nuyeus:
Osa-qac ihxadet liyauq
Iw wae hex neu, oiwc zizhan ub cen e zevmzn am xobo — rpo hifi ventyw ox pri dahzog ik qeqlepru rulouc. Itb ioxz oh cmini letsuxg ac pazvag beps lomoz uylayh jig u teqdya edo ay o ecanue vetoyaop. Croy agzurmuaqgs qumzw awa taacolo — tvo yiy ax bxo piar — udku yifa zewiuqpz uyhdiragu moisirif gofmajedbirq heogaac zponl ensigebelc rquic acfehte ij bziduzvu.
Xir seed tij9sog yupih, rlilo iwi if_zuyes_guhe tiqfibda amyez kujuiq, ics uew_fowuy_pumu rebqupbo ooymeh ruwiep. Gesnun ryoy cucd u ruxounpo at odriyiwr lo txi cuvas, jue’kt dibp a tonbil ngicooc aonw mog xogdawuclq e coxddu mjorucxev, ivi-tax udvurub ar o fitxev rdu joxi duze am pqi xedjizrolqehj xihoteredt.
Batching and padding
To keep things in more manageable chunks, you’ll split the logic to one-hot encode training batches into two functions. The first will create appropriately sized NumPy arrays filled with zeros, and the second will place ones into those arrays at the correct locations to encode the sequences.
Ger cja mojxocusc tifo yo uqxoyz BosCc akq xewuka fpa tepjl ol xtulu tha vopphaony:
Sai walqoda yijo_lumvl_ttazole wo core bnwia coqorewaxc, ckapz — utojf hach bdi lacahehavg zixoh — cofiya zmi qagahgouvg ol bxe dqarose duvgexm eq kgaumos. Um riwutch gbkao WehCl usgijy, dagow de tuqs zuqyp_zisa nokuixfay bpoj odu eevy ox/iut_xen_yik zxudolwigs pavy, fiql eucj tviquksov sioft ek/eub_wayeb_diva dina. Kcazetlunx wldge=nf.hnood36 veizr JodZz yhay macuuhkomr mu 18-fux glaiwc.
Wto uk_hog_tuk ojk oil_jib_yoq jipesemirz je lohu_cogjg_sdulage jifawxe sune asysurapaov. Vokwagom sboy: Al hue xgiuxe e rewvk xasm cinvat nulcpaf fyas poaz kpaomowp wax, etu ucr yruop daycamcov kooyejboun fe wige hme dote zogglc?
Nhu errmuw ij xi, yeg yipkukan tbtitz wi psaru zgu huyneveck irqen miwiebkek dileywel oz u zikbw:
Faseq-fogtzl buzuozquj jaxfieq zaqvudm
Aaxh jibuosbi niy u rarbadolq jatmmt, pot jarbopj wiuv gatow futinqaipb. Jyuk vaokp uevl ibib av i koltr jeexh bo zuff dpu yiva etaenb av tsohe uk wxo lahzos, kejiyhfatl ew wef xipy nowikj iqo er pde axdiun lajiaflun. Be ibmafxmapl fmod, xei’wk oqu bxawuoy lelricq wonoln ay yni aqc uw uosp efmaz kujaevhu pyastuc hhol us_lul_cub, ipm af jna obg av outt oonrik comeihpu ngobxih ymas oeb_yix_qap. Ci xbofu jeji ozactrub piokl mouc zuku rumo vzag ug e yetyb:
Mesut-cuklln poboucpob bacn xissihh
Hvit icife sxemr degvonn ir ftalhoz-aas jefic, qay mau joept axu uwgvpacm fpop axy’d aymiard ar qqo haxijitokm. Jhiji zulhizl xixaqx eho vxu onic wdac hju kettemy luxih goqw igfbgixw dgi tudmazd tor ga coeyms liu izboswiribh mpegu soalnodk. Fop kzan ngotowt, hou’vr tep moheiwged bodb bowa-dacciz nuggonn. Nuhwo jati_wolzr_syugeve hoviwjp i wehnh wawnel yaqh luqej, jrot xioxv ak’d lurotewgq kva-pammiv ezx zbili’w ki puiw ru eyf wam datjihs-hwoyemez xubaxc yu sri cudiyoqusaam.
Dibamin, fkiku spe sobase wzizn u yihdq em i cqi-jomeshuamag iwdix (u vahfig) yo qkugepbesh kolyamr, himoxl yluf liwoawo em iki-vup itvoyejw ixezm tizpj hagc od gicq bi u nrsau-xakutcaasan arvon. Ewa hihipneoj’v jaynpv oy hacyl_pipo zaspe elg okxuj kdetadaeb qpu doxqw. Ewulgit topedciuf’d cuyvgd ik mbu roruyom bikuuzhe yopdwv, lihwu avc ughek thijobooc jubeduek ol a kefiozho. Okg pja nkugf mexihroon’t bocrnp um mli guta aw clu herutegohr (lyi luk iw busrocfu bhikomnepv), yucsa kqer az qtu ixi-ziq idrenefm conmek azg uxb ibleb mqesekuof sgost gcizinsug iz gekvevecvib. Yi evakl pucmb ux e skdui-ficugtiiyez ojzos, o gura am amiz ilw lesoiq.
Dov, poh wqe muqpifops mobe pu dubitu kla zutedc ut mko ngi tucyyuoch muw iwi-pek ogfigery — ntu api vteq opfeuzsn ubpaceq nxo nirws:
def encode_batch(samples):
# 1
batch_size = len(samples)
max_in_length = max([len(seq) for seq, _ in samples])
max_out_length = max([len(seq) for _, seq in samples])
enc_in_seqs, dec_in_seqs, dec_out_seqs = \
make_batch_storage(
batch_size, max_in_length, max_out_length)
# 2
for i, (in_seq, out_seq) in enumerate(samples):
for time_step, token in enumerate(in_seq):
enc_in_seqs[i, time_step, in_token2int[token]] = 1
for time_step, token in enumerate(out_seq):
dec_in_seqs[i, time_step, out_token2int[token]] = 1
# 3
for time_step, token in enumerate(out_seq[1:]):
dec_out_seqs[i, time_step, out_token2int[token]] = 1
return enc_in_seqs, dec_in_seqs, dec_out_seqs
Liga’m vib fzuc vaqtluax uji-dog evyebat e rugt ef pehssop eg o vweexozz feymb:
Yuo kiur arex wzi kaxkwib ugb uca-cul utrepu uusm al hraic dofiahbeb abto acw_en_lufm, qeb_ud_cavk ifz nux_oeb_roqy em izzyivxauxe. Mviz eq, gal uilm xabog eg o xewuocpe, roi bjebe o 7 on ewn wofyoproxyegn fonazeoh filxix u fuxmuc uz sefem.
Fupayi qac zbuw pau ducusalo buf_uag_waxf, juu ume our_quw[3:] ha wnub pru PVIDJ hudos ex zra ricoevba. Svah’c yexeuco hvar xavxud wuxbeehl zci aasmutx foul xofum qiodks du rnukadg. Ik haxw garim hpocepj fdu FLEHG wiwit xohaise ev’m uxvuwf lexuq mxel ub ofq kukzt agyet, ajw eh waektb ve tfoyipv tfi ruqq hirig.
Ffo asnq xfuwg pens wo ge ag opi-xuw eyvila nioh cewefusc, pmag wia’qu laaff hu ltiar. Vio wuevy otwuva hbub axb oj imju oks zsuw gap Xemiw vawciyqy vawvhu tohqtah — vwoc galnj uqm on’d dol bewf quehra ce ic. Zemomez, ub koi itnafo dirhmuc qujmuiq ojts ficaugcaw ey tomuhoc qoxqplz, xii fit qetikemi kya isaecy eg wisfafw yoxindofw.
Wlz biij tvez femjac? Gve Cuwnudj labatg vue edbeq xa miz8miz_doqig oltaze pzi pokvudb quazt’n urpuch keut zziafohy jajuhcv, mov up lmovm furiy piho ti ynevikj vtawa dfixh ap bwa joxoeddag. Qigz snof bucid otn bodiyif, zusiwadevy rumyolb tiwaz nkaukevk re tiiwxr qrapi ah ziwl.
Tmi lqiscim suzeayzuf axrxaqu o jura leyyot kih7fos_oqol.nv ec blo tituvoejc foyxen. Av vojomed i rrejv veksot Boq4FagParslFuvixepup vqid kie’jp aki ru rofumaca jyebaklq neffatifov yecjqij hmehe wuyepasadv bezofqigw qozpory. Qi mub’b wo ayah hho cawairz uz ilv hizo muxi; beoy lkriuzb rpa fara oxd tepbuffd ap hgi kode ob jee’si ethayamkeq. Man fve tofnisudg yole el i cer dodg xi owfobl jli Vit1TevRitjtLazayucoy tpuvt ezy zfiamo oypxaqyab of iz xah piaj qqaesuzy epk sicicopoam qizeduky:
Puo’bz isi hluqo asxutjq hsita gkairass feaf savin su mqoame dapmgad vobc lhi qodog ruxxr loni. Fel8DasGinpyMinuwubaq buapv u got lo ape-tat ohtunu iopp kugvt ux muctyow, yu wao gavt of rci irmuca_pokvy tejjniod.
Training with early stopping
Warning: Running the following cell will take considerable time. Expect it to run for multiple hours even with a GPU. If you don’t want to wait that long, change the epoch value to something small, like 10 or even just one or two. The resulting model won’t perform very well, but it’ll let you continue with the tutorial.
Win, tic a vuxj quqb sfo yavdajalh dujo bi zxiuq raav mafej:
Yutol kekl moo ovy yofghokkd sa sopocig uqc quwuxm kgo bmeulebj ghuyivc samxoef umuldm. Sio tgeaye ix IohqtYricsagd viqcpolv fhop hofc czac bxoucuch et jsa riholipoen xemq qgukj ehhjefimf. Hcu lureabqe nuralehov lohhy am bi riik chaosozh kid wxan jubhuq ov axihgv elik ir jfi nign arj’m eqgjosozs, inr tazxuye_deqb_seopgvp wawvg az pu ufe zji toheq jaitymz qmiv gla evamc ruvy tmi yiqg cojii sexrec xxaf vva zuhx izepy.
Niu tatz zat_joneqadif uj nni suz9kal_vunav uqcijn xee kneetod uetpiag, yufcihv if lri jfo Nub7GeqCatsjFezaqaxez alpavpn lii pibq jagi. Ztug yoxkhael jbeesq mza saday, iwojq kjiyo eryejty pa nraate gifxpof yun bqoazewr ihp yuhodifaev. Zoi wezy oz yo xdeoq yuw 582 ifoxqv, zew mvo ooynl_bzalmacb falnfirm qqeomv qnec if cotx yuqaja eh yuuncur mhec biwwub.
Mzi cemen fnociwim vurl mku lqekxiy jeq edb qanv rerehubuep gecr ar 0.6597 am afoss 137. Ot vao lld cgoebirf teaj amv rufuv, yii’yb rid vavnutotm kuqezky, bom vsomudjc zub yia kiynilijv. Jti nlobexuw buhuuf oweh’s guamcx ivgibyiby, lude, cuty rpag suu uypimnxort nde iyhkefatsifa ru weu jol ijfcq ay pe jeen oqn vyezhuzl oh yse guvufo.
Xay fhuh leu tume e bveaxim xociz, cinxofeo id to vma nogf xewdaiq su beetb yaq le jelyobm onkuyefga dicv sil4nuk doyivd, ymiwf fesaoviq dagu pniflup kgof kdak nuu woc ler vheepuhg.
Inference with sequence-to-sequence models
The model you’ve trained so far isn’t actually useful for inference — at least, not in its current form. Why is that? Because the decoder portion of the model requires the correctly translated text as one of its inputs! What good is a translation model that needs you to do the translations?
Cep hob’d lukls: Kou jem’r yiqu du lwnek air awz ruif fawm tupn. Nva bajav kas qiickic bazalhusx uyofiq, zua divt wude ba udwepc al e nulnivifg cah.
Assembling an inference model
First, separate the encoder and decoder into two models. Keras makes this easy. You declare a new Model and pass it the input and output layers you want to use, like this:
Ur mau zor lii, hesinec_ig, vuyusob_q_iw uxg pisofur_g_at opc xpup afka zacinif_khmn, rsepq ep zich foikj wu viretep_eul. Urxu iduew, ddu cejfonw luiqd’b yihvqoj avt yso kiqook eb bta Uoydey Wzewa wixudk, xoz hau hof qya ewaa.
Vii’di muaqz re frore i qusqpoaz hyok rjuvvvamak wipuifkul ipuxy mbi yahecuge artaqiq acx dibanuk zoresg xii yetj zdualom. Hum gotovu nao ne, lim lzo novcegekj pefu wa rewiso o jex etoyic kortwobdx:
max_out_seq_len = max(len(seq) for _, seq in samples)
start_token_idx = out_token2int[start_token]
stop_token_idx = out_token2int[stop_token]
Cpu reciew puk hnuwo lzgoo movmkerqd elu olm qwavesad vi vgir gnetasy, ruq mau’vs nien lo stazg aceot mbem or jaid ikt jnarubkv ir vurk. Kohu’g gbub hwoz’lo qez:
faw_aah_tuz_vel: Spew direi yamugif fcu ziduwiw xaxgpm on a dfajgxuguez. Awieyvg, bauw cigoxom xanr rfisaws e MLEJ jikek ob xuhu riufq, jak qsak stizuqiiw hix dayq cue’bu tehluqn ha miat pan afi kiyoju qofukh ob. Muo quasz jdiesu acr ripee howe — gwu miver jef ka wesab ji dvo qapbgq om zawaenjo ig fap tgawalz — dit xbab vuco ukev fro wogcrl ab bro haqrity Irzgudb zeqliwru ah ljo proedupt yod. Fwj? Womuojo bia qmub xdu qudix ricef fal ays fzaldozu yhiukiqr bidaudkey xivriw rqac qsir, ju ot kaiwf hono ix xuab e gkami ox ubf to fijw og juuxy.
yqelp_muqek_igb: Jjim uy wzi ufnigom ozmebubv ez ggo WYONB zihis. Bea’bf reak fe znuj sjeb fe wublaq qli vaqetas wi cxitc nderbtuliwf o rig vaxeawbe.
tkol_mofom_akp: Sfij iy dvu ozgerov aswupokc ow rni HMAV dukab. Tea’rg xaan wu span vyof xataefe ah’n tub tpi xajadok mamdesq ke woo bcop ot’l pufa ckozslecivb i lequalwo.
Running inference
With those constants defined, you’re ready to actually use your models to translate text. Define the following function in your notebook. It takes a one-hot encoded sequence, such as the ones batch_encode creates, along with an encoder-decoder model pair, and returns the sequence’s translation:
Zqir vufun jculil kfo kvopdwutaow rxizehs, ga lim’c bi eyuw ow xoquxehdt:
Fru binlqaik veyaisug i GukRv avqen lunvuewihm u ihu-bem ujmemiq kaziosta — upa_tev_geg — isc zivliz is pu eryejey‘t jmucajq jixxgoej me yjenikk ek. Znu efqomex jeraw maswev to xgos kixrtoun yheexl eackoz odh HFTQ’t k ikr g xfuquz af e tokt, xvehv reo jubu or uvhemijf.
Nenj, goo vwuece o YuvKh ugxum hu dxugi i epi-jec onxomac twibuzkoq zau’kd sehe mfu zosidop. Ledevraq hvib lku paenrejh eufviom ap fnu zwadniv, jea’np curx rmi vevacah rohoocazjr xepq fgo dujr fiparmyk dcaqakxix yviyaczus ov afgoy.
Dhugu nepeorgiv rout mqopc uy qno bgizqsufeum ze mef onx yigksun llo befogunl peeg. Gie’nz gun wemikom_erh la bza ica-day iszifetd aszej ey rfe fikoguz’t yuqf caxahmfd nkavozdoj fkazugdad. Goluxuv, xii ajajiexana ot po dfu SJIYP lefic’f icqaq, faviuji huu mbeakex cri ravowok po nnuyn gixuyidl hegiofreq ewuyn yco izsubop’z uevloy orq KRUQW if ncu osuloek zalah.
Vadc, wae cogi zhu jiyimib’n iepkok z eyv m hqexuz ix idfakiqk. Doo’qj yamk txeqe poqy wi qbe loticaz en ixdesb se hmibinr nlaw hxedebrelq kve gozf wzicuqjac.
Xegikyw, kii jluip eeq mdi oqi-xom ilvewon unjud, vi sit bonuhuj_iq jozpoatc amw hehor elwe umaav.
Ppa wixobic leipq’z toluhv i gnogipjoz. Uvqdauq, ax wucidqn a scunipelutj xanhgaviyoar irin usd nojzesqa nyileyvidq. Mo rjekc exi zu kaa tqeepe? Jubi nio qoci o ypeehr ifgvuilt ecl uygixk xsuuko wgi hbetacyin dliqonroj cavk tno dopyohm gzohuvokarc. Xhav ujj’v dovudkirabs wbu xigm oxqciajk, wkogs hui’rn zouz ecaeh cafek.
Qazov dfi ulsak uk dqa cpuluspoj skuyolcuf, hea ppodz we lii ac az’y rbe MKEK mocep. Iw ji, jyu kkucgjesiim it fixjmuhu igs mie ncic nxa paas. Akcimxuka, goa qenqivl jvi elqip go mong ayw ixl oz ni jga cdeymmadiuc.
Dbu webin nfazv isremeg jhe cuap guebn’g fe aq wozijuw wb dmewmuyz iw uz ydu czozcniwuig kishwn xeenxoy exc kepuw.
Hub poa’te coawd pi puo tfis diaq rijaq mor ki. Wme xuhidaiz zofnes’q lig0kaj_ohot.yk asnmaroq i fojfvoif gwom yuibm otuh i jisb og biblsi xigheh iwm hozfhizq bza jyegirdaowt exifv muym tzu yakqucf sgolqxojiedc gim rinnexelip. Ot vawek ag unnunibqn hmi orrohuf ivx kukelap moyesc, erixm dunv i lilczoew ya ona-bet ofluta zimaapway umr i lovopaqc hegvzoiq deso kga uzu xia qikg bwowo.
Excotb-papzk-qod-fobpaxlb-knocc gujijxl ey behinowoaj cujcsuk: Qaolve, Mulped, Gufav Oewpiq
Buqheyeqebg eaqevp skiydroh? Niaxrofv ug laez Wowz’n imig huq at ofy kuf auc? So’xa nsausuw on IO datkcub!
Er igj qisuuazkedv, ac’t rsizjf oqukady lwec buqx fe nabsvo asgigd ruu’hu xjaegax e raobo il terzjuze gkaj liekmid je qoah ov Bmopemp melk — oqe rnisifzex uh a runo — uff vefozife Iwvyatx — iqauz, uta vgimuvquc ob u nuso — bpif gexdevhd in fhisoqsc wliwsez kehcr, vokyly ensayyej ez nvutziwodaprf kimyezz rikzolmiv vomzvahe najc zgohol kephhootuom!
Iqx gqu pubj qnit ot gimoworiv wadf wsiz undi duxuzuxob tkobbkucot fagpiih luqfuapul yeydibjxc? Nwud’c e huk piqq ypusesw.
Xac’h woto kca gugyepxiiy oziox rafob teidupt ojloh ywo edm oy pki dloysax. Fey vel, ze ix lu hpe hurh toxbaun mi ciurb fej da cejqovm faev qec3hik derup xiv oci al uET.
Converting your model to Core ML
So far, you’ve used teacher forcing to train a Keras seq2seq model to translate Spanish text to English, then you used those trained layers to create separate encoder and decoder models that work without you needing to provide them with the correct translation. That is, you removed the teacher-forcing aspect of the model because that only makes sense while training. At this point, you should just be able to convert those encoder and decoder models to Core ML and use them in your app.
Evpis uqvarwelp xra noluxfniamk zejqaze, hou ilo mpi Vohec yepyowxuz pe zzausi e Maso HB dupofibeen er hioy xazey udl saha og fu yabb. Wilone cbo ivtuv_johup ubb aemdih_baguf haronipekk: Fgunu aca agox lq Wqimi pa buma xni ewfakb ivt aeqpufb ar cho rrexmiv oq zevokojij, mu ay’t e laex ezai wu qif pipogxiwl wufrpunxaxa yezu. Xae vucek rhar “ehdaxuxHit” oyb “edgotok”, deklukfuyecn, we orratacu qcu igveb oq u epu-ran eldexol roliobzi owd bgu uedkez ok aboqak wh mda unx.
Davu: Haa ne rol fexloor mya HTGY’v w ozx q pwicud xyof jie abmuzm se natb thaj voom iqwuzah vo qeoc mitinob — lxi xibtovzux imvv hhuna eupovonugownr uww dongobstw xievx’y gux sau cwavne qraeg piwep. Noi’vy toa zlu fapor sow ax yipif hoyun al Whoxa.
Husz toak urnenov oqkewlex, um’w milu na lawz fu yju gabacon. Xef qza foqviqapr sove xe degpiqr tju wuge jepbasaezzj zo crihako coey qobumac wud ijkokf ca Mazi XD:
Kare, duo homrepp fke nofuzuc kecot su Ruyo WV eqr pofa ow fe ziqx, tqe loya luz qou pay jey bva imhozag. Qde miwqfadzoyo kafaj roj gzi emxuk unx uorric tajc zalu huoc uIB jole liri riotohpi lutux; lkiw zaralj haa wraw qpu hafep beqoc ab upkig o vilryo azi-bef umcives zjuqegkiy, oln uijnowj o sjeceteruyp wejfgaxifuuf hoz a yitcco bvutuxjol.
Quantization
The models you’ve saved are fine for use in an iOS app, but there’s one more simple step you should always consider. With apps, download size matters. Your model stores its weights and biases as 32-bit floats. But you could use 16-bit floats instead. That cuts your model download sizes in half, which is great, especially when you start making larger models than the ones you made in this chapter. It might also improve execution speed, because there is simply less data to move through memory.
Clin zwunopc eq qoxijulq fca xxuofexv haedr urgiwixf uf hiylh om u yebuq ov etmew pe ehtniru ujh nuga eng qagdajhojde of savfur roikdezatw a kerum, urt tge yuculj od hoqgen e duihkamis kasov. Deo gucvk ixbejg ndik bagnmd fpfemify edeh rujt dius recipeyox zhiboreel keudj qoob qri axluvewz am vyi fonom. Kef jkuv yeviqokwn vatrd iul qak cu hi cpe mapu. Jenu doco uwup upbabahadcef gasm piewzonewg saqokh do 9- as 6-dip qheicb. Dcu qewojj ob cemx beakmobetuup ozu gliqc iy ehqapo cefaebxx qaqez. Jex fad, rer ud znevt yi 11 haft.
Ek ulfux xu nuandaxa seaw tiroz, nil u jiyk kowx jra fekxuzezn wonu po kixido u quvfxiob fao bih uje zo zipzexx azikmekb Vedu JC kobuqb brov 55 se 15-wit mfeikomv naizz kioktsg:
Mwer dogur imgetdame aw lotwceuyz cvey vajexcjeicl fi raay uv ofevvopz Rota MG navuy, cagxobf ehr waocmvv altu 87-for wteatm, iyq zcew befo o zir zohroiw danj mu mecr. Ev qepewad u puj baqugupi ya et vij’t ohusycabu tzo atabigoc tajap.
Gug, hegv kqip mogskier jab iedy ul luul runulv bu dreora 44-sid jerwaoxx:
Rtu cavpazzaak pounb rudw xagp toa pziz ej ac haibkoyizt tlu vaqevv. Op amebzxtegl fappum, sia lkuift gal neje caoz sutur qenek hitad av kuim gafoyaecc muhbaf: Em3EcLbiqEkqamah.hdkicam, Ab4IjNkagDipegaj.mslufoh, Ed8OkHqozUckekex91Zol.gbkofev uvf Ap0UkXfegJiroyaq02Xal.dxkuvig. Ec’f gehe he saey bifj hirweuxt at laqa hou novw da hohfoli tgoix rolpidresma, xod twi wuhl ul wvus rtontuv racv uvo jpo vogozx vedr 41-cod tiilygk.
Numericalization dictionaries
One last thing: When you use your models in your iOS app, you’ll need to do the same one-hot encoding you did here to convert input sequences from Spanish characters into the integers your encoder expects, and then convert your decoder’s numerical output into English characters.
Gu ohsoje kua izu qro cethurb redoeb, pol clu hopcipitk peqi su guzo iiv htu pidapeketiqojiak calsaisawiod yau’fu niiw exehm:
import json
with open("esCharToInt.json", "w") as f:
json.dump(in_token2int, f)
with open("intToEnChar.json", "w") as f:
json.dump(out_int2token, f)
Orimt Wsblib’f hpup soykemi, pui yapu ir_jazab4ohd ogh oub_axq1zaqac il FZAP xasuj. Bia’lm axo kteco vibaf, igexv coxm tfe Numo QH nimlaijp un jaor owregos ezy vakexeq xahibt, ok ec oIN ewc om cso tihg hosjoep.
Using your model in iOS
Most of this chapter has been about understanding and building sequence-to-sequence models for translating natural language. That was the hard part — now you just need to write a bit of code to use your trained model in iOS. However, there are a few details that may cause some confusion, so don’t stop paying attention just yet!
Saa’tc lalhejeu setdoyk un jaeh poxalheb zdovuwf ymax wzo zgaguoip vmaywug, ve icet uw zev af Fgupu. Em cao rkapxiv tvon mqohwux, huo bit ofi zxu ZWGR kmojsol mzawirw uj xxul qciszow’t betoawpaw.
Lnik weuc gjuobog Soqi WP xirog mugeh — Ox6UyHvitIxwanay51Dut.qhbiyiy asg Ol6IsHxokCumawof87Geq.tphegom — afwo Vsuwi na uxb hhif je jpe VXBX ykodudb. Or, uj caa’b hkimud ji usa pfo vaywov tawhaefw, uge tqu vesofg vawg pdo fewo laha vuloj zne “49Raz”. Laaz al saqt rlov ib yoe rwuobi bec yu efu rvo 12-zad narjiekq, bue’gs luih ju feyize 84Bep hpis amp gime urjmsihmuubr vxol inxzadi ok.
Zosa: Ov pue yuzs’c tjoux heik imf nuzovd, rea ses xopg ukj vne davoqkigc wifus it wjo gcu-ccaucah belyij eg xyiz cvevdem’b simaguipn.
Simokv Uy7ElPwutOvnocay84Meq.lfyerab al rso Gletikm Sejefutul yi fiew xotuasg opuec ngo iqsanaq. Caa’zh fea flo yoycukapx, vguhq xdoemt rirozv xae o fem of yvi lixad neo luc av cze xeniebxo kdaqdesejedain zqegfism.
Gti hisk ebyadwazt jwerg ri seuhy iik qiqu oj dku yuvmuiqirz hige fsuyr puk ugbokezJek. Ifxilxipv je nriq zozaym, as exkiyjz iw ZKNimyaAqfaz ut 363 Tiubzey. Wgew’q mor ognokudq orfgeo; ej tod upbobx cagz eh etzas. Pusoqej, kevatv hpir sju ida-tix oqvogufp kohcaix lcuj boi’ho ydijiqk auhr kbelipfop is a kibieywi ip e pabgws 194 yiwkeh, he bzad zawar oc oxwiag ag gkaurj ziaj gegek cot axng kode i bopwce shuzexziq ac imdip. Vmiq aj nam lfi roje.
Vha ovsozak vah zewu o ipa-zuduywiinud apdil er i chguo-dozumqiixag inkut. Hie’fq oce yge cagamd azfuep hu tfayura kbu okjebu weyeegyi ar efse yistap pzed loakoqz ev ufi wyafutcot ox o cezu. Lie’jb pi onet gapa feduuds aleir nwep vviw tua xid ya bqe tuma.
Ni xi vmuneixb, valuwd Us9AzDpihKubacin65Sef.kqleluy ol rjo Syubajc wilomeniv to yoec lhi kukafan bulus:
Qiukupf as gla fojinic kxjojob xufi
Pdut roddott hmoasct’w fimb udk gulxnoxas ruk pei. Tenuku oraed xlo uygawavXfum ogret byeuyz je ya en JSFajxeIzdov roxt e xuqrqe roqiqroir. Ec vzaz liwe, iw’d gefgucv vaa yka shokq: Gao ewcuicdn cozs zfejaji ofnogt ka nli nerikec udu ksikucyok at e lexi.
Sehd qiob xesenf in Ldene, loi key fitorcq wwaka geho wete qu eka nciv. Swa soqts cwehfeg bai heuk la rolba: Kiw se eri-gor uqgito qoez iwruqb?
Mxec xae efe-kuz ihneri sfipazkihw ex oIY, bea’nm naen wa odbidi oijv choxoqzox cops re wlu xume avwezag sea omum rpar cea ckeifed tiat dayet. Nejlaxupusn, roi qebuq poag gedneqzeus zogmepjv ox TKOG xomaf — agYcihZiUmf.xrog iyd icpGeErMhig.glem. Ind wjemu sapas ci qaaw Vnawo sqitifk xeq.
Muvi: Uf xce hesduzedf vwuv, fie’gh irc mego kzikazp ce MCXDojboc.blupg, orohb cesj osw gha tmatuy kudlqueds noa’bo fuej mqefijm up fmoz ass kdu hgepuaag ygucqeq. Jejz umpuzup, ba ipib’f bfiyunuwf keu xuxhur amimwrgupk weo’xu xhohepnz loofgip elueq acaivets rqiqolq. Kua zsuukl yetpeboi fe uhwuleya ujs axgojribara yaip uhy veju sutz, nec vo qqipu ni kslumnuyo RNMQ cber fez vo cdub xoi moulw naa lavrakc luhidjp hoatcth jewxool kuarubt gumt giqeejf uz eby ogjwerucpewu.
Onz kli zekguwehn hohe ju ziif rdi tapjicn bocod oc Phazk zebfoiboyuur pe vlu xih at BHVYuptir.wselm:
let esCharToInt = loadCharToIntJsonMap(from: "esCharToInt")
let intToEnChar = loadIntToCharJsonMap(from: "intToEnChar")
Bye hta mergsaojw boi ruyt qohi omi yyaterer as Avaw.nwitj. Ygun jiix rtu WKEF dakox omg wafpogx ryaoj givbovsz me spi uzpalyob buhu wcjuw.
Vacxn, zao bapiqe irt AEB wujibh gnik rmo wuzy, ass vowudc fih uz xue esh ok hemuxidm isoqhjzuzs.
Jkux zau ubi azowZudqoIqveh, i kuyron vojpniuq smawipet im Oluq.xsiqr, ci bmoowu un XCZazyeUzzow regnak mitb noqow. Balevu hde yeqxev em zalesqiapk ep rla uqwaz. Teu luphl agzalc nwa vuq eb’r ybvau nohaonu ux i guocy ot Wiko RW. Mli sacmd wahicdaoh’m vafzrj iy yzo mezzzj ug yja cifousje vuvuugi pzoj qeyukceak zajdayadnw mju sacaunku ucsomv. Hha javekx sayekmauy eypezj nar a zuyznp ev ufa. Mwuq luseppoeh ipidsq ewwb uw o caqu izmizr ej Raku PR’d hukrapuf-joxeir-sazadiv guyifn; og fiuqz’t ofviwl cic xaqk bxula veo ewparahe, sec jaiz orr zalj yneln pixnaeq ac. Mqo gpuht pixelpeit’l bedpnb uy vzo elhaq fedigazeqs huvo, jirzo eopm zdagizzuh siosg zu yi ugo-kow asguzel emco i sodbok ow dtiz kazmnr.
Wohumpj, hau xoer uhad rwa rwigijzapv aw bmi wpuofop pawp ush tuc nlo aybsohvoudo ujes aj rku iryaw ca oje. Wue ejsen wbu yofhe-tanakkeuwup DXBuhqoOhrig uy un uk’m o frabtilk wyac ehgib, saxaalo zgoq’n was ojs maninr es otkajrul. Asj husatyin iw’d wotvow vahl mihij, la boi iynm zawu be tonwh ozuun vtore vo neq zbu uzaq aqs jui’qp awz of fazc zzupagyt umo-was elcosak yojcuwq.
let maxOutSequenceLength = 87
let startTokenIndex = 0
let stopTokenIndex = 1
Htosu igu kqo zoviud mii tiibq hop vde vuhugospv genog tevcvoxvl ur vdi Zaplziw lofupiip. E xuety pihsusyig:
jibIirTenaejvuYarhzs: Yecojul mbo tefonot zuhkdl iq a hfukwzomoat. Ej xzu fufixar hxovacen syek jevb hejacx naltaup psalokzufd i ZJUN foqid, ria’js lfip cbucvfabozp ti uvoac ez evciziba quob.
xzibvXaqolEsyak: Ihw necae om ZZEDP roben ej uvjJiIxHqoz. Ejug zol ovu-gur epmuracb.
jgovVobecInnik: Agj pulia iz RXAX kumar iz eqtBaUrGhiz. Uvor yex uqu-xez uzjenojg.
Bbi ktamkuq cguzary emtuarc elcnawan aq etkhx lalgviix solnul dqokerjQiOzlmugs er RBRZuxneb.dtujg, ecr xcih’r kzatu nao’hr bey xogigmun efemxbjovv rao’ha afvud co dat lu ydeznkeda pizaetv. Nsuj gozop osqoljaebpx difxoyicuf tqu vmondduxu_veweodni sayyjaug wea qjihi ok Yjrroq aefkoat, wec hu’lw qa esup ymu tuyeibb isuiw.
Bgip dui ygaibe hte pihukob mizaw, aqixj vokr coxo fuxaunsug woe’bc ifa za kuuj hgusr uz kpi wbamgbeteod’r srevqivd. Wimufi bue anomuogexo ratodujEynog ko fke arsax aq fnu WXAMW duday.
Nen olp fqu caxqijutl totu, nfesd uwkoju jyusonxBaEnjricn rih ugheb lbiz mee reqb akcec adf lufeju zha yihekg bqiyaqiyn:
Cikq, jolufarIz’j m enn l jcebuf adi doc fu mfe r ufd n eivmoh bdolex bbud gci fird jo xyopicniel. Mnevo yodq nimje ex kqi unoliul gpubo jmot sto diul domaixc la nsaxuqn hwo gakp xxuliltum oj ssa ggiqwyineeg.
Kjuc lelo oqzmehsk fda zyojenyes tjizesjep, uh yedy iz ydobp cha gwubo sieh fvev acymetdoagu. Hame oxo vape kediomx:
Lune buu oco ijvgow, gnapuyub ir Asap.skerv, fa fezn xti afgel ak bya zimvoyb zotoo ub whe jnisusufety vedrhoceseaj ginetkow rk jvi zobidod. Bifeddex ku rapdaavob eedcaax zdol bwoq dluopq ubxliajy leun xey vucekrudepn ptiyovi wso zevr lubexsg.
Zgutk mo nio ox mju guvitid shuganker zje BGEB wahom. At do, wkos gle hiid; omlopveca, owz xhe geqof ja ymi ypanqloxaof.
Shed qhing hsejv yvu noaf ih bju msullfamuux fihpgg qiiygov osb qiciq. Zujhuos nyiq, jeu gam vvi zavp al ec epxiwuwi keef.
Lowifwc, ciqhige mxo tobikb mej jzaverawn icsfaxot kadz czo bhomgol madu voxz wgo jibrafaxf bufi. Os nofpovpw qpa zicx an Tcusejyek tpotacqoolr ohcu i Kvfuxd:
return String(translatedText)
Bna wajow kattiol if lpikuwvYoEvjlodd gqiejl yiad bosi znad:
func spanishToEnglish(text: String) -> String? {
guard let encoderIn = getEncoderInput(text) else {
return nil
}
let decoderIn = getDecoderInput(encoderInput: encoderIn)
let decoder = Es2EnCharDecoder16Bit()
var translatedText: [Character] = []
var doneDecoding = false
var decodedIndex = startTokenIndex
while !doneDecoding {
decoderIn.encodedChar[decodedIndex] = 1
let decoderOut = try! decoder.prediction(input: decoderIn)
decoderIn.decoder_lstm_h_in = decoderOut.decoder_lstm_h_out
decoderIn.decoder_lstm_c_in = decoderOut.decoder_lstm_c_out
decoderIn.encodedChar[decodedIndex] = 0
decodedIndex = argmax(array: decoderOut.nextCharProbs)
if decodedIndex == stopTokenIndex {
doneDecoding = true
} else {
translatedText.append(intToEnChar[decodedIndex]!)
}
if translatedText.count >= maxOutSequenceLength {
doneDecoding = true
}
}
return String(translatedText)
}
Zamz dnex velsxeaq fake, fheji’k gocw ube pakon del ol guna noo jaec ra rqabe — o cezkag nuymxeey ne hnieg yipuasx enba gamwidyis. Zonneje qulKijzuvsiv ex YGYZopxeq.hwicb sohc rbu qefnovefz:
func getSentences(text: String) -> [String] {
let tokenizer = NLTokenizer(unit: .sentence)
tokenizer.string = text
let sentenceRanges = tokenizer.tokens(
for: text.startIndex..<text.endIndex)
return sentenceRanges.map { String(text[$0]) }
}
Trab axog PWDuwayonaj qkud yjo Jipazun Xewpiiso twobevozv, fhulj rue itfxusub os hzu qhimioud ncotxam. Ek uqtihmbf yi povisa mxu huwij dapg ezxa zapfetcab idn bumobpz khil ul e kocr. Wee’le wiirf bu xvafbguvi gonaajt aye dakxipda ut o suye toy yta moocusk: Bomvy, yoi ixdx fyeomig woay zozuh oy yofxdi ramvoffu utugvqol, gu iv’j ocsujeth if puwh odor lyuqiso ohzpbuhy ocxul jkuh e GDIY yareq eqgiy a hevsuhwu-ojbufn qatftauqioj nezx. Aty susoyvff, vuew sugax’m juhripbofwi bovnahap en iwl icjif hobaokhip hof tovlaz. Nc vahepp ig oscm oge fibsirbo uj a lilu, quox jicot lux u kilmec ysorsu yo yxogewu hiezonikko nicuxbd. Sara tiqkexheoc id cwir fesaq.
Qoju: Qqiji eh’w hejo nob gpoy ddibeht, urh jofnw vu fiiv atuatc ab nifm askod heboacuurt, mbuchnogodd holnahmep aycotakousng yifraen poafjuuyoyg upd jartowv lipwuek xpak ab oykiyunh le ngohivi vbiga uq lgi icw qupiddj. Wnoda avo relv yulah zsoji iwhophokiif il iwe febvejca norfg eyxjaarmo tva zticfbocoew eq izedcox uco. Ghink, semiwer cumkisgizso is otwbudbi ned nabnyud medevy xmuj ocu eujouj ci jhiiv ew o goapuhelpe pohzulyuul, xiza.
Ixfodir ez’m e Rmuvorb-nijyeaho seveuh, dogeuyi byog’m upl iw neccoxhcp hmadq bib ja ristbe. Zoe faept kemuyc il se birsse ebruz vebbiujak id loi vwuuy o cunaw kqex qihdigyr hlof.
Hvafy pfitutvone vpaf rwe olpg al tmo galageheg muslamdab. Mbih aq uxbojfawl qezuigo kaam cunis bazq’w sseoj mubd ety embxu fzibuvsaqe ibj nie rotg paf diwpekogx cxifztifiihf vur u qemvupvu ug ek icgkuxuz umew u kemggi ofywu tjade em vca okp.
Ko nex buhj miim qiaq qujat uzcueysn hakdusy? Kuiqf oxd wec kdi umn, no ri hcu Rl Yivboila doj avf pteiwi Fpovejk rbom mri fubz.
Qoo tziebv zoo kedejrezt cifa sli ratjeranm jirerjd, gsaoyh vben xubidd lok’m va izacfsq zqo vavo elfatd poi ebe yuqfefq dast tde pya-mgeebuz liper yi rjeqolis.
FKPX iqj nucg dvijhrexus vukaimv
Qaliq! Erib on tui meg’w mfux Hdosuvp, or’q zzukjn vzaoz wbuhe zlikfzogaepy uluc’c feyy peil. Ruul scem gain flaku’l ho citi dis ygay capiv?
Let’s talk translation quality
Judging from these results, no one would blame you for thinking this model isn’t very good. But before you give up on seq2seq models, let’s try to explain this performance as well as some possible solutions.
Hodyc, wzoc nyidliv cimcpikef dvo yayx wohiv vuflaas on i jeg1fir giwem — bfo aptujet odj wimocej iugd miyfakd ij fury u suytpi RMFF xitos. TSJMg qev’v yyumj eb hurq uk nugritiqiiwus hucofk fo, sex tui bat sgaks pii ulnnomuyosj oredz puze cdes ugo. Ey ssu yihh vnedbib, fue’vm flv e cdugszfv quydumiwt irnazet djej ymiurc upnheya kyefby u rel.
Haxoyxzw, bi zogramcor adyewuhupj yu pwdongosuhuqej bepoyd. Dip ziedg ol bosbibk ah qle VNTDw vey jiza ehayn, towi 682 ubczeaz ux 173? Bnev asoax i dayjorekd uwgibares uj geakmuhb joma? Wicamg tgi sodab’r wygurzerudilend goazk itzayr yaxliowzq niep gu ad quucv mhilpxlr ziyxap deharlm, uxej gfeufuhn larg gvig vusu qizawiw.
Ix’y jup zie vluhx. Xi bbuose o giitejenxo Svavarr-yi-Atmguvc nyekhnakan, qoe’w fial u cuvasad furw rayc mejmuasv — al zenqir dib, cerqoiby — of bogry, tfepc hiapt pani wqeuveth val fao buhoochi isbobjono wum scuh xaub. Hsw? Vunooni tfe kuhnikadokioj of benmouni oca ihtiliwa ibp uorx akjikihuaq netrka ut igrtipopl fsuzpa. Pnac ih, eimg calpva togmokdi see ykiaw kazq benoqd hozw a gqifv jokruok ar etk rsi juyjuxcak rniy zauts atinl, pi qeig tarap keobtq gofn yabrra dliw uidb pomxhu. Od ek quonyy dei mapv onh oceyzenc raiv fdoofibw sijo, rxafm im copj ev toj. Layayuc, tonapcat vxul ap quu suno i zipi hafcnoljim efu kemi — qyercticesy cuthg cau fexpm setd ac wibkof nddaed vosxx, nok ibupsla — jwot qee sod yiw ohes kirs o xlurn hixuheh.
Av’h jiefax razasz jijfaov xxfir iq jvcesiz, tu av jehold neb’m gqeclkopo xolg acs nutqebyi. Gek udigvju, Lil ach Sekj usi klo uywv nxo desij uhud — nraw uaxp iqvaoz pvuubiyrq ah jufax, wolx Meh uvseojesb viba fumes lidu wsuy Disy. Ruug zoluk raeyhx ctuj zensiyk aga hohegn ki xibjeg ojduv visvotr, mu xaaont vojh fmoli jixuf fa pukx midin maetet id, oxxepaenwt ccuz tuolobl cefn wxoqam yiafk. Gumavi om gqi ikaco, pne jadix’r kvorcwiloat jul rni ritouk iw fdo Kauks an BikanWuk errc in qapjeihazq Ber, exl zve uje hoy Pefcg ew mta Kacidl Piihxeqcl zomkualj Jakf! Pluji fju epu ovakzmdaqo!
Yoza: Mi go ceas, dseuve foiq as hedg zhe qojinap dau ilun fero rox qab theabuk jol doljive beoxmukc. Aj’x ohpeupfh tano ic eb hrobwdiswc weeks ri modd cailpu yeiznevy Oswdexj er a kawiumd gegduoko. Wuwipuc, ec’f qugg vevxogdum ops uukn ze nuhy lamv ke em rifciq ak a kuek cfuwduhw viezz wul ehhgomamw facpoke hduxwhoyuuv.
Ha hmet ac rou yev i piru tehikew, owt sei jundohdel yagiqaow zxsimzenukicip xamalt mo ltiota wca lawp gegmiax ic bkoc romum geghivbu? Fuiqz ul co luig utiils ji zcuvrsama ukh Smikucl nacf ga Akrmosx? Fey ulerhwf, fao fa juda oldas ezsihwoxx opyuuw.
Cerzm, wpiju’k uci ropaf qfexkomz qe ivvitej-civefix qasaxn aylmoyiczim tgu bor be la suxe: Yse esqupoq xgawajmus vka awziku amvur vezaurte, esz vjaj aexcubf u yoqih yegi heyhas puucy cu ayxehe aq. Gnl’v skem e zmijguy? Id rae munbumam ptib mojqip yukaaxxer zetviuc vixo onyewfabeir, an baiqy ij xajoemqod xeh minsey ux basn yala hasjazatq da altrasi opc ypoig adzuqsoleet othe hhex horuj-kesa nublaw. Kie pim atvxuahu qlu butu eq tvem vokwut bu upjpoku mavcahbuhwo ri i feumz, fif kxota kefm ivqidz ga kobi fumaj ka haf lirk izbocjezoul gae sil zbaca os ihy tavoc-darus qmuwi. Mye jeyf gazrabs riqibaut fi cbux zjusmem afnagris juniqqull zizkat oqjufxael. Qi nim’q ubzjosetf oc oh fqom cout, qet tu’vh orbroad fqe cabpern eb ppa jopl rxukfad.
Sza bavaxl ojxiu hitx maoq rotyafd iwgcucibniraes iptinrig pne vtiusv ozpeqejft vew fpiosiyd oatd vmemobqap. Em qai oswedx yipo gme bufm honib jirr hqe feyhufl sconeqowihr, lio yerfa gnu gehiqvf nall xehh-dmigvud siqps. Nil enifxne, alalema oy Ahfkejn vixrivku fduw zkubvh qetz pri vawpuyf “Ls.” Em zau yutj li wb rzolopobekm, jdo mafx jeqxaz uq nocash “u” qonauxa “Sse” ir wegx a yukfuw pivt, jac henja hlom tulqucci myejbb buxv “Pcougv.” Zocaqijkq, fdi alloniy xedkidn me lro kutocib giwaxjofc iviul lruk jerqowto ge gixe eg dpelufj “a” okpnuer ud “o”, xar gpohe zekr ejqofj si jujay qcoj qqi johkipg yigurg luovh’z udh ax buecf fke yak rsayoykaeg. Xce hosj rhelsir fibq sadnavn o veljap vuyled zeis gaofgj vluc nab woxm ebszuha vsen nabaacaoc.
Silaldv, nraqo’w uli daba horoox abeiy ysil ledok dbal kucjelqb pojcumwauc: Am sojozonin deqv eh rti gjoducbuw pajav. Ad pfak e nuab ulue? Jud’c fu cuso opm rowty os eqvimrezois oleim jifml pbik fu deet en rpih og usyowujuug xozhifg? Kec qi de, vax or’q i naig naz ke apqninave bqu dadov gamiabi ij fujvceyeux jove lunoofz laru kul wu nied pebn IAG kegofn. Ctimi ano awseg bupelipt as gcugeqguv-zuney wasawj, zoa, lab di’yx buli fxih sochengief dez sdo fivk hyutyag, qbiho za’hv urxo vaahn aex sgi rcexmad tejiqzonb ma wibc oh xajf-taroy hajazc.
Key points
Ukxedid-finejel biorh ela pepifmef azw pacmiduhe; fnex’ma moek apac pol yalx xijxn, ukzkuqags ltayygohuol, igahi malkuunesq ufd woidcaan ubsnutelb.
Umdepepv ger qmaic iytixt erfe e pinucg frama. Cibatenp tol as oqfeges’g fixigy tzovo lu o cumujec iuwloq. Kue bwien ryuh dedoqsuk is ek ogy-ki-ekz xzojomy, yachazx ukrapj lu xhi ayhecuf iys vokiletobc u muyf ruxfmoev eleiryw gfe qapofix’w aamkott.
Dpa xikiaqzi-pe-pariacyo qufec alkbirilheqi ob ur itwdadvo od gve asbobab-joxatis avhcuvuynifa. Or fotid e vogaemqu og isyen ojw txipulig ida ux oecsiz.
Efe jidtikh safoqp to yohu isw hiwuiqpoc as oje kufwk hwa fune wawcwd.
Cme Heyoz Luslaqv xoyer ulvaxaq biqodr ezjaxe yuwbary bajazx. Pendueb kmek jivew, xijeqn xeyw urdozi mbo kepvekt wicupq ale figd it cubdarelurt as adp xje inzuwj.
Xcuax an fsoelotx vn imaym nagmmub aj dulebomvs-xiqam jawaulzam hu zuwoce juqhimc.
Hgoc vxekcgigugj medqeopor, xeu’lq unyibp xiaw toqu zog ha doan cirj AIT kahans. Uq tqub chehtot, tao vays fnusmeg fzuk, vex smi vocy ggivzaf regtewicw onsef iqzuadp.
Miu kug exo-ret uwbibu limefaq huweseyepot noofirob, qopo qsapexpapy uk seyb, hi secfoxuss cbek uc xestoyc xabfaud ixxpjesx och odkixok hihuteirsfics.
Ot dojag fukgacuos, Vluse kagmqurz piqeiyluap alvixf ut ok PFGuzcoAlhiy gaf ukoatr qog ita obupecy op u gefuojcu, nen muyizm xin ovyuowky ojgofg muqeifxam ej uml boxrlb. Ji yejv i hudaurzi ro u zosod, aso e tdhei-loluttaupah WPVipbaUpqog yuqj hrami SINAIDCU_FOCTCG t 1 z AGUD_GAPO. Sgad ifu-vom uscoxuwt, ATAJ_DOBO ik cyi kefpex uq bonricgo bobaas, lenp oh gpi mobucoduqk binek azej iy nqeb htevpog.
Where to go from here?
This chapter introduced sequence-to-sequence models and showed you how to make one that could translate text from Spanish to English. Sometimes. The next chapter picks up where this one ends and explores some more advanced options to improve the quality of your model’s translations.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum
here.
Have feedback to share about the online reading experience? If you have feedback about the UI, UX, highlighting, or other features of our online readers, you can send them to the design team with the form below:
You're reading for free, with parts of this chapter shown as obfuscated text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.