How to Make a Narrated Book Using AVSpeechSynthesizer in iOS 7

Learn how to make Siri read you a bedtime story to you by using one of iOS 7’s newest features: AVSpeechSynthesizer. By .

Leave a rating/review
Save for later
Share
You are currently viewing page 3 of 4 of this article. Click here to view the first page.

Be a Good Delegate and Listen

Your speech synthesizer AVSpeechSynthesizer has a delegate AVSpeechSynthesizerDelegate that is informed of various important events and actions in the speech synthesizer’s lifecycle. You’ll implement some of these delegate methods to make speech sound more natural by using the utterance properties included in WhirlySquirrelly.plist.

Open RWTPage.h and add the following code after the declaration of displayText

  @property (nonatomic, strong, readonly) NSArray *utterances;

Open RWTPage.m and add the following code after the declaration of displayText

  @property (nonatomic, strong, readwrite) NSArray *utterances;

Note: You’re following a best-practice here by declaring properties in the header file as readonly and in the implementation file as readwrite. This makes sure that only object itself that can set its properties.

Replace pageWithAttribute: with the following code

+ (instancetype)pageWithAttributes:(NSDictionary*)attributes
{
  RWTPage *page = [[RWTPage alloc] init];

  if ([[attributes objectForKey:RWTPageAttributesKeyUtterances] isKindOfClass:[NSString class]]) {
    page.displayText = [attributes objectForKey:RWTPageAttributesKeyUtterances];
    page.backgroundImage = [attributes objectForKey:RWTPageAttributesKeyBackgroundImage];
    // 1
    page.utterances  = @[[[AVSpeechUtterance alloc] initWithString:page.displayText]];
  } else if ([[attributes objectForKey:RWTPageAttributesKeyUtterances] isKindOfClass:[NSArray class]]) {
    NSMutableArray *utterances = [NSMutableArray arrayWithCapacity:31];
    NSMutableString *displayText = [NSMutableString stringWithCapacity:101];

    for (NSDictionary *utteranceAttributes in [attributes objectForKey:RWTPageAttributesKeyUtterances]) {
      NSString *utteranceString =
                 [utteranceAttributes objectForKey:RWTUtteranceAttributesKeyUtteranceString];
      NSDictionary *utteranceProperties =
                     [utteranceAttributes objectForKey:RWTUtteranceAttributesKeyUtteranceProperties];

      AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:utteranceString];
      [utterance setValuesForKeysWithDictionary:utteranceProperties];

      if (utterance) {
        [utterances addObject:utterance];
        [displayText appendString:utteranceString];
      }
    }

    page.displayText = displayText;
    page.backgroundImage = [UIImage imageNamed:[attributes objectForKey:RWTPageAttributesKeyBackgroundImage]];
    // 2
    page.utterances  = [utterances copy];
  }

  return page;
}

The only new code is in sections 1 and 2, which set the page.utterances property for the NSString case and the same property for the NSArray case, respectively.

Open RWTPageViewController.h and replace its contents below the header comments with

#import <UIKit/UIKit.h>
@import AVFoundation;

// 1
@interface RWTPageViewController : UIViewController<AVSpeechSynthesizerDelegate>

@property (nonatomic, weak) IBOutlet UILabel *pageTextLabel;
@property (nonatomic, weak) IBOutlet UIImageView *pageImageView;

@end

In Section 1, you declared that RWTPageViewController conforms to the AVSpeechSynthesizerDelegate protocol.

Open RWTPageViewController.m and add the following property declaration just below the declaration of the synthesizer property

  @property (nonatomic, assign) NSUInteger nextSpeechIndex;

You’ll use this new property to track which element of RWTPage.utterances to speak next.

Replace setupForCurrentPage with

- (void)setupForCurrentPage
{
  self.pageTextLabel.text = [self currentPage].displayText;
  self.pageImageView.image = [self currentPage].backgroundImage;
  self.nextSpeechIndex = 0;
}

Replace speakNextUtterance with

- (void)speakNextUtterance
{
  // 1
  if (self.nextSpeechIndex < [[self currentPage].utterances count]) {
    // 2
    AVSpeechUtterance *utterance = [[self currentPage].utterances objectAtIndex:self.nextSpeechIndex];
    self.nextSpeechIndex    += 1;

    // 3
    [self.synthesizer speakUtterance:utterance];
  }
}
  1. In Section 1, you're ensuring that nextSpeechUtterance is in range.
  2. At Section 2 you're getting the current utterance and advancing the index.
  3. Finally, in Section 3, you're speaking the utterance.

Build and run.What happens now? You should only hear "Whisky," the first word, spoken on each page. That's because you still need to implement some AVSpeechSynthesizerDelegate methods to queue up the next utterance for speech when the synthesizer finishes speaking the current utterance.

Replace startSpeaking with

- (void)startSpeaking
{
  if (!self.synthesizer) {
    self.synthesizer = [[AVSpeechSynthesizer alloc] init];
    // 1
    self.synthesizer.delegate = self;
  }

  [self speakNextUtterance];
}

In Section 1, you've made your view controller a delegate of your synthesizer.

Add the following code at the end of RWTPageViewController.m, just before the @end


#pragma mark - AVSpeechSynthesizerDelegate Protocol

- (void)speechSynthesizer:(AVSpeechSynthesizer*)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance*)utterance
{
  NSUInteger indexOfUtterance = [[self currentPage].utterances indexOfObject:utterance];
  if (indexOfUtterance == NSNotFound) {
    return;
  }

  [self speakNextUtterance];
}

Your new code queues up the next utterance when the synthesizer finishes speaking the current utterance.

Build and run. You'll now hear a couple of differences:

  • You queue up the next utterance when the current utterance is spoken, so that every word on a page is verbalize.
  • When you swipe to the next or previous page, the current page's text is no longer spoken.
  • Speech sounds much more natural, thanks to the utteranceProperties in Supporting Files\WhirlySquirrelly.plist. Your humble tutorial author toiled over these to hand-tune the speech.

Control: You Must Learn Control

Master Yoda was wise: control is important. Now that your book speaks each utterance individually, you're going to add buttons to your UI so you can make real time adjustments to the pitch and rate of your synthesizer's speech.

Still in RWTPageViewController.m, add the following property declarations right after the declaration of the nextSpeechIndex property

@property (nonatomic, assign) float currentPitchMultiplier;
@property (nonatomic, assign) float currentRate;

To set these new properties, add the following methods right after the body of gotoPreviousPage:

- (void)lowerPitch
{
  if (self.currentPitchMultiplier > 0.5f) {
    self.currentPitchMultiplier = MAX(self.currentPitchMultiplier * 0.8f, 0.5f);
  }
}

- (void)raisePitch
{
  if (self.currentPitchMultiplier < 2.0f) {
    self.currentPitchMultiplier = MIN(self.currentPitchMultiplier * 1.2f, 2.0f);
  }
}

- (void)lowerRate
{
  if (self.currentRate > AVSpeechUtteranceMinimumSpeechRate) {
    self.currentRate = MAX(self.currentRate * 0.8f, AVSpeechUtteranceMinimumSpeechRate);
  }
}

- (void)raiseRate
{
  if (self.currentRate < AVSpeechUtteranceMaximumSpeechRate) {
    self.currentRate = MIN(self.currentRate * 1.2f, AVSpeechUtteranceMaximumSpeechRate);
  }
}

-(void) speakAgain
{
    if (self.nextSpeechIndex == [[self currentPage].utterances count]) {
      self.nextSpeechIndex = 0;
      [self speakNextUtterance];
    }
}

These methods are the actions that connect to your speech control buttons.

  • lowerPitch: and raisePitch: lower and raise the speech pitch, respectively, by up to 20% for each invocation, within the range [0.5f, 2.0f].
  • lowerRate: and raiseRate" lower and raise the speech rate, respectively, by up to 20% for each invocation, within the range [AVSpeechUtteranceMinimumSpeechRate, AVSpeechUtteranceMaximumSpeechRate].
  • speakAgain: resets the internal index of the current spoken word, then repeats the message on the screen.

Create the buttons by adding the following methods right after the body of raiseRate

-(void) addSpeechControlWithFrame: (CGRect) frame title:(NSString *) title action:(SEL) selector {
  UIButton *controlButton = [UIButton buttonWithType:UIButtonTypeRoundedRect];
  controlButton.frame = frame;
  controlButton.backgroundColor = [UIColor colorWithWhite:0.9f alpha:1.0f];
  [controlButton setTitle:title forState:UIControlStateNormal];
  [controlButton addTarget:self
                 action:selector
       forControlEvents:UIControlEventTouchUpInside];
  [self.view addSubview:controlButton];
}

- (void)addSpeechControls
{
  [self addSpeechControlWithFrame:CGRectMake(52, 485, 150, 50) 
                            title:@"Lower Pitch" 
                           action:@selector(lowerPitch)];
  [self addSpeechControlWithFrame:CGRectMake(222, 485, 150, 50) 
                            title:@"Raise Pitch" 
                           action:@selector(raisePitch)];
  [self addSpeechControlWithFrame:CGRectMake(422, 485, 150, 50) 
                            title:@"Lower Rate" 
                           action:@selector(lowerRate)];
  [self addSpeechControlWithFrame:CGRectMake(592, 485, 150, 50) 
                            title:@"Raise Rate" 
                           action:@selector(raiseRate)];
  [self addSpeechControlWithFrame:CGRectMake(506, 555, 150, 50) 
                            title:@"Speak Again" 
                           action:@selector(speakAgain)];
    
}

addSpeechControlWithFrame: is a convenience method to add buttons to the view that links each of them with methods to alter the spoken text on demand.

Note: You could also create these buttons in Main.storyboard and wire up their actions in RWTPageViewController. But that would be too easy, and there is a more functional approach.

Add the following code in viewDidLoad before [self startSpeaking]:


  // 1
  self.currentPitchMultiplier = 1.0f;
  self.currentRate = AVSpeechUtteranceDefaultSpeechRate;

  // 2
  [self addSpeechControls];

Section 1 sets your new speech properties to default values, and section 2 adds your speech controls.

As the last step, replace speakNextUtterance with the following

- (void)speakNextUtterance
{
  if (self.nextSpeechIndex < [[self currentPage].utterances count]) {
    AVSpeechUtterance *utterance = [[self currentPage].utterances objectAtIndex:self.nextSpeechIndex];
    self.nextSpeechIndex    += 1;

    // 1
    utterance.pitchMultiplier = self.currentPitchMultiplier;
    // 2
    utterance.rate = self.currentRate;

    [self.synthesizer speakUtterance:utterance];
  }
}

The new code sets the pitchMultiplier and rate of the next utterance to the values you set while clicking the nifty new lower/raise buttons.

Build and run. You should see something like below.

Narrated Book with Speech Control

Try clicking or tapping the various buttons while it's speaking, and take note of how it changes the sound of the speech. Yoda would be proud; you're not a Jedi yet, but are becoming a master of AVSpeechSynthesizer.