How to Parse HTML on iOS

This is a blog post by iOS Tutorial Team member Matt Galloway, founder of SwipeStack, a mobile development team based in London, UK. You can also find me on Google+. Let’s say you want to find some information inside a web page and display it in a custom way in your app. This technique is […] By Matt Galloway.

Leave a rating/review
Save for later
Share
You are currently viewing page 3 of 3 of this article. Click here to view the first page.

The Fellowship of the Tutorial

Next up, you’ll be downloading the list of Ray’s contributors, i.e. the fellowship of the iOS Tutorial Team. If you open http://www.raywenderlich.com/about in your favorite browser and “View Source” again, somewhere in the file you should see something like this:

<ul class="team-members">
    <li id='mgalloway'>
        <h3>Matt Galloway (Editor, Tutorial Team Member)</h3>
        <img src='/wp-content/images/authors/mgalloway.jpg' alt='Matt Galloway' width='100' height='100'>
    </li>
</ul>

In a tree structure, it looks like this:

Contributors tree

This time, your corresponding XPath expression looks like this:

//ul[@class='team-members']/li

This translates to: get me all the <li> tags which are children of a <ul> tag that has “class=’team-members’”.

Back in MasterViewController.m, add an instance variable to the class continuation category as follows (the class continuation category is the section beginning with @interface MasterViewController () at the top of the file, right below the imports section):

@interface MasterViewController () {
    NSMutableArray *_objects;
    NSMutableArray *_contributors;
}

You will be adding the contributors to the new _contributors array.

Next add the following method below loadTutorials in MasterViewController.m:

-(void)loadContributors {
    // 1
    NSURL *contributorsUrl = [NSURL URLWithString:@"http://www.raywenderlich.com/about"];
    NSData *contributorsHtmlData = [NSData dataWithContentsOfURL:contributorsUrl];
    
    // 2
    TFHpple *contributorsParser = [TFHpple hppleWithHTMLData:contributorsHtmlData];
    
    // 3
    NSString *contributorsXpathQueryString = @"//ul[@class='team-members']/li";
    NSArray *contributorsNodes = [contributorsParser searchWithXPathQuery:contributorsXpathQueryString];
    
    // 4
    NSMutableArray *newContributors = [[NSMutableArray alloc] initWithCapacity:0];
    for (TFHppleElement *element in contributorsNodes) {
        // 5
        Contributor *contributor = [[Contributor alloc] init];
        [newContributors addObject:contributor];
        
        // 6
        for (TFHppleElement *child in element.children) {
            if ([child.tagName isEqualToString:@"img"]) {
                // 7
                @try {
                    contributor.imageUrl = [@"http://www.raywenderlich.com" stringByAppendingString:[child objectForKey:@"src"]];
                }
                @catch (NSException *e) {}
            } else if ([child.tagName isEqualToString:@"h3"]) {
                // 8
                contributor.name = [[child firstChild] content];
            }
        }
    }
    
    // 9
    _contributors = newContributors;
    [self.tableView reloadData];
}

This should look familiar. That’s because it’s very similar to the loadTutorials method you wrote! This time, though, there’s a slightly more work that needs to be done to extract the relevant information about the contributors. Here’s what it all means:

  1. Same as before, except this time grabbing a different URL. This time we’re using the main page, so we can get the list of the cool guys and gals on the sidebar.
  2. Again, creating a TFHpple parser.
  3. Execute your desired XPath query.
  4. Create a new array and loop over the found nodes.
  5. Create a new Contributor object and add it to your array.
  6. You need to get at the name and image URL elements from the <h3> and <img> tags, which are children of the <li> tag. So you loop over the children and pull out the relevant details as you find them.
  7. If this child is an <img> tag, then the “src” attribute tells you the image URL. Note that this is wrapped in a @try{}@catch{} because sometimes internally within hpple, an exception is thrown. I hope that will be fixed upstream at some point.
  8. If this child is a <h3> tag, then the first child (the text node) will tell you the name of the contributor.
  9. As before, set the view controller’s _contributors array to the new one you created, and reload the table data.

All that’s left is to make the table view display the new contributor data. Change the following table view data source methods to look like this:

-(NSString*)tableView:(UITableView *)tableView titleForHeaderInSection:(NSInteger)section {
    switch (section) {
        case 0:
            return @"Tutorials";
            break;
        case 1:
            return @"Contributors";
            break;
    }
    return nil;
}

-(NSInteger)numberOfSectionsInTableView:(UITableView *)tableView {
    return 2;
}

-(NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section {
    switch (section) {
        case 0:
            return _objects.count;
            break;
        case 1:
            return _contributors.count;
            break;
    }
    return 0;
}

-(UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath {
    static NSString *CellIdentifier = @"Cell";
    
    UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:CellIdentifier];
    if (cell == nil) {
        cell = [[UITableViewCell alloc] initWithStyle:UITableViewCellStyleSubtitle reuseIdentifier:CellIdentifier];
        cell.accessoryType = UITableViewCellAccessoryDisclosureIndicator;
    }
    
    if (indexPath.section == 0) {
        Tutorial *thisTutorial = [_objects objectAtIndex:indexPath.row];
        cell.textLabel.text = thisTutorial.title;
        cell.detailTextLabel.text = thisTutorial.url;
    } else if (indexPath.section == 1) {
        Contributor *thisContributor = [_contributors objectAtIndex:indexPath.row];
        cell.textLabel.text = thisContributor.name;
    }
    
    return cell;
}

Finally, add the following at the bottom of your viewDidLoad:

[self loadContributors];

Then build and run. You should see a list of not only the tutorials but also the contributors! Great work!

Where to Go From Here?

Here is a sample project with all of the code from this tutorial.

I’ve shown you how to parse some simple HTML into a data model. I showed how to grab various bits of information out of the HTML, but you might want to consider some additions. Can you:

  • Parse each tutorial’s HTML data (i.e. the web page at each Tutorial object’s ‘url’) and extract the contributor who wrote that article?
  • Download the image of each contributor and show it in the table view next to the contributor’s name?
  • Make the phone open Safari to that tutorial’s or contributor’s URL when you tap on each row?
  • Perform the fetching of HTML data and parsing on a background thread so that it doesn’t lock the UI?

I hope you have enjoyed learning about HTML parsing on iOS. If you have any further questions then I’d love to hear about them in the forums!


This is a blog post by iOS Tutorial Team member Matt Galloway, founder of SwipeStack, a mobile development team based in London, UK.

Contributors

Over 300 content creators. Join our team.