Home iOS & Swift Books Swift Apprentice

9
Strings Written by Matt Galloway

So far, you have briefly seen what the type String has to offer for representing text. Text is a ubiquitous data type: people’s names, their addresses, the words of a book. All of these are examples of text that an app might need to handle. It’s worth having a deeper understanding of how String works and what it can do.

This chapter deepens your knowledge of strings in general, and how strings work in Swift. Swift is one of the few languages that handle Unicode characters correctly while maintaining maximum predictable performance.

Strings as collections

In Chapter 2, “Types & Operations”, you learned what a string is, and what character sets and code points are. To recap, they define the mapping numbers to the character it represents. And now it’s time to look deeper into the String type.

It’s pretty easy to conceptualize a string as a collection of characters. Because strings are collections, you can do things like this:

let string = "Matt"
for char in string {
  print(char)
}

This will print out every character of Matt individually. Simple, eh?

You can also use other collection operations, such as:

let stringLength = string.count

This will give you the length of the string.

Now imagine you want to get the fourth character in the string. You may think to do something like this:

let fourthChar = string[3]

However, if you did this, you would receive the following error message:

'subscript' is unavailable: cannot subscript String with an Int, see the documentation comment for discussion

Why is that? The short answer is because characters do not have a fixed size, so they can’t be accessed like an array. Why not? It’s time to take a detour further into how strings work by introducing what a grapheme cluster is.

Grapheme clusters

As you know, a string is made up of a collection of Unicode characters. Until now, you have considered one code point to precisely equal one character and vice versa. However, the term “character” is relatively loose.

let cafeNormal = "café"
let cafeCombining = "cafe\u{0301}"

cafeNormal.count     // 4
cafeCombining.count  // 4
cafeNormal.unicodeScalars.count     // 4
cafeCombining.unicodeScalars.count  // 5
for codePoint in cafeCombining.unicodeScalars {
  print(codePoint.value)
}
99
97
102
101
769

Indexing strings

As you saw earlier, indexing into a string to get a certain character (err, I mean grapheme cluster) is not as simple as using an integer subscript. Swift wants you to be aware of what’s going on under the hood, and so it requires syntax that is a bit more verbose.

let firstIndex = cafeCombining.startIndex
let firstChar = cafeCombining[firstIndex]
let lastIndex = cafeCombining.endIndex
let lastChar = cafeCombining[lastIndex]
Fatal error: String index is out of bounds
let lastIndex = cafeCombining.index(before: cafeCombining.endIndex)
let lastChar = cafeCombining[lastIndex]
let fourthIndex = cafeCombining.index(cafeCombining.startIndex,
                                      offsetBy: 3)
let fourthChar = cafeCombining[fourthIndex]
fourthChar.unicodeScalars.count // 2
fourthChar.unicodeScalars.forEach { codePoint in
  print(codePoint.value)
}
101
769

Equality with combining characters

Combining characters make equality of strings a little trickier. For example, consider the word café written once using the single é character, and once using the combining character, like so:

let equal = cafeNormal == cafeCombining

Strings as bi-directional collections

Sometimes you want to reverse a string. Often this is so you can iterate through it backward. Fortunately, Swift has a rather simple way to do this, through a method called reversed() like so:

let name = "Matt"
let backwardsName = name.reversed()
let secondCharIndex = backwardsName.index(backwardsName.startIndex,
                                          offsetBy: 1)
let secondChar = backwardsName[secondCharIndex] // "t"
let backwardsNameString = String(backwardsName)

Raw strings

A raw string is useful when you want to avoid special characters or string interpolation. Instead, the complete string as you type it is what becomes the string. To illustrate this, consider the following raw string:

let raw1 = #"Raw "No Escaping" \(no interpolation!). Use all the \ you want!"#
print(raw1)
Raw "No Escaping" \(no interpolation!). Use all the \ you want!
let raw2 = ##"Aren’t we "# clever"##
print(raw2)
Aren’t we "# clever
let can = "can do that too"
let raw3 = #"Yes we \#(can)!"#
print(raw3)
Yes, we can do that too!

Substrings

Another thing you often need to do when manipulating strings is to generate substrings. That is, pull out a part of the string into its own value. This can be done in Swift using a subscript that takes a range of indices.

let fullName = "Matt Galloway"
let spaceIndex = fullName.firstIndex(of: " ")!
let firstName = fullName[fullName.startIndex..<spaceIndex] // "Matt"
let firstName = fullName[..<spaceIndex] // "Matt"
let lastName = fullName[fullName.index(after: spaceIndex)...]
// "Galloway"
let lastNameString = String(lastName)

Character properties

You encountered the Character type earlier in this chapter. There are some rather interesting properties of this type that allow you to introspect the character in question and learn about its semantics.

let singleCharacter: Character = "x"
singleCharacter.isASCII
let space: Character = " "
space.isWhitespace
let hexDigit: Character = "d"
hexDigit.isHexDigit
let thaiNine: Character = "๙"
thaiNine.wholeNumberValue

Encoding

So far, you’ve learned what strings are and explored how to work with them but haven’t touched on how strings are stored or encoded.

UTF-8

A much more common scheme is called UTF-8. This uses 8-bit code units instead. One reason for UTF-8’s popularity is because it is fully compatible with the venerable, English-only, 7-bit ASCII encoding. But how do you store code points that need more than eight bits?! Herein lies the magic of the encoding.

let char = "\u{00bd}"
for i in char.utf8 {
  print(i)
}
194
189
+½⇨🙃
let characters = "+\u{00bd}\u{21e8}\u{1f643}"
for i in characters.utf8 {
  print("\(i) : \(String(i, radix: 2))")
}
43 : 101011

194 : 11000010
189 : 10111101

226 : 11100010
135 : 10000111
168 : 10101000

240 : 11110000
159 : 10011111
153 : 10011001
131 : 10000011

UTF-16

There is another encoding that is useful to introduce, namely UTF-16. Yes, you guessed it. It uses 16-bit code units!

for i in characters.utf16 {
  print("\(i) : \(String(i, radix: 2))")
}
43 : 101011

189 : 10111101

8680 : 10000111101000

55357 : 1101100000111101
56899 : 1101111001000011

Converting indexes between encoding views

As you saw earlier, you use indexes to access grapheme clusters in a string. For example, using the same string from above, you can do the following:

let arrowIndex = characters.firstIndex(of: "\u{21e8}")!
characters[arrowIndex] // ⇨
if let unicodeScalarsIndex = arrowIndex.samePosition(in: characters.unicodeScalars) {
  characters.unicodeScalars[unicodeScalarsIndex] // 8680
}

if let utf8Index = arrowIndex.samePosition(in: characters.utf8) {
  characters.utf8[utf8Index] // 226  
}

if let utf16Index = arrowIndex.samePosition(in: characters.utf16) {
  characters.utf16[utf16Index] // 8680
}

Challenges

Before moving on, here are some challenges to test your knowledge of strings. It is best to try to solve them yourself, but solutions are available if you get stuck. These came with the download or are available at the printed book’s source code link listed in the introduction.

Challenge 1: Character count

Write a function that takes a string and prints out the count of each character in the string.

Challenge 2: Word count

Write a function that tells you how many words there are in a string. Do it without splitting the string.

Challenge 3: Name formatter

Write a function that takes a string which looks like “Galloway, Matt” and returns one which looks like “Matt Galloway”, i.e., the string goes from "<LAST_NAME>, <FIRST_NAME>" to "<FIRST_NAME> <LAST_NAME>".

Challenge 4: Components

A method exists on a string named components(separatedBy:) that will split the string into chunks, which are delimited by the given string, and return an array containing the results.

Challenge 5: Word reverser

Write a function which takes a string and returns a version of it with each individual word reversed.

Key points

  • Strings are collections of Character types.
  • A Character is grapheme cluster and is made up of one or more code points.
  • A combining character is a character that alters the previous character in some way.
  • You use special (non-integer) indexes to subscript into the string to a certain grapheme cluster.
  • Swift’s use of canonicalization ensures that the comparison of strings accounts for combining characters.
  • Slicing a string yields a substring with type Substring, which shares storage with its parent String.
  • You can convert from a Substring to a String by initializing a new String and passing the Substring.
  • Swift String has a view called unicodeScalars, which is itself a collection of the individual Unicode code points that make up the string.
  • There are multiple ways to encode a string. UTF-8 and UTF-16 are the most popular.
  • The individual parts of an encoding are called code units. UTF-8 uses 8-bit code units, and UTF-16 uses 16-bit code units.
  • Swift’s String has views called utf8 and utf16that are collections that allow you to obtain the individual code units in the given encoding.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Have feedback to share about the online reading experience? If you have feedback about the UI, UX, highlighting, or other features of our online readers, you can send them to the design team with the form below:

© 2020 Razeware LLC

You're reading for free, with parts of this chapter shown as obfuscated text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.

Unlock Now

To highlight or take notes, you’ll need to own this book in a subscription or purchased by itself.