7A: Strings

Lesson 7 has three parts A, B, C which can be completed in any order.

So far, we have been using strings (items of str type) only in simple ways. In this lesson we show how to manipulate strings: how to take them apart, combine them, and how to view the individual characters that make up a string.

What is a string?

All data stored on a computer is ultimately stored as a sequence of 0s and 1s. This includes text, digital books, images, songs, videos, and "executable files" like games and applications. Strings, an example of text data, are stored in the following way:

  • a string is a sequence of characters (e.g., the string "Hello, World!" contains 13 characters including letters like "H", "e" and punctuation like " ", "!"
  • each character is actually represented by a number (e.g., "H" is represented by the number 72; this is its ASCII/Unicode value)

(Numbers are stored internally in a 0-1 binary format.)

Manipulating strings as sequences of characters: S[]

In order to manipulate a string, we need to be able to access the individual characters that make up a string. In Python this is done in the following way: for a string S and an integer index, the notation

S[index]
returns the character of S at position index. By convention the string starts at index 0: so S[0] is the first character, S[1] is the second character, etc. In "Hello, World!" the list of characters is:

Index: 0  1  2  3  4  5  6  7  8  9 10 11 12
Char.: H  e  l  l  o  ,     W  o  r  l  d  !
Note that the character at index 6 is a space.

In many other programming languages, there is a separate type for characters. In Python, characters are the same as length-1 strings, so their type is str.

Finding the number of characters in a string: len

To get the number of characters in a string, we use the Python function len. For example, len("Hello, World!") is 13.

Multiple Choice Exercise: Last Character
What expression can be used to determine the last character in a string S?
Correct! Although len(S) gives you the total number of characters in the string, since it starts with index 0, the last character is at index len(S)-1.

Here is an example of using len and [], the two tools we just introduced.

Example: String length and characters

Cutting strings: S[:]

Cutting out some part of a string gives you a substring. For example, the strings "eat" and "ted" are substrings of "repeated". To extract a substring in Python, we use the syntax

S[firstIndex:tailIndex]
to get the substring starting at index firstIndex and ending at tailIndex-1. Try to figure out the output of the following code before you run it.
Example: Substrings

Note that in taking substrings, firstIndex is included, while the tailIndex is not included. This is a common source of errors. However, it has some nice effects. For example, because of this choice, the length of the substring S[i:j] is always j-i. This convention is often depicted like a ruler:

Coding Exercise: String Shaving
Write a program which reads a string using input(), and outputs the same string but with the first and last character deleted. (You may assume the input string has length at least 2.) For example, on input Fairy a correct program will print air.
You may enter input for the program in the box below.

Pasting strings: +

We all know that 1+2=3. With strings, instead we get the following result:

Example: String Addition

As you can see, the effect of S+T is to create a new string that starts with S and has T immediately afterwards. This string-gluing operation is also called concatenation.

Coding Exercise: Heads and Tails
Write a program which reads a string using input(), and outputs the same string but with the first and last character exchanged. (You may assume the input string has length at least 2.) For example, on input Fairy a correct program will print yairF. Hint: use your solution to the previous program as part of the answer.
You may enter input for the program in the box below.

If you want to concatenate numbers, you need to convert them to str first. Otherwise you will get one of two errors, depending on the order you tried. Run this program to see the errors that can occur.
Example
Rearrange the lines to see 2 concatenation errors
  • print("high " + 5)
  • print(110 + " percent")
Here is a correct example: the str() function converts the number to a string before concatenation.
Example
Converting a number to a string with str()

As we mentioned in Lesson 4, you can multiply strings and integers: S * n is short for S + S + ... + S.

Example
String multiplication

Character codes: ord, chr

As we mentioned in the introduction of this lesson, your computer actually represents every character as a number. Which number corresponds to which character? Generally, it can depend on which encoding your computer uses, but nearly all modern computers have a standard set of characters for the numbers between 32 and 255. Here is a list of the characters with numbers between 32 and 127:

ord: 32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47
chr:      !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
ord: 48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63
chr:  0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
ord: 64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79
chr:  @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
ord: 80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
chr:  P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
ord: 96  97  98  99  100 101 102 103 104 105 106 107 108 109 110 111
chr:  `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
ord: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
chr:  p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~
Later, in lesson 8, you will write a program to generate this table.
It is not so useful to personally memorize the entire table, but there are some useful facts to remember:

  • the lowercase characters a, b, c, ..., z have consecutive character codes
  • the uppercase characters A, B, C, ..., Z have consecutive character codes
  • the digit characters 0, 1, 2, ..., 9 have consecutive character codes

Character 32 is a space, while character 127 is one of several special "control" characters. Some useful control characters are 9, which is tab, and 10 and 13 which are used for newlines.

In Python, you can convert a character into its corresponding numerical code using the ord function. The chr function does the reverse: it takes a number as input, and returns the character with that code.

Example
Examples of ord and chr

Coding Exercise: Next Letter
Write a program that takes a character as input (a string of length 1), which you should assume is an upper-case character; the output should be the next character in the alphabet. If the input is 'Z', your output should be 'A'. (You will need to use an if statement. For another hint click here.)
You may enter input for the program in the box below.

Some systems only support printable characters between 32 and 127; others have printable characters up to 255 or 65535; in Unicode there are hundreds of thousands of characters. You can read more about the history here or here.

Here are two more exercises to finish the lesson.

Coding Exercise: Pig Latin
Pig Latin is a nonsense language. To transform a word from English to Pig Latin, you move the first letter to the end and add "ay" after that. For example, monkey becomes onkeymay in Pig Latin, and word becomes ordway. Write a program that takes a single word as input and translates it to Pig Latin. (In reality, Pig Latin has rules that are more complex than this, but we ignore them for the purposes of this exercise.)
You may enter input for the program in the box below.

Coding Exercise: The Name Game
The Name Game lets you make a song out of any person's name. Listen to the song to get an idea of how it works:
Your program should take a person's name as input, for example "pearl," and print out the song like

pearl, pearl, bo-bearl
banana-fana fo-fearl
fee-fi-mo-mearl
pearl!
Note that the entire name appears three times; in addition the name appears three more times with the first letter replaced by b, f, or m. (In reality, the song has rules that are more complex than this, but we ignore them for the purposes of this exercise.)
You may enter input for the program in the box below.

Continue on to the next lesson!