In this lesson, we give more detail about how Python stores lists. Copying and comparing lists can have unexpected consequences. We will explain how Python works internally, so that you can understand and avoid these mistakes. The general issue is that several different variables can "point to" or "refer to" the same list. Towards the end of the lesson we describe the "
is" operator which tells whether two variables really point to exactly the same list.
Let's say we want a code fragment to convert a list of lengths in inches called
oldSize, to a list of the same lengths in centimeters called
newSize. The most natural way to do this is to use the line
newSize = oldSize to make a copy, and then go through and change all the values:
oldSize = ["letter", 8.5, 11] # size of paper in inches newSize = oldSize # make a copy newSize = newSize*2.54 # convert to cm newSize = newSize*2.54 # convert to cmNow let's see if this actually works. If you print
newSizethen it gives
["letter", 21.59, 27.94]as expected. But there is a big surprise if you also print
oldSize: the values of
oldSizehave changed! We illustrate below:
We will now give a detailed explanation of exactly what happened, using diagrams. The main problem is that
newSize = oldSize didn't really copy the whole list: it just copied a reference (arrow) to the same list.
Click on the slide titles to change tabs.
city = "Moose Factory"this table (with the black border) shows what Python's memory looks like:
population = 2458
employer = city
The first variable's name is
cityand its value is the string
"Moose Factory". The second variable's name is
populationand its value is the integer
2458. The third variable's name is
employerand its value is the string
myList = ["Moose Factory", 2458]This will just create one variable, named
myList. A list is created, and the value of
myListis set equal to "point" or "refer" to that list. We represent the list using a box, and the values of the list's entries are shown inside the box next to their corresponding indices. The list is shown in blue.
The arrow shows that
myListrefers to this new list. The element at index 0 of the list is the string
"Moose Factory", and the element at index 1 is the integer
2458. For example, if you
print(myList)then Python will output
myList = ["Moose Factory", 2458]Python calculates
myList = myList+1
2459, and this replaces the value at index 1 of the list. After the update, we have the diagram shown below.
(The 2458 isn't part of Python's memory, it is only shown to emphasize the change.)
oldSize = ["letter", 8.5, 11]and so Python's memory looks like the diagram below after this line is executed. We created a list of length 3.
newSize = oldSizeand here we actually find the main issue: in the second line,
=does not duplicate the list! Instead, it just copied a new reference to the same list. This is illustrated by two different arrows pointing to the exact same box. This is quite different from copying numbers or strings (like in the first slide).
As the diagram shows, we have two variables, which both refer to the same list.
newSize = newSize*2.54Python looks at
newSize, looks up the value of its index 1 (8.5) and multiplies it by 2.54, and then replaces the value, as shown. However, since
oldSizereferred to the same list, a side effect is that we also affected
(Again 8.5 isn't part of Python's memory, and is shown to emphasize the change.)
newSize = newSize*2.54which affects the other value in the list. After this line executes, Python's memory looks like the diagram shown below.
Now when we print either
oldSize, Python outputs
["letter", 21.59, 27.94].
|Another way to look at this example is by running it through the Visualization tool. If you do that, instead of arrows, notice that lists are named by their "ID." Each list basically lives at some "address" equal to its ID; an address value is like an arrow pointing at the box at that ID address.|
How to Correctly Copy a List in Python
Although it is sometimes useful to have multiple references to the same list, here it is not what we wanted.
We'll give three solutions. They are all pretty much equivalent, but each one will teach you a new fact about Python, so all three are worth reading.
Method 1, using
Above, we demonstrate that
newList = oldList[:] creates a real copy of the old list. Although this syntax looks strange, it is a relative of something we already saw. In the string lesson, we introduced a way of extracting a substring:
string[first:tail] returns the substring starting with index
first and ending with index
tail-1. We mentioned it can also apply to create sublists of lists. In addition,
- if you omit
first, then it takes the default value of
- if you omit
tail, then it takes the default value of
len(the length of the list/string).
So what really happens is that
oldList[:] is creating a new sublist, but containing all of the same data as the original list, so it is a new copy.
Method 2, using
There is a module called
copy, which contains several methods related to copying. The simplest one is
copy.copy(L): when you call this on a list
L, the function returns a real copy of
copy.copy() function also works for other kinds of objects, but we won't discuss them here.
Method 3, using
The last way of making a real copy is using the list() function. Ordinarily,
list() is used to convert data from other types to the
list type. (For example
list("hello") converts the string "hello" to a 5-element list, each element being one character.) But if you try to convert something that is already a list into a list, it just makes a copy.
Lists as Arguments
Notice that because of the way that lists work, any function which accepts a list as an argument can actually change the contents of the list. (You saw this already in the
list = list + list
list = list + list
data = [3, 4]
func, we change the element at index 0 to 7, and the element at index 1 to 7+4=11. So we print 7*11.
Comparing lists using
When are two list variables
L2 equal? There are two different ways that we can interpret this question:
- Same Identity: Are
L2pointing/referring to the exact same list object?
- Same Values: Are the contents of list
L1equal to the contents of list
It turns out that in Python, the standard equality operator
== has the Same Values meaning, as the following example shows.
To test for Same Identity, Python has the
is operator. We use this operator the same way as
==: the syntax
«list1» is «list2»
True if the lists refer to the same list, and
False if they refer to different lists (even if they have the same contents).
Trueshow up in the output of this program? (Draw a diagram to help keep track.)
list1 = [9, 1, 1]
list3 = list(list1)
list2 = list1
list4 = list3[:]
list1 = 4
list5 = list1
print(list1 is list2, list2 is list3, list3 is list4, list4 is list5)
print(list1 == list2, list2 == list3, list3 == list4, list4 == list5)
True False False FalseTrue False True False
| You should not use |
We have already given most of the important information, but there is one other common situation worth mentioning. A nested list is a list within another list, for example
sample = [365.25, ["first", 5]]
shows a nested list. The outer list, which is what
sample refers to, has two elements; the element at index 0 is a float and the element at index 1 is the inner list. The inner list is
["first", 5]. (You can have more levels of nesting too.) Once you start using nested lists, keep in mind:
- Applying the three methods above to
samplewill copy the outer list, but not the inner list. So
copy(sample) is sample, meaning the copy still has a reference to part of the original list. This probably was not what you wanted. If you want to make a real copy at all levels, use
- Testing nested lists with
==is pretty intuitive: Python recursively calls
==on each list element. For example
[[1, 2], 3]==[[1, 2], 3]is
[[1, 2], 3]==[1, 2, 3]is
Falsesince the first elements are different (
[1, 2] != 1).
You have now completed this lesson! The following extra material is optional.
We mentioned above that when you call a function on a list, it can alter the list. Sometimes you want to make this impossible! A solution in Python is to create tuples, which are the same as lists except they can never be changed. We say lists are "mutable" and tuples are "immutable". (Strings and numbers are also immutable.) This can be a useful way to prevent any programming errors from altering lists that should stay unchanged. Tuples are almost identical to lists except that they use round parentheses
() instead of square brackets
. You can convert between tuples and lists using
∞ and Beyond: Self-Containment
It is possible to have a list that contains itself! Simply make a list, and then redirect one of its elements to point back to the whole list:
Notice that Python's output engine is smart enough to recognize that the list loops back on itself: it prints "
..." instead of printing all of
L again, to avoid an infinite loop.