18: Efficiency

Many programming tasks can be done in more than one way, but one way might be much faster than another. Designing fast programs is part of the art and science of computer programming. We look at a couple of examples in this exercise.

Part 1: Don't Recompute The Same Thing Twice

The Fibonacci numbers are a fascinating and simple sequence of numbers. You start with two numbers, 1 and 1. Then the rule is, to get the next number, add the previous two. Therefore, the next number is 1+1=2. This gives the first three terms,

1, 1, 2

and the fourth term is 1+2=3, then we have 2+3=5, and so forth:

1, 1, 2, 3, 5, 8, 13, ...

The Fibonacci sequence was originally invented to talk about rabbit populations, and it has fantastic connections to the architecture of plants. Here is part of an awesome video series about the Fibonacci numbers:

The definition of the Fibonacci numbers lends itself naturally to a recursive function. The next exercise defines a function Fibonacci(n) to give the nth item in the list above (starting from n=1).

Scramble Exercise: Nibofacci
Unscramble this program so that it gives a recursive definition of the Fibonacci numbers. The grader will print out the first ten of them.
  • return Fibonacci(n-1) + Fibonacci(n-2)
  • return 1
  • if (n==1 or n==2):
  • else:
  • def Fibonacci(n):
Enter testing statements like print(myfunction("test argument")) below.

However, this recursive function becomes too slow to compute entries further down the sequence. Press Enter test statements and type print(Fibonacci(80)). When you test it, you get "Time Limit Exceeded."

Why is it going so slowly? The function has no complicated instructions or loops, just addition. So the slowness turns out to be related to the number of total calls to the function. If we call Fibonacci(3), the recursive function is called three times in total: the initial call, then two recursive calls. If we call Fibonacci(4), the recursive function is called five times: the initial call, the three times just mentioned for n=3, and one more recursive call with n=2. Computing Fibonacci(5) makes a total of nine calls, and Fibonacci(6) makes a total of 9+5+1=15 calls. The number of calls gets very large very quickly as n grows!

As a rough estimate, Fibonacci(n+2) requires at least twice as many total recursive calls as Fibonacci(n), since Fibonacci(n+2) calls Fibonacci(n) once directly, and another time indirectly through the recursive call to Fibonacci(n+1). So the computing time is proportional to an exponential function at least as large as (√2)n. This is too slow! E.g., Fibonacci(80) requires more than 240 = 1099511627776 recursive calls.

This argument even outlines the exact conceptual problem: calling Fibonacci(n) twice and re-computing the answer from scratch the second time is wasteful. We should come up with some approach where we don't waste time re-computing the same thing over and over again.

The solution

Let's try doing something in Python more similar to the introduction. We started by writing down 1, 1. Then we kept extending the sequence by adding the last two elements together. Here is what the code looks like; again, it's up to you to unscramble it.

Scramble Exercise: Fast Fibonacci
Unscramble the fast version to compute larger Fibonacci values.
  • for i in range(3, n+1):
  • return sequence[n]
  • sequence = [0, 1, 1] # Fibonacci(0) is 0, Fibonacci(1) and Fibonacci(2) are 1
  • sequence.append(sequence[i-1] + sequence[i-2])
  • def Fibonacci(n):
Enter testing statements like print(myfunction("test argument")) below.

There is still a little room for improvement, since we don't need to re-compute the whole array in each new call, but this is good enough since it runs quickly even on large values of n!

Part 2: Don't Compute Unnecessary Things, Even Once

Our second example is about checking whether numbers are prime, which is important in cryptography and computer security. A number is prime if it has exactly two factors: 1 and itself. The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23. (For example, 21 is not prime because it has factors 3 and 7 as well as 1 and 21.)

How can we test whether a number is prime in Python? We saw earlier how to test divisibility:

N % D == 0  # will be True if D is a divisor of N, False otherwise

So by testing all possible divisors, we arrive at the following program.

Example
Testing whether a few numbers are prime
Enter testing statements like print(myfunction("test argument")) below.

It works! But it is also too slow on large numbers. Pick Enter input and type isItPrime(324635459). It runs out of time. Try some more values... for primes greater than about 10000000 the code always runs out of time, because the grader can only execute the divisibility-checking loop about 10 million times per second. If we want to check larger numbers, we will need a more efficient idea. Let's make the code work even for numbers as big as a trillion (1000000000000)!

Do we really need to check all of the numbers between 2 and N-1, to check if N is prime? Hint

The idea, and an argument

If you read the hint and experimented, you might have noticed that when N is not prime, the program usually found a pretty small factor compared to N. For example, isItPrime(34827948723948723984729834) runs pretty quickly even though its input is gigantic, finding the divisor D=2.

Maybe we don't actually need to check all possible factors. Is there some small limit to the number of factors we need to check, before we can be sure that N is prime? Thankfully, yes! In fact we can argue that if N is not prime, one of its divisors is at most sqrt(N). Why? Well, if N is not prime, then it has a divisor A. Being a divisor means that there is some other number B such that

A*B == N

Let's continue the argument. If A <= sqrt(N) or B <= sqrt(N), then we are happy: we found a factor of N that is small, like we wanted. But actually these are the only possibilities: otherwise we get the impossible contradiction

N = A*B > sqrt(N)*sqrt(N) = N

Great! So now, let's implement this new idea in Python. The easiest way to change the old approach is to add a test within the for loop: once D > sqrt(N) (or equivalently, D*D > N), we can just break out of the loop and stop testing.

Example
Faster primality checking

The program now works on gigantic primes!

Final Exercise

In this exercise, we combine the primes from the second half of the lesson with the list-based approach of the first half. Your code should fill out a table of length 1000001 so that isPrime[N] equals True if N is prime, and False if N is composite, for all N up to one million. (isPrime[0] and isPrime[1] should be False.)

Click here for a hint. It's a big one!

Coding Exercise: Primed for Takeoff
Write a program that defines the table isPrime we described above (notice that isPrime is a list, not a function). Hint
The grader will allow a longer-than-usual amount of time, 7 seconds, for your program to execute. However, simply using the isItPrime function will not be fast enough.
Enter testing statements like print(myfunction("test argument")) below.

This is the last exercise in the CS Circles website. Congratulations to those who have tried all of the lessons! See the resources page for some suggestions on what to learn next.  Have fun, good luck, and code well!