Python

Python: fixing random numbers for testing

Prev Next

How do you test any python code that uses the random module?

You might know about mocking methods of the random module and fixing the returned values. It is good because you can exactly say what should be the (fake) random values, but that is only reasonable if you have a very limited number of values that should be randomly generated.

If you have a need for a potentially much bigger set of random numbers you need something else. You might have read about repeating the same random number using seed. That's what we are going to use. We are going to fix the seed for testing.

To show how this works I've created a scrip to pick N random names from a file where each line is a name.

examples/python/pickaname.py

import random
import sys

def select_names(filename, count):
    with open(filename) as fh:
        names = fh.read().splitlines()

    selected_names = []
    for _ in range(count):
        pick = names[ random.randrange(len(names)) ]
        if pick not in selected_names:
            selected_names.append(pick)
    return selected_names


if __name__ == '__main__':
    if len(sys.argv) != 3:
        exit('Usage: {} FILENAME COUNT'.format(sys.argv[0]))
    names = select_names(sys.argv[1], int(sys.argv[2]))
    for n in names:
        print(n)

If you look carefully you will notice there is a bug in the code. If you have not found it yet, don't worry, that's part of the point of testing. To find cases where our brilliant code fails our expectations.

Basically it gets a name of a file and a number on the command line and prints that many names. Without repetition.

In order to demonstrate it I've download a list of 20 names (The 10 most popular boy and girl names in some country in some year.)

examples/data/names.txt

OLIVIA
RUBY
EMILY
GRACE
JESSICA
CHLOE
SOPHIE
LILY
AMELIA
EVIE
JACK
OLIVER
THOMAS
HARRY
JOSHUA
ALFIE
CHARLIE
DANIEL
JAMES
WILLIAM

So what happens if we run the code?

$ python examples/python/pickaname.py examples/data/names.txt 3
CHLOE
HARRY
OLIVER

$ python examples/python/pickaname.py examples/data/names.txt 3
RUBY
GRACE
JACK

$ python examples/python/pickaname.py examples/data/names.txt 3
CHLOE
DANIEL

The first two times it worked well and returned 3 random names. The 3rd time it only returned 2 names.

If you have not noticed found the bug earlier this will probably make it a bit easier.

In any case the problem was that once the code ensures that there is no repetition, it does not try again.

Let's write a test for this code:

examples/python/test_pickaname_1.py

import pickaname

def test_pickaname():
    names = pickaname.select_names('../data/names.txt', 3)
    assert len(names) == 3

We run the test a few times and it is successful...

$ pytest test_pickaname_1.py
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-4.0.2, py-1.7.0, pluggy-0.8.0
rootdir: /home/gabor/work/code-maven.com/examples/python, inifile:
collected 1 item

test_pickaname_1.py .                                                    [100%]

=========================== 1 passed in 0.00 seconds ===========================

... but at one point it fails:

$ pytest test_pickaname_1.py
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-4.0.2, py-1.7.0, pluggy-0.8.0
rootdir: /home/gabor/work/code-maven.com/examples/python, inifile:
collected 1 item

test_pickaname_1.py F                                                    [100%]

=================================== FAILURES ===================================
________________________________ test_pickaname ________________________________

    def test_pickaname():
        names = pickaname.select_names('../data/names.txt', 3)
>       assert len(names) == 3
E       AssertionError: assert 2 == 3
E        +  where 2 = len(['AMELIA', 'CHLOE'])

test_pickaname_1.py:5: AssertionError
=========================== 1 failed in 0.02 seconds ===========================
(venv3) gabor@thinkpad:~/work/code-maven.com/examples/python$

So we can't use this test because it will randomly. We cannot even use it to consequently show that there is an error.

That's the nature of random.

So what can we do?

We know that the random numbers the random module generates are actually pseudo random numbers and that by fixing the seed we will get the exact same random values every time we run the code.

So let's try this.

examples/python/test_pickaname_2.py

import pickaname
import random

def test_pickaname():
    random.seed(42)
    names = pickaname.select_names('../data/names.txt', 3)
    assert len(names) == 3
    print(names)

42 is just an arbitrary number I picked.

If we run this test it will succeed. But what happens if we run it several time? How many time do we have to run to conclude that it will always succeed.

We can print the actual names picked by the code that observe they are always the same.

$ pytest test_pickaname_2.py  -sq
['GRACE', 'OLIVIA', 'AMELIA']
.
1 passed in 0.00 seconds

Here -s told pytest to to print the output of the test script to the console. I also used -q to tell pytest to be as silent as it can be so it is easier to see the results.

Run the same code several times and observe that not only the number of names remains 3, we also get the exact same names every time.

OK, so we have a test that reliably checks one random case. How can we reproduce the case when the function only returns 2 names.

For that we'll have to pick another number.

I had to dig a bit, actually I wrote a look that checked the whole numbers from 0 as seeds and found that using the number 2 as seed will reliably make the function return only two elements. (What a coincidence.) I could use that number, but I am sure some reader of my code down the road might think that the number of elements is somehow 2 because the seed was 2 so I looked for another number and found that 11 caused the same issue, albeit returning two different names.

So here is a test case that will reliably fail.

examples/python/test_pickaname_3.py

import pickaname
import random

def test_pickaname():
    random.seed(11)
    names = pickaname.select_names('../data/names.txt', 3)
    assert len(names) == 3
    print(names)

That's it, until someone fixes the function.

Going further

We could go further and test that the returned names are exactly the same on every run. This might be a good idea if we had some random-based algorithm that should get us some real results.

Conclusion

It can be really useful to fix the seed when testing some code that uses random numbers.

Prev Next

Written by
Gabor Szabo

Published on 2019-01-25

If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub. Comment on this post

Author: Gabor Szabo

Gábor who writes the articles of the Code Maven site offers courses in in the subjects that are discussed on this web site.

Gábor helps companies set up test automation, CI/CD Continuous Integration and Continuous Delivery and other DevOps related systems. Gabor can help your team improve the development speed and reduce the risk of bugs.

He is also the author of a number of eBooks.

Contact Gabor if you'd like to hire his services.

If you would like to support his freely available work, you can do it via Patreon, GitHub, or PayPal.