7. Random numbers#

In this notebook, you will implement random numbers in Python and learn how to make computations with them. First we characterize the statistical properties of random numbers by calculating the mean and standard deviation. Second, we perform numerical integration with random numbers.

Random numbers can be a good use in many parts of physics. You can simulate physical processes with them or even use them to calculate integrals. Despite the fact that random numbers are not deterministic, i.e. they vary between realizations, you can make numerical simulations with them just as with normal numbers.

Learning objectives: After finishing this lecture, you should be able to:

  1. Generate random numbers and know how to analyze them

  2. Implement Monte Carlo integration

# Initialisation code for the notebook
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['figure.dpi'] = 100

7.1. Introduction to Random numbers#

Let us look at random numbers generated with Python. Let’s say we want an array 5 random numbers drawn from a uniform distribution between 0 and 1. In this case you can use the following code

x=np.random.rand(5)
print(x)

You can see that the numbers fall in the interval between 0 and 1 and they seem fairly random. Note that if you execute the cell multiple times you get different seemingly random numbers. However they are not random!

7.2. How random are random numbers?#

True random numbers, like the ones you use in statistics, are challenging to obtain. A computer can only run deterministic algorithms, hence any number generated by a computer is not truly random. A computer generates pseudorandom numbers that for the majority of problems are sufficiently random.

A common method that the computer uses to generate random numbers is through the use of the equation

\[ x_{n+1}=(a x_n + b) \, \text{mod }m \]

where \(a\), \(b\), and \(m\) are integer parameters and \(x_n\) is an integer random number. The sequence can be generated from an input \(x_0\) and then be iterated. Starting with an initial number \(x_0\), \(a\), \(b\), and \(m\) the equation makes new numbers that are seemingly random (for the proper choice of \(a\), \(b\) and \(m\)) but follow the same sequence. The number \(x_0\) is called the seed, and when calling the random number generator with a fixed seed the random numbers are always the same. When you call the random function in Python the algorithm picks a seed from a random sequence stored in the computer. Therefore you get a different random number every time you call rand().

To set the seed for the computer you can use the function seed() from the numpy random library with in between the () a fixed number variable. The number is of no importance, only that is an integer. Seeding can be handy for debugging and to compare your code to the code of somebody else. So in the exercises in this note we set the seed to 0 to get answers that can be quantitatively checked. Try it out for yourself.

np.random.seed(0)
x=np.random.rand(5)
print(x)

Note that when you repeatedly call a random function in a single cell then every call starts of where the previous one left off. If you want the same sequence you would have to re-seed the random number generator. See the example below:

np.random.seed(0)
x=np.random.rand(5)
print(x)
y=np.random.rand(5)
print(y)

7.3. Random numbers with distributions#

Here we will generate random numbers for a few of the most common distributions. Numpy provides a variety of random numbers with different distributions.

7.3.1. The uniform distribution#

Random numbers generated with the uniform distribution have an equal probability to fall within an interval defined by the bounds \([a, b]\). The probability density function for the uniform distribution is

\[\begin{split} p(x)=\left\{ \begin{array}{ll}\frac{1}{b-a} & \, \, \text{for} \, a \leq x \leq b\\0 & \, \, \text{else} \end{array}\right. \, . \end{split}\]

The uniform distribution has a mean

\[ \mu = \frac{1}{2}(a+b) \]

and a variance

\[ \sigma^2 = \frac{1}{12}(b-a)^2 \]

Exercise 1

Create a uniformly distributed random variable between bounds 5 and 8 and draw \(N\)=100 samples from it, as an array.

  • Determine the mean of the generated samples and store this to variable meanx

  • Determine the variance and store this to variable varx (in this case the biased variance with \(N\) in the denominator)

  • Check that the mean and variance numbers approximately match the theoretical values as shown above, increase \(N\) to see convergence to the theoretical values.

Note that you can generate the uniformly distributed random numbers in two ways: from the uniform() function and from the standard rand() function (with uniform distribution between 0 and 1). The latter has to be shifted and scaled to match the interval \(a, b\). Try them both! Also note that you can use numpy functions for computing the mean and variance

# Use seed to fix random number generation
np.random.seed(0)

# Define parameters
a=5
b=8
N=100

# meanx = ...
# varx = ...

answer_9_01_1 = np.copy(meanx)
answer_9_01_2 = np.copy(varx)
question = "answer_9_01"
num = 2

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Solution:

Hide code cell source

# Use seed to fix random number generation
np.random.seed(0)

# Define parameters
a=5
b=8
N=100

### BEGIN SOLUTION

# Create random variable
#x=a+(b-a)*np.random.rand(N)
x=np.random.uniform(a, b, N)

# Determine statistical properties
meanx=np.mean(x)
meanxth=0.5*(a+b)
print('The measured mean is ' + str(meanx) + ', which is close to the theoretical value of ' + str(meanxth))
varx=np.var(x, ddof=0)
varxth=(1/12)*(b-a)**2
print('The measured variance is ' + str(varx) + ', which is close to the theoretical value of ' + str(varxth))
### END SOLUTION

answer_9_01_1 = np.copy(meanx)
answer_9_01_2 = np.copy(varx)

7.3.2. Gaussian random numbers#

The Gaussian distribution, or normal distribution, is a distribution that occurs a lot in physics. This is due to the fact that many physical observables are the result of a sum of a large number of random steps. In the limit of infinitely many of these steps, the distribution of this sum converges to the Gaussian distribution. The Gaussian probability density function is

\[ p(x)=\frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(x-\mu)^2/2\sigma^2} \]

with \(\mu\) the mean and \(\sigma^2\) the variance.

Exercise 2

Create a Gaussian distributed random variable with mean \(\mu=10\) and standard deviation of \(\sigma=7\) and take \(N=100\) samples from it, as an array.

  • Determine the mean of the generated samples and store this to variable meanxn

  • Determine the variance and store this to variable varxn (in this case the biased variance with \(N\) in the denominator)

  • Check that the mean and variance numbers match the theoretical values , increase \(N\) to see convergence to the theoretical values.

Note that you can generate the Gaussian distributed random numbers with mean \(\mu\) and variance \(\sigma^2\) in two ways: from the normal() function and from the standard normal distributionrandn() function (with zero mean and variance 1). The latter has to be shifted and scaled to match the arbitrary Gaussian. Try them both!

# Use seed to fix random number generation
np.random.seed(0)

# Define parameters
mu=10
sigma=7
N=100

# meanxn = ...
# varxn = ...

answer_9_02_1 = np.copy(meanxn)
answer_9_02_2 = np.copy(varxn)
question = "answer_9_02"
num = 2

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Solution:

Hide code cell source

# Use seed to fix random number generation
np.random.seed(0)

# Define parameters
mu=10
sigma=7
N=100

### BEGIN SOLUTION

# Create random variable
x=mu+sigma*np.random.randn(N)
#x=np.random.normal(mu,sigma,N)

# Determine statistical properties
meanxn=np.mean(x)
meanxth=mu
print('The measured mean is ' + str(meanxn) + ', which is close to the theoretical value of ' + str(meanxth))
varxn=np.var(x, ddof=0)
varxth=sigma**2
print('The measured variance is ' + str(varxn) + ', which is close to the theoretical value of ' + str(varxth))

### END SOLUTION

answer_9_02_1 = np.copy(meanxn)
answer_9_02_2 = np.copy(varxn)

A simple example of the use of random numbers is in the calculation of \(\pi\). The method is quite simple. You generate a lot of points in a known area, in this case a square, that encloses the object for which you want to know the area, in this case the unit circle. Next, you count the number of points inside the object and divide that by the total number of points you generated. The ratio is an estimate of the fraction of area that the object covers. Finally, you multiply the ratio by the area of the known area around the object and you have an estimate for the to be integrated area. Since the area of a circle is \(\pi r^2\), the estimated area of our object (the unit circle) is \(\pi\).

Exercise 3

Generate \(N=100\) uniformly distributed random numbers between -1 and 1 for the \(x\)-coordinates. Do the same for the \(y\)-coordinates.

  • Plot the random numbers in the \(x\)-\(y\) plane, add a circle according to the given code

  • Make a for loop and iterate over all \(N\) points

  • Check for every point whether it is within the unit circle (\(x^2+y^2<1\)), if so add 1 to a variable that counts the number of points in the unit circle

  • Calculate \(\pi\) by dividing the number of points inside the circle by \(N\) and multiply by the area of the square surrounding the unit circle, store your value of \(\pi\) in the variable pi_est.

# Use seed to fix random number generation
np.random.seed(0)

# Plot circle
angle = np.linspace(0, 2*np.pi, 100)
plt.plot(np.cos(angle), np.sin(angle), '--r')

npoints=100

# pi_est = ...

plt.plot(x,y,'ob')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.axis('square')
plt.xlim([-1.0,1.0])
plt.ylim([-1.0, 1.0])

answer_9_03_1 = np.copy(pi_est)
question = "answer_9_03"
num = 1

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Solution:

Hide code cell source

# Use seed to fix random number generation
np.random.seed(0)

# Plot circle
angle = np.linspace(0, 2*np.pi, 100)
plt.plot(np.cos(angle), np.sin(angle), '--r')

npoints=100
### BEGIN SOLUTION
x=-1+2*np.random.rand(npoints)
y=-1+2*np.random.rand(npoints)

cnt=0
for j in range(npoints):
    if x[j]**2 + y[j]**2 < 1:
        cnt +=1

# the vectorized version using Boolean indexing        
# inpoints=x*x+y*y<1  # determine indices for point inside the circle
# cnt=float(sum(inpoints))

pi_est=4*cnt/npoints

print('The value of pi is '+"{:.10f}".format(pi_est))
         
### END SOLUTION

plt.plot(x,y,'ob')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.axis('square')
plt.xlim([-1.0,1.0])
plt.ylim([-1.0, 1.0])

answer_9_03_1 = np.copy(pi_est)

Self-check:

  • Why is our estimated value of \(\pi\) not exact?

  • Can you formulate an error bar for your estimate of \(\pi\)?

7.4. Monte Carlo integration#

Monte Carlo integration not only can be used to estimate surface areas or volumes, it can be used to integrate any function in any multi-dimensional space. Note that in principle the dimension of the integrated volume can be greater than or equal to 1.

Assume that we want to integrate over a (hyper-) volume \(S\). The way this works is through the mean value method by

\[ I=\frac{V}{N}\sum_{i=1}^{N} f({\vec{r}_i}) \chi({\vec{r}_i}) \, . \]

Now, have a good look at this formula.

  • \(V\) represents a (hyper-) volume that encloses the to be integrated volume \(S\), \(V\) is usually a rectangle or (hyper-) cube.

  • \(N\) is the number of random points in \(V\)

  • \(f\) the function to be integrated. If you just want to calculate a surface area or (hyper-) volume \(f\)=1

  • The function \(\chi\) is an index function that is 1 if \(\vec{r}_i\) is inside \(S\) and 0 if \(\vec{r}_i\) is outside of \(S\).

So if \(f\)=1, the (hyper-) volume is just the sum of the number of points in \(S\) divided by the total number of points in \(V\). For example, for the case of calculating \(\pi\), the ‘volume’ \(V\) is the square area surrounding the circle \(S\). The value of \(\pi\) is then calculated by summing up all numbers in the circle and dividing them by the area of the square. If we want to integrate the function \(f\) over the volume, we calculate \(f(\vec{r}_i)\) for all random points. Multiplication with \(\chi\) makes sure that we do not integrate the function \(f\) outside of \(S\).

So in general, we have to calculate \(f(\vec{r}_i)\) and \(\chi({\vec{r}_i})\) as fast as possible. Next we multiply them and sum them, and thats it. A quick way to do this is is by using array and Boolean indexing, a nice feature of many programming languages. Try it first for yourself.

7.4.1. Boolean or “mask” index arrays#

Integer-type arrays can be used for advanced indexing to access the contents of an array. Consider the simple example below

y = np.array([ 0, 2, 4, 8, 16, 32])
index = np.array([2, 3, 4])
print(y[index])

Now if we would like to check for an array whether a condition is met for all elements, it would be nice to make an array with the indices where this statement is true. You can do this by using Boolean variables.

Booleans are variables that have two values true or false. You can use Python to check whether a statement is true or false and Python then generates a boolean variable. For example, for a variable a

a=2>3
print(a)

results in a being of type bool and having a value of False. Boolean indexing works in the following way: if x is an array and y is a boolean-value array of the same shape as x. Then x[y] returns an array that is formed from x and contains the values from x where y is True. Try it out for yourself.

Generate an array x of 100 random numbers between 0 and 10 and count using Boolean indexing how many numbers are larger than 5.

Method 1:

  • Make a Boolean value array y for the condition x>5

  • Print the variable y, the values are Boolean, i.e., True or False

  • Extract from array x the elements that match the condition with x[y]

  • Calculate the length of the extracted array

# Use seed to fix random number generation
np.random.seed(0)

N=100
x=10*np.random.rand(N)

Solution:

Hide code cell source

# Use seed to fix random number generation
np.random.seed(0)

N=100
x=10*np.random.rand(N)

### BEGIN SOLUTION

# Method 1
y=(x>5)
z=x[y]
print(len(z))

### END SOLUTION

Just as you can extract a series of numbers from an array with advanced indexing Python recognizes that if you use a Boolean array you to convert it to an array with values equal to the locations that are True.

Method 2

  • Make a Boolean variable y for the condition x>5

  • Take the sum of y

Your code here:

Solution:

Hide code cell source

### BEGIN SOLUTION

# Method 2
y=(x>5)
z=sum(y)
print(z)

### END SOLUTION

Obviously, method 2 is much faster to implement. But wait a minute, a sum over an array of Booleans creates an integer number, something weird must be going on here?!

In fact, Python is sufficiently clever to understand that if you take a sum that True counts as 1 and False as zero. Another way of seeing is this is that 1*y, with y a Boolean array, generates a sequence of ones and zeros that can be summed. So when performing mathematical operations with Booleans these may be converted to numeric values.

7.4.2. Multi-dimensional integration#

Using the method of Boolean indexing, the speed of calculating Monte-Carlo integrals through the mean value method can be significantly improved. When the mean value method is used to calculate areas or volumes \(f(\vec{r}_i)=1\). In the most general case \(f(\vec{r}_i)\) is the function that is integrated over an area or volume.

Exercise 4

Change your code for calculating \(\pi\) (exercise 3) using Boolean indexing, i.e. replace the for loop with two lines of code: one for Boolean indexing the points within the circle, the second for calculating the sum.

The beauty of Monte-Carlo integration is that extension to higher dimensionality integrals is straightforward. Just generate the random numbers in the multi-dimensional space enclosing the to be integrated volume \(S\), apply Boolean indexing, and calculate the mean value.

Exercise 5

In this exercise we are going to calculate the mass of a 3D sphere with linearly increasing density. This problem is quite challenging to do analytically, but with Monte Carlo integration it is relatively easy.

Let’s first start with the simple problem of determining the mass of a sphere with uniform density \(\rho_0=1.5\) kg/m\(^3\). The sphere has a radius of 1 meter, and we initially integrate it with \(N=100\) points.

  • Generate an array of \(N\) random numbers for the \(x\), \(y\), and \(z\) coordinates. Make sure that all points lie in a cube enclosing the sphere

  • Generate an index function. This is an array that is equal to 1 for points inside the sphere and 0 outside the sphere. Use the Boolean indexing technique you learned above to do this without using a for loop.

  • Calculate the mass by applying the mean value method, store your mass as variable msphere.

  • Do you get a mass of 2\(\pi\) kg? And if you increase \(N\)?

# Use seed to fix random number generation
np.random.seed(0)


# use the following code structure to calculate the x,y,z random points
# x=...np.random.rand(...)...
# y=...np.random.rand(...)...
# z=...np.random.rand(...)...

# msphere = ...

answer_9_05_1 = np.copy(msphere)
question = "answer_9_05"
num = 1

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Solution:

Hide code cell source

# Use seed to fix random number generation
np.random.seed(0)

### BEGIN SOLUTION

# Definition of parameters
R=1        # radius of the sphere in m
rho0=1.5   # density of the sphere in kg/m^3
V=(2*R)**3 # reference volume of the volume enclosed by the sphere
N=100

# Generation of random points inside the volume
x=2*R*np.random.rand(N)-R # define random numbers in enclosing box
y=2*R*np.random.rand(N)-R
z=2*R*np.random.rand(N)-R

# Generation of index function
inpoints=x*x+y*y+z*z<R**2  # determine indices for point inside

# Calculation of totalmass
msphere=(V/N)*sum(rho0*inpoints)

print('The mass of the sphere is '+ str(msphere) + ' kg')

# for reference the mass is
# mass=rho0*(4/3)*np.pi*R**3

### END SOLUTION

# use the following code structure to calculate the x,y,z random points
# x=...np.random.rand(...)...
# y=...np.random.rand(...)...
# z=...np.random.rand(...)...

answer_9_05_1 = np.copy(msphere)

Exercise 6

Now we make it a bit more difficult by adjusting the code. Consider a density that is a linear function of the radius according to \(\rho(r)=\rho_0 r\).

Calculate the mass of the sphere for the case of the linear increasing density using the methodology developed in exercise 5 and the same parameters. Store your calculated mass for \(N=100\) as variable mspherelin. With some effort we can calculate the analytical answer as 1.5\(\pi\), however, as you can see the Monte Carlo integration is a lot easier to implement. Increase \(N\) to see whether your answer gets close to the analytical answer.

# Use seed to fix random number generation
np.random.seed(0)

# mspherelin = ...

answer_9_06_1 = np.copy(mspherelin)
question = "answer_9_06"
num = 1

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Solution:

Hide code cell source

# Use seed to fix random number generation
np.random.seed(0)

### BEGIN SOLUTION

# Defnition of parameters
R=1        # radius of the sphere in m
rho0=1.5    # density of the sphere in kg/m^3
V=(2*R)**3 # reference volume of the volume enclosed by the sphere
N=100


# Generation of random points inside the volume
x=2*R*np.random.rand(N)-R # define random numbers in enclosing box
y=2*R*np.random.rand(N)-R
z=2*R*np.random.rand(N)-R

# Generation of index function
inpoints=x*x+y*y+z*z<R**2  # determine indices for point inside

# Calculation of totalmass
mspherelin=(V/N)*sum(rho0*np.sqrt(x*x + y*y +z*z)*inpoints)

print('The mass of the sphere is '+ str(mspherelin) + ' kg')

# The analytical answer
# print(1.5*np.pi)

### END SOLUTION

answer_9_06_1 = np.copy(mspherelin)

7.4.3. Monte-Carlo integration accuracy#

The Monte Carlo integration error scales as \(N^{-1/2}\) no matter what the dimensionality of the problem is. If we would have performed the integration with the simple sum rule the error would be \(\mathcal{O}(h)\propto \mathcal{O}(1/N)\) , which on first look may indicate that this is more accurate than Monte Carlo integration. However, this scaling of the accuracy is only true for a single dimension.

For a cubic volume we have only \(N^{1/3}\) points along every direction. Hence, the simple sum integration error in three dimensions scales as \(N^{-1/3}\), which is worse than with Monte Carlo integration. Especially for problems with high dimensionality (D) the use of Monte Carlo integration is more advantageous than conventional numerical integration as, for simple sum integration, the accuracy scales as \(N^{-1/D}\), whereas Monte-Carlo integration always scales as \(N^{-1/2}\).

These integrations occur regularly in statistical mechanics.

Hide code cell source

## Pre-loading the solutions

import sys
await micropip.install("numpy")
from validate_answers import *

with open(location):
    pass # Initially this notebook does not recognise the file unless someone tries to read it first