# Chapter 5. Random Number Generation

## 5.1. Introduction

This section covers the VSIPL random number generation functionality. Two pseudo random number generators are defined. The first generates uniform random numbers over the open interval zero to one. The second approximates a Gaussian random number with zero mean and unit variance.

### 5.1.1. Random Numbers

The create and destroy random number generator state are used for both scalar, and “by element” random number generation. The “by element” random number generators are required to produce the equivalent result to applying the scalar generators to each element in the order that proceeds from the minimum stride to the maximum stride dimension.

[For a matrix Z, where the stride between elements of a row is less than the stride between elements of a column, where x j denotes the jth output of the generator:

z 0,0 x i , z 0,1 x i + 1 , ... z 1,0 x i + N , ... , z M-1,N-1 x i + M * N - 1 ]

### 5.1.2. VSIPL Random Number Generator Functions

VSIPL specifies a portable random number generator. The details of the generator follow.

The code below implements a combined 32 bit random number generator comprising variants (RAN0 and RAN1) of two popular 32 bit random number generators.

RAN0 is loosely based (the constants chosen by D.E Knuth and H.W. Lewis are used) on a linear congruential random number generator popularized in Numerical Recipes [1], x i ( a x i - 1 + c ) mod m , where a = 1664525 , c = 1013904223 , and m = 2 32 .

RAN1 is based on the popular random number generator, y i ( a y i - 1 + c ) mod m , where a = 69069 , c = 1 , and m = 2 32 . In the VSIPL version of RAN1, the addend c is set equal to three instead of one.

The uniformly distributed 32 bit unsigned random integer created by the combined generator represents the difference between the uniformly distributed 32 bit unsigned integers created by RAN0 and RAN1. The combined random number generator maintains two separate seed sequences. One for RAN0, and the other for RAN1.

Both RAN0 and RAN1 have periods of 2 32 - 1 iterations. Starting with a given pair of seeds for the initial state, RAN0 and RAN1 complete periods and generate a state with the starting seed pair at the same iteration. To increase the period of the combined generator, the seed for RAN1 is incremented by one at the end of each full period. This ensures that the starting state (pair of seeds) will not be generated until RAN1 has passed through 2 32 periods. Thus, the period for the combined generator is 2 64 iterations.

To promote independence in a parallel processing environment, two actions are taken. The period of RAN0 is evenly divided among the parallel threads, and the addend constant used by RAN1 is uniquely assigned to each thread. These actions ensure that no two threads ever use exactly the same sequence of initial state seed pairs.

To divide the period of RAN0 evenly among threads we first note that we can generate the sequence x 0 , x 1 , x 2 , x 4 , x 8 , by applying the following recursion:

x 0 x 1 = ( x 0 a + c ) mod m x 2 = ( x 1 a + c ) mod m = ( ( x 0 a + c ) a + c ) mod m = ( x 0 a 2 + a c + c ) mod m = ( x 0 a a + ( a + 1 ) c ) mod m x 4 = ( x 0 a 4 + a 3 c + a 2 c + a c + c ) mod m = ( x 0 a 2 a 2 + ( a 3 + a 2 + a + 1 ) c ) mod m = ( x 0 a 2 a 2 + ( a 2 ( a + 1 ) + ( a + 1 ) ) c ) mod m = ( x 0 a 2 a 2 + ( a 2 + 1 ) ( a + 1 ) c ) mod m x 8 = ( x 0 a 4 a 4 + ( a 4 + 1 ) ( a 2 + 1 ) ( a + 1 ) c ) mod m x 2 j = ( A x 0 + C ) mod m = ( x 0 a 2 i + ( a 2 i - 1 + 1 ) ( a 2 i - 2 + 1 ) ( a + 1 ) c ) mod m

After i-1 iterations, each term of new generator (using A and C) skips over 2 i terms of the basic sequence generated by RAN0.

The term A may be calculated by repeated squaring of a, (i-1) times modulo m. The term C may be calculated in parallel with A by repeatedly adding one to the previous partial product of A and multiplying the previous partial product of C by the sum, (i-1) times. The algorithm is shown below.

IterationAC
0 a c
1 a 2 ( a + 1 ) c
2 a 4 ( a 2 + 1 ) ( a + 1 ) c
3 a 8 ( a 4 + 1 ) ( a 2 + 1 ) ( a + 1 ) c

The technique is used in the first “for” loop within vsip_randcreate to efficiently skip to the starting state assigned to a given thread for use by RAN0.

In a parallel environment, each thread is assigned an addend for RAN1 which corresponds to the prime number taken from the sequence of prime numbers greater than or equal to three. Thus the first thread uses three as the addend for RAN1, the second uses five as the addend for RAN1, the third uses seven as the addend for RAN1, etc. This technique assigns a different RAN1 random number generator to each thread.

By using the two techniques together, we define a very low overhead random number generator with sub-sequence statistical characteristics better than those of either RAN0 or RAN1. The new generator has a period of 2 64 and can be used safely in many parallel processing environments. In particular, statistically independent subsequences of lengths up to approximately 264/N, where N is the number of parallel sub-sequences required, are ensured.

References: [1] See pp. 275-276 of Numerical Recipes in FORTRAN, The Art of Scientific Computing, Second Edition, William H. Press, et al., Cambridge University Press, 1992.

### 5.1.3. Sample Implementation

The following is a sample C implementation of two uniform random number generators. The comments have been formatted for readability.

```#include <float.h>
#include <limits.h>
#include <stdlib.h>
#include <math.h>
#include "private/vsip_scalar_typedefs.h"
#include "vsip.h"
/* Typedefs for unsigned and signed exactly 32 bit integers uint32_t and int32_t */
#include "inttypes.h"
#define A0 1664525 /* Parameters for RAN0 */
#define C0 1013904223
#define A1 69069 /* Parameters for RAN1 */
#define C1 3
#define RAN(state, A, C ) A*( state )+C
#define RAN0( ptr ) RAN( ptr->seed0,ptr->a0,ptr->c0 )
#define RAN1( ptr ) RAN( ptr->seed1,ptr->a1,ptr->c1 )```

The type of vsip_randstate in “vsip.h” should be an incomplete type definition. The actual type should be opaque and defined in a place like `private/pvsip_scalar_typedefs.h`

```typedef struct vsip_rand_object
{
double double_scale;
float float_scale;
uint32_t seed0;
uint32_t seed1;
uint32_t seed2;
int32_t numseqs;
int32_t id;
uint32_t a0;
uint32_t c0;
uint32_t a1;
uint32_t c1;
} vsip_randstate;```
```vsip_randstate *
vsip_randcreate(vsip_index seed,    /* Initial user seed */
vsip_index numseqs, /* Number of sub-sequences */
vsip_index id,      /* Subsequence id (0 < id 2 numseqs) */
vsip_rng portable)  /* Portable or non-portable sequence */
{
/* portable is ignored */
uint32_t A=A0,C=C0; /* Initialize RAN */
int32_t i;
vsip_randstate* state;
state = (vsip_randstate* )malloc(sizeof(vsip_randstate));
state->double_scale = (double)pow((double)2.0,(double)(-32));
state->float_scale = (float)pow((double)2.0,(double)(-24));
state->seed0 = seed;
state->seed1 = 1;
state->seed2 = 1;
state->numseqs = numseqs;
state->id = id;
state->a0 = A0;
state->c0 = C0;
state->a1 = A1;
state->c1 = C1;```

Find the skip size by dividing 2 31 - 1 ( 2 32 cannot be represented in 32 bits), and multiplying the quotient by the relative thread id.

`  skip = ((UINT_MAX/numseqs)*(id-1));`

Given a starting seed, the code below generates the starting seed for the id'th sub-sequence of a set of numseqs sub-sequences of RAN0.

With each loop iteration, a new random number generator is created – on the fly – from the base random number generator RAN0. The new generator issues a sub-sequence of the original sequence generated by RAN0, such that given the same starting seed, each term of the subsequence corresponds to every ( 2 i )'th term of the original RAN0 sequence, where i corresponds to the loop iteration.

To understand how this is used below, first note that the 32 bit unsigned integer skip may be viewed as a polynomial

skip P 31 2 31 + P 30 2 30 + + P 1 2 + P 0

where

P(i) ∈ {0,1}, for i = 0, 1, 2, …, 31

Thus, each value P(i) is the setting (1 or 0) of the ith bit of skip.

In the loop below, the current generator (starting with RAN0) is applied to the current seed S if and only if the ith (i is the loop iterator) bit of skip is a one, i.e. P(i) = 1. At each loop iteration, a new generator is created that skips the next 2 i + 1 (i is the loop iterator) terms of the original RAN0 sequence. After 32 iterations, the thirty two bits of skip are exhausted and the current value of seed S is returned as the starting seed for the idth thread.

As an example of how the scheme works, assume that we are creating the seed for the fifth of fifteen threads. The value of skip is

skip = ( ( 2 32 - 1 ) / 15 ) ( 5 - 1 ) = ( 4294967295 / 15 ) 4 = 286331153 4 = 1145324612

skip = 44444444 = 2 30 + 2 26 + 2 22 + 2 18 + 2 14 + 2 10 + 2 6 + 2 2

Thus, to create the correct seed we need to skip, in succession, 2 2 terms, 2 6 terms, 2 10 terms, 2 14 terms, 2 18 terms, 2 22 terms, and 2 26 terms of the base RAN0 sequence. This is accomplished in the loop below by applying the current random number generator only on iterations 2, 6, 10, 14, 18, 22, 26, and 30. Although used on only eight of thirty two iterations, a new generator is created on each iteration.

```  mask = 1;
for( i=0; i<32; i++ )
{ /* Update seed if bit is set */
{
state->seed0 = A*(state->seed0) + C;
}
C = (A+1)*C; /* Generate new offset constant */
A = A*A; /* Generate new multiplier constant */
}```

Set C1 to the idth prime, starting at C1 = 3. (The following is a very simple prime number generator. A pre-computed table of primes could be substituted.)

```  for( i=1; i<id; i++ )
{
int32_t loop_max;
state->c1 += 2;
loop_max = (int32_t)sqrt( (double)state->c1 );
C = 3;
while( C <= loop_max )
{
while( ( (state->c1 % C) != 0 ) && ( C <= loop_max ) ) C += 2;
if( (state->c1 % C) == 0 )
{
C = 3;
state->c1 += 2;
loop_max = (int32_t)sqrt( (double)state->c1 );
}
}
}
return state;
}

void vsip_randdestroy(vsip_randstate* state) { free(state);}```

IEEE-754 Dependent Code

The following is implementation dependent. It is defined for IEEE-754 single and double precision floating point where float is single and double is double precision. (For non-IEEE 754 implementations the closest representable floating point number to the IEEE-754 number should be returned.)

Create the mask used to convert an unsigned integer to a single precision IEEE-754 float. The mask forces the sign and excess 127 exponent for a single precision IEEE-754 floating point number (1.0) into the upper nine bits of a 32 bit unsigned integer that has been shifted right nine bit positions. The low mantissa bit is set to one to force the number away from a true IEEE zero when it is normalized by subtracting 1.0.

`static uint32_t vsip_random_fmask = 0x3f800001;`

Function `vsip_randu_f`:

• Takes a vsip_randstate* argument (which was created by vsip_randcreate);

• Updates the state seeds using macro functions RAN0 and RAN1;

• Forms a temporary 32 bit unsigned seed from the difference between the two state seeds;

• Right shifts the temporary seed eight bit positions (the width of a single precision IEEE- 754 floating point number's exponent field);

• Bitwise OR's a one into the low bit; converts the integer contained in the interval 0 2 24 - 1 to a single precision IEEE-754 floating point number;

• Scales the floating point number by 2 - 24 to force the result into the open interval (0.0,1.0);

• And finally returns the floating point result as the function return value.

```float vsip_randu_f( vsip_randstate* state )
/* Returns a float uniform random number within the open interval (0, 1) */
{
float temp;
uint32_t itemp;
state->seed0 = RAN0( state );
state->seed1 = RAN1( state );
itemp = (state->seed0 - state->seed1);
if (state->seed1 == state->seed2)
{
state->seed1 += 1;
state->seed2 += 1;
}
itemp = (itemp>>8)|0x00000001;
temp = (float)(itemp)*state->float_scale;
return( temp );
}```

Function `vsip_randu_d`:

• Accepts a vsip_randstate* argument (which was created by vsip_randcreate);

• Updates the state seeds using macro functions RAN0 and RAN1;

• Forms a temporary 32 bit unsigned seed from the difference between the two state seeds;

• Bitwise OR's a one into the low bit; converts the integer contained in the interval 0 2 32 - 1 to a double precision IEEE-754 floating point number;

• Scales the floating point number by 2 - 32 to force the result into the open interval (0.0,1.0);

• And finally returns the floating point result as the function return value.

```double vsip_randu_d( vsip_randstate *state )
{
double temp;
uint32_t itemp;
state->seed0 = RAN0( state );
state->seed1 = RAN1( state );
itemp = (state->seed0 - state->seed1);
if (state->seed1 == state->seed2)
{
state->seed1 += 1;
state->seed2 += 1;
}
temp = ((double)(itemp) + 0.5)*state->double_scale;
return( temp );
}```

## 5.2. Random Number Functions

 vsip_randcreate Create Random State vsip_randdestroy Destroy Random State vsip_`d``s`randu_`p` Uniform Random Numbers vsip_`d``s`randn_`p` Gaussian Random Numbers

### 5.2.1. vsip_randcreate

Create a random number generator state object.

Functionality

Creates a state object for use by a VSIPL random number generation function. The random number generator is characterized by specifying the number of random number generators the application is expected to create, and the index of this generator. If the portable sequence is specified, than the number of random number generators specifies how many sub-sequences the primary sequence is partitioned into. If the non-portable sequence is specified, the characteristics of the random number generator are implementation dependent.

The function returns a random state object which holds the state information for the random number sequence generator, or null if the create fails.

Prototypes
```vsip_randstate *vsip_randcreate(vsip_index seed, vsip_index numprocs,
vsip_index id, vsip_rng portable);```
Arguments
seed

Seed to initialize generator.

numprocs

Number of processors (number of sub-sequences sequences into which the primary sequence is to be partitioned).

id

Processor ID (index to select a particular sub-sequence from the group of numprocs sub-sequences).

portable

Select between portable and non-portable random number sequences.

```typedef enum
{
VSIP_PRNG = 0, /* Portable random number generator */
VSIP_NPRNG = 1 /* Non-portable trandom number generator */
} vsip_rng;```
Return value

Returns a pointer to a random number state object of type vsip_randstate, or null if the create fails.

Restrictions

Errors

The arguments must conform to the following:

1. 0 < id numprocs 2 32 - 1

Notes/References

You must call vsip_randcreate for each random number sequence/stream the application needs. This might be one per processor, one per thread, etc. For the portable sequence to have the desired pseudo-random properties, each create must specify the same number of processors/sub-sequences.

Note to Implementors: All implementations of vsip_randcreate must support the portable sequence. The vendor defined non-portable sequence may be the same sequence as the defined portable sequence, or an implementation dependent uniform random number generator.

Examples

See `vsip_dsrandu_p` for example.

### 5.2.2. vsip_randdestroy

Destroy a random number generator state object.

Functionality

Destroys a random number state object created by a `vsip_randcreate`.

Prototypes
`int vsip_randdestroy(vsip_randstate *state);`
Arguments
state

Pointer to random number state object.

Return value

Returns zero on success and non-zero on failure.

Restrictions

Errors

The arguments must conform to the following:

1. The random number state object must be valid. An argument of null is not an error.

Notes/References

An argument of null is not an error.

Examples

See `vsip_dsrandu_p` for example.

### 5.2.3. vsip_`d``s`randu_`p`

Generate a uniformly distributed (pseudo-)random number. Floating point values are uniformly distributed over the open interval ( 0 , 1 ) . Integer deviates are uniformly distributed over the open interval ( 0 , 2 31 - 1 ) .

Functionality

Real

r Uniform 0 1 r n Uniform 0 1 for n = 0 , 1 , ... , N - 1 r n,m Uniform 0 1 for n = 0 , 1 , ... , N - 1 ; for m = 0 , 1 , ... , M - 1

Complex

r Uniform 0 1 + j Uniform 0 1 r n Uniform 0 1 + j Uniform 0 1 for n = 0 , 1 , ... , N - 1 r n,m Uniform 0 1 + j Uniform 0 1 for n = 0 , 1 , ... , N - 1 ; for m = 0 , 1 , ... , M - 1

Prototypes

Scalar:

```vsip_scalar_f vsip_randu_f(vsip_randstate *state);
vsip_cscalar_f vsip_crandu_f(vsip_randstate *state);```

By Element:

```void vsip_vrandu_f(vsip_randstate *state, const vsip_vview_f *r);
void vsip_cvrandu_f(vsip_randstate *state, const vsip_cvview_f *r);
void vsip_mrandu_f(vsip_randstate *state, const vsip_mview_f *r);
void vsip_cmrandu_f(vsip_randstate *state, const vsip_cmview_f *r);```
Arguments
state

Pointer to random number state object.

r

Output vector or matrix view object.

Return value

The arguments must conform to the following:

1. The random number state object must be valid.

2. The output view object must be valid.

Restrictions

Errors

The pointer to a random number state object must be valid.

Notes/References

The complex random number has real and imaginary components where each component is Uniform(0,1). The mean of the complex sequence is therefore 2 2 .

The “by element” random number generators are required to produce the equivalent result to applying the scalar generators to each element in the order that proceeds from the minimum stride to the maximum stride dimension. For example for a matrix Z, where the stride between elements of a row is less than the stride between elements of a column, where x j denotes the jth output of the generator:

z 0,0 x i , z 0,1 x i + 1 , ... z 1,0 x i + N , ... , z M-1,N-1 x i + M * N - 1

Examples

Generate 10 Uniform random numbers in the interval -π to π.

```#include <stdio.h>
#include <vsip.h>
main()
{
int i;
int seed =0, num_procs=1, id=1;
vsip_scalar_d x;
vsip_cscalar_d z;
vsip_rand_state *state;
vsip_init((void *)0);
state = vsip_randcreate(seed, num_procs, id, VSIP_PRNG);
printf("Uniform\n");
for(i=0; i<10; i++)
{
x = 2*M_PI*vsip_randu_d(state) - M_PI;
printf("%g\n",x);
}
printf("Complex Uniform\n");
for(i=0; i<10; i++)
{
vsip_rcmul_d(M_PI, vsip_crandu_d(state), &z);
printf("(%f, %f)\n",vsip_real_d(z),vsip_imag_d(z));
}
vsip_randdestroy(state);
vsip_finalize((void *)0);
return 0;
}```

### 5.2.4. vsip_`d``s`randn_`p`

Generate an approximately normally distributed (pseudo-)random deviate having mean zero and unit variance; N(0,1).

Functionality

Real

r 6 - k = 0 11 x k r n 6 - k = 0 11 x k + 12 n for n = 0 , 1 , ... , N - 1 r n,m 6 - k = 0 11 x k + 12 ( n + m N ) for n = 0 , 1 , ... , N - 1 ; for m = 0 , 1 , ... , M - 1 (row major) r n,m 6 - k = 0 11 x k + 12 ( m + n M ) for n = 0 , 1 , ... , N - 1 ; for m = 0 , 1 , ... , M - 1 (column major)

Complex

t 1 k = 0 2 x k , t 2 k = 0 2 x k r 3 - ( t 1 + t 2 ) + j ( t 1 - t 2 ) t 1 k = 0 2 x k + 6 n , t 2 k = 0 2 x k + 6 n + 3 for n = 0 , 1 , ... , N - 1 r n 3 - ( t 1 + t 2 ) + j ( t 1 - t 2 ) t 1 k = 0 2 x k + 6 ( n + m N ) , t 2 k = 0 2 x k + 6 ( n + m N ) + 3 for n = 0 , 1 , ... , N - 1 ; for m = 0 , 1 , ... , M - 1 r n,m 3 - ( t 1 + t 2 ) + j ( t 1 - t 2 ) (row major) t 1 k = 0 2 x k + 6 ( m + n M ) , t 2 k = 0 2 x k + 6 ( m + n M ) + 3 for n = 0 , 1 , ... , N - 1 ; for m = 0 , 1 , ... , M - 1 r n,m 3 - ( t 1 + t 2 ) + j ( t 1 - t 2 ) (column major)

Where:

x k is a uniformly distributed random number over the open interval (0,1). x k is generated in order using the same method as the uniform scalar function `vsip_randu_f`. For matrices the dimension with the smallest stride is filled first.

Prototypes

Scalar:

```vsip_scalar_`f` vsip_randn_`f`(vsip_randstate *state);
vsip_cscalar_`f` vsip_crandn_f(vsip_randstate *state);```

By Element:

```void vsip_vrandn_`f`(vsip_randstate *state, const vsip_vview_`f` *r);
void vsip_cvrandn_`f`(vsip_randstate *state, const vsip_cvview_`f` *r);
void vsip_mrandn_`f`(vsip_randstate *state, const vsip_mview_`f` *r);
void vsip_cmrandn_`f`(vsip_randstate *state, const vsip_cmview_`f` *r);```
Arguments
state

Pointer to random number state object.

r

Output vector or matrix view object.

Return value

Returns a Gaussian random number.

Restrictions

Errors

The pointer to a random number state object must be valid.

Notes/References

Both the real and complex Gaussian random number are N(0,1). The complex random number has real and imaginary components that are uncorrelated.

If a true Gaussian random deviate is needed, the Box-Muller algorithm should be used. See Knuth, Donald E., Seminumerical Algorithms, 2nd ed., vol. 2, pp. 117 of The Art of Computer Programming, Addison-Wesley, 1981.

Examples

Generate 10 Uniform Gaussian numbers with zero mean and a variance of π.

```#include <stdio.h>
#include <vsip.h>
main()
{
int i;
int seed =0, num_procs=1, id=1;
vsip_scalar_d x;
vsip_cscalar_d z;
vsip_rand_state *state;
vsip_init((void *)0);
state = vsip_randcreate(seed, num_procs, id, VSIP_PRNG);
printf("Normal\n");
for(i=0; i<10; i++)
{
x = M_PI*vsip_randn_d(state);
printf("%g\n",x);
}
printf("Complex Normal\n");
for(i=0; i<10; i++)
{
vsip_rcmul_d(M_PI, vsip_crandn_d(state), &z);
printf("(%f, %f)\n",vsip_real_d(z),vsip_imag_d(z));
}
vsip_randdestroy(state);
vsip_finalize((void *)0);
return 0;
}```