# Generating multiple low-quality random numbers… fast.

## Background

a while ago i was doing some profiling to find out ways to improve the load performance of my project. the load process is roughly as follows:

• Load wave file from disk into memory.
• Apply a linear phase filter to all loaded samples (this is to compensate for the response of the resampler which occurs during playback).
• Normalise and quantise the filtered audio using . The normalisation ensures that we use the full 16-bits of precision for each sample. Obviously, the scaling coefficient is stored along with the sample to ensure we play the content back at the intended level.

there were multiple things which i did to speed it up, but i think what i pushed in was the most interesting one – so what follows in this blog is a bit of a write up.

## The idea

uniform random numbers are useful and often we need several of them. in my use-case (performing tpdf dither on a sample pair) i needed four random variables. generating them needs to be fast but the “quality” of the randomness is not that important. the obvious way to do this is to use a which ends up forming a code pattern like this:

```r0 = (state * A + C) % M; r1 = (r0 * A + C) % M; r2 = (r1 * A + C) % M; r3 = (r2 * A + C) % M; state = r3;```

this forms a dependency chain where each random number requires knowledge of the previous one and could end up leading to pipeline issues (this is really dependent on how the compiler re-orders the code and the architecture of the platform). in the code generated on my machine, this turned out to be a bottleneck. we could get around the pipeline issues by creating four different lcgs:

```r0 = (state0 * A0 + C0) % M; r1 = (state1 * A1 + C1) % M; r2 = (state2 * A2 + C2) % M; r3 = (state3 * A3 + C3) % M; state0 = r0; state1 = r1; state2 = r2; state3 = r3;```

LCGs are really cool toys. We can choose our and values in such a way that the pseudo-random sequence has period . Given the algorithm generates the next value from the previous, this must mean that the random sequence contains every integer between 0 and 云南快乐十分前三直选. The first part of the idea, is to modify the code as:

```r0 = (state0 * A0) % M; r1 = (state1 * A1) % M; r2 = (state2 * A2) % M; r3 = (state3 * A3) % M; state0 = (r0 + C0 % M) % M; state1 = (r1 + C1 % M) % M; state2 = (r2 + C2 % M) % M; state3 = (r3 + C3 % M) % M;```

This produces a different sequence in to because the accumulation now happens at the state assignment, but still has a period . Now for the dodgy part, we change the algorithm to this:

```r0 = (state0 * A0) % M; r1 = (state0 * A1) % M; r2 = (state0 * A2) % M; r3 = (state0 * A3) % M; state0 = (r0 + C0 % M) % M;```

Where , and to are all different but satisfy the required conditions for an LCG to work. It should not be difficult to infer that to still all have period – but produce different output sequences. I wrote a test program to look at the correlation of these sequences over their period which can be found on – they are mostly independent over their period. I doubt they make great random numbers, but for performing dither, they are completely fine. The generated code seems pretty good also:

```sampletest[0x10000a99f] <+3503>: imull  \$0xd688014d, %r8d, %edi   ; imm = 0xD688014D
sampletest[0x10000a9a6] <+3510>: imull  \$0xdb71f7bd, %r8d, %r15d  ; imm = 0xDB71F7BD
sampletest[0x10000a9ad] <+3517>: imull  \$0xe05f354d, %r8d, %esi   ; imm = 0xE05F354D
sampletest[0x10000a9b4] <+3524>: imull  \$0xe54c7f35, %r8d, %ecx   ; imm = 0xE54C7F35
sampletest[0x10000a9bb] <+3531>: movss  (%r11,%rdx,2), %xmm2      ; xmm2 = mem,zero,zero,zero
sampletest[0x10000a9c1] <+3537>: mulss  %xmm1, %xmm2
sampletest[0x10000a9c5] <+3541>: movss  (%r10,%rdx,2), %xmm3      ; xmm3 = mem,zero,zero,zero
sampletest[0x10000a9cb] <+3547>: mulss  %xmm1, %xmm3
sampletest[0x10000a9cf] <+3551>: leal   0x1(%rdi), %r8d
sampletest[0x10000a9d3] <+3555>: cvttss2si %xmm2, %r12
sampletest[0x10000a9d8] <+3560>: cvttss2si %xmm3, %r13