In your code, you have: mt[i] = mt[i + _MT64_MM] ^ (x >> 1) ^ ((x & 1ULL) ? _MT64_MATRIX_A : 0); This emits a branch instruction due to the ? operator, which breaks up the pipelining of instructions on most modern processors. Instead, I tried this: mt[i] = mt[i + _MT64_MM] ^ (x >> 1) ^ ((x & 1ULL) * _MT64_MATRIX_A); Since you're ANDing with 1, the result will be 1 if the rightmost bit is set and 0 otherwise. Multiplication then accomplishes the same thing as the conditional, but without a branch instruction. On most modern CPUs, multiplication is a single clock cycle instruction. On a 32-bit Pentium machine, this sped things up by a full 30%! On an AMD64 machine compiling into 64-bit code, this sped things up by about 5%. -Adam Ierymenko