For most developers, when we talk about “random” numbers we are talking about two ideas: Predictability and Bias. Both are closely related but have some fundamental differences. In this article I will discuss these two ideas as well as evaluate how random numbers are achieved using PHP so you can have a better understanding of them the next time you need random numbers in your program.
To begin to understand how to generate better random numbers, you must first understand how random numbers work. As stated before, random numbers must have two characteristics: predictability and bias. Predictability is a reference to the statistical probability of being able to predict the next number when you know the previous values. Bias is the statistical problems of predicting the next number when you know the distribution of the previous values. A key difference when we want to demonstrate the randomness (or un-randomness) of our generated values and the security or un-corrupability of the random source.
Let’s start with predictability. The idea that a computer generated value can be random is false. There is no such thing as pure randomness. This means that all numbers are (to some degree) deterministic. We could get into a physics discussion here about Heisenberg’s Uncertainty Principle and quantum mechanics, but I will spare you. Even quantum mechanics is based off of statistical equations, so while you cannot predict the exact values with any usual form of precision, you can make a good statistical guess. Which invalidates the idea that random numbers could ever be random.
Here is an example: If I gave you a number sequence such as “12, 31, 9”, would you be able to tell me the next number in the sequence? Probably not. However, if I told you that the sequence was created from the numbers 1-50, you would have a far higher probability of predicting the number. So while distribution may be random, by watching the sequence and knowing the algorithm, you can have a higher likelihood of predicting the next number.
Following that, we can determine that all random events are only unpredictable if there is a lack of information on how they are generated. Once that information is given, the game is up. However, this does not mean that randomness is essentially security through obscurity (although it can be). All this means is that by having extra knowledge of how the numbers are generated, it is possible to strongly predict the outcome of a future random value.
All random events are subject to this dichotomy of information, where knowing enough about past events and the method of generation can allow you to theoretically predict the next value to a greater degree than with raw information alone.
Next, Bias. If a sequence is to be called truly random then each element in the sequence should occur with the exact same probability. So with enough elements in the sequence, the number of occurrences of each unique number should be about the same. If you were to flip a coin a million times, you would expect to see an equal number of “heads” to “tails”. While this most likely would not be the case, if you had good random distribution it would happen. Bias comes into play when there is some other factor we aren’t seeing, such as a weighted coin, which skews the results.
A common mistake in programming random numbers is the inadvertent introduction of bias into the generation of the numbers. This usually occurs by creating two unbiased sources and tampering with them by doing something such as multiplying them together, which can bias the results towards lower numbers. Programmers make such mistakes since they think that the multiplication of unbiased results will remain unbiased since they are not being affected by outside sources. In order to get better random results, programmers must strive to remove bias from causing their random numbers to become more predictable.
PHP Random Number Generators
So, what makes a good random number generator? To be a truly good generator it must produce numbers that are both unpredictable and unbiased. To be unpredictable, given a sequence, it must be hard or impossible to predict the next with any degree of certainty. To be unbiased, all numbers in the output range must have an even chance.
To achieve this with PHP we have several random number generators and I can discuss each here so you are better informed to choose the right one for your project. To start, we have two basic random number generators,
mt_rand(). Each of these functions generates a random number using its own internal algorithm from a seed. Think of the seed as the starting point for the algorithm. If the seed is unknown then the numbers produced are quite strong. However, if the seed is known or is able to be tampered with, then the generated sequence is very predictable. A common attack on programs is “seed poisoning”, where the attacker has found a way to tamper with the seed and use a known value, which will undermine your random number generation process.
Next we have
lcg_value() which generates weaker random numbers but the seed is internal and less susceptible to seed poisoning attacks. However,
lcg_value() only receives a seed on the first call and the seed is made up of the process id and current time. An attacker that knows this can use this information to reasonably predict the sequence of numbers generated.
uniquid() function is used to generate random strings and it internally uses the current time and then calls
lcg_value(). So if
lcg_value() is compromised, like explained above, the generation of unique strings from
uniquid() can be guessed fairly easily.
By far the best way to generate random numbers in PHP is to use MCrypt, which is a replacement of the UNIX crypt command. MCrypt provides the
mcrypt_create_vi() function which can be used with
MCRYPT_DEV_RANDOM to generate very strong and unhampered random numbers.
Ultimately, the choice of random number generator is yours and may even vary depending on your needs and the project. While random number generation may seem hard on the surface, by understanding the rules of predictability and bias you are better prepared to create better random numbers and avoid the common pitfall of introducing bias to your system. Keeping your random numbers pure and free from attacks is important and can be critical in high-performance systems such as security, where these are frequently used.
Help us spread the word!
If you liked this article, consider enrolling in one of these related courses:
|Jun 05-06||Web Development with PHP/MySQL|
|Aug 07-08||Web Development with PHP/MySQL|
|- Classroom - Online|