Computers are very exacting, and they can only ever do what they've been trained for. Software is training, or teaching a computer to perform a task. Already we know they can't be very good at coming up with
actually random numbers, so there's a class of study known as
pseudo random number generators.
Nietzsche says there's no such thing as random, in his determinism theory. Computers use fancy maths to generate random like numbers. Who cares? Anyone that uses SSL, for starters. All real encryption uses random numbers, and this is one of the leading ways to break a cipher. A chain is only as strong as it's weakest link, so let's talk about the concept of random numbers, how to game a string of them, and how to generate something like them in code, since virtually all languages have some type of built in random function.
Let's start with SQL. Why would anybody need random in a database? Well, test data for starters, or even disc IO load distribution. This brings us back to Nietzsche and his theory, because good databases (
Oracle, SQL Server) have very smart query optimizers. They need to understand whether a function is
deterministic or not, which is to say it will always return the same value given the same input. If this is true, when you use the function in a SELECT list, it's evaluated once before execution and cached - if it's not true, the function must be evaluated with every record. Think the difference between X + Y and GetTimeAndDate() - the second one isn't
deterministic. It's not random, but it's not predictable given only the inputs.
- SELECT CAST(CAST(newid() AS binary(4)) AS int)
- SELECT rand(cast(cast(newid() as binary(4)) as int)) * cast(cast(newid() as binary(4)) as int)
- SELECT cast(newid() as binary(4)) ^ cast(substring(cast(newid() as binary(4)), 7,4) as int)
- RAND(CAST(NEWID() AS BINARY(6)))
All of these take a globally generated unique identifier (
which is always generated by the Operating System) and hack at it. 1 works, but the output is hard to bind to min and max values. 2 and 3 are more complex variations that will fool SQL Server 2000 but not 2005. The last one is
considered to be deterministic even though it uses the built in random function, because we've given the PNG a
seed. I have no idea how safe these are, and hope somebody else might comment.
There's another concept that needs to be looked at, entropy. Things go from an ordered state (
and computers are highly ordered) and fall into disarray. People try to pull some entropy out of the real world by measuring things that are hard to predict. You could access a file on the network or behind a URL and see how long that takes. The host computer's memory usage or number of active processes, some keystrokes and the time between each of them. Especially when you go and take the least significant digits behind each measurement, this brings us closer to the end goal of as close as we can get to random.
What do you know about the technology and or philosophy of random numbers?