Quality of the “seed” for random numbers defines the usefulness of random numbers. If the “seed” is knowable then the output is deterministic. The best “seed” would be a random number as that is indeterminate. If we had that real random number then why would you need a random number generator? That is another subject. Our goal is to find that real random number source.
The following code is for a real random number source. The easiest way to test this is to get a data set and test it in 365 excel. Run a set of data and copy it to excel new sheet. Analyze and chart the results. Include donut chart and scatter. If you ask the NIST for the standard for random numbers, there is none. NIST provides a set of test to help qualify random numbers.
Now using the random “seed” in a random number generator outputs quality random numbers that are useable in our requirements. Test the output. Is it random?
Any thoughts or comments are welcome, I’m thick shinned. I know this goes against what we are taught in school; “computers can’t produce real random numbers”.
More seriously, when I run this code (three times, with 1M samples each) and compare to a uniform distribution over the same range (using numpy’s PRNG), it’s very clear that using the CPU clock (which is all you’re doing) does not generate a uniform sample over the range. Those deviations are far outside the expectation for a uniform distribution.
Turns out those teachers might have known a thing or two after all.
The way this is solved in modern computers is to manage an entropy pool.
Linux kernel does this and has a source of random bits, I assume Windows and macOS do the same.
The source of the entropy is measurements of real world events as seen by the computer. Arrival of network packets, mouse clicks, keys being pressed etc.
You can read about the details of this on lwn.net that has published a number of articles on the kernels RNG implementation as it changed over time.
a web search for “lwn.net rng” brings up a number of useful articles.
Since the entropy pool takes time to build up a quality of random bits it makes sense to seed a pseudorandom number generator (PRNG) to produce an endless stream of bits. The PRNG can be reseeded from the entropy pool to mitigate possible errors in the PRNG that make it predictable.
Your code is in effect measuring jitter in time.sleep() and hence the kernel scheduling code. It will not be very random.
The python secrets module uses the operating system’s source of cryptographic quality random numbers.
Turns out, you were taught right. The code you wrote is still deterministic; at best, what you’ll get is the very mild level of entropy that comes from other running processes introducing a small amount of state into your counter. This is most assuredly NOT random, and if an attacker could ensure consistent server load (which is pretty easy - just saturate the server), there’s a good chance your seeds would be entirely predictable.
Why are you trying to reinvent - badly - what has already been done in industry standard ways, utilizing ACTUAL entropy that computers can collect? And why do you open your post with a hostile remark that has nothing whatsoever to do with your content?
I am glad to see the above comments. This is a subject that has no formal definition of what random is. That is according to NIST. No standard, just whatever (because of no formal definition) people agree on.
First, I‘m including the program and the excel spread sheet. This is repeatable.
So as an example, what is it we would like to have if we throw dice? We want a dice that ALWAYS has the same chance for all six numbers over number of throws. That is what we will show. We threw 16374 dice, ten times for the set of data. The data is in the excel file. The program is both included as an attachment and this comment.
How do we analyze the data? Make a column for every side of the dice, by 16374 rows, by ten sets. Sum the columns of dice sides. We need to know how many number 1’s should be in every column: 16374/6=2729 1’s. Subtract sum of column from 2729.
Now the fun part, You add the six numbers they equal zero (55 + -3 + 25 + 96 + -37 + -136).
This is great, you know that the dice are “TRULY RANDOM”.
That’s only one way of recognizing randomness. There are, in fact, MANY attributes of a truly random sequence of numbers, and humans are actually pretty terrible both at creating and at recognizing such sequences. Here’s an “inverse Turing test” - a challenge that machines can perform easily, but humans will struggle at:
(Many thanks to 3blue1brown and the Summer of Math Exposition 2 contest for encouraging this to be created, and for subsequently sharing it with the world - otherwise I’d never have seen it.)
Use your timing-based random number generator to produce a large number of random bits, then enter them into the page (yes, it’s fully interactive). Then try your program again with the system fully loaded, so everything slows to a crawl. This is NOT RANDOM.
Why do you insist on trying to reinvent badly what has already been done far better?
You start by taking 16374 samples. If they were perfectly evenly distributed, you would have 2729 in each column. Then you calculate the error in each column. Well, of COURSE the sum of the errors will equal zero, because any discrepancy will increase one column and decrease another! If you’ve done your arithmetic correctly, that MUST equal zero at the end. The test of randomness is not that the sum is zero, but that all the errors are small; and in your case, you have one error at -136 which is a 5% discrepancy. That’s the number you really need to be looking at.
Be extremely careful where you go from here, because this is exactly the sort of thing that outright scams will use to “prove” that their product or service is somehow better than everything else.
The program is only using a time-based integer to seed Python’s built-in PNRG. If you didn’t bother with seeding it at all, it’s most likely the test you propose would pass if you used the built-in generator to create even trillions of bits (the Mersenne Twister does very well on this kind of test).
Throwing in reseedings at various times probably won’t hurt that, provided that seeds don’t repeat.
However, the program uses total_nanosecond_seed as a seed, which is a difference between two points “close” in time. There’s no reason to expect that won’t repeat frequently.
>>> import time
>>> short_time = 0.0000000001
>>> seen = set()
>>> for i in range(1_000_000):
... start = time.time_ns()
... time.sleep(short_time)
... finish = time.time_ns()
... seen.add(finish - start)
>>> len(seen)
381
So there, using Python 3.12.0 on a pretty quiet Win10 box, only 381 unique values were seen out of a million tries. That kind of “reseeding” vastly reduces the quality of the outputs from the Mersenne Twister.
On the right track, but not quite there . You need to look at all- the values - and “too small” is just as bad as “too large”! For example, if you flip a coin a million times, and exactly half the time it comes up heads, it’s exceedingly unlikely that the flips were “random”. Humans trying to “act random” typically make choices that are far too close to perfectly evenly distributed.
For a putatively random process with 6 possible outcomes, the appropriate test is a “chi-squared test, with 5 degrees of freedom”. Most rigorous tests of “randomness” are based on computing chi-squared statistics, comparing observed frequencies of outcomes to “theoretically perfect” frequencies
Yes, I edited that paragraph several times and ended up with a bit of a confusing thing there. “The discrepancies” is what you need to be looking at, not “the final sum”. Of the discrepancies, one is roughly 5% of the sample size.
That’s true, but I suspect that you could get a long way without worrying about “too perfect” as a criterion. Using the sliding window strategy as in the page I linked to, you’ll almost certainly discover issues from uneven distribution of pairs or triples.
But in any case, I think we have plenty of proof that repeatedly reseeding the PRNG in this way cannot possibly improve it, and will almost certainly worsen it.
Which doesn’t matter on its own. Here’s the calculation:
discreps = [55, -3, 25, 96, -37, -136]
expected = [2729] * 6
chisq = sum(d**2 / e for d, e in zip(discreps, expected))
print(format(chisq, ".2f"))
The statistic is 12.00. Any number of online calculators can be used to verify that a chi-squared distribution with 5 degrees of freedom is as high as 12 about 3.5% of the time. provided that the source is truly random. A value of 12 is at worst mildly suspicious. In fact, it’s expected to be at least 12 in 3.5% of trials on truly random sources (so then you want to aggregate multiple chi=square stats, for which the Kolmogrov-Smirnov test is suited).
“Intuition” is worse than useless here - crunch the numbers.
Especially given that the Twister’s state has 19937 bits. Any scheme that reseeds with values having substantially fewer bits than that is incapable of reaching the vast majority of its possible states.
Another problem for the OP is that they can’t show an “improvement” unless they can show a case in which the built-in generator is demonstrably “not random”. That isn’t easy! It once took me a full day to come up with one(*), despite knowing in advance exactly what kinds of statistical tests the Twister was already known to fail.
(*) Generate 400 million random bits, and arrange them in a 20000x20000 bit matrix, viewing the entries as being elements of GF(2). Compute the rank of that matrix. It will be much, much, much too small for a “random” matrix of that size. But, e.g., if you do it with 15000x15000 bit matrices, the distribution of ranks is indistinguishable from what you’d expect from random matrices. It’s not a coincidence that the number of bits in the Twister’s state (19937) is between 15000 and 20000 .
There is a formal definition of randomness. Have a look at “algorithmic complexity theory” (see for instance: Kolmogorov complexity).
There is even a formal definition of “the most random real number” ( Chaitin’s number Omega - in a way this encodes everything that is computable; it’s not of any direct practical use, since it’s not computable, but it’s interesting in its own right).