|
|||
Posted by John Viega on Fri, Oct 10, 2003 (01:47 PM) GMT [Submitted by Bear Giles]
Problem Most shuffling algorithms people use aren't statistically fair. Fair ones tend to be subtle or complex. You want a shuffling algorithm that is fairly simple, unbiased, easy to demonstrate correct and programmatically verifiable.
Solution Create a temporary array of objects containing a cryptographically random value and a value to be shuffled. Sort by the random value, reshuffle if the same value is found twice, retrieve the shuffled values. Verify by testing sort invariants.
Discussion Shuffling objects is a common problem... one that is not hard to solve. It's even discussed in the sample chapter of the Secure Programming Cookbook. The problem isn't the algorithms... it's the fact that people have to write the code and they are prone to mistakes. Mistakes that can be difficult for the coder to discover, but which leave the system exposed to a committed attacker.
The solution is to use an approach that is intuitively obvious, demonstratively correct, and that gets the job done. The basic idea is to choose a random 32-bit value for each element in the array, and then shuffle the values by sorting the random values, e.g., via qsort(). This is fair as long as two items do not receive the same random index. One can explicitly test for that condition after sorting and reshuffle if it occurs.One way to implement this solution is to create a temporary array to stand in for our shuffle. E.g., we can define a structure like: and we can easily malloc() an array large enough to encompass all of the objects we wish to sort. The rvalue field is initialized with a cryptographically strong PRNG, and the index is either initialized (0, 1, 2,...) or we set up the pointer to point to each object in a linked list, binary tree, etc.Finally, we define a helper function that compares two objects in a consistent manner: If the rvalue fields are equal, we can detect that here and start over, or we can just wait until the array is sorted. It is a bad idea to just ignore this issue, because in many cases, the attacker could use the slight statistical advantage (about .01%) to his favor.We can now shuffle our objects simply by calling qsort(), sorting on the randomly chosen indices.How can we make the shuffle verifiable? Easy - we write two invariant tests. The post-test simply verifies that the rvalues are correctly sorted and that no two are the same: The pre-test is slightly more complex, but is similar to the FIPS-140 monobit tests. We make one pass through the array and look at each pair of values. If the value is increasing, call it a '1', otherwise call it a '0.' If our data is random, about half of the numbers should be '1'. We can reuse the code from the FIPS-140 self-check, or just check for extreme cases.This test isn't perfect, but even in the worst case (say 100, 101, 99, 102, 98, 103, 97,...) we can guarantee a good shuffle by repeating this process at least lg(N) times for an array of N elements. |
![]() |
All trademarks and copyrights on this page are owned by their respective owners. Copyrights on submitted material are copyrighted by the submitter. Everything else is Copyright © 2003 by John Viega and Matt Messier. Click here for licensing information regarding the use of recipes on this site and from The Secure Programming Cookbook for C and C++. |