Most shuffling algorithms people use aren't statistically fair. Fair ones tend to be subtle or complex. You want a shuffling algorithm that is fairly simple, unbiased, easy to demonstrate correct and programmatically verifiable.
Create a temporary array of objects containing a cryptographically random value and a value to be shuffled. Sort by the random value, reshuffle if the same value is found twice, retrieve the shuffled values. Verify by testing sort invariants.
Shuffling objects is a common problem... one that is not hard to solve. It's even discussed in the sample chapter of the Secure Programming Cookbook. The problem isn't the algorithms... it's the fact that people have to write the code and they are prone to mistakes. Mistakes that can be difficult for the coder to discover, but which leave the system exposed to a committed attacker.
The solution is to use an approach that is intuitively obvious, demonstratively correct, and that gets the job done. The basic idea is to choose a random 32-bit value for each element in the array, and then shuffle the values by sorting the random values, e.g., via
qsort(). This is fair as long as two items do not receive the same random index. One can explicitly test for that condition after sorting and reshuffle if it occurs.
One way to implement this solution is to create a temporary array to stand in for our shuffle. E.g., we can define a structure like:
struct ShuffleItem {
unsigned long rvalue; // pseudorandom value
int index; // index into array
void *ptr; // alternately, pointer to obect
}
and we can easily
malloc() an array large enough to encompass all of the objects we wish to sort. The
rvalue field is initialized with a cryptographically strong PRNG, and the index is either initialized (0, 1, 2,...) or we set up the pointer to point to each object in a linked list, binary tree, etc.
Finally, we define a helper function that compares two objects in a consistent manner:
int cmp(const void *a, const void *b) {
const struct ShuffleItem *sa = (const struct ShuttleItem *) a;
const struct ShuffleItem *sb = (const struct ShuffleItem *) b;
r = 0;
if (sa->rvalue < sb->rvalue) r = -1;
if (sa->rvalue > sb->rvalue) r = 1;
return r;
}
If the
rvalue fields are equal, we can detect that here and start over, or we can just wait until the array is sorted. It is a bad idea to just ignore this issue, because in many cases, the attacker could use the slight statistical advantage (about .01%) to his favor.
We can now shuffle our objects simply by calling
qsort(), sorting on the randomly chosen indices.
How can we make the shuffle verifiable? Easy - we write two invariant tests. The post-test simply verifies that the rvalues are correctly sorted and that no two are the same:
for (int i = 0; i < nel-1; i++) {
assert(data[i].rvalue < data[i+1].rvalue);
}
The pre-test is slightly more complex, but is similar to the FIPS-140 monobit tests. We make one pass through the array and look at each pair of values. If the value is increasing, call it a '1', otherwise call it a '0.' If our data is random, about half of the numbers should be '1'. We can reuse the code from the FIPS-140 self-check, or just check for extreme cases.
This test isn't perfect, but even in the worst case (say 100, 101, 99, 102, 98, 103, 97,...) we can guarantee a good shuffle by repeating this process at least lg(N) times for an array of N elements.