# The Many Mathematical Models of the Shuffle

# The Many Mathematical Models of the Shuffle

Join the DZone community and get the full member experience.

Join For FreeHow to Simplify Apache Kafka. Get eBook.

In a previous post, a few weeks ago, I mentioned that I will be in Las Vegas by the end of July. And I took the opportunity to write a post on *roulette(s)*. Since some colleagues told me I should take some time to play poker there, I guess I have to understand how to play poker… so I went back to basics on cards, and shuffling techniques.

Now, I have to confess that I have been surprised, while I was looking for mathematical models for shuffling, to find so many deterministic techniques (and results related to algebra, and cycles).

On http://mathworld.wolfram.com/ for instance, one can find nice articles on so-called in-shuffle or out-shuffle techniques. There is also a great article, Golomb (1961), but mainly on algebraic properties of permutations by cutting and shuffling, as well as Diaconis and Graham’s *The Mathematics of Perfect Shuffle *or* The Solutions to Emsley’s Problem, *And if you look at Monge’s shuffle, you can find a deterministic recursive relationship. As a statistician (or applied probabilist), I should confess that I did not find answer to the question I wanted to ask :

*how long should we shuffle before getting cards randomly sorted in ours hands*?

**Randomness (from a statistician perspective)**

First, I need to define (as properly as possible) a notion of “*cards randomly sorted*“. Consider a game with 32 cards. Why 32 ? Mathematicians will tell you that 32 is a great number, since it is a power of 2, so there might be interesting (algebraic) properties when shuffling. From a computational point of view, 32 is smaller than 52, so my random generations will run faster. This is basically why I used 32. 10 would have been better, but not realistic with cards.

So, our 32 cards can be seen as a vector, or a list, of 32 items, say

In order to assess if my cards are randomly sorted, let us get back to number properties (real valued numbers). If there were 10 cards, the list can be seen as an element of the following set

(or to be more specific, a subset of that set, since numbers have to be different – it has to be a permutation – we cannot have duplicates, we’ll get back to that point in a few seconds). Let us see this list as a decimal number, with 10 digits. More precisely,

Now, it is natural to say that cards are randomly sorted is this number is uniformly distributed on the unit interval, isn’t it ? (if we use the same shuffle many times, with the same starting point)

Well, if we think about it twice, uniform on the unit interval is probably not the proper distribution, since (as mentioned above) all digits have to be different. For instance, the smallest number would be and the largest . But as we will see, it this uniform assumption might not be too strong, actually.

And if we want to get back to our initial problem, with 32 cards, we simply have to use a decomposition in the 32-basis.

So if we have an algorithm to shuffle cards, we just have to run it several times (with the same starting value) and see *when* starts to be uniformly distributed. We start with a Dirac distribution, we have some kind of transition matrix, we expect our limiting distribution to be uniform and we wonder *when* the limiting distribution is reached… And from a statistical point of view, that should not be that difficult to assess, since we do have several goodness of fit tests that can be used.

Actually, it is possible to check if our technique passes the test of a uniform distribution, when digit are randomly generated (without replacement). The code to generate is

> j = 32 > X3 = (0:(j-1))[sample(1:j)] > x3 = sum(j^(-(1:j))*X3)

If we run it a few times, and check if the assumption of a uniform distribution is valid (on samples with, say, 500 observations),

> P3=NULL > for(i in 1:10000){ + U3=NULL + for(s in 1:500){ + X3 =(0:(j-1))[sample(1:j)] + x3 =sum(j^(-(1:j))*X3) + U3 =c(U3,x3)} + P3 =c(P3,ks.test(U3,punif)$p.value) + }

in 95% of the scenarios, the -value exceeds 5%

> mean(P3>.05) [1] 0.9529

(which is something we should have under the null), More precisely, we can check that the -value is uniformly distributed on the unit interval.

> hist(P3,freq=FALSE)

So assuming that our number is uniform on the unit interval might be a good notion for “*cards are randomly sorted*“.

What we need now is some shuffling algorithms. Or to be more specific, some *feasible *shuffling algorithm. I mean here that I just start playing with cards, so it should be some techniques that I should be able to perform, to understand how it works…. So you will have to wait a few weeks before I start talking about the riffle or dovetail shuffle (you know the kind of shuffle in which half of the deck is held in each hand, and then cards are released by the thumbs so that they fall to the table interleaved… like in the movies) !

**Top in at random shuffle, and related (simple) algorithm**

My first algorithm is simple: the **top-in at random shuffle**. We start with the following ordering

N=1:m

There are cards, and **n** denote the place where the card on top will go.

n=sample(2:m,size=1) if(n<m) N=c(N[2:n],N[1],N[(n+1):m]) if(n==m) N=c(N[2:n],N[1])

Then, we repeat that transfer of the card on top several times.

schuffle1=function(m,ns=10){ N=1:m for(i in 1:ns) { n=sample(2:m,size=1) if(n<m) N=c(N[2:n],N[1],N[(n+1):m]) if(n==m) N=c(N[2:n],N[1]) } return(N)}

Now, it is also possible to consider a **bottom-in at random shuffle**. The idea is the same, the only difference it that you start from the card at the bottom of the deck. But that would be the same as the one before (in terms of time before reaching randomness)

n=sample(1:(m-1),size=1) if(n>1) N=c(N[1:(n-1)],N[m],N[n:(m-1)]) if(n==1) N=c(N[m],N[1:(m-1)])

Why not mixing ? Randomly. Call it **randomly mixed top-bottom in at random shuffle**. You start either with the card on top, or at bottom (with identical probability), of the deck and then move the card somewhere,

card=sample(c("top","bottom"),size=1) if(card=="top"){ n=sample(2:m,size=1) if(n<m) N=c(N[2:n],N[1],N[(n+1):m]) if(n==m) N=c(N[2:n],N[1])} if(card=="bottom"){ n=sample(1:(m-1),size=1) if(n>1) N=c(N[1:(n-1)],N[m],N[n:(m-1)]) if(n==1) N=c(N[m],N[1:(m-1)])}

All those codes can be together (within the same function),

schuffle1=function(m,ns=10,which="top"){ N=1:m if(which=="top"){ for(i in 1:ns) { n=sample(2:m,size=1) if(n<m) N=c(N[2:n],N[1],N[(n+1):m]) if(n==m) N=c(N[2:n],N[1]) }} if(which=="bottom"){ for(i in 1:ns) { n=sample(1:(m-1),size=1) if(n>1) N=c(N[1:(n-1)],N[m],N[n:(m-1)]) if(n==1) N=c(N[m],N[1:(m-1)]) }} if(which=="mixed"){ for(i in 1:ns) {card=sample(c("top","bottom"),size=1) if(card=="top"){ n=sample(2:m,size=1) if(n<m) N=c(N[2:n],N[1],N[(n+1):m]) if(n==m) N=c(N[2:n],N[1]) } if(card=="bottom"){ n=sample(1:(m-1),size=1) if(n>1) N=c(N[1:(n-1)],N[m],N[n:(m-1)]) if(n==1) N=c(N[m],N[1:(m-1)]) } }} return(N)}

But why do we take only *one *card ? It won’t be more complex to take 2. Or 3. Or more.

**Tops in at random shuffle, and related (mixed) algorithm**

Yes, I used tops to say that we would take several cards on top of the deck. Say a random number of cards. And then, the strategy is the same, so the previous code is (slightly) adapted, as follows

k=sample(1:(m-1),size=1) n=sample((k+1):m,size=1); if(k==m-1) n=m if(n<m) N=c(N[(k+1):n],N[1:k],N[(n+1):m]) if(n==m) N=c(N[(k+1):n],N[1:k])

The idea is the following, here

As earlier, it is possible to take cards at the bottom of the deck, or, one more time, to use a mixed strategy. The codes would be

card=sample(c("top","bottom"),size=1) if(card=="top"){ k=sample(1:(m-1),size=1) n=sample((k+1):m,size=1); if(k==m-1) n=m if(n<m) N=c(N[(k+1):n],N[1:k],N[(n+1):m]) if(n==m) N=c(N[(k+1):n],N[1:k])} if(card=="bottom"){ k=sample(2:m,size=1) n=sample(1:(k-1),size=1); if(k==1) n=1 if(n>1) N=c(N[1:(n-1)],N[k:m],N[n:(k-1)]) if(n==1) N=c(N[k:m],N[n:(k-1)])}

Again, it is possible to have all those codes in the same function,

schuffle2=function(m,ns=10,which="top"){ N=1:m if(which=="top"){ for(i in 1:ns) { k=sample(1:(m-1),size=1) n=sample((k+1):m,size=1); if(k==m-1) n=m if(n<m) N=c(N[(k+1):n],N[1:k],N[(n+1):m]) if(n==m) N=c(N[(k+1):n],N[1:k]) }} if(which=="bottom"){ for(i in 1:ns) { k=sample(2:m,size=1) n=sample(1:(k-1),size=1); if(k==1) n=1 if(n>1) N=c(N[1:(n-1)],N[k:m],N[n:(k-1)]) if(n==1) N=c(N[k:m],N[n:(k-1)]) }} if(which=="mixed"){ for(i in 1:ns) {card=sample(c("top","bottom"),size=1) if(card=="top"){ k=sample(1:(m-1),size=1) n=sample((k+1):m,size=1); if(k==m-1) n=m if(n<m) N=c(N[(k+1):n],N[1:k],N[(n+1):m]) if(n==m) N=c(N[(k+1):n],N[1:k]) } if(card=="bottom"){ k=sample(2:m,size=1) n=sample(1:(k-1),size=1); if(k==1) n=1 if(n>1) N=c(N[1:(n-1)],N[k:m],N[n:(k-1)]) if(n==1) N=c(N[k:m],N[n:(k-1)]) } }} return(N)}

**How long should we shuffle before having cards randomly sorted ?**

With the codes mentioned above, it is possible to run generations of shuffles,

distu=function(k=100,j=32){ U1B=U1T=U1M=U2B=U2T=U2M=U3=NULL for(s in 1:100){ X1T=(0:(j-1))[schuffle1(j,k,"top")] X1B=(0:(j-1))[schuffle1(j,k,"bottom")] X1M=(0:(j-1))[schuffle1(j,k,"mixed")] X2T=(0:(j-1))[schuffle2(j,k,"top")] X2B=(0:(j-1))[schuffle2(j,k,"bottom")] X2M=(0:(j-1))[schuffle2(j,k,"mixed")] X3 =(0:(j-1))[sample(1:j)] x1T=sum(j^(-(1:j))*X1T) x1B=sum(j^(-(1:j))*X1B) x1M=sum(j^(-(1:j))*X1M) x2T=sum(j^(-(1:j))*X2T) x2B=sum(j^(-(1:j))*X2B) x2M=sum(j^(-(1:j))*X2M) x3 =sum(j^(-(1:j))*X3) U1T=c(U1T,x1T) U1B=c(U1B,x1B) U1M=c(U1M,x1M) U2T=c(U2T,x2T) U2B=c(U2B,x2B) U2M=c(U2M,x2M) U3 =c(U3,x3) } B=list(U1T=U1T,...) }

and then, we run tests to see if the samples can be assumed to be uniformly distributed on the unit interval, e.g. for the very first kind first shuffle describe above, it would be

ks.test(B$U1T,punif)$p.value

More precisely, we use the following function, to estimate to proportion of scenarios where the -value exceeds 5%,

PV=function(k){ P1B=P1T=P1M=P2B=P2T=P2M=P3=NULL for(i in 1:10000){ B=dist(k,j=32) P1T=c(P1T,ks.test(B$U1T,punif)$p.value) P1M=c(P1M,ks.test(B$U1M,punif)$p.value) P2T=c(P2T,ks.test(B$U2T,punif)$p.value) P2M=c(P2M,ks.test(B$U2M,punif)$p.value) P3 =c(P3,ks.test(B$U3,punif)$p.value)} return(list( p1T=mean(P1T>.05), p1M=mean(P1M>.05), p2T=mean(P2T>.05), p2M=mean(P2M>.05), p3=mean(P3>.05)))}

If we plot the results, we have

K=1:100 MP=Vectorize(PV)(K) plot(K,MP[1,],col="red",type="b",ylim=0:1,pch=1) lines(K,MP[2,],type="b",pch=19,col="red") lines(K,MP[3,],col="blue",type="b",pch=1) lines(K,MP[4,],type="b",pch=19,col="blue") lines(K,MP[5,],type="b",pch=3,col="black")

Here, we look at the proportion of -values that exceed 5%. We can pretend that we have a uniform distribution if that proportion is close to 95%. So basically, we just have to see *when* we reached for the first time the 95% region.If we zoom in the upper part of the graph, we get

With 32 cards,

- with a top in at random, we have to shuffle about 70 or 80 cards before having a randomly sorted set of cards. Which is large, but which is quite intuitive. One can imagine that it might take a while before getting the cards at bottom much higher in the pack,
- with a randomly mixed top in at random strategy, it is faster, slightly (we do not have that problem with cards at bottom that stay at bottom), since it takes about 60 or 70 cards.
- with a tops in a random, it goes again faster, with about 35 rounds,
- with a randomly mixed tops-bottoms in at random, it takes about 10 to 15 rounds.

Those results were obtained on tests on samples of size 100. The same code ran on a server during the week-end, with samples of size 500. Note that the output is rather close,

Note that those algorithm were mentioned because they were feasible, not only from a computational point of view, but when playing with real cards, in paper. Like with kids. I can actually ask my kids to perform those shuffle techniques next time we play with cards. The good thing is that randomly mixed tops-bottoms in at random shuffle technique: kids can do it 10 times, and cards should be randomly ordered in the deck…

Now, for those willing to see more algorithms, there are the so-called Fisher-Yates (also Knuth) shuffle. But may I keep that for another post ?

12 Best Practices for Modern Data Ingestion. Download White Paper.

Published at DZone with permission of Arthur Charpentier , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}