Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Combinatorics Beyond the Basics

DZone's Guide to

Combinatorics Beyond the Basics

Stats, probability, and combinatorics come in handy for devs and engineers more than you might think. Let's take a deeper-than-average (but quick) look at combinatorics.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Most basic combinatorial problems can be solved in terms of multiplication, permutations, and combinations. The next step beyond the basics, in my experience, is counting selections with replacement. Often when I run into a problem that is not quite transparent, it boils down to this.

Examples of Selection With Replacement

Here are three problems that reduce to counting selections with replacement.

Looking Ahead in an Experiment

For example, suppose you're running an experiment in which you randomize to n different treatments and you want to know how many ways the next k subjects can be assigned to treatments. So if you had treatments A, B, and C, and five subjects, you could assign all five to A, or four to A and one to B, etc. for a total of 21 possibilities. You're choosing 5 things from a set of 3, with replacement because you can (and will) assign the same treatment more than once.

Partial Derivatives

For another example, if you're taking the kth order partial derivatives of a function of n variables, you're choosing k things (variables to differentiate with respect to) from a set of n (the variables). Equality of mixed partials for smooth functions says that all that matters is how many times you differentiate with respect to each variable. The order doesn't matter: differentiating with respect to x and then with respect to y gives the same result as taking the derivatives in the opposite order, as long as the function you're differentiating has enough derivatives. I wrote about this example here.

Sums Over Even Terms

I recently had an expression come up that required summing over n distinct variables, each with a non-negative even value, and summing to 2 k. How many ways can that be done? As many as dividing all the variable values in half and asking that they sum to k. Here the thing being chosen, the variable, and since the indexes sum to k, I have a total of k choices to make, with replacement since I can choose a variable more than once. So again I'm choosing k things with replacement from a set of size n.

Formula

I wrote up a set of notes on sampling with replacement that I won't duplicate here, but in a nutshell:

The symbol on the left is Richard Stanley's suggested notation for the number of ways to select k things with replacement from a set of size n. It turns out that this equals the expression on the right side. The derivation isn't too long, but it's not completely obvious either. See the aforementioned notes for details.

I started by saying basic combinatorial problems can be reduced to multiplication, permutations, and combinations. Sampling with replacement can be reduced to a combination, the right side of the equation above, but with non-obvious arguments. Hence the need to introduce a new symbol, the one on the right, that maps directly to problem statements.

Enumerating Possibilities

Sometimes you just need to count how many ways one can select k things with replacement from a set of size n. But often you need to actually enumerate the possibilities, say to loop over them in software.

In conducting a clinical trial, you may want to enumerate all the ways pending data could turn out. If you're going to act the same way, no matter how the pending data work out, there's no need to wait for the missing data before proceeding. This is something I did many times when I worked at MD Anderson Cancer Center.

When you evaluate a multivariate Taylor series, you need to carry out a sum that has a term for corresponding to each partial derivative. You could naively sum over all possible derivatives, not taking into account equality of mixed partials, but that could be very inefficient, as described here.

The example above summing over even partitions of an even number also comes out of a problem with multivariate Taylor series, one in which odd terms drop out by symmetry.

To find algorithms for enumerating selections with replacement, see Knuth's TAOCP, volume 4, Fascicle 3.

Related Posts

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
combinatorics ,probability ,big data ,statistical analysis

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}