Pills of combinatorics

probability
Although experimentalists are well familiar with the topic, nonetheless I believe it might be helpful to spend a few minutes for a review of basic formulae of combinatorial analysis.
Author

Angelo Maria Sabatini

Published

April 24, 2024

At least in the elementary case of finite sample spaces (i.e., a finite number of outcomes can be generated by our experiment), we often find useful to consider the intuitive notion that the outcomes are equally likely; this allows to introduce what is known as the classical definition of probability. For any set \(A\) made of some collection of outcomes from the sample space, we define the probability of \(A\) as:

\[ \text{Pr}(A)=\frac{\text{number of cases in}\;A}{\text{number of cases in the sample space}} \]

After the sample space has been defined and its size determined, we calculate probabilities of sets by counting the number of possible cases.

The art of counting is a relevant part of a field in mathematics known as combinatorics. In this post, I present the basic principle of counting and apply it to a number of situations that are often encountered in probabilistic models.

The counting principle

Consider an experiment that consists of two consecutive stages. The possible outcomes of the first stage are \(a_1,\cdots,a_n\) and, for each outcome from the first stage, \(b_1,\cdots,b_m\) outcomes are possible for the second stage. The outcomes of the two-stage experiment are thus the ordered pairs \((a_i,b_j),i=1,...,m;j=1,\cdots,m\), whose number is \(nm\). An obvious generalization is for an \(r\)-stage experiment, with \(n_i,i=1,2,\cdots,r\) outcomes each. The total number of outcomes is

\[ N=\prod_{i=1}^rn_i \]

Example 1 (Number of subsets of an \(n\)-element set.) Consider an \(n\)-element set \(\{a_1,a_2,\cdots,a_n\}\). The construction of a subset can be seen as an \(n\)-stage process, where, at the \(i\)th stage, the \(i\)th element is offered the opportunity to be a member of the subset or not (binary choice). Therefore the number of subsets is \(N=2^n\), inclusive of either the empty set (no elements are in one subset) or the sample space (all the elements are in another subset).

Consider now the situation where we have \(n\) distinct objects in a box and we want to form groups by sequentially selecting \(k\) objects from it, without repetitions being allowed (sampling without replacement) or with repetitions being allowed (sampling with replacement). In the former case, of course, we cannot select more objects than they are in the box, namely \(k\leq n\). In the latter case, the restriction \(k\leq n\) does not apply and, although unlikely, a group can even consist of \(k\) replicas of the same object.

Moreover, the selection process can differ in regard to wthether we are interested in the order of selection or not. If the order of selection matters, two groups that are made by the same objects, each one with its own number of occurrences, are considered distinct, whereas, if the order of selection does not matter, they should count as one case only in the probability calculation.

Based on these properties, four different arrangements of \(k\) out of a collection of \(n\) objects have to be considered, namely (without/with) repetition-order of selection (does/does not) matter. Each arrangement has its own name and related formula of counting, as outlined in the following.

Sampling without replacement

Order of selection matters

The number of arrangements, called permutations, can be calculated as the result of a \(k\)-stage process. At the first stage, we have \(n\) possible choices, at the second stage, the choices are reduced by one unit, i.e., \(n-1\); when we reach the \(k\)-stage, we are left with \(n-k+1\) choices. Applying the principle of counting, we have:

\[ D(n,k)=n(n-1)\cdots (n-k+1)=\frac{n!}{(n-k)!} \tag{1}\]

where the factorial of a non-negative integer \(n\), denoted by \(n!\), is the product of all positive integers less than or equal to \(n\), i.e., \(n!=1\cdot2\cdots n\). Recall that, when \(k=n\), \(D(n,n)=n!\) (\(0!=1!=1\)).

Order of selection does not matter

The number of arrangements, called combinations, can be calculated by noting that there exist \(P(k,k)=k!\) sequences of length \(k\) that differ from one another just in terms of the order of the presentation of their elements. They contribute only one arrangement to the total number of them:

\[ C(n,k)=\frac{D(n,k)}{D(k,k)}=\frac{n!}{(n-k)!k!}={n\choose k} \tag{2}\]

where the binomial coefficient \({n\choose k}\), indexed by the pair of integers \(n\geq k\geq0\), is considered.

Example 2 (Newton’s binomial theorem) For any real number \(n\) that is not a non-negative integer, the theorem states that:

\[ \sum_{k=0}^n{n\choose k}a^kb^{n-k}=(a+b)^n \]

when \(a,b\in\mathbb{R}\).

Since \({n\choose k}\) is the number of \(k\)-element sequences of a given \(n\)-element set, the sum over \(k\) of \({n\choose k}\) counts the number of subsets of all possible sizes. Using the result of Example 1:

\[ \sum_{k=0}^{n}{n\choose k}=2^n \]

This result is also a simple application of the Newton’s binomial theorem for \(a=b=1\).

Recall that a combination is a choice of \(k\) elements out of an \(n\)-element set without regard to order. This is the same as partitioning the set in two: one part contains \(k\) elements and the other contains the remaining \(n−k\). We can generalize by considering partitions in more than two subsets. We have \(n\) distinct objects and we are given non-negative integers \(n_1,n_2,\cdots,n_r\), whose sum is equal to \(n\). The \(n\) items are to be divided into \(r\) disjoint groups, with the \(i\)th group containing exactly \(n_i\) items. The total number of groups is given by the multinomial coefficient:

\[ N_r={n\choose n_1n_2\cdots n_r}=\frac{n!}{n_1!n_2!\cdots n_r!} \]

Sampling with replacement

Order of selection matters

The number of arrangements, called dispositions can be calculated as the result of a \(k\)-stage process. At the first stage, we have \(n\) possible choices, at all the other stages, the choices are still \(n\). Applying the principle of counting, we have:

\[ D^*(n,k)=n^k \tag{3}\]

Order of selection does not matter

The number of arrangements, called partitions is given by:

\[ C^*(n,k)=\frac{(n+k-1)!}{k!(n-1)!}={n+k-1\choose k} \tag{4}\]

To understand the formula of Equation 4, think of writing \(C^*(n,k)\) when, for instance, \(n=6,k=4\) as, say, \(a_1a_2a_3a_2\) or equivalently \(a_1*a_2**a_3*a_4\), where any element \(a_i,i=1,2,\cdots,n\) is followed by a number of asterisks equal to the number of its occurrences in the sequence (the total number of asterisks in the equivalent representation needs to be equal to \(k\)). Another example: \(a_2a_3a_3a_3\) and \(a_1a_2*a_3***a_4\). It is observed that a one-to-one correspondence exists between the original arrangements and all possible permutations in the alignment of elements and asterisks in the equivalent representation. Since each alignment starts with \(a_1\), we need to permute \(n+k-1\) elements, among which \(k\) (the asterisks) and \(n-1\) (the elements \(a_i,i=2,\cdots,n\)) are equal.

Formulae

In conclusion, the four different arrangements of \(k\) out of a collection of \(n\) objects can be computed as follows:

\[ \begin{split} &\quad\quad\textbf{without replacement}&\quad\quad\textbf{with replacement}\\ \\ \textbf{order matters}&\quad\quad\text{permutations}&\quad\quad\text{dispositions}\\ &\quad\quad D(n,k)=\dfrac{n!}{(n-k)!}&\quad\quad D^*(n,k)=n^k\\ \textbf{order does not matter}&\quad\quad\text{combinations}&\quad\quad\text{partitions}\\ &\quad\quad C(n,k)={n\choose k}&\quad\quad C^*(n,k)={n+k-1\choose k} \end{split} \]

Exercises

Exercise 1 Given a group of \(n\) individuals, count how many subgroups can be formed having one particular person as the leader, and a number (from 0 to \(n-1\)) of additional members.

Answer The counting principle can be applied as follows. First, we have \(n\) possible choices for the leader, and once the leader is chosen, \(2^{n-1}\) subsets can be considered, which may include from none to all the remaining individuals of the group. We have therefore:

\[ N=n2^{n-1} \]

Exercise 2 How many words that consists of four distinct letters can be formed?

Answer The sample space is composed of \(n=26\) elements. The sequences we are interested are formed by \(k=4\) elements, repetitions are not allowed. Two words are considered distinct based on the order of appearance of the four letters into them. The number of words is then given by Equation 1:

\[ N=D(n,k)=\frac{n!}{(n-k)!}=\frac{26!}{22!}=26\cdot25\cdot24\cdot23=358,800 \]

Exercise 3 How many combinations exist using two letters out of four letters?

Answer The sample is composed of \(n=4\) elements. The sequences we are interested are formed by \(k=2\) distinct elements, repetitions are not allowed. Two sequences are considered the same if they differ just by the order of the elements in them, namely \(a_1a_2\) is the same as \(a_2a_1\). The number of combinations is then given by Equation 2:

\[ N=C(n,k)={n\choose k}=\frac{4!}{2!2!}=6 \]

Exercise 4 How many distinct combinations can be formed from the letters ATALANTA?

Answer There are \(n=8\) letters which may be arranged in \(n!\) ways, but this leads to double counting. If the \(k_1=4\) “A”s are permuted, then nothing is changed, and similarly for the \(k_2=2\) “T”s. The total number of distinct combinations is then given by the multinomial coefficient:

\[ N=\frac{n!}{k_1!k_2!}=\frac{8!}{4!2!}=7\cdot6\cdot5\cdot4=840 \]