Starting with MUDIM

The second part of this text is devoted to the synoptic description of MUDIM, which is a system for handling with probabilistic multi-dimensional distributions in the form of compositional models. The MUDIM (MUlti-DImensional Models) system is written as a package of R . It is based on R.oo package that implements methods and functions for object-oriented programming in R. It contains a set of functions to construct and support discrete probability distribution. Two or more probability distributions can be composed together - to create the so-called compositional model. The package contains a set of functions to support work with compositional models.

In the following text, we will speak about probability distributions and compositional models. Probability distributions are defined over random variables. Similarly, some objects in R are usually called variables. E.g. having a probability distribution \(Pi\) over finite discrete variables \(A,B\), it can be also stored in R using mudim object od class Distribution(). The object is stored in computer memory and it can be referenced by R variable Pi that is nothing else that a pointer to that object - a pointer to the specific place in computer memory.

Install R

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and Mac-OS. To download and install R, please go to R project website and download the latest version based on your operating system.

To work with our package, we strongly recommend using RStudio , which is a free and open-source integrated development environment for R, a programming language for statistical computing and graphics. To install RStudio, please go to the project website and download the latest version of the product.

Install MUDIM

Once R and RStudio are installed, you can proceed and install mudim. To do so, start RStudio and type the following command in the console.

                 repos = NULL)

Once the package is installed, you can easily load it to make all its functionality available. To load the package, type:


Probability distribution

mudim package works with discrete random variables with finitely many values. By a state of a group of variables, we understand a combination of values of the respective variables. Recall Example 1.2 about three coins where the first two are randomly tossed and the third one is laid on a table in the way that the number of ‘\(1\)’ is odd. Probability distribution fully describing such an experiment can be defined as a table - see Table 1.1. A possible representation of this distribution in mudim package is as follows. It appears in the console when you type

##    X Y Z MUDIM.frequency
## 1: 0 0 1            0.25
## 2: 0 1 0            0.25
## 3: 1 0 0            0.25
## 4: 1 1 1            0.25

You can see that columns of the table, except for the last one, correspond to random variables. Rows correspond to various states of random variables. In this case, the three first columns of the table correspond to random variables \(\{X,Y,Z\}\). The last column is rather special. It denotes the frequency/probability of each row - state of the variables. It is denoted as MUDIM.frequency - it is a keyword and no random variable should be called by this name. The states with zero probability may be omitted.

R object

When creating a probability distribution, it is good to start with an empty distribution - i.e. a probability distribution defined for an empty set of variables. In R, even an empty distribution is an object of class Distribution.

d <- Distribution("test", info = "my first distribution")

By performing the above command, you have created an empty distribution referenced by d in R. It has a name “test” and additional information “my first distribution” for internal purposes. The parameter info in Distribution() function is auxiliary.

Each object of class Distribution has several slots. In case of a distribution referenced by d the slots look like this:

  • name - "test"
  • info - "my first distribution"
  • data - NULL
  • variables - NULL
  • dim - 0

Of course, in case of distribution coins, the slots look like this

  • name - "3coin"
  • info - "3 coins X,Y,Z. X and Y are randomly tossed and the third one is laid on the table in the way that the number of 1 is odd"
  • data - This slot contains a 4x4 matrix (the rows corresponding to states, three columns corresponding to variables and the fourth one containing the probabilities) - accessible using command dTable(coins)
  • variables - "X" "Y" "Z" - accessible using command variables(coins)
  • dim - 3 - accessible using command dim(coins)

To read more about the internal structure of the Distribution class object, type ?Distribution in the console of your RStudio.

Probability table

Generally, probability tables and probability distributions are used as synonyms. The probability table is represented by a mudim object of class Distribution. To create a probability distribution over a set of random variables, you should design its defining table first. This can be done either manually or you can use some external data/measurements. To check, whether a distribution is empty, you can use functions is.empty() or dim():

## [1] TRUE
## [1] 0

Create manually

Let us try to create manually a table that would describe 3-coin example mentioned above. We have three random variables \(X,Y,Z\). To describe their possible states, use the following code:

##   X Y Z
## 1 0 0 1
## 2 1 0 0
## 3 0 1 0
## 4 1 1 1

Now, we have all possible outcomes of the 3 coins example as illustrated by distribution coins. Because we want to create a uniform distribution over the possible outcomes, we can either add a new column denoted by MUDIM.frequency with respective probabilities or we can let the system do it automatically. When you assign a probability table without a column named MUDIM.frequency, mudim automatically assumes that each row of the given table has the same probability and adds the frequency column with weight \(1\) for each row. Because each row is unique, the resulting distribution is uniform over the possible outcomes.

##    X Y Z MUDIM.frequency
## 1: 0 0 1               1
## 2: 1 0 0               1
## 3: 0 1 0               1
## 4: 1 1 1               1

Note that MUDIM.frequency column denotes frequencies, not probabilities. To change that, call normalize(d).

We can add the frequency column to table by ourselves. To do that, add a column called MUDIM.frequency, continue as above, and then e.g. normalize it:

##    X Y Z MUDIM.frequency
## 1: 0 0 1             0.1
## 2: 1 0 0             0.3
## 3: 0 1 0             0.4
## 4: 1 1 1             0.2

Use data

Another possibility is to create a probability distribution from data. The data can have their origin from various sources. The easiest way how to load data to R environment is using a CSV (comma separated) file. For illustration, we have prepared a data-set X defined over seven variables \(D,N,R,T,W,U,B\). Similarly, you can load a data set from an external CSV file using functions read.csv or read.csv2 etc. When creating a respective distribution over a subset of variables, one can use function dTable as well. As mentioned above, when assigning a new table to distribution using function dTable, if column MUDIM.frequency is missing, equal weights are assigned to all rows. I.e., if rows are not unique, but one of them is repeated several times, then the weights sum up appropriately. When calling dTable function, unique rows are stored and MUDIM.frequency column denotes the numbers of appearances in the source file.

##   D N R T W U B
## 1 1 2 1 2 2 1 1
## 2 1 2 1 1 2 1 1
## 3 2 2 1 1 2 1 1
## 4 2 2 1 2 2 2 2
## 5 2 2 2 1 2 2 2
## 6 2 2 1 1 2 1 1
##    N R T MUDIM.frequency
## 1: 2 1 2             143
## 2: 2 1 1             250
## 3: 2 2 1             175
## 4: 1 2 2             207
## 5: 1 2 1             131
## 6: 1 1 2              34
## 7: 2 2 2              50
## 8: 1 1 1              10
##    N R T MUDIM.frequency
## 1: 2 1 2           0.143
## 2: 2 1 1           0.250
## 3: 2 2 1           0.175
## 4: 1 2 2           0.207
## 5: 1 2 1           0.131
## 6: 1 1 2           0.034
## 7: 2 2 2           0.050
## 8: 1 1 1           0.010

Names of random variables

Each distribution is defined over a set of random variables. Each variable is supposed to have a unique name. If two random variables have the same name, we consider them to be the same variable. The names can be set in two ways. Using column names of the probability table used in dTable function, or using function variables.

## [1] "N" "R" "T"
##    A B C MUDIM.frequency
## 1: 2 1 2           0.143
## 2: 2 1 1           0.250
## 3: 2 2 1           0.175
## 4: 1 2 2           0.207
## 5: 1 2 1           0.131
## 6: 1 1 2           0.034
## [1] "x" "y" "z"

Additional information

As mentioned above, each probability distribution in mudim is an object in computer memory. Such an object can have many R variables pointing at it. To simplify distribution identification, each object of class Distribution can have a name and info parameter. To handle these parameters, use functions name and info

## [1] "pi"
name(Pi) <- "distribution 123"
## [1] "distribution 123"
## [1] "probability distribution over two binary variables A,B"

Manipulations with probability distributions

Marginal distribution

A probability distribution is defined over a certain set of variables. Sometimes, we are interested in a probability distribution defined over just a subset of them. As defined in Section 1.1, the probability distribution over the subset is known as the marginal probability distribution.

To compute a marginal distribution, specify the distribution and a subset of variables of interest.

PiMarginal <- marginalize(Pi, variables = variables(Pi)[1])
## [1] "A"

Sometimes, you want to remove a set of variables from the distribution. To do that, you can easily use parameter keep

## [1] "B"

If you want to change the probability distribution without making its copy, you can use parameter new

## [1] "A" "B"
marginalize(Pi, variables = variables(Pi)[1], new = FALSE )
## Probability distribution 
## * Name:distribution 123
## * Info:probability distribution over two binary variables A,B
## * Variables:A
## * Non-empty items:2
## [1] "A"


Let us have two probability distribution \(\pi(K)\) and \(\kappa(L)\). Then we can define their product as \(\lambda(K \cup L)\) such that \(\lambda(x) = \pi(x^{\downarrow K}) * \kappa(x^{\downarrow L})\) for each \(x \in \mathbb{X}_{K\cup L}\). Please, note that in case of \(K \cap L \neq \emptyset\) the resulting object does not have to be a probability distribution.


The key operator of the package is the operator of composition \(\triangleright\) defined in Definition 2.1. Consider two probability distributions \(\kappa(\mathbf{K})\) and \(\lambda(\mathbf{L})\), for which all the compositions appearing in the following statements are defined. The most important properties of the operator are:

  • (Domain) \(\kappa \triangleright \lambda\) is a probability distribution for variables \(\mathbf{K} \cup \mathbf{L}\).
  • (Conditional independence): \(\mathbf{K} \setminus \mathbf{L} \ind \mathbf{L} \setminus \mathbf{K} | \mathbf{K} \cap \mathbf{L}\ [\kappa \triangleright \lambda]\).
  • (Composition preserves first marginal): \((\kappa \triangleright \lambda)^{\downarrow \mathbf{K}} = \kappa\).

To compose two distributions, use function compose

## [1] "Distribution" "Object"
## [1] "A" "B" "C"
## [1] 0
## [1] 0

Note that function compose is generic. It means that you can have more functions with the same name and the compiler choose the function based on the context – more specifically, based on the class of the function parameters. In this case, if the first parameter is of class ‘Distribution’ then a function corresponding to the operator of composition is called and the result is of class ‘Distribution’. On the other hand, if the first parameter is of class ‘Model’, then another function is called and the result is different.

Anticipating operator

The so-called anticipating composition of two probability distribution is a generalized version of the operator of composition, for which the following property holds: If \(\kappa(\mathbf{K})\), \(\lambda(\mathbf{L})\) and \(\mu(\mathbf{M})\) are such that \(\mu \triangleright (\kappa \cirtr_{\mathbf{M}} \lambda)\) is defined, then \[(\mu \triangleright \kappa) \triangleright \lambda = \mu \triangleright (\kappa \cirtr_{\mathbf{M}} \lambda).\] For details see Section 2.2.

d <- anticipate(Pi, Kappa, M = c("A","B","C", "D"))
##    B C A MUDIM.frequency
## 1: 0 0 0            0.15
## 2: 0 0 1            0.35
## 3: 0 1 0            0.15
## 4: 0 1 1            0.35

Information-theoretic notions

Package ‘mudim’ allows us to compute the most important information-theoretic characteristics of probability distributions described in Section 1.4. Recall that these characteristics are important for the construction of compositional models. Fro their meaning and application to model construction see Section 1.4 and, mainly, Chapter 6.

Shannon entropy

Shannon entropy can be used to quantify the amount of uncertainty in an entire probability distribution

entropy(Pi, base = 2)
## [1] 0.8812909

In other words, the Shannon entropy of a distribution is the expected amount of information in an event drawn from that distribution. It gives a lower bound on the number of bits (if the logarithm is base 2, otherwise the units are different) needed on average to encode symbols drawn from a given distribution.

Recall that the entropy of nearly deterministic distributions (where the outcome is almost certain) is close to zero; distributions that are close to uniform have high entropy.

Kullback-Leibler divergence

Having two probability distributions \(\pi\) and \(\kappa\) over the same set of random variables, we can measurethe difference between these two distributions using the Kullback-Leibler (KL) divergence:

data(Pi); data(Kappa);
variables(Kappa) <- variables(Pi)
KL.divergence(Kappa, Pi)
## [1] 1.821928
KL.divergence(Pi, Kappa)
## Warning in KL.divergence.Distribution(Pi, Kappa):
absolute continuity of
## input distributions not satisfied
## [1] Inf
## Error in KL.divergence.Distribution(Pi, Kappa): Unable
to compute KL divergence for distributions over different
sets variables.

Recall that the KL divergence defined in Section 1.4 has many useful properties:

  • It is non-negative.
  • It is 0 if and only if \(\pi\) and \(\kappa\) are the same distribution in the case of discrete variables (or equal almost everywhere in the case of continuous variables).

Recall also that the compared distributions must be defined for the same set of variables, and if distribution \(\nu\) does not dominate \(\pi\), then \(Div(\pi \shortparallel \nu) = + \infty\). Therefore, if the user tries to compute a divergence between two distributions that are defined for different variables, the function will stop and return an error message. Analogously, if the distribution in the second argument does not dominate the distribution in the first argument, the function return \(+\infty\) and show a warning message.

Mutual information

Mutual information (MI) (also known as the information gain) of two disjoint sets of random variables is a measure of the mutual dependence between the two groups of variables. More specifically, it quantifies the “amount of information” obtained about one set of variables through observing the other set of variables; for more properties see Section1.4.

The higher the value, the stronger dependence exists between the considered two disjoint sets of variables.

MI(coins, K = "X", L = "Y")
## [1] 0
MI(coins, K = c("X","Y"), L = "Z")
## [1] 1

In case that the user tries to compute the mutual information between non-disjoint groups of variables, then function will stop and return an error message.

Conditional mutual information

Analogously to mutual information, one can compute also conditional mutual information. More precisely, for three disjoint groups of variables, and a corresponding probability distribution one can compute conditional mutual information (see Section 1.4). As an example we can take Example 1.2 with three coins. In this case, of course, variables \(X\) and \(Y\) are conditionaly dependent by \(Z\). Note that if one puts M = c() then the function coincides with MI().

conditionalMI(coins, K = "X", L = "Y", M = "Z")
## [1] 2


Multi-information, sometimes called also dependence tightness, total correlation, or informational content (IC) is a relative entropy of a distribution concerning the product of its one-dimensional marginals. Simply, it expresses the loss when substituting a distribution by a product of its one-dimensional marginals.

## [1] 0.005802149

Conditional multiinformation

Analogously to multi-information, one can compute also conditional multi-information. More precisely, for three disjoint groups of variables, and a corresponding probability distribution one can compute conditional mutual information (see Section 1.4).

conditionalIC(Pi, cond = "A", base = 2)
## [1] -0.8812909

Compositional model

The main purpose of mudim is to enable the users comfortable handling multidimensional compositional models, i.e., multidimensional probability distributions assembled from sequences of low-dimensional distributions using the operator of composition. The result of the composition (if defined) is a new distribution. We can iteratively repeat the process of composition to obtain a multidimensional distribution. That is why such a multidimensional distribution can be called a compositional model.

For the purpose of model processing, we will understand by a compositional model the sequence of low-dimensional distribution. Assume a system of \(n\) probability distributions \(\pi_1, \pi_2, \ldots, \pi_n\) defined over sets of variables \(K_1, K_2, \ldots, K_n\), respectively. Thus, in agreement with Chapter 3 the formula \(\pi_1 \triangleright \pi_2 \triangleright \ldots \triangleright \pi_n\), is understood as \[\pi_1 \triangleright \pi_2 \triangleright \pi_3 \triangleright \ldots \triangleright \pi_n = (((\pi_1 \triangleright \pi_2) \triangleright \pi_3) \ldots \triangleright \pi_n)\]

To construct such a model it is sufficient to determine a sequence of low-dimensional distributions \(\pi_1, \pi_2, \ldots , \pi_n\) (sometimes called a generating sequence). Note that there are situations in which the result of the composition is not defined. To be able to store a compositional model of dozens or hundreds of variables, a compositional model is kept using its generating sequence in the computer memory. This, on the other side, brings some troubles when making elementary operations like marginalization, conditioning, etc.

R Object

To start creating your compositional model, it is good to start with creating an empty model - i.e. a compositional model whose generating sequence is empty. Doing this, you create an object of class Model.

m <- Model("test", info = "my first compositional model")
## [1] "Model"  "Object"

By doing this, you have created an empty compositional model referenced by variable m in R. It has a name “test” and additional information “my first compositional model” for internal purposes. The parameter info in Model() function is auxiliary.

Each object of class Model has several slots. In case of a compositional model referenced by m the slots look like this:

  • name - "test"
  • info - "my first compositional model"
  • distributions - list()
  • variables - list()
  • length - 0
  • dim - 0
  • perfect - FALSE

To read more about the internal structure of the Model class object, type ?Model in the console of your RStudio.

Insert distribution

To insert a probability distribution into the generating sequence of a compositional model, we can use functions insert or compose.

# creat a compositional model whose generating 
# sequence has two distributions
insert(model = m, distribution = Pi)
insert(model = m, distribution = Kappa, position = 2)

Similarly, you can access an arbitrary distribution in a compositional model by calling function getDistribution() that has three parameters

  • model respective compositional model
  • k index of the required distribution in the generating sequence
  • ref: logical. If TRUE then a reference is returned and by changing respective probability distribution, you change the generating sequence as well. Otherwise, a copy of the distribution is returned. The default value is TRUE.
getDistribution(m, k = 2)
## Probability distribution 
## * Name:Kappa
## * Info:uniform discrete probability distribution over two variables
## * Variables:B, C
## * Non-empty items:2

Model properties

Every compositional model has several properties. Some of them are related to its structure.

Basic overview

To see the basic statistics about the model, it is enough to type the name of the model, or call function as.character().

## Compositional model 
## * Name:test
## * Info:my first compositional model
## * Variables:B, C, A
## * Length:2

Name and information

For an easier handling of a compositional mode, you case set/change its name and aditional information about it. The usage is the same as in case of an object of Distribution class.

## [1] "test"
info(m) <- "different information"


By the length of a model, we understand the number of elements of its generating sequence. I.e. in case of a model with a generating sequence \(\pi_1, \pi_2\) we say that its length is \(2\). To find the length of the model, use function length().

## [1] 2


By the dimension of a model \(\pi_1, \ldots, \pi_n\), we understand the dimension of the space of the composed probability distribution \(\pi_1 \triangleright \ldots \triangleright \pi_n\). In other words, the dimension corresponds to the number of unique random variables probability distributions \(\pi_1, \ldots, \pi_n\) are defined for.

## [1] 3


Let \(\pi_1(K_1), \pi_2(K_2), \ldots, \pi_n(K_n)\) be the generating sequence of a compositional model. Then the sequence of sets of variables \(K_1, K_2, \ldots, K_n\) is its structure.

## [[1]]
## [1] "A" "B"
## [[2]]
## [1] "B" "C"

Random variables

To get the set of all random variables the given compositional model is defined for, call variables() function.

## [1] "B" "C" "A"

To get the model structure - which is a sequence of sets of variables the


As discussed in Section 3.1, the perfectness of a compositional model is a strong property, however, its validity is not easy to check. Note that the fact whether the model is perfect or not depends on the “numbers” defining the probability distributions, not on the structure of the compositional model. By a structure, we denote the sequence of sets of variables the distributions in the generating sequence are defined for. The ordering of the sets coincides with the ordering of the generating sequence.

If the structure meets the so-called Running Intersection Property (RIP) then a compositional model is called decomposable. To check this, one can use function is.decomposable().

## [1] TRUE

Manipulations with model

Even though the compositional model is internally represented using its generating sequence, it is a probability distribution. Therefore one can manipulate it as a probability distribution


The task studied in this section is the following: for a compositional model \(\pi_1 \triangleright \pi_2 \triangleright \ldots \triangleright \pi_n\), and a subset of variables \(M \subset K_1 \cup K_2 \cup \ldots \cup K_n\) find a compositional model \(\kappa_1 \triangleright \kappa_2 \triangleright \ldots \triangleright \kappa_m\) such that \[(\pi_1 \triangleright \pi_2 \triangleright \ldots \triangleright \pi_n)^{\downarrow M} = \kappa_1 \triangleright \kappa_2 \triangleright \ldots \triangleright \kappa_m\]

To do that with mudim package, use function marginalize with the respective compositional model as its first parameter. The function has five parameters:

  • x: compositional model
  • variables: vector of variables to be either removed or kept in the compositional model
  • keep: logical variable. If TRUE the resulting compositional model is defined over variables. If FALSE, variables are removed from the compositional model. The default value is TRUE.
  • perfect: logical variable. If TRUE, the marginalization algorithm expects a perfect compositional model on the input and some special techniques speeding up the marginalization process can be used. The default value is FALSE.
  • new: logical variable. If TRUE, a compositional model referenced by x is left unchanged and a new compositional model is created and returned by the function. If FALSE, the compositional model referenced by x is changed. The function marginalize does not return anything in that case. The default value is TRUE.
## [1] "D" "N" "R" "T" "W" "U" "B"
## [1] "D" "N" "R" "T" "B"

Note that the original compositional model loaded from the package using the command data(m) remains unchanged. To see that, let us print the vector of random variables the compositional model is defined for.

## [1] "D" "N" "R" "T" "W" "U" "B"

To change the original model referenced by m, set the parameter new to FALSE.

marginalize(m, variables = c("W","U"), keep = FALSE, new = FALSE)
## [1] "D" "N" "R" "T" "B"


Not all compositional models are equally efficient when used for the representation of multidimensional distributions. Among them, so-called perfect models hold an important position. Recall from Section 3.1 that the importance of these models arises from the fact that having a compositional model, each probability distribution from its generating sequence is a marginal of the compositional model. In other words, one can say that a perfect compositional model perfectly reflects all the local information stored in probability distributions of its generating sequence.

If a compositional model is not perfect, one can easily convert it into a perfect one by replacing each member of its generating sequence by a respective marginal as shown in Theorem 3.6. To do it in mudim, one can use the function perfect

mPerfect <- perfect(m, new = TRUE)


We can calculate a conditional compositional model in the case of decomposable models only. This is given by the fact that the conditioning variable has to appear among variables of the first distribution in the model (its generating sequence). In the case of a decomposable model, it is guaranteed that the generating sequence can be always reordered in a way that a given variable appears among arguments of the first distribution in the sequence.

mDecomposable <- toDecomposable(m)
conditioning(mDecomposable, variable = "T", value = 2)

Note that the original model has been changed. If you want to keep the original model, you have to use function copy first.

m2 <- copy(mDecomposable)


The importance of decomposable models is hidden in the fact that most of the computational procedures can be done efficiently using so-called local computations. By this term, one usually denotes computational process realized as a sequence of steps, in which each step performs computations with only one of the distributions, from which the multidimensional model is composed.

To convert a compositional model into its decomposable version type:

## [1] TRUE

Note that in case of some special operations (like conditioning), it is necessary to reorder a decomposable model (its generating sequence) in a way that a specific variable appears among variables of the first distribution in the generating sequence. To do that, use function reorderRIP:

## [[1]]
## [1] "T" "R" "N"
## [[2]]
## [1] "B" "R"
## [[3]]
## [1] "N" "D"
reorderRIP(mDecomposable, root = "D")
# after reordering
## [[1]]
## [1] "N" "D"
## [[2]]
## [1] "T" "R" "N"
## [[3]]
## [1] "B" "R"

Convert to distribution

A compositional model is kept in a form of a generating sequence, e.g. \(\pi_1, \ldots, \pi_n\). If you want to apply all the operators of composition and create a multidimensional probability distribution \(\pi_1 \triangleright \ldots \triangleright \pi_n\), call function toDistribution().

## Probability distribution 
## * Name:composition
## * Info:
## * Variables:N, T, R, B, D
## * Non-empty items:48


For other functionality of the package, it is useful to know the following:

Save and load

To save and load probability distribution as defined in MUDIM package, use R internal functions save() and load() with parameter file to specify the location of the stored object.

## [1] FALSE


mudim package is based on R.oo package that implements methods and classes for object-oriented programming in R. When calling constructor function d <- Distribution("name"), an object of class "Distribution" is created and a pointer to that object is stored in variable d. I.e. if one wants to make a copy of distribution d using command d.copy <- d, just the pointer is copied. I.e. d.copy still points to the same location in memory as d. Therefore modifying d.copy, say by changing variables names, d will also get updated. To avoid this, one has to explicitly copy: d.copy <- copy(d).

## [1] "A" "B"
## [1] "C" "D"
## [1] "C" "D"