Bayesian networks: definition, examples and how they work

Table of contents:

Bayesian networks: definition, examples and how they work
Bayesian networks: definition, examples and how they work
Anonim

A belief network, a decision network, a Bayesian (ian) model, or a probabilistically driven acyclic graph model, is a variant schema (a type of statistical model) that represents a set of variables and their conditional dependencies through a directed acyclic graph (DAG).

For example, a Bayesian network can represent probabilistic relationships between diseases and symptoms. Given the latter, the network can be used to calculate the possibility of having various diseases. In the video below you can see an example of a Bayesian belief network with calculations.

Image
Image

Efficiency

Efficient algorithms can perform inference and learning on Bayesian networks. Networks that model variables (such as speech signals or protein sequences) are called dynamic networks. Generalizations of Bayesian networks that can represent and solve problems under uncertainty are called influence diagrams.

Essence

FormallyBayesian networks are DAGs whose nodes represent variables in the Bayesian sense: they can be observed values, hidden variables, unknown parameters, or hypotheses. Because it is very interesting.

Bayesian network example

Two events can cause grass to get wet: an active sprinkler or rain. Rain has a direct effect on the use of the sprinkler (namely, that when it rains, the sprinkler is usually inactive). This situation can be modeled using a Bayesian network.

Typical formula
Typical formula

Simulation

Because the Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, it can be used to update knowledge about the state of a subset of variables when other data (evidence variables) are observed. This interesting process is called probabilistic inference.

A posteriori gives a universally sufficient statistic for discovery applications when choosing values for a subset of variables. Thus, this algorithm can be considered a mechanism for automatically applying Bayes' theorem to complex problems. In the pictures in the article you can see examples of Bayesian belief networks.

Practical Bayesian network
Practical Bayesian network

Output Methods

The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the unobservablenon-query parameters one by one by allocating the amount to the product.

Click propagation of a "tree" that caches calculations so that many variables can be queried at once and new proofs can be propagated quickly; and recursive matching and/or searching, which allow trade-offs between space and time and match the efficiency of variable elimination when enough space is used.

All these methods have a special complexity that depends exponentially on the length of the network. The most common approximate inference algorithms are mini-segment elimination, cyclic belief propagation, generalized belief propagation, and variational methods.

Types of networks
Types of networks

Networking

To fully specify the Bayesian network and thus fully represent the joint probability distribution, it is necessary to specify for each node X the probability distribution for X due to the parents of X.

The distribution of X conditionally by its parents can have any form. It is common to work with discrete or Gaussian distributions as it simplifies calculations. Sometimes only distribution constraints are known. You can then use entropy to determine the single distribution that has the highest entropy given the constraints.

Similarly, in the specific context of a dynamic Bayesian network, the conditional distribution for the temporal evolution of the latentstate is usually set to maximize the entropy rate of the implied random process.

Bayesian web of trust
Bayesian web of trust

Direct maximization of probability (or posterior probability) is often tricky given the presence of unobserved variables. This is especially true for a Bayesian decision network.

Classic approach

The classic approach to this problem is the expectation maximization algorithm, which alternates computing the expected values of unobserved variables dependent on the observed data with maximizing the total probability (or posterior value), assuming that the previously computed expected values are correct. Under conditions of moderate regularity, this process converges in the maximum (or maximum a posteriori) values of the parameters.

A more complete Bayesian approach to parameters is to treat them as additional unobserved variables and calculate the full posterior distribution over all nodes given the observed data, and then integrate the parameters. This approach can be costly and result in large models, making classic parameter tuning approaches more accessible.

In the simplest case, a Bayesian network is defined by an expert and then used to perform inference. In other applications, the task of determining is too difficult for a human. In this case, the structure of the Bayesian neural network and the parameters of local distributions must be learned among the data.

Bayesian networks
Bayesian networks

Alternative method

An alternative method of structured learning uses optimization search. This requires the application of an evaluation function and a search strategy. A common scoring algorithm is the posterior probability of a structure given training data such as BIC or BDeu.

The time required for an exhaustive search returning a structure that maximizes the score is superexponential in the number of variables. The local search strategy makes incremental changes to improve structure estimation. Friedman and his colleagues considered using mutual information between variables to find the desired structure. They restrict the set of parent candidates to k nodes and search them thoroughly.

A particularly fast method for studying BN exactly is to imagine the problem as an optimization problem and solve it using integer programming. Acyclicity constraints are added to the integer program (IP) during solution in the form of cutting planes. Such a method can handle problems up to 100 variables.

Graphs and networks
Graphs and networks

Problem Solving

To solve problems with thousands of variables, a different approach is needed. One is to first choose one order and then find the optimal BN structure with respect to that order. This implies working in the search space of possible ordering, which is convenient because it is smaller than the space of network structures. Several orders are then selected and evaluated. This method turned outbest available in the literature when the number of variables is huge.

Another method is to focus on a subclass of decomposable models for which MLEs are closed. Then you can find a consistent structure for hundreds of variables.

Studying Bayesian networks with a limited width of three lines is necessary to provide accurate, interpretable inference, since the worst-case complexity of the latter is exponential in tree length k (according to the exponential time hypothesis). However, as a global property of the graph, it greatly increases the complexity of the learning process. In this context, K-tree can be used for effective learning.

Short network
Short network

Development

Development of a Bayesian Web of Trust often begins with the creation of a DAG G such that X satisfies a local Markov property with respect to G. Sometimes this is a causal DAG. The conditional probability distributions of each variable over its parents in G are estimated. In many cases, in particular when the variables are discrete, if the joint distribution of X is the product of these conditional distributions, then X becomes a Bayesian network with respect to G.

Markov's "knot blanket" is a set of knots. The Markov quilt makes the node independent of the rest of the blank of the node of the same name and is sufficient knowledge to calculate its distribution. X is a Bayesian network with respect to G if each node is conditionally independent of all other nodes, given its Markovianblanket.

Recommended: