Statistical model: the essence of the method, construction and analysis

Table of contents:

Statistical model: the essence of the method, construction and analysis
Statistical model: the essence of the method, construction and analysis
Anonim

A statistical model is a mathematical projection that embodies a set of different assumptions about the generation of some sample data. The term is often presented in a much idealized form.

The assumptions expressed in the statistical model show a set of probability distributions. Many of which are intended to correctly approximate the distribution from which a particular set of information is being drawn. The probability distributions inherent in statistical models are what distinguish the projection from other mathematical modifications.

General projection

statistical process models
statistical process models

Mathematical model is a description of the system using certain concepts and language. They apply to the natural sciences (such as physics, biology, earth science, chemistry) and engineering disciplines (such as computer science, electrical engineering), as well as the social sciences (such as economics, psychology, sociology, political science).

The model can help explain the system andstudy the influence of various components, and make predictions of behavior.

Mathematical models can take many forms, including dynamical systems, statistical projections, differential equations, or game-theoretic parameters. These and other types may overlap, and this model includes many abstract structures. In general, mathematical projections can also include logical components. In many cases, the quality of a scientific field depends on how well the theoretically developed mathematical models agree with the results of repeated experiments. Lack of agreement between theoretical processes and experimental measurements often leads to important advances as better theories are developed.

In the physical sciences, the traditional mathematical model contains a large number of the following elements:

  • Control equations.
  • Additional submodels.
  • Define equations.
  • Constituent equations.
  • Assumptions and limitations.
  • Initial and boundary conditions.
  • Classical constraints and kinematic equations.

Formula

A statistical model, as a rule, is set by mathematical equations that combine one or more random variables and, possibly, other naturally occurring variables. Similarly, projection is considered "the formal concept of a concept."

All statistical hypothesis testing and statistical evaluations are earned from mathematical models.

Introduction

statistical mathematical models
statistical mathematical models

Informally, a statistical model can be viewed as an assumption (or set of assumptions) with a specific property: it allows one to calculate the probability of any event. As an example, consider a pair of ordinary six-sided dice. Two different statistical assumptions about the bone need to be explored.

The first assumption is:

For each of the dice, the probability of getting one of the numbers (1, 2, 3, 4, 5, and 6) is: 1/6.

From this assumption, we can calculate the probability of both dice: 1:1/6×1/6=1/36.

More generally, you can calculate the probability of any event. However, it should be understood that it is impossible to calculate the probability of any other non-trivial event.

Only the first opinion collects a statistical mathematical model: due to the fact that with only one assumption it is possible to determine the probability of each action.

In the above sample with initial permission, it is easy to determine the possibility of an event. With some other examples, the calculation may be difficult or even unrealistic (for example, it may require many years of calculations). For a person designing a statistical analysis model, such complexity is considered unacceptable: the implementation of calculations should not be practically impossible and theoretically impossible.

Formal definition

In mathematical terms, the statistical model of a system is usually considered as a pair (S, P), where S isthe set of possible observations, i.e. the sample space, and P is the set of probability distributions on S.

The intuition of this definition is as follows. It is assumed that there is a "true" probability distribution caused by the process that generates certain data.

Set

It is he who determines the parameters of the model. Parameterization generally requires different values to result in different distributions, i.e.

Model Consequence
Model Consequence

must hold (in other words, it must be injective). A parametrization that meets the requirement is said to be identifiable.

Example

Statistics Graph
Statistics Graph

Assume that there are some number of students who are of different ages. The height of the child will be stochastically related to the year of birth: for example, when a schoolboy is 7 years old, this affects the probability of growth, only so that the person will be taller than 3 centimeters.

You can formalize this approach into a rectilinear regression model, for example, as follows: height i=b 0 + b 1agei + εi, where b 0 is the intersection, b 1 is the parameter by which the age is multiplied when obtaining elevation monitoring. This is an error term. That is, it assumes that height is predicted by age with a certain error.

A valid form must match all information points. Thus, the rectilinear direction (level i=b 0 + b 1agei) is not capable of being an equation for a data model - if it does not clearly answer absolutely all points. I.ewithout exception, all information lies flawlessly on the line. The margin of error εi must be entered into the equation so that the form matches absolutely all items of information.

To make a statistical inference, we first need to accept some probability distributions for ε i. For example, one can assume that the distributions of ε i have a Gaussian shape with zero mean. In this case, the model will have 3 parameters: b 0, b 1 and the variance of the Gaussian distribution.

You can formally specify the model as (S, P).

In this example, the model is defined by specifying S and so some assumptions can be made about P. There are two options:

This growth can be approximated by a linear function of age;

That the errors in the approximation are distributed as inside a Gaussian.

General remarks

Statistical parameters of models are a special class of mathematical projection. What makes one species different from another? So it is that the statistical model is non-deterministic. Thus, in it, unlike mathematical equations, certain variables do not have certain values, but instead have a distribution of possibilities. That is, individual variables are considered stochastic. In the example above, ε is a stochastic variable. Without it, the projection would be deterministic.

Building a statistical model is often used, even if the material process is considered to be deterministic. For example, tossing coins is, in principle, a predetermining action. However, this is still in most cases modeled as stochastic (through a Bernoulli process).

According to Konishi and Kitagawa, there are three goals for a statistical model:

  • Predictions.
  • Information mining.
  • Description of stochastic structures.

Projection size

Assume there is a statistical prediction model, The model is called parametric if O has a finite dimension. In the solution, you must write that

Model difference
Model difference

where k is a positive integer (R stands for any real numbers). Here k is called the dimension of the model.

As an example, we can assume that all data arises from a univariate Gaussian distribution:

Statistics Formula
Statistics Formula

In this example, the dimension of k is 2.

And as another example, the data can be assumed to consist of (x, y) points, which are assumed to be distributed in a straight line with Gaussian residuals (with zero mean). Then the dimension of the statistical economic model is equal to 3: the intersection of the line, its slope and the variance of the distribution of residuals. It should be noted that in geometry a straight line has a dimension of 1.

Although the above value is technically the only parameter that has dimension k, it is sometimes considered to contain k distinct values. For example, with a one-dimensional Gaussian distribution, O is the only parameter with a size of 2, but is sometimes considered to contain twoindividual parameter - mean value and standard deviation.

A statistical process model is non-parametric if the set of O values is infinite-dimensional. It is also semi-parametric if it has both finite-dimensional and infinite-dimensional parameters. Formally, if k is a dimension of O and n is the number of samples, semi-parametric and non-parametric models have

Model Formula
Model Formula

then the model is semi-parametric. Otherwise, the projection is non-parametric.

Parametric models are the most commonly used statistics. Regarding semi-parametric and non-parametric projections, Sir David Cox stated:

"Typically, they involve the fewest hypotheses about texture and distribution shape, but they include powerful theories about self-sufficiency."

Nested models

Don't confuse them with multilevel projections.

Two statistical models are nested if the first can be converted to the second by imposing constraints on the parameters of the first. For example, the set of all Gaussian distributions has a nested set of zero-mean distributions:

That is, you need to limit the mean in the set of all Gaussian distributions to get distributions with zero mean. As a second example, the quadratic model y=b 0 + b 1 x + b 2 x 2 + ε, ε ~N (0, σ 2) has an embedded linear model y=b 0 + b 1 x + ε, ε ~ N (0,σ 2) - i.e. parameter b2 is equal to 0.

In both of these examples, the first model has a higher dimensionality than the second model. This is often, but not always the case. Another example is the set of Gaussian distributions with positive mean, which has dimension 2.

Comparison of models

statistical model
statistical model

It is assumed that there is a "true" probability distribution underlying the observed data induced by the process that generated it.

And also models can be compared with each other, using exploratory analysis or confirmatory. In an exploratory analysis, different models are formulated and an assessment is made of how well each of them describes the data. In a confirmatory analysis, the previously formulated hypothesis is compared with the original one. Common criteria for this include P 2, Bayesian factor and relative probability.

Konishi and Kitagawa's Thought

“Most problems in a statistical mathematical model can be thought of as predictive questions. They are usually formulated as comparisons of several factors.”

Furthermore, Sir David Cox said: "As a translation from the topic, the problem in the statistical model is often the most important part of the analysis."

Recommended: