Байесовское мультимоделирование (лекции, О.Ю. Бахтеев, В.В. Стрижов)/Осень 2021

Материал из MachineLearning.

Перейти к: навигация, поиск

Bayesian model selection and multimodeling

The lecture course delivers the main problem of machine learning, the problem of model selection. One can set a heuristic model and optimise its parameters, or select a model from a class, or make a teacher model to transform its knowledge to a student model, or even make an ensemble from a models. Behind all these strategies there is a fundamental technique: the Bayesian inference. It assumes hypotheses about the measured data set, about the model parameters and even about the model structure. And it deduce the error function to optimise. This is called the Minimum Description Length principle. It selects simple, stable and precise models. This course joins the theory and the practical lab works of the model selection and multimodeling.


Active participation 2 points, two lab works 3+3 points, questions during lectures 2 points, final exam 1 point.


  1. 8.09 Intro
  2. 15.09 Distributions, expectation, likelihood
  3. 22.09 Bayesian inference
  4. 29.09 MDL, Minimum description length principle
  5. 6.10 Probabilistic metric spaces
  6. 13.10 Generative and discriminative models
  7. 20.10 Data generation, VAE, GAN
  8. 27.10 Probabilistic graphical models
  9. 3.11 Variational inference
  10. 10.11 Variational inference 2
  11. 17.11 Hyperparameter optimization
  12. 24.11 Meta-optimization
  13. 1.12 Bayesian PCA, GLM and NN
  14. 8.12 Gaussian processes

Lab works

The parameter space $\mathbb{R}^2\ni\mathbf{w}=[w_1, w_2]\T$ is shown by $x,y$-axes. A function of the parameters, for example, $p(\bw)$ or~$\mathcal{L}(\mathbf{w})$ is shown by $z$-axis. The variance of some functions is shown by an opaque surface over~$z$-axis.

Lab work 0

Plot the stochastic gradient descent vectors and the result average. Here the link to the code.

Lab work 1

Laplace approximation. Sample the parameter space in the neighbourhood of the optimal value~$\mathbf{w}_0$ and draw the error function $S(\mathbf{w}|\mathfrak{D})$, the sampled distribution $p(\mathbf{w}|\mathfrak{D})@ and the Laplace approximation $p(\mathbf{w}|\mathbf{A})$ for the covariance $\mathbf{A}=\alpha \mathbf{I}, \mathbf{A}=\text{diag}\boldsymbol{alpha}$, and positive semidefinite $\mathbf{w}\T\mathbf{A}\mathbf{w} \geq 0$.

Lab work 2

Muitistart and Laplace approximation. Find the problem and the synthetic dataset where the error function $S$ has multiple extremums (various data generation hypotheses are appreciated). Make the Laplace approximation at each extremum point. Check if the covariance matrices the same.

Lab work 3

The regularisation surface. Plot the error function for various types of regularisation: $\ell_2, \ell_1, \ell_2+\ell_1, \ell_\inf, \ell_{frat{1}{2}}$. Decompose it and plot the regularisers separately.

Lab work 4

Plot the hyperparameter estimation sequence over the Metropolis-Hasting sampling procedure steps. The hyperparameters are $\alpha,\beta$ or $\mathbf{A},\mathbf{B}$.

Lab work 5

Compare the hyperparameters, estimated by various procedures.

Lab work 6

The feature selection procedure with change of the parameters’ variance. Set a feature selection algorithms Lasso and LARS. Plot the regularisation coefficient or number of parameters versus variance of parameters and covariance of parameters selected pairs.

Lab work 7

Set an error function with additive regularisers. Sample the lambda metaparameters over optimum value of this function.

Lab work 8

Plot the Pareto-front of the complexity, stability and accuracy over the sampled structured parameters.

Lab work 9

Plot the expectation and the variance of the parameters over the sampled structure parameters. Plot the error function and its variance. Compare two types of models: simple linear and 2NN to show the problem of neuronal interchangeability.

Lab work 10

Sample the empirical joint distribution of the parameters and structure parameters. Compare with the prior distribution of parameters.

Lab work 11

Plot the joint distribution data and parameters for the models of various structured complexity: undertrained, optimal, overtrained. (To discuss how to plot the parameter space of higher dimensions.)

Lab work 12

Plot the distance between the prior and posterior over the steps of the variational inference procedure. Plot these two distributions in the parameter space.

Lab work 13

Plot the regularisation path of the parameters for various hypermodels.

Lab work 14

Plot the error function expectation and its variance over sample size, over complexity.

Lab work 15

Show the parameters’ variance propagation over the layers of a deep network. The hypothesis the variance should increase.

Lab work 16

Compare the convergence to the MDL over various prior distributions of the structure parameters.

Lab work 17

Plot the error function and its variance for the models of insufficient and excessive complexity in the consequent add-del procedure.

Lab work 18

Penalise each structure element of the model with regulariser and its metaparameter $\lambda$. Sample the structure parameters and metaparameters. Plot the error fuction.

Lab work 19

Investigate the data space, plot the data distribution, the source and the target variable.

Lab work 20

Plot the empirical distribution of the model parameters for various data generation hypothesis and various regions of the data space.



  1. Bishop
  2. Barber
  3. Murphy
  4. Rasmussen and Williams, of course!
  5. Taboga(to catch up)

Our thesises

  1. Грабововй А.В. Диссертация.
  2. Бахтеев О.Ю.. Выбор моделей глубокого обучения субоптимальной сложности git, автореферат, презентация (PDF), видео. 2020. МФТИ.
  3. Адуенко А.А. Выбор мультимоделей в задачах классификации, презентация (PDF), видео. 2017. МФТИ.
  4. Кузьмин А.А. | Построение иерархических тематических моделей коллекций коротких текстов, | презентация (PDF), видео. 2017. МФТИ.


Личные инструменты