Байесовское мультимоделирование (лекции, О.Ю. Бахтеев, В.В. Стрижов)/Осень 2021

Материал из MachineLearning.

(Различия между версиями)

Версия 07:18, 8 сентября 2021

Bayesian model selection and multimodeling

The lecture course delivers the main problem of machine learning, the problem of model selection. One can set a heuristic model and optimise its parameters, or select a model from a class, or make a teacher model to transform its knowledge to a student model, or even make an ensemble from a models. Behind all these strategies there is a fundamental technique: the Bayesian inference. It assumes hypotheses about the measured data set, about the model parameters and even about the model structure. And it deduce the error function to optimise. This is called the Minimum Description Length principle. It selects simple, stable and precise models. This course joins the theory and the practical lab works of the model selection and multimodeling.

Grading

Labs: 6 in total
Forms: 1 in total
Reports: 2 in total

The maximum score is 11, so the final score is MIN(10, score)

Syllabus

8.09 Intro
15.09 Distributions, expectation, likelihood
22.09 Bayesian inference
29.09 MDL, Minimum description length principle
6.10 Probabilistic metric spaces
13.10 Generative and discriminative models
20.10 Data generation, VAE, GAN
27.10 Probabilistic graphical models
3.11 Variational inference
10.11 Variational inference 2
17.11 Hyperparameter optimization
24.11 Meta-optimization
1.12 Bayesian PCA, GLM and NN
8.12 Gaussian processes

Lab works

The parameter space $\mathbb{R}^2\ni\mathbf{w}=[w_1, w_2]\T$ is shown by $x,y$-axes. A function of the parameters, for example, $p(\bw)$ or~$\mathcal{L}(\mathbf{w})$ is shown by $z$-axis. The variance of some functions is shown by an opaque surface over~$z$-axis.

Lab work 0

Plot the stochastic gradient descent vectors and the result average. Here the link to the code.

Lab work 1

Laplace approximation. Sample the parameter space in the neighbourhood of the optimal value~$\mathbf{w}_0$ and draw the error function $S(\mathbf{w}|\mathfrak{D})$, the sampled distribution $p(\mathbf{w}|\mathfrak{D})@ and the Laplace approximation $p(\mathbf{w}|\mathbf{A})$ for the covariance $\mathbf{A}=\alpha \mathbf{I}, \mathbf{A}=\text{diag}\boldsymbol{alpha}$, and positive semidefinite $\mathbf{w}\T\mathbf{A}\mathbf{w} \geq 0$.

Lab work 2

Muitistart and Laplace approximation. Find the problem and the synthetic dataset where the error function $S$ has multiple extremums (various data generation hypotheses are appreciated). Make the Laplace approximation at each extremum point. Check if the covariance matrices the same.

Lab work 3

The regularisation surface. Plot the error function for various types of regularisation: $\ell_2, \ell_1, \ell_2+\ell_1, \ell_\inf, \ell_{frat{1}{2}}$. Decompose it and plot the regularisers separately.

Lab work 4

Plot the hyperparameter estimation sequence over the Metropolis-Hasting sampling procedure steps. The hyperparameters are $\alpha,\beta$ or $\mathbf{A},\mathbf{B}$.

Lab work 5

Compare the hyperparameters, estimated by various procedures.

Lab work 6

The feature selection procedure with change of the parameters’ variance. Set a feature selection algorithms Lasso and LARS. Plot the regularisation coefficient or number of parameters versus variance of parameters and covariance of parameters selected pairs.

Lab work 7

Set an error function with additive regularisers. Sample the lambda metaparameters over optimum value of this function.

Lab work 8

Plot the Pareto-front of the complexity, stability and accuracy over the sampled structured parameters.

Lab work 9

Plot the expectation and the variance of the parameters over the sampled structure parameters. Plot the error function and its variance. Compare two types of models: simple linear and 2NN to show the problem of neuronal interchangeability.

Lab work 10

Sample the empirical joint distribution of the parameters and structure parameters. Compare with the prior distribution of parameters.

Lab work 11

Plot the joint distribution data and parameters for the models of various structured complexity: undertrained, optimal, overtrained. (To discuss how to plot the parameter space of higher dimensions.)

Lab work 12

Plot the distance between the prior and posterior over the steps of the variational inference procedure. Plot these two distributions in the parameter space.

Lab work 13

Plot the regularisation path of the parameters for various hypermodels.

Lab work 14

Plot the error function expectation and its variance over sample size, over complexity.

Lab work 15

Show the parameters’ variance propagation over the layers of a deep network. The hypothesis the variance should increase.

Lab work 16

Compare the convergence to the MDL over various prior distributions of the structure parameters.

Lab work 17

Plot the error function and its variance for the models of insufficient and excessive complexity in the consequent add-del procedure.

Lab work 18

Penalise each structure element of the model with regulariser and its metaparameter $\lambda$. Sample the structure parameters and metaparameters. Plot the error fuction.

Lab work 19

Investigate the data space, plot the data distribution, the source and the target variable.

Lab work 20

Plot the empirical distribution of the model parameters for various data generation hypothesis and various regions of the data space.

References

Books

Bishop
Barber
Murphy
Rasmussen and Williams, of course!
Taboga(to catch up)

Theses

Грабововй А.В. Диссертация.
Бахтеев О.Ю.. Выбор моделей глубокого обучения субоптимальной сложности git, автореферат, презентация (PDF), видео. 2020. МФТИ.
Адуенко А.А. Выбор мультимоделей в задачах классификации, презентация (PDF), видео. 2017. МФТИ.
Кузьмин А.А. | Построение иерархических тематических моделей коллекций коротких текстов, | презентация (PDF), видео. 2017. МФТИ.

Papers

Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica, 2016, 27(3) : 607-624, PDF.
Bakhteev O.Y., Strijov V.V. Deep learning model selection of suboptimal complexity // Automation and Remote Control, 2018, 79(8) : 1474–1488, PDF.
Bakhteev O.Y., Strijov V.V. Comprehensive analysis of gradient-based hyperparameter optimization algorithmss // Annals of Operations Research, 2020 : 1-15, PDF.

Источник — «http://www.machinelearning.ru/wiki/index.php?title=%D0%91%D0%B0%D0%B9%D0%B5%D1%81%D0%BE%D0%B2%D1%81%D0%BA%D0%BE%D0%B5_%D0%BC%D1%83%D0%BB%D1%8C%D1%82%D0%B8%D0%BC%D0%BE%D0%B4%D0%B5%D0%BB%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%28%D0%BB%D0%B5%D0%BA%D1%86%D0%B8%D0%B8%2C_%D0%9E.%D0%AE._%D0%91%D0%B0%D1%85%D1%82%D0%B5%D0%B5%D0%B2%2C_%D0%92.%D0%92._%D0%A1%D1%82%D1%80%D0%B8%D0%B6%D0%BE%D0%B2%29/%D0%9E%D1%81%D0%B5%D0%BD%D1%8C_2021»

@@ Строка 5: / Строка 5: @@
 ==Grading==
-Active participation 1 point, several lab works n points, questions during lectures 1 point, final exam 1 point.
+* Labs: 6 in total
+* Forms: 1 in total
+* Reports: 2 in total
+The maximum score is 11, so the final score is MIN(10, score)
 ==Syllabus==