# Байесовское мультимоделирование (лекции, О.Ю. Бахтеев, В.В. Стрижов)/Осень 2021

(Различия между версиями)
Перейти к: навигация, поиск
 Версия 21:06, 7 сентября 2021 (править)← К предыдущему изменению Текущая версия (07:19, 8 сентября 2021) (править) (отменить) (→Bayesian model selection and multimodeling) (2 промежуточные версии не показаны) Строка 1: Строка 1: __NOTOC__ __NOTOC__ ==Bayesian model selection and multimodeling== ==Bayesian model selection and multimodeling== + + Course page: https://github.com/Intelligent-Systems-Phystech/BMM-21 The lecture course delivers the main problem of machine learning, the problem of model selection. One can set a heuristic model and optimise its parameters, or select a model from a class, or make a teacher model to transform its knowledge to a student model, or even make an ensemble from a models. Behind all these strategies there is a fundamental technique: the Bayesian inference. It assumes hypotheses about the measured data set, about the model parameters and even about the model structure. And it deduce the error function to optimise. This is called the Minimum Description Length principle. It selects simple, stable and precise models. This course joins the theory and the practical lab works of the model selection and multimodeling. The lecture course delivers the main problem of machine learning, the problem of model selection. One can set a heuristic model and optimise its parameters, or select a model from a class, or make a teacher model to transform its knowledge to a student model, or even make an ensemble from a models. Behind all these strategies there is a fundamental technique: the Bayesian inference. It assumes hypotheses about the measured data set, about the model parameters and even about the model structure. And it deduce the error function to optimise. This is called the Minimum Description Length principle. It selects simple, stable and precise models. This course joins the theory and the practical lab works of the model selection and multimodeling. ==Grading== ==Grading== - Active participation 1 point, several lab works n points, questions during lectures 1 point, final exam 1 point. + * Labs: 6 in total + * Forms: 1 in total + * Reports: 2 in total + + The maximum score is 11, so the final score is MIN(10, score) ==Syllabus== ==Syllabus== Строка 23: Строка 29: # 8.12 Gaussian processes # 8.12 Gaussian processes - == Lab works == - The parameter space $\mathbb{R}^2\ni\mathbf{w}=[w_1, w_2]\T$ is shown by $x,y$-axes. A function of the parameters, for example, $p(\bw)$ or~$\mathcal{L}(\mathbf{w})$ is shown by $z$-axis. The variance of some functions is shown by an opaque surface over~$z$-axis. - - ===Lab work 0=== - Plot the stochastic gradient descent vectors and the result average. Here the link to the code. - - ===Lab work 1=== - Laplace approximation. Sample the parameter space in the neighbourhood of the optimal value~$\mathbf{w}_0$ and draw the error function $S(\mathbf{w}|\mathfrak{D})$, the sampled distribution $p(\mathbf{w}|\mathfrak{D})@ and the Laplace approximation$p(\mathbf{w}|\mathbf{A})$for the covariance$\mathbf{A}=\alpha \mathbf{I}, \mathbf{A}=\text{diag}\boldsymbol{alpha}$, and positive semidefinite$\mathbf{w}\T\mathbf{A}\mathbf{w} \geq 0$. - - ===Lab work 2=== - Muitistart and Laplace approximation. Find the problem and the synthetic dataset where the error function$S$has multiple extremums (various data generation hypotheses are appreciated). Make the Laplace approximation at each extremum point. Check if the covariance matrices the same. - - ===Lab work 3=== - The regularisation surface. Plot the error function for various types of regularisation:$\ell_2, \ell_1, \ell_2+\ell_1, \ell_\inf, \ell_{frat{1}{2}}$. Decompose it and plot the regularisers separately. - - ===Lab work 4=== - Plot the hyperparameter estimation sequence over the Metropolis-Hasting sampling procedure steps. The hyperparameters are$\alpha,\beta$or$\mathbf{A},\mathbf{B}$. - - ===Lab work 5=== - Compare the hyperparameters, estimated by various procedures. - - ===Lab work 6=== - The feature selection procedure with change of the parameters’ variance. Set a feature selection algorithms Lasso and LARS. Plot the regularisation coefficient or number of parameters versus variance of parameters and covariance of parameters selected pairs. - - ===Lab work 7=== - Set an error function with additive regularisers. Sample the lambda metaparameters over optimum value of this function. - - ===Lab work 8=== - Plot the Pareto-front of the complexity, stability and accuracy over the sampled structured parameters. - - ===Lab work 9=== - Plot the expectation and the variance of the parameters over the sampled structure parameters. Plot the error function and its variance. Compare two types of models: simple linear and 2NN to show the problem of neuronal interchangeability. - - ===Lab work 10=== - Sample the empirical joint distribution of the parameters and structure parameters. Compare with the prior distribution of parameters. - - ===Lab work 11=== - Plot the joint distribution data and parameters for the models of various structured complexity: undertrained, optimal, overtrained. (To discuss how to plot the parameter space of higher dimensions.) - - ===Lab work 12=== - Plot the distance between the prior and posterior over the steps of the variational inference procedure. Plot these two distributions in the parameter space. - - ===Lab work 13=== - Plot the regularisation path of the parameters for various hypermodels. - - ===Lab work 14=== - Plot the error function expectation and its variance over sample size, over complexity. - - ===Lab work 15=== - Show the parameters’ variance propagation over the layers of a deep network. The hypothesis the variance should increase. - - ===Lab work 16=== - Compare the convergence to the MDL over various prior distributions of the structure parameters. - - ===Lab work 17=== - Plot the error function and its variance for the models of insufficient and excessive complexity in the consequent add-del procedure. - - ===Lab work 18=== - Penalise each structure element of the model with regulariser and its metaparameter$\lambda\$. Sample the structure parameters and metaparameters. Plot the error fuction. - - ===Lab work 19=== - Investigate the data space, plot the data distribution, the source and the target variable. - - ===Lab work 20=== - Plot the empirical distribution of the model parameters for various data generation hypothesis and various regions of the data space.

## Bayesian model selection and multimodeling

The lecture course delivers the main problem of machine learning, the problem of model selection. One can set a heuristic model and optimise its parameters, or select a model from a class, or make a teacher model to transform its knowledge to a student model, or even make an ensemble from a models. Behind all these strategies there is a fundamental technique: the Bayesian inference. It assumes hypotheses about the measured data set, about the model parameters and even about the model structure. And it deduce the error function to optimise. This is called the Minimum Description Length principle. It selects simple, stable and precise models. This course joins the theory and the practical lab works of the model selection and multimodeling.

• Labs: 6 in total
• Forms: 1 in total
• Reports: 2 in total

The maximum score is 11, so the final score is MIN(10, score)

## Syllabus

1. 8.09 Intro
2. 15.09 Distributions, expectation, likelihood
3. 22.09 Bayesian inference
4. 29.09 MDL, Minimum description length principle
5. 6.10 Probabilistic metric spaces
6. 13.10 Generative and discriminative models
7. 20.10 Data generation, VAE, GAN
8. 27.10 Probabilistic graphical models
9. 3.11 Variational inference
10. 10.11 Variational inference 2
11. 17.11 Hyperparameter optimization
12. 24.11 Meta-optimization
13. 1.12 Bayesian PCA, GLM and NN
14. 8.12 Gaussian processes

## References

### Books

1. Bishop
2. Barber
3. Murphy
4. Rasmussen and Williams, of course!
5. Taboga(to catch up)

### Theses

1. Грабововй А.В. Диссертация.
2. Бахтеев О.Ю.. Выбор моделей глубокого обучения субоптимальной сложности git, автореферат, презентация (PDF), видео. 2020. МФТИ.
3. Адуенко А.А. Выбор мультимоделей в задачах классификации, презентация (PDF), видео. 2017. МФТИ.
4. Кузьмин А.А. | Построение иерархических тематических моделей коллекций коротких текстов, | презентация (PDF), видео. 2017. МФТИ.

### Papers

1. Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica, 2016, 27(3) : 607-624, PDF.
2. Bakhteev O.Y., Strijov V.V. Deep learning model selection of suboptimal complexity // Automation and Remote Control, 2018, 79(8) : 1474–1488, PDF.
3. Bakhteev O.Y., Strijov V.V. Comprehensive analysis of gradient-based hyperparameter optimization algorithmss // Annals of Operations Research, 2020 : 1-15, PDF.