Machine Learning and Data Analysis (Strijov practice)/Group 074, Fall 2013

Материал из MachineLearning.

(Различия между версиями)
Перейти к: навигация, поиск
(Problems)
 
(15 промежуточных версий не показаны.)
Строка 2: Строка 2:
__NOTOC__
__NOTOC__
-
Проекты с готовой Web-частью располагаются по адресу http://mvr.jmlda.org
+
The completed projects are located at http://mvr.jmlda.org
-
== Задачи ==
+
== Problems ==
{|class="wikitable"
{|class="wikitable"
|-
|-
-
! Автор
+
! Author
-
! Название задачи
+
! Problem name
-
! Ссылка
+
! Link
![BMF]LSICUDTPRWS
![BMF]LSICUDTPRWS
-
! Сумма
+
! Total
-
! Оценка
+
! Grade
|-
|-
|Bunakov Vasiliy
|Bunakov Vasiliy
-
|Signature Recognition
+
|Fraud Signature Recognition Using SVM Method
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Bunakov2013SignatureRecognition/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Bunakov2013SignatureRecognition/]
| [BM+F]L+SI+CU-DTPRWS
| [BM+F]L+SI+CU-DTPRWS
|14.5
|14.5
-
|
+
|10
|-
|-
-
|Vdovina Yevgeniya
+
|Vdovina Evgeniya
|Visualization of Results of Keyword Groups Mapping
|Visualization of Results of Keyword Groups Mapping
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Vdovina2013MappingResultsVisualization/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Vdovina2013MappingResultsVisualization/]
| [BF]L-S+I+C0DT-0R-0S
| [BF]L-S+I+C0DT-0R-0S
|9.75
|9.75
-
|
+
|5
|-
|-
|Voronov Sergey
|Voronov Sergey
-
|Google Steet View text detection and recognition
+
|Google Steet View Text Detection and Recognition
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Voronov2013TextRecognition/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Voronov2013TextRecognition/]
| [BM+F]LS-I+CU+DTP+R-W+S--
| [BM+F]LS-I+CU+DTP+R-W+S--
|14.25
|14.25
-
|
+
|10
|-
|-
|Grinchuk Oleg
|Grinchuk Oleg
-
|Macroeconomic conditions forecasting
+
|Macroeconomic Conditions Forecasting
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Grinchuk2013InverseVAR/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Grinchuk2013InverseVAR/]
| [BMF]L-SI-C-0DTPRWS
| [BMF]L-SI-C-0DTPRWS
|12.25
|12.25
-
|
+
|8
|-
|-
|Dubovik Anna
|Dubovik Anna
|Classification and Exploring of Source Code of Python Projects.
|Classification and Exploring of Source Code of Python Projects.
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Dubovik2013ProjectCodeClassifying/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Dubovik2013ProjectCodeClassifying/]
-
| [M]L0I-->>>000
+
| [M]L0I-->>>000C--
-
|
+
| 2.5
|
|
|-
|-
Строка 52: Строка 52:
|Automatic Filters Generator for Gmail
|Automatic Filters Generator for Gmail
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhelavskaya2013FiltersGenerator/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhelavskaya2013FiltersGenerator/]
-
| [BM+]LS->>>>>00I
+
| [BM+F]LS->>>>>00IC-U-D-TP--R-W--S-
-
|
+
|11.75
-
|
+
|7
|-
|-
|Zhuykov Vladimir
|Zhuykov Vladimir
-
|Signature Recognition
+
|Fraud Signature Recognition
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhuykov2013SignatureRecognition/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhuykov2013SignatureRecognition/]
-
| [B]L--0I-->>>>>
+
| [B]L--0I--C--0D--00000
-
|
+
|3
|
|
|-
|-
Строка 73: Строка 73:
|Detecting Unsolicited SMS Messages
|Detecting Unsolicited SMS Messages
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/IvanovA2013DetectingSMSSpam/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/IvanovA2013DetectingSMSSpam/]
-
| [BM+]LSIC->>U>DTPR
+
| [BM+F]LSIC->>U>DTPR0S-
-
|
+
| 12.75
-
|
+
|8
|-
|-
|Kasatkin Sergey
|Kasatkin Sergey
-
|Determination of the type of human activity based on the data from the accelerometer
+
|Determination of the Type of Human Activity Based on the Data from the Accelerometer
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kasatkin2013Accelerometer/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kasatkin2013Accelerometer/]
-
| [B]L-S-I-->>>000
+
| [BF]L-S-I-->>>000C-U-DT-P--R--W-S-
-
|
+
| 9.75
-
|
+
|5
|-
|-
|Katrutsa Aleksandr
|Katrutsa Aleksandr
-
|Search engine results ranking
+
|Search Engine Results Ranking
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Katrutsa2013PageRank/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Katrutsa2013PageRank/]
| [BM+F]L+SI+CUDTPR+W+S
| [BM+F]L+SI+CUDTPR+W+S
|15.25
|15.25
-
|
+
|10
|-
|-
|Kolchanov Andrey
|Kolchanov Andrey
-
|The financial bubbles definition in the stock data
+
|The Financial Bubbles Detection in The Stock Data
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kolchanov2013FinancialBubbles/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kolchanov2013FinancialBubbles/]
| [B]0S-I->>>
| [B]0S-I->>>
Строка 101: Строка 101:
|Classify Handwritten Digits
|Classify Handwritten Digits
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kostin2013ClassifyHandwrittenDigits/]
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kostin2013ClassifyHandwrittenDigits/]
-
| [B]L+S-IS-
+
| [BF]L+S-IC-
-
|
+
|5.75
-
|
+
|1
|-
|-
-
|Kotenko Lengold Yekaterina
+
|Kotenko Lengold Ekaterina
-
|Satellite imagery processing for NDVI estimation
+
|Satellite Imagery Processing for NDVI Estimation
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kudryashova.Kotenko.NDVI/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kudryashova.Kotenko.NDVI/]
| [BMF-]L-S-IC-UD--000W--S--
| [BMF-]L-S-IC-UD--000W--S--
|8.5
|8.5
-
|
+
|4
|-
|-
|Kudryashova Aleksandra
|Kudryashova Aleksandra
-
|Satellite imagery processing for NDVI estimation
+
|Satellite Imagery Processing for NDVI Estimation
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kudryashova.Kotenko.NDVI/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kudryashova.Kotenko.NDVI/]
| [BMF-]L-S-IC-UD--000W--S--
| [BMF-]L-S-IC-UD--000W--S--
|8.5
|8.5
-
|
+
|4
|-
|-
|Levdik Pavel
|Levdik Pavel
-
|Electricity prices forecasting
+
|Electricity Prices Forecasting
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Levdik2013Forecasting/]
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Levdik2013Forecasting/]
| [BM+]L-SIC--U-D->PR-W>
| [BM+]L-SIC--U-D->PR-W>
| 9.75
| 9.75
-
|
+
|5
|-
|-
|Matrosov Mikhail
|Matrosov Mikhail
-
|Short-term forecasting of musical compositions
+
|Short-term Forecasting of Musical Compositions
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Matrosov2013MusicForecasting/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Matrosov2013MusicForecasting/]
-
| [BF]L-SIC--UDT>>W+S
+
| [BF]L-SIC-UDTPRW+S
-
|9.5
+
|12.75
-
|
+
|8
|-
|-
|Mityashov Andrey
|Mityashov Andrey
-
|Unstructured social data processing in classification problem
+
|Unstructured Social Data Processing in Classification Problem
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Mityashov2013ClassificationSocialData/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Mityashov2013ClassificationSocialData/]
| [M+F]L+SI--C-UDT--P00S-
| [M+F]L+SI--C-UDT--P00S-
|10
|10
-
|
+
|5
|-
|-
|Neklyudov Kirill
|Neklyudov Kirill
-
|Face recognition
+
|Face Recognition
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Neklyudov2013FacialKeypointsDetection/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Neklyudov2013FacialKeypointsDetection/]
| [BM+F]LS-I+CU-DTPR-WS-
| [BM+F]LS-I+CU-DTPR-WS-
|13.5
|13.5
-
|
+
|9
|-
|-
|Perekrestenko Dmitriy
|Perekrestenko Dmitriy
-
|Human activity recognition
+
|Human Activity Recognition Using Deep Learning
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Perekrestenko2013Accelerometer/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Perekrestenko2013Accelerometer/]
| [BM+F]L-SI-CU-DTPRW+S
| [BM+F]L-SI-CU-DTPRW+S
|13.75
|13.75
-
|
+
|9
|-
|-
|Prilepskiy Roman
|Prilepskiy Roman
-
|Text Location and recognition on Google Street View Images.
+
|Text Detection on Google Street View Images.
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Prilepskiy2013GoogleStreetView/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Prilepskiy2013GoogleStreetView/]
-
| [B]L+00>>>000
+
| [BF]L+0I>>>C--0D--00R-W-S--
-
|
+
|7.25
-
|
+
|3
|-
|-
|Pushnyakov Aleksey
|Pushnyakov Aleksey
-
|Color image segmentation
+
|Color Image Segmentation
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Pushnyakov2013ImageSegmentation/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Pushnyakov2013ImageSegmentation/]
| [BM+F]L+S+I+C+UDT+P+R+W+S
| [BM+F]L+S+I+C+UDT+P+R+W+S
|16.25
|16.25
-
|
+
|10
|-
|-
|Ryskina Mariya
|Ryskina Mariya
-
|Topic modeling using PLSA
+
|Topic Modeling Using PLSA algorithm
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Ryskina2013TopicModelPLSA/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Ryskina2013TopicModelPLSA/]
| [BM+F]L-S+I+CUDT+PR+W+S
| [BM+F]L-S+I+CUDT+PR+W+S
|15.25
|15.25
-
|
+
|10
|-
|-
|Stenin Sergey
|Stenin Sergey
-
|Detection of topically similar abstracts of scientific conference
+
|Detection of Topically Similar Abstracts of Scientific Conference
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Stenin2013Clustering/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Stenin2013Clustering/]
-
| [B]L+S+I+CUD
+
| [BF]L+S+I+CUDT-0R-WS
-
|
+
|12.25
-
|
+
|8
|-
|-
|Urzhumtsev Oleg
|Urzhumtsev Oleg
-
|Similar conferences abstract search
+
|Similar Conferences Abstract Search
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Urzhumtsev2013Dictionary/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Urzhumtsev2013Dictionary/]
| [BM+F]L-S-IC>D>>R--WS
| [BM+F]L-S-IC>D>>R--WS
|10.25
|10.25
-
|
+
|6
|-
|-
|Feyzkhanov Rustem
|Feyzkhanov Rustem
-
|Email filter generation
+
|Email Filter Generation
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Feyzkhanov2013FilterEmail/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Feyzkhanov2013FilterEmail/]
-
| [BM+F-]LS-IC--U->(D-T)>>PR
+
| [BM+F-]LS-IC--U->(D-T)>>PRWS-
-
|
+
| 12.5
-
|
+
|8
|-
|-
|Shuyskiy Nikolay
|Shuyskiy Nikolay
-
|Melody recognition
+
|Melody Recognition using Spectral Analysis
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Shuyskiy2013MelodyRecognition/]
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Shuyskiy2013MelodyRecognition/]
-
| [B]0S-0>>>>>
+
| [B]L-S-IC--0D-T--0R-W--S-
-
|
+
|7.25
-
|
+
|3
|-
|-
|Yashkov Daniil
|Yashkov Daniil
-
|Face detection
+
|Face Detection Using Viola-Jones
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Yashkov2013FaceDetection/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Yashkov2013FaceDetection/]
-
| [M+F]L-S-IC->>>UDTP
+
| [M+F]L-S-IC->>>UDTPRW--S-
-
|
+
| 12.75
-
|
+
|8
|-
|-
|}
|}
Строка 216: Строка 216:
! Date
! Date
!
!
 +
! Task
! Result
! Result
-
! To discuss
 
! Code
! Code
|-
|-
Строка 303: Строка 303:
* Undone work stage - 0.
* Undone work stage - 0.
-
== Home tasks ==
+
== Homework ==
 +
 
 +
=== Literature ===
 +
#. Complete section 1.1.2 "Motivation" of SysDocs;
 +
#. Complete section 1.1.3 "Literature";
 +
#. Prepare 40-second oral report on a problem.
 +
=== Statement ===
 +
Compose problem statement (using LaTeX). Here[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kuznetsov2013SSAForecasting/doc/] is a "template" of problem statement:
-
* Literature
 
-
1. Complete section 1.1.2 "Motivation" of SysDocs;
 
-
2. Complete section 1.1.3 "Literature";
 
-
3. Prepare 40-second oral report on a problem.
 
-
* Statement
 
-
Compose problem statement (using LaTeX)
 
-
Here is a "template" of problem statement:
 
-
https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kuznetsov2013SSAForecasting/doc/
 
And here are some examples from the class presentation, it's strictly recommended to review all of them before starting:
And here are some examples from the class presentation, it's strictly recommended to review all of them before starting:
-
http://strijov.com/papers/KuzminAduenkoStrijov2012Clustering.pdf
 
-
http://strijov.com/papers/Kuznetsov2012Curvilinear.pdf
 
-
http://strijov.com/papers/Kuznetsov-Strijov2013Concordance.pdf
 
-
http://strijov.com/papers/Medvednikova2012CoIndicator.pdf
 
-
http://strijov.com/papers/Medvednikova2012RankScales.pdf
 
-
http://strijov.com/papers/MotrenkoStrijov2012HAPrediction.pdf
 
-
http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize.pdf
 
-
http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf
 
-
http://strijov.com/papers/Strijov2012ErrorFn.pdf
 
-
http://strijov.com/papers/Tsyganova2013TopicHierarchy.pdf
 
-
Also you can review several articles from JMLDA journal archive: http://jmlda.org/?page_id=35
 
-
This particular task is one of the most important in the course. Please, be more active before deadline (Tuesday, 6.00 am). Iterative scheme of interaction with your advisor will help in achieving better results
 
-
* Idef
 
-
Correct problem statement in case of necessary.
+
[http://strijov.com/papers/KuzminAduenkoStrijov2012Clustering.pdf]
-
Write down the abstract according to plans and (section 1.1.1 Systemdocs)
+
[http://strijov.com/papers/Kuznetsov2012Curvilinear.pdf]
-
Design two layer IDEF0 diagram (sections 1.2.2, 1.2.3 Systemdocs), preferably separating learning stage from final utilization stage.
+
[http://strijov.com/papers/Kuznetsov-Strijov2013Concordance.pdf]
-
Describe general data formats and structures(section 1.4 Systemdocs)
+
[http://strijov.com/papers/Medvednikova2012CoIndicator.pdf]
-
Describe modules interfaces (section 2 Systemdocs)
+
[http://strijov.com/papers/Medvednikova2012RankScales.pdf]
-
As usual I ask everybody to prefer iterative form of interaction. Moreover, you are welcome send and discuss all tasks separately.
+
[http://strijov.com/papers/MotrenkoStrijov2012HAPrediction.pdf]
-
Some useful links that can help:
+
[http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize.pdf]
-
MATLAB Programming Style Guidelines
+
[http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf]
-
http://www.machinelearning.ru/wiki/images/1/18/MatlabStyle1p5.pdf
+
[http://strijov.com/papers/Strijov2012ErrorFn.pdf]
-
IDEF0
+
[http://strijov.com/papers/Tsyganova2013TopicHierarchy.pdf]
-
http://www.machinelearning.ru/wiki/images/9/99/P_50-IDEF0.pdf
+
-
Function heading style example
+
-
http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/code/
+
-
System of notations
+
-
http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/
+
-
(файл Strijov2013Notation.pdf)
+
-
* Code
+
Also you can review several articles from JMLDA journal archive [http://jmlda.org/?page_id=35].
-
Let me recall that H/W 5 consist of just one task:
+
=== Idef ===
-
Create launchable source code
+
#Correct problem statement in case if necessary.
-
But to complete this task you also need to rewrite in more detailed view all modules interfaces (section 2 Systemdocs) and function headings.
+
#Write down the abstract according to plans and (section 1.1.1 Systemdocs)
-
Please start doing this task much early than usual, it's much more complicated, particularly knowing that many of you have no experience in Matlab.
+
#Design two layer IDEF0 diagram (sections 1.2.2, 1.2.3 Systemdocs), preferably separating learning stage from final utilization stage.
-
Don't hesitate to disturb your assistants at any time.
+
#Describe general data formats and structures(section 1.4 Systemdocs)
 +
#Describe modules interfaces (section 2 Systemdocs)
-
* Unit-test
+
Some useful links that can help:
 +
#MATLAB Programming Style Guidelines[http://www.machinelearning.ru/wiki/images/1/18/MatlabStyle1p5.pdf]
 +
#IDEF0[http://www.machinelearning.ru/wiki/images/9/99/P_50-IDEF0.pdf]
 +
#Function heading style example[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/code/]
 +
#System of notations[http://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/](файл Strijov2013Notation.pdf)
-
Let me recall that H/W 6 consist of just two tasks:
+
=== Code ===
-
Create final version of code for project basement: launchable code should evaluate project results in "one click".
+
-
Write unit tests for each module, according to the manual that will be announced here soon.
+
-
Don't hesitate to disturb your assistants at any time.
+
-
* Data
+
# Create launchable source code
-
Домашнее задание D
+
# But to complete this task you also need to rewrite in more detailed view all modules interfaces (section 2 Systemdocs) and function headings.
-
1. Доделать IDEF0: детализировать блок обработки пользовательских данных, сделать второй уровень детализации. Второй уровень посвящен проверке адекватности пользовательских данных на:
+
=== Unit-test ===
-
1) наличие вирусов в теле загружаемых данных (воздерживаться от выполнения команд, находящихся в теле файлов, например, mpeg),
+
-
2) тип загружаемого файла,
+
-
3) величину загружаемого файла,
+
-
4) допустимость времени расчетов, сложности алгоритма распознавания (не более 15 сек, в противном случае обсуждается вариант фонового выполнения алгоритма или отправка результатов по почте),
+
-
5) допустимость объема памяти (желательно не более 200 МБ),
+
-
6) адекватность структуры входных данных (алгоритм не должен возвращать неадекватные результаты получив неадекватные данные, желательно сообщать о таком случае).
+
 +
#Create final version of code for project basement: launchable code should evaluate project results in "one click".
 +
#Write unit tests for each module, according to the manual.
-
2. В папке data собрать реальные данные, предназначенные для демонстрации работы алгоритма (и, возможно, для тестирования, если объем данных невелик). При большом объеме данных в эту папку записываются файлы со ссылками в интернет, где можно скачать большую выборку. Вариант: ссылка находится в загрузчике данных. Подготовить описание данных в systemdocs.
+
=== Data ===
-
3. Подготовить модель загрузки и проверки пользовательских данных. Модуль должен загружать один пользовательский файл.
+
# Finish IDEF0: detail block of user data processing, make second level of schema. The second level is devoting to the user data adequacy checking, in particular:
 +
## The presence of viruses in the uploaded data (do not execute commands from the data, e.g. mpeg),
 +
## uploaded data type,
 +
## uploaded data size,
 +
## allowability of the expected time complexity of the algorithm (not more than 15 sec)
 +
## allowability of the memory complexity (not more than 200 Mb),
 +
## the adequacy of the input data structure (algorithm should signalize in the case of inadequate data).
 +
# Gather real data in the folder 'data' to demonstrate the algorithm performing (and possibly for testing if the data are not too big). If the data are big write to the 'data' files with internet links on the real data. As a variant, the link can be located in the data loading module. Make the data description in systemdocs.
 +
# Prepare modules of loading and checking the user data. The module must download one user file.
 +
For your attention:
 +
# The main stages of system testing and error analysis.
 +
## Check data adequacy,
 +
## Check models adequacy (overfitting, complexity, stability, accuracy, etc).
 +
## Check adequacy of the obtained results. Error analysis (e.g. residual analysis).
 +
## Check adequacy of the system (time complexity, optimization algorithms convergency, stability of the algorithm on the similar data).
 +
# Methods of algorithm complexity calculation.
 +
## Theoretical method.
 +
### Estimate time complexity, e.g. O(n ln n).
 +
### Estimate a constant in O().
 +
### Estimate time required for the user file processing.
 +
## Technical method.
 +
### Measure algorithm time on the samples of a different size.
 +
### Plot a figure sample size / elapsed time.
 +
### Estimate a regression function of the sample size on the elapsed time.
 +
### Estimate time required for the user file processing.
-
На заметку
+
=== Tests ===
-
Основные работы при системном тестировании и анализе ошибок
+
#Write a review using a plan provided below and place it into a file named like YourSurname2013ReviewSurname
-
1. Проверить, насколько адекватны данные.
+
#Prepare 1-minute speech
-
2. Насколько адекватны модели (переобученность, сложность, устойчивость, точность и прочие способы сравнения моделей).
+
#Create system tests: test data sets, module (script) for launching. Put the reference to this module in section 5.2 of SystemDocs file.
-
3. Насколько адекватны полученные результаты. Собственно анализ ошибок (например, анализ регрессионных остатков).
+
-
4. Насколько адекватно работает техническая система (время работы, сходимость алгоритмов оптимизации, устойчивость при получении похожих результатов на похожих выборках).
+
-
Способы вычисления сложности алгоритма.
 
-
Вариант 1, теоретический.
 
-
1. Оцениваете сложность алгоритма, например O(n ln n)
 
-
2. Оцениваете константу в О().
 
-
3. Рассчитываете, сколько времени понадобится для обработки пользовательского файла.
 
-
 
-
Вариант 2, технический.
 
-
1. Измеряете время работы алгоритма на выборках разной величины.
 
-
2. Строите график объем выборки / время.
 
-
3. Подбираете регрессионную функцию, строите регрессию.
 
-
3. Рассчитываете, сколько времени понадобится для обработки пользовательского файл
 
-
* Tests
 
-
Let me recall that H/W 8 consists of three tasks:
 
-
Write a review using a plan provided below and place it into a file named like YourSurname2013ReviewSurname
 
-
Prepare 1-minute speech
 
-
Create system tests: test data sets, module (script) for launching. Put the reference to this module in section 5.2 of SystemDocs file.
 
Review plan:
Review plan:
-
1. Shortly - what is the main topic, what do you think the most important it this project, aim of the project comparing with similar projects, how can you apply the results of the project (is it actual? important ?)
+
# Shortly - what is the main topic, what do you think the most important it this project, aim of the project comparing with similar projects, how can you apply the results of the project (is it actual? important ?)
-
2. Project strengths (what positively surprise you?) and weaknesses (what should be considered in a more detailed way)
+
# Project strengths (what positively surprise you?) and weaknesses (what should be considered in a more detailed way)
-
3. Project details: clarity of project description in SystemDocs, ProblemStatement; code readability, interfaces usability, tests coverage.
+
# Project details: clarity of project description in SystemDocs, ProblemStatement; code readability, interfaces usability, tests coverage.
-
4. Conclusion
+
# Conclusion
-
* Profiler
+
=== Profiler ===
-
Используя профайлер, предлагается оптимизировать узкие места в коде. Проделанную работу описать в секции 5.3 systemdocs, используя отчеты профайлера и вставляя комментарии о проделанной работе.
+
Using built-in Matlab profiler, optimize bottle necks in your code. Report about the achievements in section 5.3 of systemdocs file (using profiler reports and comments on the achievements)
-
Узкие места - те фрагменты кода, которые занимают значительное время при выполнении вычислительного эксперимента. Требуется показать, что при достигнуты улучшения кода при замене циклов на матричные операции или показать, что код достаточно хорошо оптимизирован. При этом необходимо в отчет вставить наиболее значимые строки из отчета профайлера. Это как правило, первые 10-15 строк. Копировать можно из html-отчета профайлера или воспользоваться функцией profile. В ней есть пример, как сохранить отчет профайлера в удобном формате. При оптимизации кода можно вставить в отчет те измерения кода, которые вы считаете удачными.
+
Bottle necks are the code fragments, which are unexpectedly turned out to be time-expensive during the experiment.
 +
You should show that source code was improved by replacing loops with matrix operations and show that code is efficient enough.
 +
If necessary put most significant strings from profiler reports (usually first 10-15 lines), Either copy-pasting lines from html-report generated by profiler or using profiler's exporting utilities (several examples are provided in Matlab manual).
-
Также при оптимизации рекомендуется пользоваться функцией parfor - параллельный for. См. документацию "doc parfor" и пример, где показано как включать параллельный режим. Совет: конструкции вида x = x+1 или x(end+1) = y и подобные конструкции не распараллеливаются. Чтобы избежать таких конструкций, надо заранее создавать структуры/матрицы требуемого размера. Параллельные вычисления работают в Матлабе начиная с версии 2012.
+
It's recommended to parallel the execution of your algorithm (where it is possible). One of the easiest way to parallel your program is to utilize structure parfor, that is just a "parallel for". Look documentation ("doc parfor") to find examples.
-
Пример
+
Example:
-
>> matlabpool(3)
+
-
>> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc
+
>> matlabpool(3)
-
Elapsed time is 3.712837 seconds.
+
-
>> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc
+
-
Elapsed time is 5.807167 seconds.
+
-
* Report
+
>> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 3.712837 seconds.
 +
 
 +
>> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 5.807167 seconds.
 +
 
 +
=== Report ===
Using the results of system tests and the computational experiment, aimed to provide error rate analysis, create plots and tables with some clarifications, and put it into section 5.2 of system docs.
Using the results of system tests and the computational experiment, aimed to provide error rate analysis, create plots and tables with some clarifications, and put it into section 5.2 of system docs.
Please identify different parts of this report with help of paragraphs named adequately.
Please identify different parts of this report with help of paragraphs named adequately.
-
Required parts of the mentioned computational experiment:
+
#Required parts of the mentioned computational experiment:
-
Visualization of the procedure of model selection and structural parameters optimization
+
#Visualization of the procedure of model selection and structural parameters optimization
-
Visualization of the resulting model or algorithm, visualization of the applied method of optimization, dependence of the lost function or quality criterion on the level of inserted noise or on other factors.
+
#Visualization of the resulting model or algorithm, visualization of the applied method of optimization, dependence of the lost function or quality criterion on the level of inserted noise or on other factors.
-
Visualization of obtained error rate in "web" section. (also plot or table)
+
#Visualization of obtained error rate in "web" section. (also plot or table)
-
* Web
+
=== Web ===
The folder "web" should contain next mandatory files:
The folder "web" should contain next mandatory files:
-
File "config.json" (name and extension should be the same). Fill this file using example placed in folder "Group074/Kuznetsov2013SSAForecasting/web/"
+
#File "config.json" (name and extension should be the same). Fill this file using example placed in folder "Group074/Kuznetsov2013SSAForecasting/web/"
-
File "main.m" with one argument variable and one resulting variable:
+
#File "main.m" with one argument variable and one resulting variable: html = main(filname), where filename is a text string containing file name, and html is text string containing visual "web" report in html format.
-
html = main(filname), where filename is a text string containing file name, and html is text string containing visual "web" report in html format.
+
#File "test.csv" (you can use another extension), This file should contain test object (text, time series, image, sound, video, etc.) for forecasting.
-
File "test.csv" (you can use another extension), This file should contain test object (text, time series, image, sound, video, etc.) for forecasting.
+
#Other files, that are required for function "main" (in particular file with parameters and structural parameters of forecasting model/algorithm)
-
Other files, that are required for function "main" (in particular file with parameters and structural parameters of forecasting model/algorithm)
+
 
For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error massage about some trouble with forecasting (types of errors were considered in data loading section).
For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error massage about some trouble with forecasting (types of errors were considered in data loading section).
 +
 +
[[Категория:Учебные курсы]]

Текущая версия

Main article: Machine Learning and Data Analysis (Strijov practice, in Russian)


The completed projects are located at http://mvr.jmlda.org

Problems

Author Problem name Link [BMF]LSICUDTPRWS Total Grade
Bunakov Vasiliy Fraud Signature Recognition Using SVM Method [1] [BM+F]L+SI+CU-DTPRWS 14.5 10
Vdovina Evgeniya Visualization of Results of Keyword Groups Mapping [2] [BF]L-S+I+C0DT-0R-0S 9.75 5
Voronov Sergey Google Steet View Text Detection and Recognition [3] [BM+F]LS-I+CU+DTP+R-W+S-- 14.25 10
Grinchuk Oleg Macroeconomic Conditions Forecasting [4] [BMF]L-SI-C-0DTPRWS 12.25 8
Dubovik Anna Classification and Exploring of Source Code of Python Projects. [5] [M]L0I-->>>000C-- 2.5
Zhelavskaya Irina Automatic Filters Generator for Gmail [6] [BM+F]LS->>>>>00IC-U-D-TP--R-W--S- 11.75 7
Zhuykov Vladimir Fraud Signature Recognition [7] [B]L--0I--C--0D--00000 3
Ivanov Sergey Personalize Expedia Hotel Searches [8] [B]+L-SI+>>
Ivanov Aleksandr Detecting Unsolicited SMS Messages [9] [BM+F]LSIC->>U>DTPR0S- 12.75 8
Kasatkin Sergey Determination of the Type of Human Activity Based on the Data from the Accelerometer [10] [BF]L-S-I-->>>000C-U-DT-P--R--W-S- 9.75 5
Katrutsa Aleksandr Search Engine Results Ranking [11] [BM+F]L+SI+CUDTPR+W+S 15.25 10
Kolchanov Andrey The Financial Bubbles Detection in The Stock Data [12] [B]0S-I->>>
Kostin Aleksandr Classify Handwritten Digits [13] [BF]L+S-IC- 5.75 1
Kotenko Lengold Ekaterina Satellite Imagery Processing for NDVI Estimation [14] [BMF-]L-S-IC-UD--000W--S-- 8.5 4
Kudryashova Aleksandra Satellite Imagery Processing for NDVI Estimation [15] [BMF-]L-S-IC-UD--000W--S-- 8.5 4
Levdik Pavel Electricity Prices Forecasting [16] [BM+]L-SIC--U-D->PR-W> 9.75 5
Matrosov Mikhail Short-term Forecasting of Musical Compositions [17] [BF]L-SIC-UDTPRW+S 12.75 8
Mityashov Andrey Unstructured Social Data Processing in Classification Problem [18] [M+F]L+SI--C-UDT--P00S- 10 5
Neklyudov Kirill Face Recognition [19] [BM+F]LS-I+CU-DTPR-WS- 13.5 9
Perekrestenko Dmitriy Human Activity Recognition Using Deep Learning [20] [BM+F]L-SI-CU-DTPRW+S 13.75 9
Prilepskiy Roman Text Detection on Google Street View Images. [21] [BF]L+0I>>>C--0D--00R-W-S-- 7.25 3
Pushnyakov Aleksey Color Image Segmentation [22] [BM+F]L+S+I+C+UDT+P+R+W+S 16.25 10
Ryskina Mariya Topic Modeling Using PLSA algorithm [23] [BM+F]L-S+I+CUDT+PR+W+S 15.25 10
Stenin Sergey Detection of Topically Similar Abstracts of Scientific Conference [24] [BF]L+S+I+CUDT-0R-WS 12.25 8
Urzhumtsev Oleg Similar Conferences Abstract Search [25] [BM+F]L-S-IC>D>>R--WS 10.25 6
Feyzkhanov Rustem Email Filter Generation [26] [BM+F-]LS-IC--U->(D-T)>>PRWS- 12.5 8
Shuyskiy Nikolay Melody Recognition using Spectral Analysis [27] [B]L-S-IC--0D-T--0R-W--S- 7.25 3
Yashkov Daniil Face Detection Using Viola-Jones [28] [M+F]L-S-IC->>>UDTPRW--S- 12.75 8

Sсhedule

Date Task Result Code
September 18 Select a problem, an advisor. machinelearning.ru record. -
25 Collect literature, write comments. Bibliography list, mini-report. Literature
October 2 Problem statement (synthetic data). Write mathematical statement in TeX-format. ~1 page of text (problem statement) Statement
9 Create report file. Make project description. Describe architecture and main system interfaces (synthetic data). Description, IDEF0. Idef
16 Detail interfaces, write a code (first version). Code (synthetic data). Code
23 Write Unit tests with a launch module. Unit tests. Unit-test
30 Collect real data. Finish IDEF0-schema. Write loading data modules. Data, second IDEF0-schema, modules. Data
November 6 Write and launch system tests. Write a review on a project. Tests, review. Tests
13 Optimize the code. Profiler report before and after. Profiler
20 Make visualization report. Finished technical report. Report
27 Develop web interface. Code on a site. Web
December 4 Make user interface and examples. Report. Show

Work and consultations

  1. Finish each work in a week.
  2. Each work is desirable to be submitted several times before deadline.
  3. Deadline of the last version: Tuesday, 6:00am.
  4. Elapsed week time will be added to the report.
  • Each work stage + 1 point (А--, А-, А, А+, А++),
  • Undone work stage - 0.

Homework

Literature

  1. . Complete section 1.1.2 "Motivation" of SysDocs;
  2. . Complete section 1.1.3 "Literature";
  3. . Prepare 40-second oral report on a problem.

Statement

Compose problem statement (using LaTeX). Here[29] is a "template" of problem statement:

And here are some examples from the class presentation, it's strictly recommended to review all of them before starting:

[30] [31] [32] [33] [34] [35] [36] [37] [38] [39]

Also you can review several articles from JMLDA journal archive [40].

Idef

  1. Correct problem statement in case if necessary.
  2. Write down the abstract according to plans and (section 1.1.1 Systemdocs)
  3. Design two layer IDEF0 diagram (sections 1.2.2, 1.2.3 Systemdocs), preferably separating learning stage from final utilization stage.
  4. Describe general data formats and structures(section 1.4 Systemdocs)
  5. Describe modules interfaces (section 2 Systemdocs)

Some useful links that can help:

  1. MATLAB Programming Style Guidelines[41]
  2. IDEF0[42]
  3. Function heading style example[43]
  4. System of notations[44](файл Strijov2013Notation.pdf)

Code

  1. Create launchable source code
  2. But to complete this task you also need to rewrite in more detailed view all modules interfaces (section 2 Systemdocs) and function headings.

Unit-test

  1. Create final version of code for project basement: launchable code should evaluate project results in "one click".
  2. Write unit tests for each module, according to the manual.

Data

  1. Finish IDEF0: detail block of user data processing, make second level of schema. The second level is devoting to the user data adequacy checking, in particular:
    1. The presence of viruses in the uploaded data (do not execute commands from the data, e.g. mpeg),
    2. uploaded data type,
    3. uploaded data size,
    4. allowability of the expected time complexity of the algorithm (not more than 15 sec)
    5. allowability of the memory complexity (not more than 200 Mb),
    6. the adequacy of the input data structure (algorithm should signalize in the case of inadequate data).
  2. Gather real data in the folder 'data' to demonstrate the algorithm performing (and possibly for testing if the data are not too big). If the data are big write to the 'data' files with internet links on the real data. As a variant, the link can be located in the data loading module. Make the data description in systemdocs.
  3. Prepare modules of loading and checking the user data. The module must download one user file.

For your attention:

  1. The main stages of system testing and error analysis.
    1. Check data adequacy,
    2. Check models adequacy (overfitting, complexity, stability, accuracy, etc).
    3. Check adequacy of the obtained results. Error analysis (e.g. residual analysis).
    4. Check adequacy of the system (time complexity, optimization algorithms convergency, stability of the algorithm on the similar data).
  2. Methods of algorithm complexity calculation.
    1. Theoretical method.
      1. Estimate time complexity, e.g. O(n ln n).
      2. Estimate a constant in O().
      3. Estimate time required for the user file processing.
    2. Technical method.
      1. Measure algorithm time on the samples of a different size.
      2. Plot a figure sample size / elapsed time.
      3. Estimate a regression function of the sample size on the elapsed time.
      4. Estimate time required for the user file processing.

Tests

  1. Write a review using a plan provided below and place it into a file named like YourSurname2013ReviewSurname
  2. Prepare 1-minute speech
  3. Create system tests: test data sets, module (script) for launching. Put the reference to this module in section 5.2 of SystemDocs file.

Review plan:

  1. Shortly - what is the main topic, what do you think the most important it this project, aim of the project comparing with similar projects, how can you apply the results of the project (is it actual? important ?)
  2. Project strengths (what positively surprise you?) and weaknesses (what should be considered in a more detailed way)
  3. Project details: clarity of project description in SystemDocs, ProblemStatement; code readability, interfaces usability, tests coverage.
  4. Conclusion

Profiler

Using built-in Matlab profiler, optimize bottle necks in your code. Report about the achievements in section 5.3 of systemdocs file (using profiler reports and comments on the achievements)

Bottle necks are the code fragments, which are unexpectedly turned out to be time-expensive during the experiment. You should show that source code was improved by replacing loops with matrix operations and show that code is efficient enough. If necessary put most significant strings from profiler reports (usually first 10-15 lines), Either copy-pasting lines from html-report generated by profiler or using profiler's exporting utilities (several examples are provided in Matlab manual).

It's recommended to parallel the execution of your algorithm (where it is possible). One of the easiest way to parallel your program is to utilize structure parfor, that is just a "parallel for". Look documentation ("doc parfor") to find examples.

Example:

>> matlabpool(3)

>> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 3.712837 seconds.

>> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 5.807167 seconds.

Report

Using the results of system tests and the computational experiment, aimed to provide error rate analysis, create plots and tables with some clarifications, and put it into section 5.2 of system docs. Please identify different parts of this report with help of paragraphs named adequately.

  1. Required parts of the mentioned computational experiment:
  2. Visualization of the procedure of model selection and structural parameters optimization
  3. Visualization of the resulting model or algorithm, visualization of the applied method of optimization, dependence of the lost function or quality criterion on the level of inserted noise or on other factors.
  4. Visualization of obtained error rate in "web" section. (also plot or table)

Web

The folder "web" should contain next mandatory files:

  1. File "config.json" (name and extension should be the same). Fill this file using example placed in folder "Group074/Kuznetsov2013SSAForecasting/web/"
  2. File "main.m" with one argument variable and one resulting variable: html = main(filname), where filename is a text string containing file name, and html is text string containing visual "web" report in html format.
  3. File "test.csv" (you can use another extension), This file should contain test object (text, time series, image, sound, video, etc.) for forecasting.
  4. Other files, that are required for function "main" (in particular file with parameters and structural parameters of forecasting model/algorithm)

For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error massage about some trouble with forecasting (types of errors were considered in data loading section).

Личные инструменты