Machine Learning and Data Analysis (Strijov practice)/Group 074, Fall 2013

Материал из MachineLearning.

(Различия между версиями)
Перейти к: навигация, поиск
(Задачи)
(Problems)
 
(12 промежуточных версий не показаны.)
Строка 2: Строка 2:
__NOTOC__
__NOTOC__
-
Проекты с готовой Web-частью располагаются по адресу http://mvr.jmlda.org
+
The completed projects are located at http://mvr.jmlda.org
== Problems ==
== Problems ==
Строка 19: Строка 19:
| [BM+F]L+SI+CU-DTPRWS
| [BM+F]L+SI+CU-DTPRWS
|14.5
|14.5
-
|
+
|10
|-
|-
|Vdovina Evgeniya
|Vdovina Evgeniya
Строка 26: Строка 26:
| [BF]L-S+I+C0DT-0R-0S
| [BF]L-S+I+C0DT-0R-0S
|9.75
|9.75
-
|
+
|5
|-
|-
|Voronov Sergey
|Voronov Sergey
Строка 33: Строка 33:
| [BM+F]LS-I+CU+DTP+R-W+S--
| [BM+F]LS-I+CU+DTP+R-W+S--
|14.25
|14.25
-
|
+
|10
|-
|-
|Grinchuk Oleg
|Grinchuk Oleg
Строка 40: Строка 40:
| [BMF]L-SI-C-0DTPRWS
| [BMF]L-SI-C-0DTPRWS
|12.25
|12.25
-
|
+
|8
|-
|-
|Dubovik Anna
|Dubovik Anna
|Classification and Exploring of Source Code of Python Projects.
|Classification and Exploring of Source Code of Python Projects.
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Dubovik2013ProjectCodeClassifying/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Dubovik2013ProjectCodeClassifying/]
-
| [M]L0I-->>>000
+
| [M]L0I-->>>000C--
-
|
+
| 2.5
|
|
|-
|-
Строка 52: Строка 52:
|Automatic Filters Generator for Gmail
|Automatic Filters Generator for Gmail
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhelavskaya2013FiltersGenerator/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhelavskaya2013FiltersGenerator/]
-
| [BM+]LS->>>>>00I
+
| [BM+F]LS->>>>>00IC-U-D-TP--R-W--S-
-
|
+
|11.75
-
|
+
|7
|-
|-
|Zhuykov Vladimir
|Zhuykov Vladimir
|Fraud Signature Recognition
|Fraud Signature Recognition
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhuykov2013SignatureRecognition/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Zhuykov2013SignatureRecognition/]
-
| [B]L--0I-->>>>>
+
| [B]L--0I--C--0D--00000
-
|
+
|3
|
|
|-
|-
Строка 73: Строка 73:
|Detecting Unsolicited SMS Messages
|Detecting Unsolicited SMS Messages
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/IvanovA2013DetectingSMSSpam/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/IvanovA2013DetectingSMSSpam/]
-
| [BM+]LSIC->>U>DTPR
+
| [BM+F]LSIC->>U>DTPR0S-
-
|
+
| 12.75
-
|
+
|8
|-
|-
|Kasatkin Sergey
|Kasatkin Sergey
|Determination of the Type of Human Activity Based on the Data from the Accelerometer
|Determination of the Type of Human Activity Based on the Data from the Accelerometer
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kasatkin2013Accelerometer/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kasatkin2013Accelerometer/]
-
| [B]L-S-I-->>>000
+
| [BF]L-S-I-->>>000C-U-DT-P--R--W-S-
-
|
+
| 9.75
-
|
+
|5
|-
|-
|Katrutsa Aleksandr
|Katrutsa Aleksandr
Строка 89: Строка 89:
| [BM+F]L+SI+CUDTPR+W+S
| [BM+F]L+SI+CUDTPR+W+S
|15.25
|15.25
-
|
+
|10
|-
|-
|Kolchanov Andrey
|Kolchanov Andrey
Строка 101: Строка 101:
|Classify Handwritten Digits
|Classify Handwritten Digits
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kostin2013ClassifyHandwrittenDigits/]
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Kostin2013ClassifyHandwrittenDigits/]
-
| [B]L+S-IS-
+
| [BF]L+S-IC-
-
|
+
|5.75
-
|
+
|1
|-
|-
|Kotenko Lengold Ekaterina
|Kotenko Lengold Ekaterina
Строка 110: Строка 110:
| [BMF-]L-S-IC-UD--000W--S--
| [BMF-]L-S-IC-UD--000W--S--
|8.5
|8.5
-
|
+
|4
|-
|-
|Kudryashova Aleksandra
|Kudryashova Aleksandra
Строка 117: Строка 117:
| [BMF-]L-S-IC-UD--000W--S--
| [BMF-]L-S-IC-UD--000W--S--
|8.5
|8.5
-
|
+
|4
|-
|-
|Levdik Pavel
|Levdik Pavel
Строка 124: Строка 124:
| [BM+]L-SIC--U-D->PR-W>
| [BM+]L-SIC--U-D->PR-W>
| 9.75
| 9.75
-
|
+
|5
|-
|-
|Matrosov Mikhail
|Matrosov Mikhail
|Short-term Forecasting of Musical Compositions
|Short-term Forecasting of Musical Compositions
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Matrosov2013MusicForecasting/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Matrosov2013MusicForecasting/]
-
| [BF]L-SIC--UDT>>W+S
+
| [BF]L-SIC-UDTPRW+S
-
|9.5
+
|12.75
-
|
+
|8
|-
|-
|Mityashov Andrey
|Mityashov Andrey
Строка 138: Строка 138:
| [M+F]L+SI--C-UDT--P00S-
| [M+F]L+SI--C-UDT--P00S-
|10
|10
-
|
+
|5
|-
|-
|Neklyudov Kirill
|Neklyudov Kirill
Строка 145: Строка 145:
| [BM+F]LS-I+CU-DTPR-WS-
| [BM+F]LS-I+CU-DTPR-WS-
|13.5
|13.5
-
|
+
|9
|-
|-
|Perekrestenko Dmitriy
|Perekrestenko Dmitriy
Строка 152: Строка 152:
| [BM+F]L-SI-CU-DTPRW+S
| [BM+F]L-SI-CU-DTPRW+S
|13.75
|13.75
-
|
+
|9
|-
|-
|Prilepskiy Roman
|Prilepskiy Roman
|Text Detection on Google Street View Images.
|Text Detection on Google Street View Images.
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Prilepskiy2013GoogleStreetView/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Prilepskiy2013GoogleStreetView/]
-
| [B]L+00>>>000
+
| [BF]L+0I>>>C--0D--00R-W-S--
-
|
+
|7.25
-
|
+
|3
|-
|-
|Pushnyakov Aleksey
|Pushnyakov Aleksey
Строка 166: Строка 166:
| [BM+F]L+S+I+C+UDT+P+R+W+S
| [BM+F]L+S+I+C+UDT+P+R+W+S
|16.25
|16.25
-
|
+
|10
|-
|-
|Ryskina Mariya
|Ryskina Mariya
Строка 173: Строка 173:
| [BM+F]L-S+I+CUDT+PR+W+S
| [BM+F]L-S+I+CUDT+PR+W+S
|15.25
|15.25
-
|
+
|10
|-
|-
|Stenin Sergey
|Stenin Sergey
|Detection of Topically Similar Abstracts of Scientific Conference
|Detection of Topically Similar Abstracts of Scientific Conference
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Stenin2013Clustering/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Stenin2013Clustering/]
-
| [B]L+S+I+CUD
+
| [BF]L+S+I+CUDT-0R-WS
-
|
+
|12.25
-
|
+
|8
|-
|-
|Urzhumtsev Oleg
|Urzhumtsev Oleg
Строка 187: Строка 187:
| [BM+F]L-S-IC>D>>R--WS
| [BM+F]L-S-IC>D>>R--WS
|10.25
|10.25
-
|
+
|6
|-
|-
|Feyzkhanov Rustem
|Feyzkhanov Rustem
|Email Filter Generation
|Email Filter Generation
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Feyzkhanov2013FilterEmail/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Feyzkhanov2013FilterEmail/]
-
| [BM+F-]LS-IC--U->(D-T)>>PR
+
| [BM+F-]LS-IC--U->(D-T)>>PRWS-
-
|
+
| 12.5
-
|
+
|8
|-
|-
|Shuyskiy Nikolay
|Shuyskiy Nikolay
|Melody Recognition using Spectral Analysis
|Melody Recognition using Spectral Analysis
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Shuyskiy2013MelodyRecognition/]
| [https://svn.code.sf.net/p/mlalgorithms/code/Group074/Shuyskiy2013MelodyRecognition/]
-
| [B]0S-0>>>>>
+
| [B]L-S-IC--0D-T--0R-W--S-
-
|
+
|7.25
-
|
+
|3
|-
|-
|Yashkov Daniil
|Yashkov Daniil
|Face Detection Using Viola-Jones
|Face Detection Using Viola-Jones
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Yashkov2013FaceDetection/]
|[https://svn.code.sf.net/p/mlalgorithms/code/Group074/Yashkov2013FaceDetection/]
-
| [M+F]L-S-IC->>>UDTP
+
| [M+F]L-S-IC->>>UDTPRW--S-
-
|
+
| 12.75
-
|
+
|8
|-
|-
|}
|}
Строка 216: Строка 216:
! Date
! Date
!
!
 +
! Task
! Result
! Result
-
! To discuss
 
! Code
! Code
|-
|-
Строка 303: Строка 303:
* Undone work stage - 0.
* Undone work stage - 0.
-
== Home tasks ==
+
== Homework ==
=== Literature ===
=== Literature ===
Строка 352: Строка 352:
=== Data ===
=== Data ===
-
1. Доделать IDEF0: детализировать блок обработки пользовательских данных, сделать второй уровень детализации. Второй уровень посвящен проверке адекватности пользовательских данных на:
+
# Finish IDEF0: detail block of user data processing, make second level of schema. The second level is devoting to the user data adequacy checking, in particular:
-
1) наличие вирусов в теле загружаемых данных (воздерживаться от выполнения команд, находящихся в теле файлов, например, mpeg),
+
## The presence of viruses in the uploaded data (do not execute commands from the data, e.g. mpeg),
-
2) тип загружаемого файла,
+
## uploaded data type,
-
3) величину загружаемого файла,
+
## uploaded data size,
-
4) допустимость времени расчетов, сложности алгоритма распознавания (не более 15 сек, в противном случае обсуждается вариант фонового выполнения алгоритма или отправка результатов по почте),
+
## allowability of the expected time complexity of the algorithm (not more than 15 sec)
-
5) допустимость объема памяти (желательно не более 200 МБ),
+
## allowability of the memory complexity (not more than 200 Mb),
-
6) адекватность структуры входных данных (алгоритм не должен возвращать неадекватные результаты получив неадекватные данные, желательно сообщать о таком случае).
+
## the adequacy of the input data structure (algorithm should signalize in the case of inadequate data).
-
 
+
# Gather real data in the folder 'data' to demonstrate the algorithm performing (and possibly for testing if the data are not too big). If the data are big write to the 'data' files with internet links on the real data. As a variant, the link can be located in the data loading module. Make the data description in systemdocs.
-
 
+
# Prepare modules of loading and checking the user data. The module must download one user file.
-
2. В папке data собрать реальные данные, предназначенные для демонстрации работы алгоритма (и, возможно, для тестирования, если объем данных невелик). При большом объеме данных в эту папку записываются файлы со ссылками в интернет, где можно скачать большую выборку. Вариант: ссылка находится в загрузчике данных. Подготовить описание данных в systemdocs.
+
For your attention:
-
 
+
# The main stages of system testing and error analysis.
-
3. Подготовить модель загрузки и проверки пользовательских данных. Модуль должен загружать один пользовательский файл.
+
## Check data adequacy,
-
 
+
## Check models adequacy (overfitting, complexity, stability, accuracy, etc).
-
На заметку
+
## Check adequacy of the obtained results. Error analysis (e.g. residual analysis).
-
Основные работы при системном тестировании и анализе ошибок
+
## Check adequacy of the system (time complexity, optimization algorithms convergency, stability of the algorithm on the similar data).
-
1. Проверить, насколько адекватны данные.
+
# Methods of algorithm complexity calculation.
-
2. Насколько адекватны модели (переобученность, сложность, устойчивость, точность и прочие способы сравнения моделей).
+
## Theoretical method.
-
3. Насколько адекватны полученные результаты. Собственно анализ ошибок (например, анализ регрессионных остатков).
+
### Estimate time complexity, e.g. O(n ln n).
-
4. Насколько адекватно работает техническая система (время работы, сходимость алгоритмов оптимизации, устойчивость при получении похожих результатов на похожих выборках).
+
### Estimate a constant in O().
-
 
+
### Estimate time required for the user file processing.
-
Способы вычисления сложности алгоритма.
+
## Technical method.
-
Вариант 1, теоретический.
+
### Measure algorithm time on the samples of a different size.
-
1. Оцениваете сложность алгоритма, например O(n ln n)
+
### Plot a figure sample size / elapsed time.
-
2. Оцениваете константу в О().
+
### Estimate a regression function of the sample size on the elapsed time.
-
3. Рассчитываете, сколько времени понадобится для обработки пользовательского файла.
+
### Estimate time required for the user file processing.
-
 
+
-
Вариант 2, технический.
+
-
1. Измеряете время работы алгоритма на выборках разной величины.
+
-
2. Строите график объем выборки / время.
+
-
3. Подбираете регрессионную функцию, строите регрессию.
+
-
3. Рассчитываете, сколько времени понадобится для обработки пользовательского файл
+
=== Tests ===
=== Tests ===
Строка 396: Строка 390:
=== Profiler ===
=== Profiler ===
-
Используя профайлер, предлагается оптимизировать узкие места в коде. Проделанную работу описать в секции 5.3 systemdocs, используя отчеты профайлера и вставляя комментарии о проделанной работе.
+
Using built-in Matlab profiler, optimize bottle necks in your code. Report about the achievements in section 5.3 of systemdocs file (using profiler reports and comments on the achievements)
-
Узкие места - те фрагменты кода, которые занимают значительное время при выполнении вычислительного эксперимента. Требуется показать, что при достигнуты улучшения кода при замене циклов на матричные операции или показать, что код достаточно хорошо оптимизирован. При этом необходимо в отчет вставить наиболее значимые строки из отчета профайлера. Это как правило, первые 10-15 строк. Копировать можно из html-отчета профайлера или воспользоваться функцией profile. В ней есть пример, как сохранить отчет профайлера в удобном формате. При оптимизации кода можно вставить в отчет те измерения кода, которые вы считаете удачными.
+
Bottle necks are the code fragments, which are unexpectedly turned out to be time-expensive during the experiment.
 +
You should show that source code was improved by replacing loops with matrix operations and show that code is efficient enough.
 +
If necessary put most significant strings from profiler reports (usually first 10-15 lines), Either copy-pasting lines from html-report generated by profiler or using profiler's exporting utilities (several examples are provided in Matlab manual).
-
Также при оптимизации рекомендуется пользоваться функцией parfor - параллельный for. См. документацию "doc parfor" и пример, где показано как включать параллельный режим. Совет: конструкции вида x = x+1 или x(end+1) = y и подобные конструкции не распараллеливаются. Чтобы избежать таких конструкций, надо заранее создавать структуры/матрицы требуемого размера. Параллельные вычисления работают в Матлабе начиная с версии 2012.
+
It's recommended to parallel the execution of your algorithm (where it is possible). One of the easiest way to parallel your program is to utilize structure parfor, that is just a "parallel for". Look documentation ("doc parfor") to find examples.
-
Пример
+
Example:
-
>> matlabpool(3)
+
-
>> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc
+
>> matlabpool(3)
-
Elapsed time is 3.712837 seconds.
+
 
-
>> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc
+
>> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 3.712837 seconds.
-
Elapsed time is 5.807167 seconds.
+
 
 +
>> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 5.807167 seconds.
=== Report ===
=== Report ===
Строка 426: Строка 423:
For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error massage about some trouble with forecasting (types of errors were considered in data loading section).
For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error massage about some trouble with forecasting (types of errors were considered in data loading section).
 +
 +
[[Категория:Учебные курсы]]

Текущая версия

Main article: Machine Learning and Data Analysis (Strijov practice, in Russian)


The completed projects are located at http://mvr.jmlda.org

Problems

Author Problem name Link [BMF]LSICUDTPRWS Total Grade
Bunakov Vasiliy Fraud Signature Recognition Using SVM Method [1] [BM+F]L+SI+CU-DTPRWS 14.5 10
Vdovina Evgeniya Visualization of Results of Keyword Groups Mapping [2] [BF]L-S+I+C0DT-0R-0S 9.75 5
Voronov Sergey Google Steet View Text Detection and Recognition [3] [BM+F]LS-I+CU+DTP+R-W+S-- 14.25 10
Grinchuk Oleg Macroeconomic Conditions Forecasting [4] [BMF]L-SI-C-0DTPRWS 12.25 8
Dubovik Anna Classification and Exploring of Source Code of Python Projects. [5] [M]L0I-->>>000C-- 2.5
Zhelavskaya Irina Automatic Filters Generator for Gmail [6] [BM+F]LS->>>>>00IC-U-D-TP--R-W--S- 11.75 7
Zhuykov Vladimir Fraud Signature Recognition [7] [B]L--0I--C--0D--00000 3
Ivanov Sergey Personalize Expedia Hotel Searches [8] [B]+L-SI+>>
Ivanov Aleksandr Detecting Unsolicited SMS Messages [9] [BM+F]LSIC->>U>DTPR0S- 12.75 8
Kasatkin Sergey Determination of the Type of Human Activity Based on the Data from the Accelerometer [10] [BF]L-S-I-->>>000C-U-DT-P--R--W-S- 9.75 5
Katrutsa Aleksandr Search Engine Results Ranking [11] [BM+F]L+SI+CUDTPR+W+S 15.25 10
Kolchanov Andrey The Financial Bubbles Detection in The Stock Data [12] [B]0S-I->>>
Kostin Aleksandr Classify Handwritten Digits [13] [BF]L+S-IC- 5.75 1
Kotenko Lengold Ekaterina Satellite Imagery Processing for NDVI Estimation [14] [BMF-]L-S-IC-UD--000W--S-- 8.5 4
Kudryashova Aleksandra Satellite Imagery Processing for NDVI Estimation [15] [BMF-]L-S-IC-UD--000W--S-- 8.5 4
Levdik Pavel Electricity Prices Forecasting [16] [BM+]L-SIC--U-D->PR-W> 9.75 5
Matrosov Mikhail Short-term Forecasting of Musical Compositions [17] [BF]L-SIC-UDTPRW+S 12.75 8
Mityashov Andrey Unstructured Social Data Processing in Classification Problem [18] [M+F]L+SI--C-UDT--P00S- 10 5
Neklyudov Kirill Face Recognition [19] [BM+F]LS-I+CU-DTPR-WS- 13.5 9
Perekrestenko Dmitriy Human Activity Recognition Using Deep Learning [20] [BM+F]L-SI-CU-DTPRW+S 13.75 9
Prilepskiy Roman Text Detection on Google Street View Images. [21] [BF]L+0I>>>C--0D--00R-W-S-- 7.25 3
Pushnyakov Aleksey Color Image Segmentation [22] [BM+F]L+S+I+C+UDT+P+R+W+S 16.25 10
Ryskina Mariya Topic Modeling Using PLSA algorithm [23] [BM+F]L-S+I+CUDT+PR+W+S 15.25 10
Stenin Sergey Detection of Topically Similar Abstracts of Scientific Conference [24] [BF]L+S+I+CUDT-0R-WS 12.25 8
Urzhumtsev Oleg Similar Conferences Abstract Search [25] [BM+F]L-S-IC>D>>R--WS 10.25 6
Feyzkhanov Rustem Email Filter Generation [26] [BM+F-]LS-IC--U->(D-T)>>PRWS- 12.5 8
Shuyskiy Nikolay Melody Recognition using Spectral Analysis [27] [B]L-S-IC--0D-T--0R-W--S- 7.25 3
Yashkov Daniil Face Detection Using Viola-Jones [28] [M+F]L-S-IC->>>UDTPRW--S- 12.75 8

Sсhedule

Date Task Result Code
September 18 Select a problem, an advisor. machinelearning.ru record. -
25 Collect literature, write comments. Bibliography list, mini-report. Literature
October 2 Problem statement (synthetic data). Write mathematical statement in TeX-format. ~1 page of text (problem statement) Statement
9 Create report file. Make project description. Describe architecture and main system interfaces (synthetic data). Description, IDEF0. Idef
16 Detail interfaces, write a code (first version). Code (synthetic data). Code
23 Write Unit tests with a launch module. Unit tests. Unit-test
30 Collect real data. Finish IDEF0-schema. Write loading data modules. Data, second IDEF0-schema, modules. Data
November 6 Write and launch system tests. Write a review on a project. Tests, review. Tests
13 Optimize the code. Profiler report before and after. Profiler
20 Make visualization report. Finished technical report. Report
27 Develop web interface. Code on a site. Web
December 4 Make user interface and examples. Report. Show

Work and consultations

  1. Finish each work in a week.
  2. Each work is desirable to be submitted several times before deadline.
  3. Deadline of the last version: Tuesday, 6:00am.
  4. Elapsed week time will be added to the report.
  • Each work stage + 1 point (А--, А-, А, А+, А++),
  • Undone work stage - 0.

Homework

Literature

  1. . Complete section 1.1.2 "Motivation" of SysDocs;
  2. . Complete section 1.1.3 "Literature";
  3. . Prepare 40-second oral report on a problem.

Statement

Compose problem statement (using LaTeX). Here[29] is a "template" of problem statement:

And here are some examples from the class presentation, it's strictly recommended to review all of them before starting:

[30] [31] [32] [33] [34] [35] [36] [37] [38] [39]

Also you can review several articles from JMLDA journal archive [40].

Idef

  1. Correct problem statement in case if necessary.
  2. Write down the abstract according to plans and (section 1.1.1 Systemdocs)
  3. Design two layer IDEF0 diagram (sections 1.2.2, 1.2.3 Systemdocs), preferably separating learning stage from final utilization stage.
  4. Describe general data formats and structures(section 1.4 Systemdocs)
  5. Describe modules interfaces (section 2 Systemdocs)

Some useful links that can help:

  1. MATLAB Programming Style Guidelines[41]
  2. IDEF0[42]
  3. Function heading style example[43]
  4. System of notations[44](файл Strijov2013Notation.pdf)

Code

  1. Create launchable source code
  2. But to complete this task you also need to rewrite in more detailed view all modules interfaces (section 2 Systemdocs) and function headings.

Unit-test

  1. Create final version of code for project basement: launchable code should evaluate project results in "one click".
  2. Write unit tests for each module, according to the manual.

Data

  1. Finish IDEF0: detail block of user data processing, make second level of schema. The second level is devoting to the user data adequacy checking, in particular:
    1. The presence of viruses in the uploaded data (do not execute commands from the data, e.g. mpeg),
    2. uploaded data type,
    3. uploaded data size,
    4. allowability of the expected time complexity of the algorithm (not more than 15 sec)
    5. allowability of the memory complexity (not more than 200 Mb),
    6. the adequacy of the input data structure (algorithm should signalize in the case of inadequate data).
  2. Gather real data in the folder 'data' to demonstrate the algorithm performing (and possibly for testing if the data are not too big). If the data are big write to the 'data' files with internet links on the real data. As a variant, the link can be located in the data loading module. Make the data description in systemdocs.
  3. Prepare modules of loading and checking the user data. The module must download one user file.

For your attention:

  1. The main stages of system testing and error analysis.
    1. Check data adequacy,
    2. Check models adequacy (overfitting, complexity, stability, accuracy, etc).
    3. Check adequacy of the obtained results. Error analysis (e.g. residual analysis).
    4. Check adequacy of the system (time complexity, optimization algorithms convergency, stability of the algorithm on the similar data).
  2. Methods of algorithm complexity calculation.
    1. Theoretical method.
      1. Estimate time complexity, e.g. O(n ln n).
      2. Estimate a constant in O().
      3. Estimate time required for the user file processing.
    2. Technical method.
      1. Measure algorithm time on the samples of a different size.
      2. Plot a figure sample size / elapsed time.
      3. Estimate a regression function of the sample size on the elapsed time.
      4. Estimate time required for the user file processing.

Tests

  1. Write a review using a plan provided below and place it into a file named like YourSurname2013ReviewSurname
  2. Prepare 1-minute speech
  3. Create system tests: test data sets, module (script) for launching. Put the reference to this module in section 5.2 of SystemDocs file.

Review plan:

  1. Shortly - what is the main topic, what do you think the most important it this project, aim of the project comparing with similar projects, how can you apply the results of the project (is it actual? important ?)
  2. Project strengths (what positively surprise you?) and weaknesses (what should be considered in a more detailed way)
  3. Project details: clarity of project description in SystemDocs, ProblemStatement; code readability, interfaces usability, tests coverage.
  4. Conclusion

Profiler

Using built-in Matlab profiler, optimize bottle necks in your code. Report about the achievements in section 5.3 of systemdocs file (using profiler reports and comments on the achievements)

Bottle necks are the code fragments, which are unexpectedly turned out to be time-expensive during the experiment. You should show that source code was improved by replacing loops with matrix operations and show that code is efficient enough. If necessary put most significant strings from profiler reports (usually first 10-15 lines), Either copy-pasting lines from html-report generated by profiler or using profiler's exporting utilities (several examples are provided in Matlab manual).

It's recommended to parallel the execution of your algorithm (where it is possible). One of the easiest way to parallel your program is to utilize structure parfor, that is just a "parallel for". Look documentation ("doc parfor") to find examples.

Example:

>> matlabpool(3)

>> tic; parfor i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 3.712837 seconds.

>> tic; for i=1:3, c(:,i) = eig(rand(1000)); end; toc Elapsed time is 5.807167 seconds.

Report

Using the results of system tests and the computational experiment, aimed to provide error rate analysis, create plots and tables with some clarifications, and put it into section 5.2 of system docs. Please identify different parts of this report with help of paragraphs named adequately.

  1. Required parts of the mentioned computational experiment:
  2. Visualization of the procedure of model selection and structural parameters optimization
  3. Visualization of the resulting model or algorithm, visualization of the applied method of optimization, dependence of the lost function or quality criterion on the level of inserted noise or on other factors.
  4. Visualization of obtained error rate in "web" section. (also plot or table)

Web

The folder "web" should contain next mandatory files:

  1. File "config.json" (name and extension should be the same). Fill this file using example placed in folder "Group074/Kuznetsov2013SSAForecasting/web/"
  2. File "main.m" with one argument variable and one resulting variable: html = main(filname), where filename is a text string containing file name, and html is text string containing visual "web" report in html format.
  3. File "test.csv" (you can use another extension), This file should contain test object (text, time series, image, sound, video, etc.) for forecasting.
  4. Other files, that are required for function "main" (in particular file with parameters and structural parameters of forecasting model/algorithm)

For testing purposes it is strongly recommended to launch function writeHTML. It calls function "main('test.csv')" and save results into "out.html". This file should contain either "web" report about results of forecasting or error massage about some trouble with forecasting (types of errors were considered in data loading section).

Личные инструменты