The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
Aalto

Data Exploration Process Based on the Self-Organizing Map

Juha Vesanto

Dissertation for the degree of Doctor of Technology to be presented with due permission of Computer Science and Engineering for public examination and debate at Helsinki University of Technology, Espoo, Finland on the 16th of May, 2002, at 12 o'clock noon.

Overview in PDF format (ISBN 951-22-5897-8)   [11400 KB]
Dissertation is also available in print (ISBN 951-666-596-9)

Abstract

With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining.

The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims.

The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library.

Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.

This thesis consists of an overview and of the following 10 publications:

  1. Juha Vesanto (1997). Using the SOM and Local Models in Time-Series Prediction. In Proceedings of Workshop on Self-Organizing Maps (WSOM'97), Espoo, Finland, pp. 209-214. © 1997 HUT. By permission.
  2. Esa Alhoniemi, Jaakko Hollmén, Olli Simula and Juha Vesanto (1999). Process Monitoring and Modeling Using the Self-Organizing Map. In Integrated Computer Aided Engineering Volume 6, Number 1, IOS Press, pp. 3-14. © 1999 IOS Press. By permission.
  3. Juha Vesanto (1999). SOM-Based Data Visualization Methods. In Intelligent Data Analysis, Volume 3, Number 2, Elsevier Science, pp. 111-126. © 1999 IOS Press. By permission.
  4. Esa Alhoniemi, Johan Himberg and Juha Vesanto (1999). Probabilistic Measures for Responses of Self-Organizing Map Units. In Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA'99), ICSC Academic Press, pp. 286-290. © 1999 ICSC. By permission.
  5. Juha Vesanto and Jussi Ahola (1999). Hunting for Correlations in Data Using the Self-Organizing Map. In Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA'99), ICSC Academic Press, pp. 279-285. © 1999 ICSC. By permission.
  6. Juha Vesanto, Johan Himberg, Esa Alhoniemi and Juha Parhankangas (1999). Self-Organizing Map in Matlab: the SOM Toolbox. In Proceedings of the Matlab DSP Conference 1999, Espoo, Finland, pp. 35-40. © 1999 Comsol Oy. By permission.
  7. Juha Vesanto and Esa Alhoniemi (2000). Clustering of the Self-Organizing Map. In IEEE Transactions on Neural Networks, Volume 11, Number 3, pp. 586-600. © 2000 IEEE. By permission.
  8. Juha Vesanto (2001). Importance of Individual Variables in the k-Means Algorithm. In Proceedings of the Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining (PAKDD2001), Springer-Verlag, pp. 513-518. © 2001 Springer-Verlag. By permission.
  9. Markus Siponen, Juha Vesanto, Olli Simula and Petri Vasara (2001). An Approach to Automated Interpretation of SOM. In Proceedings of Workshop on Self-Organizing Map 2001 (WSOM2001), Springer, pp. 89-94. © 2001 Springer-Verlag. By permission.
  10. Juha Vesanto and Jaakko Hollmén (2002). An Automated Report Generation Tool for the Data Understanding Phase. In Hybrid Information Systems, edited by A. Abraham and M. Köppen, Physica Verlag, Heidelberg, pp. 611-626. © 2002 Springer-Verlag. By permission.

Keywords: self-organizing map, exploratory data analysis, data mining, visualization, clustering, vector quantization

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2002 Helsinki University of Technology


Last update 2011-05-26