Separation index and classification techniques based on Shannon entropy

Buono Francesco, RWTH Aachen University

The purpose is to use Shannon entropy measures to develop classification techniques and an index which estimates the separation of the groups in a finite mixture model, see [2]. If we know the number of groups and we have training samples from each group (supervised learning) the index is used to measure the separation of the groups. Here some entropy measures are used to classify new individuals in one of these groups. If we are not sure about the number of groups (unsupervised learning), following the idea of the normalized entropy criterion defined in [1], the index can be used to determine the optimal number of groups from an entropy (information/uncertainty) criterion. It can also be used to determine the best variables in order to separate the groups. Theoretical, parametric and non-parametric techniques are proposed to get approximations of these entropy measures in practice. An application to gene selection in a colon cancer discrimination study with a lot of variables is provided as well. References [1] G. Celeux, G. Soromenho, An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13(1996), 195-212. [2] J. Navarro, F. Buono, J.M. Arevalillo, A New Separation Index and Classification Techniques Based on Shannon Entropy, Methodology and Computing in Applied Probability, 25(2023), 78.

Area: IS15 - Stochastic processes in the natural sciences (Giuseppe D'Onofrio/ Serena Spina)

Keywords: Shannon entropy, Discriminant analysis, Cluster analysis.

Please Login in order to download this file