"Scalable Clustering of Categorical Data and Applications" Περίληψη: Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. In this talk, we introduce LIMBO, a scalable hierarchical categorical clustering algorithm that uses an intuitive information-theoretic distance measure for categorical tuples and values. When clustering values, LIMBO can give useful hints about potential duplication and errors that may exist in a data set. As a hierarchical algorithm, LIMBO has the advantage that it can produce clusterings of different sizes in a single execution and within a memory bounded summary model for the data. We present results from our experimental evaluation of LIMBO, which show the increase in efficiency without significant loss in the quality of the produced clusterings. We move on to show how the algorithm can be used to produce valid and useful clusterings of large software systems. In this case, LIMBO is applied in the presence of both structural and non-structural information about the software systems and, thus, allows for an evaluation of their usefulness in understanding them. Finally, we conclude the talk with a set of research challenges that present themselves for the future. Βιογραφικό σημείωμα: Periklis Andritsos received his B.Sc. degree in Electrical and Computer Engineering in 1998 from the National Technical University of Athens, Greece. In 2000, he received the M.Sc. degree in Computer Science and in 2004 the Ph.D. degree in Computer Science, both from the Department of Computer Science at the University of Toronto. In 2004/2005 he was a Post-doctoral fellow at the University of Toronto. Currently, he is a faculty member at the University of Trento. His research interests include database systems, data mining, clustering and reverse engineering. He is a member of the IEEE Computer Society and the ACM.