Relating clustering stability to properties of cluster boundaries
In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the true stability cannot exist, unless one makes strong assumptions on the underlying distribution.
Author(s): | Ben-David, S. and von Luxburg, U. |
Book Title: | COLT 2008 |
Journal: | Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008) |
Pages: | 379-390 |
Year: | 2008 |
Month: | July |
Day: | 0 |
Editors: | Servedio, R. A., T. Zhang |
Publisher: | Omnipress |
Bibtex Type: | Conference Paper (inproceedings) |
Address: | Madison, WI, USA |
Event Name: | 21st Annual Conference on Learning Theory |
Event Place: | Helsinki, Finland |
Digital: | 0 |
Electronic Archiving: | grant_archive |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
BibTex
@inproceedings{5105, title = {Relating clustering stability to properties of cluster boundaries}, journal = {Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)}, booktitle = {COLT 2008}, abstract = {In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the true stability cannot exist, unless one makes strong assumptions on the underlying distribution.}, pages = {379-390}, editors = {Servedio, R. A., T. Zhang}, publisher = {Omnipress}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {Madison, WI, USA}, month = jul, year = {2008}, slug = {5105}, author = {Ben-David, S. and von Luxburg, U.}, month_numeric = {7} }