NCSU Libraries
Search the Collection|Browse Subjects|Services|Library Information|Community |News & Events

Title page for ETD etd-11052004-022839


Type of Document Master's Thesis
Author Caņas, Daniel Alberto,
Author's Email Address dacanas@unity.ncsu.edu
URN etd-11052004-022839
Title Generalizations and unification of centroid-based clustering methods
Degree Master of Science
Graduate Program Computer Science
Advisory Committee
Advisor Name Title
Dr. Robert Funderlic Committee Chair
Dr, Jon Doyle Committee Member
Dr. Steffen Heber Committee Member
Keywords
  • k-means
  • data mining
  • cluster analysis
Date of Defense 2004-11-02
Availability unrestricted
Abstract
There are many clustering methods that are referred to as k-means-like. We give the minimal necessary and sufficient components for the mechanism of the k-means (iterative and partitional) clustering method of a finite set of objects, X. Thus k-means is generalized and the methods that mimic k-means are unified. We name these k-center clustering methods. The fundamental mechanism of k-center methods exposes the usual misconceptions of k-means such as (a) ``distance" satisfies some of properties of a mathematical metric, (b) there is a need to measure ``distance" between objects in X, and (c) the centers of clusters have the same nature as the objects of X. Moreover, k-center methods have a common formula to choose or calculate centers of clusters. We characterize the convergent common objective function by expressing it in terms of (a) a distance measure for closeness between center objects and the objects in X and (b) the coherence of clusters. We give a three object example to demonstrate the components of the formal mechanism of a k-center method. We then give examples of various known methods that belong to the class of k-center methods. We exhibit an extensive and thorough comparison of the qualitative k-modes and the numerical spherical k-means. Included are paradigm applications, a matrix environment, an understanding of the duality of a dissimilarity and similarity measure, and an understanding of normalized X and the normalized centers of subsets of X.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  etd.pdf 255.24 Kb 00:01:10 00:00:36 00:00:31 00:00:15 00:00:01