3006. Cluster Analysis with Binary Variables in Large Databases
John E Cornell, PhD, GRECC/Verdict STVHCS
Workshop Objectives: Cluster analysis is a statistical technique used to categorize patients into coherent, cohesive subgroups based upon patterns of clinical signs and symptoms that empirically go together. In practice, cluster analysis is the end product of a series of analytical decisions, each of which can significantly affect the number and quality of clusters identified in the analysis. This series of analytic decisions involves choices about what objects to cluster, what unit of measurement to use for the variables, what proximity measure to use as an index of similarity or dissimilarity among the objects, what type of clustering algorithm to use, and what criteria to use for determining the number and quality of clusters in the data. The objectives of this workshop are to: 1. Provide an overview of proximity measures appropriate for binary data, 2. Review clustering algorithms (hierarchical, optimal, latenttrait, and fuzzy logic) appropriate for binary data, 3. Illustrate various graphical techniques for visualizing cluster, and 4. Discuss the problems and issues that emerge in the application of cluster analysis on binary variables in large databases. Our approach is primarily didactic and emphases the decision making and problem solving aspects in the application of cluster analysis in medical research.
Target Audience: The workshop is intended for medical researchers who want to learn about how cluster analysis is used to discover homogeneous subgroups of patients relevant to health care research.
Audience Familiarity: Participants need an understanding of basic medical statistics and statistical methods for 2 x 2 tables in order to benefit from the workshop.
