top of page

Cluster Analysis

The goal of cluster analysis is to find patterns using the variable values to identify relationships between the data. Clustering can generally be used during the data preparation stage to help identify variables that can be aggregated or removed from consideration when analyzing the data. The grouping of the data helps to organize the variables into subgroups which helps when trying to identify similarities or differences. The data that is more closely related will be clustered together and generally will have more variance than the data in other clusters.  There are different methods of clustering that can be used depending on the size and type of data that is being analyzed. 

 

The two types unsupervised learning clustering methods are hierarchical and K-means clustering. Hierarchical clustering is usually used for smaller data sets where k-means clustering is used when there is a larger amount of data. K-means clustering is the most popular non-hierarchial method of analysis.

​

The two cluster analysis visualizations pictured are the parallel coordinates cluster analysis and cluster matrix. These visualizations were made in SAS Visual Analytics. The parallel coordinates are best for viewing one variable individually and seeing the affect of single variables on the entire analysis. Cluster matrix is best used for looking at individual factors. 

​

​

​

​

bottom of page