Cluster Results Visualizer & Profiler
Input Pre-Clustered Data
Cluster Visualization & Profiles
Data Table with Cluster Assignments
Cluster Profiles (Feature Summary Statistics)
Cluster Visualization (2D Scatter Plot - Select Features)
Understanding Cluster Analysis & Export
Cluster Analysis (Segmentation)
Cluster Analysis is an unsupervised machine learning technique used to group a set of objects (data points) in such a way that objects in the same group (called a cluster or segment) are more similar to each other than to those in other groups.
It's widely used for:
- Customer Segmentation: Grouping customers based on purchasing behavior, demographics, etc., for targeted marketing.
- Market Segmentation: Identifying distinct groups of consumers in a market.
- Anomaly Detection: Finding data points that don't fit well into any cluster.
- Image Segmentation, Document Grouping, and many other applications.
Common Clustering Algorithms (Not implemented in this tool):
- K-Means Clustering: Partitions data into 'K' predefined clusters by minimizing the within-cluster sum of squares (distance to centroid).
- Hierarchical Clustering: Builds a hierarchy of clusters, either agglomerative (bottom-up) or divisive (top-down). Results are often visualized as a dendrogram.
- DBSCAN: A density-based algorithm that groups together points that are closely packed together, marking outliers.
Profiling Clusters (What this tool does):
Once clusters are formed (by an external tool), the next crucial step is to understand what defines each cluster. This involves examining the characteristics of the data points within each segment. Common ways to profile include:
- Calculating summary statistics (mean, median, min, max, standard deviation) for each feature/variable within each cluster.
- Visualizing the distribution of features for each cluster (e.g., box plots, histograms).
- For 2 or 3 features, scatter plots color-coded by cluster can reveal separation.
This tool helps with the profiling step by taking your data and pre-assigned cluster labels to show summary statistics and a basic scatter plot.