How Do I Interpret and Explain My Clustering Chart?
We have just a built an awesome clustering chart, but there is some visual fine-tuning to do before we can hit a home-run with our boss. Before we jump in, let’s take a look at some of the numbers under the hood.
How do I understand each of my clusters beyond just eyeballing it?
- Click the down arrow on the Clusters pill which should be on your Color shelf.
- Choose Describe Clusters.
- A window will appear with a lot of information about how this was created. You want to pay attention to the following:
- Variables – these are the measures that you are crunching to find look-a-likes (i.e. group similar customers by sales and profit)
- Level of Detail – these are the dimensions that you’re incorporating into the cluster (i.e. show me look-a-like customers by sales and profit, by analyzing customer segment, marketing channel, product category, etc. and finding commonalities across all of those).
- Number of clusters – these are the distinct groups or segments that the algorithm found
- Clusters – you need to scroll down to find these.
- Number of Items – shows how many data points are in each cluster (these could be your bars or the circles on a scatter plot)
- Centers – this is the average value within each cluster. You’ll see the obvious differences.
- It’s OK to have clusters of different sizes as data may group more strongly at one end then another, but you want each cluster to have enough data points to be meaningful.
- If it only has one or two, consider excluding those from the view as they might be outliers skewing your results, or consider changing the number of clusters.
- Note: Most of the cluster centers will appear in scientific notation, which is frustrating. If you click the Copy to Clipboard button and paste it into Excel, you can format the numbers so you know correctly what they represent.
Now, Let’s clean the clusters up with a trick to rename them with the added bonus of being able to use them in other charts and analyses. (Note that once you complete this step you cannot view the previous underlying numbers, so make sure you have copied the numbers or taken a screenshot.) This is the final product:
- Hold down the Ctrl key and then click on the Clusters pill on the color shelf, and then drag this over into Dimensions.
- Now, double click the Clusters pill you just dragged into Dimensions and rename it to “Sales & Profit Clusters.” This is now a field that we can reuse again later, which will be very helpful in analyzing certain segments of customers.
- Click the down arrow on the renamed pill and choose Edit Group.
- Right click on Cluster 1 and choose Rename. Type “Low Sales, Low Profit.”
- Follow the same procedure for Cluster 2 (note that they may not be in numeric order!). Rename is to “High Sales, Low Profit.”
- Rename Cluster 3 to “Top Performers.”
- Rename Cluster 4 to “Mid-tier Sales, Low Profit.”
- Rename Cluster 5 to “Medium Sales, Medium Profit.”
- Now drag the updated “Sales and Profit Clusters” pill and replace the existing Clusters field on the color shelf. You can do this by placing this pill directly on top of the other one. Or, by dragging the current field on Color off and replacing it with the new. Follow along with the GIF below to see it completed up to this point (Click to see it full screen):
Now, let’s change the color scheme, so that our colors convey a little more meaning.
- On the legend, click the drop down arrow at the top right, and choose Edit Colors.
- Set the color palette to Superfishel Stone in the drop down menu.
- Now choose the “Top Performers” segment and the click on the dark green pill.
- Repeat this procedure and change “Low Sales, Low Profit” to the orange color. Change “High Sales, Low Profit” to red. Change “Mid-tier Sales…” to the light olive color. Change “Medium Sales” to the aqua color.
- Choose OK.
We now have some statistically valid segments that we can reuse and that are highlighted with meaningful titles that indicate the next step. For example, “High Sales, Low Profits” leads us to the very obvious “why” question. We can then drill down deeper to see what else surfaces from these data points that indicate actions we need to take.
How do I explain this to other people…?
…and get the “thumbs up” from your boss?
Use the following tips:
In English: Find members of a potential group (could be customers, could be cities, could be anything you’re trying to group on) that are as similar to each other as possible, and as dis-similar as possible to the next group. We want each group to be as unique and distinct as possible, while we want each member of a particular group to be as similar as possible.
Quantitatively: For a given number of clusters or look-a-like groups (denoted by the letter “K”), the algorithm partitions the data into that many clusters or groups. The algorithm will determine what it thinks is the optimal number of clusters for you, based on your data. But you can easily change that to see if new patterns emerge. Each Cluster has a center (centroid) that is the average value of all the points in that cluster. Each cluster is a valid statistical grouping that will update dynamically as data values change or as new data is added.
An example: Let’s say you have information about four Domino’s pizza chains, and a list of customer addresses. But those customer addresses aren’t tied to any particular Domino’s location. You’d have to manually sort through the addresses and compare them on Google Maps to determine which location they should order from. Clustering does this automatically. It would crunch through the data and then determine which neighborhoods are around each Domino’s location. You’d have four clusters. This is essentially what Google does when you search for “pizza near me,” by the way.
What have you used clustering for and how did it allow you to find more insight? Let us know in the comments!