Up to this point, we’ve mostly looked at how data can be segmented by some dimension or over time. However, looking at correlation in Tableau by looking between numbers, and how one metric affects another, is an extremely valuable skill in analytics. Correlation analysis in Tableau compares two or more quantitative variables to see if values in one vary systematically with values in another. For example, as height in men increases, so typically does weight. And with enough data, you could probably start to have a pretty good idea that if a man is 6’0 tall he will weigh within a certain range. This is Tableau correlation analysis at work. Tableau offers several analytical tools to do this.
Scatter Plots to Find Correlation in Tableau
First let’s build a scatter plot.
1. Drag Sales to Columns and Profits to Rows. This will build a quadrant with two axes, with Sales along your x-axis as your independent variable, and Profit on your y-axis as your dependent variable. You can easily swap these axes using the swap icon at the top.
2. Drag Customer Name out into the quadrant. We’ll now have a dot for every customer that plots both their sales and their profit. The goal would be to have everyone with both high sales and high profits, which would cluster the dots at the upper right corner of the graph.
3. Now drag Segment onto the Color shelf. We now have each of the customers encoded by their segment.
4. We see, for example, one dot up at the top. Mousing over that, we see that it’s a particular Consumer customer that has bought over $117k of products from us and has a profit of $34k.
5. However, with so many colors on the view at different points, it is difficult to look at any one particular segment. What if we wanted to just focus on that for a moment, but don’t want to remove it from the view. We can focus on just one segment by clicking its name in the legend. All other points will gray out.
6. This now enables us to see the correlations of sales to profit in Tableau for a particular segment. Note that you can do legend highlighting on any chart, not just scatter plots.
Trend Lines and Statistical Features Add Rigor to Tableau Correlation Analysis
If you want to add more analytical and statistical rigor to your analysis, you can add trend lines and various statistics to the view. While these can sometimes be confusing to an end user who doesn’t have much experience with stats, it’s very helpful to you as an analyst in really knowing what’s going on. Once you have a sense of what’s affecting your numbers, you can then talk your conclusions to your colleagues and management. So let’s look at a few basic statistical features.
1. Right click on your scatter plot and click Trend Lines>Show Trend Lines.
2. Several lines will now appear on your graph. At the moment, we just want the Tableau correlations, not the confidence bands (which is why you have so many lines).
3. Hover over a line and click edit trend lines. Uncheck “Show Confidence Bands.” But leave “Allow a Trend Line per Color” since we only have 4 segments.
4. When you mouse over the line, you will be given an equation and a p-value. You want a p value that is less than 0.05. If it’s higher than that, the Tableau correlation between the variables isn’t statistically significant. In this situation, a very low P-value means that you can have greater trust in the Tableau correlation between sales and profit for a customer in any of our particular segments, and that the results we are seeing did not occur randomly
5. The equation enables you to predict how changes in your x variable (sales) will change your y (profit). Use the R-Squared value as a sniff test to determine how well this model predicts y from x. You can think of this as a scale of 0 to 100%, the percentage of variation (or changes) in y that can be explained by x. The closer to 100% the more variation in y is attributed to x, and not some outside variable. Prediction models only consider the variables you’ve used to build it so outside variables will always confound the results. For example, an R-Squared value of 0.127 means that 12.7% of the changes in profits can be explained by sales – therefore 87.3% of changes in profits cannot be explained by sales and are related to OTHER outside variables. This would not be a good model for prediction purposes.
6. You can also find correlation in Tableau between the two variables – also known as “Pearson’s R” or the “Pearson Product Moment” – by taking the square root of R-Squared and applying a negative or positive sign to the result, depending on the direction of the slope of the line. Correlation in Tableau measures the strength and direction of a linear relationship.
7. There is a lot more detail on how to use trend lines and models here. http://onlinehelp.tableau.com/current/pro/online/en-us/help.htm#trendlines.html%3FTocPath%3DAdvanced%2520Analysis%7CTrend%2520Lines%7C_____0
8. The other trick you can use to get some basic stats about your chart (scatterplot or otherwise), click Worksheet and then Show Summary. This will display a box that shows some basic stats, like sum, count, average, min/max, but you can click the down arrow and get much more statistical insight. The unfortunate thing is this can only be displayed on worksheets, not dashboards, so it’s mostly for just your reference.
Note: By leaving the “Show recalculated line for highlighted or selected data points” box checked, you can recalculate a trend line for just a subset of points.
For example, if we just highlight the points above the orange line in the preceding scatterplot image, the trend line would recalculate and be much more steep. This gives us a sense of how certain data is behaving in comparison to others. In this example, data that behaves like those upper points will rise (i.e. profits will go up at a faster rate as sales increase) than do the data that behaves like those along the bottom of the chart.
Reference Lines Show Correlation in Tableau
Reference lines come in a variety of formats and are extremely useful for showing relationships between numbers. You can show a reference line (i.e. sales per segment compared to the average sales across all segments), a distribution (i.e. show me sales divided up into percentiles), or a band (show me customers whose sales are above $10k).
1. These can be found above the data pane under the tab Analytics.
2. Click Analytics and then drag “Median with Quartiles” onto the scatterplot. You’ll want to make sure both Sales and Profit are highlighted on the table that appears.
You’ll now see some bands on top of your view that shows where your middle sales and profit values lie. Anything above or below that lie outside of that range.
3. Now let’s see how the average line compares to the median value. Drag average onto the scatterplot.
4. A box will appear that will provide options with examples. Ensure only the Sales box under the table section turns red. You’ll now have a median and average sales line.
5. You can format a line by right clicking on the line and choosing Format. You can change both the label formatting as well as the line formatting.
6. Let’s change the average line to a dotted line that is dark green.
7. Let’s edit the label by right clicking on the label and choosing Edit.
8. Change the label from Computation (which was Average) to Custom.
9. Type in “Avg:” then > and select Value. Click ok and notice how the reference label changed.
10. You can get much more detailed with these dynamic values by adding dimensions and measures to your Detail shelf.
Practice How to Find Correlation in Tableau Today
Marketing has decided they are running things by the numbers. They want to know whether Discounts have an impact on Order Quantity, and by how much. They’d also like to see Profit over time by Marketing Channel broken into quartiles. And they’d like to see a quarterly forecast of Sales.
- Build a scatterplot plotting those 2 variables – Discount on Columns and Order Quantity on Rows.
- For now, leave both of their aggregations at Sum. In reality, we would set Discounts to Average, but leaving it as a sum makes for a more dramatic example.
- Drag Customer Name onto the view.
- Add a trendline.
- Look at the p-value and determine if it’s statistically significant. If it’s less than .05, you’re good.
- Rename the tab “Impact of Discounts on Order Qty.”
- Create a second tab and bring year and month of Order Date to Columns.
- Bring in Sales and add a reference distribution showing the Median with Quartiles.
- Hint: This can be done easily using the Analytics tab at the top of the Dimensions pane.
- Are monthly sales figures becoming more predictable (i.e. is the spread between the bands increasing or decreasing)?
- Add a filter for Marketing Channel. Check All to begin with.
- Rename the tab “Sales Quartiles by Year.”
- Add these charts into a dashboard with Quartiles on left, and the scatterplot at right.
- Add formatting. Title the whole dashboard “Marketing’s Revenue KPIs.”
- Name the dashboard tab “Marketing KPIs.”
While you can easily learn how to use the tools, showing Correlation in Tableau is one of the skills that you ultimately need to be successful with your analysis. We hope you learned a lot about Tableau in this mini blog tutorial. Keep in mind that if you want to practice more analytical skills, check out our online Tableau training!