Distribution in Tableau
In the business world, you’ll hear averages get referenced all the time. “What’s the average number of admissions per day?” or “What’s the average profit per month?” or “How are we trending compared to our average?” Those are all fine – even vital – things to know. But they obscure insights hidden in the data. It assumes that everyone behaves just like the norm. And often the exceptions are really the most interesting.
A distribution allows you to look at how data is dispersed across a range of values rather than consolidating it all into an average. Imagine a classroom of middle school kids. The average height is 5’2. The average height for the boys is 5’4 and the average height for the girls is 5’0. But, there’s one 13-year-old boy who is 6’6 – all arms and legs right now – but is a starter on the basketball team, because all we need to do is pass him the ball and no one else can touch it. That’s interesting to know! Our star ballplayer is an outlier and is maybe the most interesting data point in the lineup. A distribution allows us to see how many students fall into each height range and clearly see our basketball star way off the chart.
Using Tableau distribution makes you a smarter analyst. It helps you find better insights into your data and helps you get to the underlying root causes faster. The next section will show you a series of charts you can create to show Tableau distributions in various ways.
Heatmaps and Highlight Tables
Looking at how data is distributed is both really valuable and often an overlooked best practice in business analytics. Let’s first look at heatmaps and highlight tables. Heatmaps encode areas of higher density vs. another (for example, higher sales or profit vs. another). This can be done via color, size, shape, or all of the above. Highlight tables do the same, but actually add the text labels in the squares.
- On a new sheet, bring Region to Rows and Segment to Columns.
- Bring Sales to both Size and Color (though you could easily put multiple measures on the different shelves).
- Size your chart wider by hovering your mouse between the headers at the top until you get the resize arrows, and dragging out to the right.
- Change the color palette from Automatic to Red-Blue diverging by clicking on the color legend. We can now clearly see that in the West, the Corporate segment outperforms the others in Sales, and that in general, the Corporate segment has the best sales across the country.
- To convert this from a heatmap to a highlight table, go to Show Me and click the third icon at the top right. This provides a different way of visualizing things. If, for example, you have a coworker that really wants to see the numbers, not just the comparisons between values, a highlight table still adds a visual element but also keeps the accuracy and detail of the numbers.
Breaking Data into Numerical Ranges
Why look at numerical ranges rather than, say, just an average? Because averages obscure information you need to know.
- What if you were tracking how quickly you shipped the product vs. when it was ordered? The average might look good, but you might have a sizable number of orders that are delayed, which is causing customer frustration and negative reviews, and you wouldn’t know it until it’s too late.
- To build this view, we’re going to create a calculated field.
- Open a new tab. Drag Ship Date and Order Date to Rows.
- Click the drop down arrow on both and select the continuous Day option (Day May 8, 2015).
- Double click the Order Date pill and copy everything in the pill.
- Next, double click the Ship Date pill and add a minus sign after the text that is there.
- Now paste in the Order Date information that you copied.
- We no longer need Order Date, since for this analysis we care more about the number of delayed orders than the date they were ordered on. Drag that pill off the view.
- You now have a calculation showing the number of days between the Order and Ship dates. We need to right-click and change this from a Measure to a Dimension, and also change it to Discrete. This will build you a text table that is ready for us to drop more information into.
- Now bring Number of Records to the Text Shelf (This is the same as the Label Shelf, but the name changes whether you have a visual you are working with, or just a text table).
- Let’s also bring Number of Records to Columns to build a bar chart.
- If our policy were that all orders ship within 4 days of being received, we can see that there are a number of orders that are delayed.
- To see the exact percentage of delayed orders, drag Number of Records onto the Columns shelf next to the first green pill.
- Click the drop down arrow on that second pill and add a Quick Table Calculation to show Percent of Total.
- The marks labels may still be showing the totals on both charts, rather than the percentages on the second chart. If that is the case, on the marks card for your second chart, drag Number of Records off the Label shelf.
- Click the down arrow on that pill and choose Format to format it to 0 decimals. Ensure the tab is set to Pane, not Axis. Then change the Numbers box to Percentage with 0 decimals.
- Now right click and add an annotation. Adding the percent of orders that are > 4, we see 9% of orders are delayed. Type that into the annotation box. Change the font color to red. Click ok.
- Now drag that annotation to encompass the bars starting below the 4 days mark. Drag the annotation box out to fill that side of the view.
- Right click and format it so that the borders are red, there is no shading, and the corners are rounded.
- We’ve now very clearly highlighted where our problem areas lie.
Want to encode multiple values in ranges? Take Tableau reference distributions even further, with our Tableau Reference Distribution tutorial.
Histograms create bins of data and then show how many instances fall into that bin using a bar chart. This is useful for seeing the composition and distribution of your data.
- For example, let’s look at prices binned in $10 increments. Open a new tab. Right click Unit Price > Create > Bins. Change Size of bins to 10.
- This will create a new dimension in your Dimensions pane. Drag the Unit Price (bin) variable to Rows.
- Drag Number of Records to Columns. We now have a histogram showing number of items ordered by price point.
- Turn on labels by clicking Label > Show mark labels.
Did you like our step-by-step tutorial? We hope you have gained a better understanding of Tableau distribution and the importance of data distribution in the corporate world. Looking for more hands-on Tableau training? Check out our interactive Tableau course for individual learners! If you are looking to do the above with your company’s data or looking for a class for your employees or department, check our corporate Tableau courses here!