To Jitter or Not to Jitter
In week 31 of the #MakeoverMonday challenge hosted by Zen masters Andy Kriebel and Eva Murray on the Big Mac index, I built a visualization that made use of a jitter plot to show the purchasing power parity among nations.
I decided on a circle view to display this type of data so it will be easy to split the Big Mac index among the various countries between those countries whose currencies were undervalued and those that were not.
The problem with using the circle view for this dashboard was the fact that I had just one measure (Selected currency) to work with and as a result, it was difficult to see all the countries in the distribution. This is due to the fact that several data points overlap as shown below.
While it may be easy to spot an outlier, it is not easy however to see details of most countries in the view. The way to solve this is to spread the data points using a jitter technique.
When to jitter
As stated, you may want to spread the dots in your distribution when they are packed together to allow for easy reading of the dots that overlap. Using the jitter technique in this way allows you to separate marks or dots into different columns.
When not to jitter
When you’re working with geographic data, it is not advisable to use the jitter technique when the exact location of a mark is important to the analysis. If, for example, your data set is measuring when and where gas emissions were detected by a sensor, a jitter technique would move overlapping marks to a new point (i.e. the latitude and longitude).
How to jitter
There are two approaches you can use to jitter marks that overlap on top of each other
The simplest approach to spreading overlapping marks is to use the hidden Random function within Tableau. To do this open a calculation window and simply type in the function RANDOM ().
After creating this, bring the calculated field to the view; in this case to the column shelf. Now notice how this spreads out the marks (countries) in the view.
By default, Tableau sets the spread of the jitter when you make use of the Random function. If, however you want greater control over the jittering, you can do this by using a parameter to control the spread and including the modulo function in the calculation.
First, create a parameter to control the spread as shown in the example below.
Now add this to the jitter calculation as shown in the screenshot.
Now you have a dynamic way to control the spread of the data by inputting values in your parameter. This changes the range of the jitter axis and the results now look like this;
The downside of using the RANDOM () function is that it is not a very reliable method to generate pseudo-random numbers because results may be inconsistent.
For example, when you leave the aggregation of the RANDOM () function as a sum, it sometimes gives you an “up and to the right” phenomenal that looks more like a scatterplot and implies some type of relationship exists in the data. You may need to fix this by setting the random function as a dimension.
Tableau’s website also states that the Random function is “not tested, supported or recommended for general use and may be deprecated in future”. The website further states that;
- The RANDOM () function does not produce reliable results for analytic tasks, due to its interaction with other queries and the data cache.
- The RANDOM () function in its current form is useful to developers for testing a limited number of other functions.
Since the RANDOM () may not present you with reliable results, another approach to consider is using the INDEX () function. You can choose to simply type in INDEX () in the calculated field or combine it with the modulo and parameter just as used in the RANDOM () approach.
Using the Index function creates a table calculation and you need to set it to compute using the dimension field, in this case, ‘Name’.
Below is what the result of the Index approach looks like.
In my final chart view, I used the Index approach but I could easily have used the random approach as well.
One important thing to remember in using jitter is to remove the axis headers for the jitter technique used. Leaving the axis headers might make your audience curious about what the numbers represent and think some type of relationship exists in the view.
To learn more about using the jitter techniques mentioned and their use cases, check out Tableau Zen master, Ryan Sleeper’s blog here.
If you prefer a more advanced way to randomly generate numbers, you can also check out this blog by Tableau Zen master Joshua Milligan.
You can download a copy of my dashboard in the link here. Can you think of any other scenarios where jitter techniques can be useful? Please leave a comment below.