A recent stats-themed XKCD comic made the rounds:
and Tableau Zen Master Mike Cisneros nailed it when he said:
If you don’t feel this way when you use the Analytics panel in Tableau, then I don’t believe you. https://t.co/NxlKtMY90V
— Mike Cisneros (@mikevizneros) September 19, 2018
What’s most hilarious about the comic is that the data points are exactly the same in each frame.
And yet, it’s true. Statistics or predictive analytics in Tableau can seem daunting. Some people feel they need an advanced degree to use statistics effectively. And since they don’t, they just avoid them altogether.
Others feel that you can just drop in a line that looks cool and call it good. Until, of course, someone starts to ask hard, probing questions of your analysis during a meeting in front of your boss and colleagues and you suddenly regret sleeping through that 8am stats class in college.
So, we’ve written this definitive guide to linear regression in Tableau.
The goal of this series of blog posts is to be a plain-English resource on linear regression models in Tableau, one of the most common forms of predictive analytics out there.
Linear Regression (aka the Trend Line feature in the Analytics pane in Tableau):
At a high level, a “linear regression model” is drawing a line through several data points that best minimizes the distance between each point and the line. The better fit of the line to the points, the better it can be used to predict future points on the line. In other words, the less distance from the points to the line, the better, and the more accurate your projections.
See the following:
The model on the left would be considered as a strong model and usable to predict other values on that line while the model on the right would be very hard to predict.
In a Linear Regression, you are examining the relationship between a “dependent variable” (the metric on Rows in Tableau) and an “independent variable” (the metric on Columns in Tableau). The independent variable is the value that is being changed while the dependent variable is how it reacts to the change in the independent variable.
For example, say you are tracking the average Body Mass Index (BMI) by age. The age buckets would be our independent variable or the change we are looking at. The dependent variable would be the average BMI. We’d want to know how much does the average BMI increase or decrease as age increases and how strong is that correlation?
Use Cases for Linear Regression Models
There are three key uses for linear regression models:
Determining the strength (or how much change they cause) of variables
In other words, how much does the change in one thing affect another? Is the relationship between the two metrics strong or weak? Is the relationship positive (as one thing goes up so does another) or negative (as one goes up, the other goes down)?
See the following example where we compare Discounts vs. Profit Ratio using the Superstore data set that ships with Tableau. Here we compare the strength and direction of the relationship between the two variables.
We can even parameterize the variables to see how much impact any one variable or metric will have on another.
Trend and time forecasting
In other words, based on historical patterns over time what do we project will happen in the future?
See the following example looking at Sales by Month using the same Superstore data. We can reasonably expect Sales values to fall between the outside bands and close to the trend line in the middle in the next 12 months.
We can even adjust the type of regression being calculated to determine if that is a better fit to the data. In the image below, we’ve adjusted the regression line from Linear to Polynomial of Degree 2, which gives us a slightly better fit than the linear. But we want to avoid the issue that XKCD points out of just slapping on a line that looks cool. We need to know if it’s more accurate or not. In our next post, we’ll give you the tools to assess statistical significance and accuracy all using Tableau’s built-in predictive analytics tools.
Estimating an unknown value or performing what-if analysis
In other words, if we change Metric 1 by x amount, how much will Metric 2 change? This enables us to use existing data to estimate another value. This is useful in what-if analysis. For example, if we change prices, what will happen to sales? Is this a significant impact or not?
In the below image, we’ve used the same chart as above but changed it from linear to polynomial to again get a better estimate of the trend. You’ll notice that we have lost our helpful calculations indicating the strength of the relationship and the direction of the relationship as calculating those becomes more of a challenge. Fear not, however, we will tackle these in a subsequent blog post!
Altogether, we have several possibilities of predictive analytics within Tableau using regression analysis. We can project forward in time, we can compare the impact of one metric on another, and we can make inferences about data we don’t currently have in order to run more accurate what-if type scenarios.
There are all sorts of applications to these possibilities. Here are just a few:
- What’s the correlation between social shares or traffic from social media and form submissions on our site?
- Is there a statistically significant impact from our new email marketing campaign on purchases?
- If we could rank an organic search term in the #1 slot on Google, how much traffic could we expect to receive from our site?
- If we increase marketing budget in this channel, how many additional leads could we expect?
- Do conversations in social media about certain product features or services offered drive sales? If we could increase social conversations by X amount, how much would sales increase?
- Does usage of our software, or particular features of our software, correlate with higher customer retention, or conversely, more customer churn?
- For other examples, check out this article on using predictive analytics for product research and development
- If we increase price, what would be the impact to sales, or to profits, or to inventory levels?
- What do we project sales will be in the next six months?
- How are discounts correlated with profit margins?
- What do we forecast our pipeline to be for total leads vs marketing qualified leads vs actual sales ready prospects vs actual sales?
- Are certain actions from us, saying sending a whitepaper in an email, or from a prospect, say visiting our website multiple times in a given timeframe, strong predictors of intent to purchase?
- Is there a correlation or impact from student alcohol drinking frequencies or amounts and GPA, attendance and graduation rates?
- Is there a correlation between amount of grant money and volume of publications or awards?
- Which variables correlate best with commercializing university research into viable market products?
- What is most predictive of attracting new students to our university?
- What is the impact of this new molecule on drug test results?
- Can we correlate demand for drugs or products with Google searches?
- Can we project when our competitors are likely to bring new products to market based on historical data of patent applications and when products hit retail shelves?
- What is the impact of soil conditions on crop yields per hectare?
- Can we project the amount of harvest we can achieve for differing growing seasons and regions?
- Which variables are most correlated with beating our control experiments?
There are many others. If your field isn’t mentioned, drop us a note in the comments or send us an email and we can provide you some ideas and examples. We’d love to hear from you. We also offer Tableau Training courses. Feel free to explore and download the charts from Tableau Public. Stay tuned for more blog posts in our linear regression in Tableau series!