blog header - graphic - infographics

Making Data-Backed Predictions Through Logistic Regression

Yogi Berra famously said that “it’s tough to make predictions, especially about the future.” I’d add that making predictions about binary events is even tougher, mainly because you can’t be partially right. I’d also add that not only is making such predictions tough, but those predictions can be considerably costly if you’re wrong, especially in today’s business environment. 

Business leaders need to make predictions all the time, from forecasting what their cash flow will look like over the next 13 weeks to managing inventory based on predicted sales to hiring employees to meet expected staffing demands. While predictions in business are rather ubiquitous, some of the most consequential ones are binary; that is, a yes-or-no, zero-or-one prediction. Will this salesperson close the deal? Will this customer churn? Will this employee leave for another opportunity? These are some critically important situations that companies face and being able to make accurate predictions on the likelihood of these events happening would be near invaluable. 

But beyond relying exclusively on the experience of leadership, how can a company use data to make empirically sound predictions in these contexts? If you recall from our previous post, linear regression can be a simple but powerful tool in predictive analytics. You can predict the percent likelihood of a customer churning this quarter, for example, by regressing past customer churn events against customer sales trends, satisfaction scores, pricing, competitive effects, etc. The issue with using linear regression, however, is that the response variable is unbounded by 0 and 1. In other words, using linear regression to make this kind of prediction can lead to results indicating a greater than 100% or less than 0% chance of something happening. Obviously, this conclusion is wrong, indicating that this approach is suboptimal. Fortunately, logistic regression doesn’t come with these problems but still provides the simple interpretability and computational efficiency of linear regression.

linear-regression

Mathematically, logistic (or “logit”) regression is very similar to linear regression. Take the linear regression formula:

logistic formula

Logistic regression is the linear regression equation put into an exponential, where P = the probability of an event happening:

equation

These coefficients can be used to calculate probabilities: 

  • If a0 +a1x1 + a2x2 + …. + ajxj = ∞, then P = 1, or 100%. 
  • If a0 +a1x1 + a2x2 + …. + ajxj = -∞, then P = 0, or 0%. 

Calculating logarithmic odds is slightly more computationally intense than determining linear coefficients, so logistic regression can take a little longer than linear regression when dealing with large datasets. Additionally, exponentiating coefficients that represent log-odds to their probability values may be slightly confusing to those without a background in statistics, so interpretability is a little more difficult than linear regression as well. That said, logistic regression is still a fast and relatively straightforward tool that can be powerfully effective at estimating probabilities. 

Companies have effectively used logistic regression to estimate the likelihood of:

  • A customer defecting to a competitor
  • An employee leaving for another opportunity
  • A salesperson closing on a sale
  • A borrower defaulting on their loan
  • A particular transaction being fraudulent

The interpretable output of logistic regression models won’t be a “yes” or “no”, but rather a percent likelihood ranging from 0 to 100%. As a business leader, understanding the relative costs of being wrong is imperative for appropriately leveraging logit models. For example, a financial institution erroneously denying credit to a safe borrower means they missed out on earning more revenue, but that cost is undoubtedly lower than the cost they would incur if they offer credit to a borrower who doesn’t pay off their loan. In this example, it may be appropriate for the lender to deny credit to potential borrowers whose model-determined probability of default is relatively high but still far below 50%. Understanding the relative costs of false positives and false negatives (Type I and Type II errors, respectively) is needed to fully take advantage of the benefits of logistic regression. That said, an experienced leader’s expert opinion on a given business context coupled with a data-oriented approach like logistic regression can make predictions less tough, even if they are about the future. 

Logistic regression can be used to predict liquidity issues and defaults in a turnaround and restructuring engagement, it can be used to perform accurate A/R aging in a finance transformation engagement, and it can be used to predict marketplace activity in a strategy engagement. If you’re interested in how Larx can help your company with predictive analytics or a number of other challenges, contact Allan Mathis.

Share this:

Share on facebook
Share on twitter
Share on linkedin

Got a Project? Let's Schedule An Appointment

How can we help?

Enter your email, and we’ll be in touch with you to see how we can help you navigate through your changes.