When preparing for oncoming hurricanes, Walmart manages its inventory in accordance with items that typically sell quickly in emergencies. These products are what you would generally expect to need in a natural disaster situation, such as generators, batteries, first aid supplies, bottled water, and…Strawberry Pop-Tarts.
Walmart’s cluster analysis of its customers’ shopping carts in times of natural disasters has shown that consumers do tend to buy strawberry-flavored frosted pastry treats with their essential survival goods. In retrospect this makes some degree of sense, as Pop-Tarts last forever, can be eaten at any meal, and don’t require cooking. And let’s be honest, during natural disasters any concerns one might have over their abject lack of nutritional value tend to go away. (After all, if I’m hunkering down in a basement during a storm I’m not exactly going to be watching my sugar intake.)
What’s interesting is that Walmart didn’t know this beforehand. Analysts in Bentonville didn’t do statistical tests to see if Strawberry Pop-Tart sales correlate with batteries and first aid supplies. They simply analyzed the data as is, without categorizing it or hypothesizing about it beforehand. This approach is the basis of unsupervised learning algorithms, which can be a very powerful tool for revealing hidden insights in your marketplace.
While the mathematical foundation of many unsupervised algorithms has been around for many years, improvements in computing power in recent decades have allowed for them to be deployed in myriad ways. A canonical story is how video streaming giant Netflix found that many of its users who liked political thrillers also liked movies starring Kevin Spacey and movies directed by David Fincher. Realizing this, Netflix decided to simply make a political thriller produced by David Fincher starring Kevin Spacey. And thus, House of Cards was born.
House of Cards was unsurprisingly a hit for Netflix, and their proprietary unsupervised algorithms guide the recommendations you see on your home screen if you’re a user of the platform. These particular algorithms are examples of clustering analyses, which essentially group data points together based on their proximity to one another across a number of dimensions. A relatively simple example is the K-Means Clustering approach, and this algorithm has aided many companies in a number of ways, especially with customer segmentation.
Many companies simply don’t know the customer segments they’re serving. (Before performing their own analysis, Netflix didn’t know that their David Fincher fans were the same as their political thriller fans, for example.) Because it’s an unsupervised approach, K-Means Clustering simply shows which customers behave similarly, allowing for natural customer segments to be identified and analyzed.
The algorithm is quite simple and has only a few steps:
- Start with “K” number of clusters and choose arbitrary starting cluster centers
- Assign each data point to its nearest cluster center (or “centroid”)
- Recalculate the cluster centroids
- Repeat steps 2 & 3 until no data points change clusters
While the algorithm is very straightforward, a somewhat more difficult question is determining the appropriate number for “K”. This can be done by referencing what’s called an “elbow chart,” where the number of the clusters is on the horizontal axis and the total distance between each data point and its cluster center is on the vertical axis. Where the chart “bends” — much like an elbow — is what is considered the optimal value for K. In the example to the right, K=3 is the appropriate choice. But as always with data analytics, understanding the business context is key as well. If your company has the budget for just two sales offices, using K=2 clusters would be the optimal number.
Phrases like “unsupervised k-means algorithmic clustering” can sound intimidating for organizations that haven’t significantly invested in data analytics in the past, and frankly, that’s understandable. But the concept is rather simple — your data likely exhibits patterns and phenomena that have gone unnoticed, and data analytics can be a powerful tool at uncovering them.
Segmenting your customers, determining the optimal allocation of your sales resources, and cross-selling multiple products that have similar sales patterns are just some examples of how clustering algorithms can be leveraged as part of a strategy consulting engagement. If you’re interested in how Larx can help your company use analytics to optimize its strategy, contact Allan Mathis.