In 1884, the Victorian mathematician and schoolmaster Edwin Abbott published his classic “Flatland: A Romance of Many Dimensions.” It’s the only novel I’ve ever read that takes place in an entirely two-dimensional reality.
Humans are of course three-dimensional beings, but the characters of “Flatland” are instead two-dimensional beings defined by their geometrical shapes: lines, squares, polygons etc.
These talking geometrical figures live their whole lives on a plane, and thus have no understanding of what three-dimensional reality is like. The main character, a square, eventually travels to an even more limited reality: the one-dimensional world of Lineland:
Source: Edwin Abbott, “Flatland” (London: Seeley and Co, 1884)
Why am I bringing up a Victorian novel about flat characters in an article about customer segmentation? The answer is that basic methods for customer segmentation reduce your customers to something more like the two-dimensional characters in Flatland than multidimensional people.
For example, if you’re using pivot tables in Excel to derive segments from customer data, you may literally view your customers as 2D people. This is because pivot tables are primarily useful in visualizing combined trends using two dimensions of your data; one in the row label and one in the column label:
(Used with permission from Microsoft)
Yes, you can add many more dimensions to the column and row label fields in the pivot table builder. However, when you do this, your pivot table quickly becomes illegible, and you lose sight of patterns across multiple dimensions.
Escape From Flatland With Clustering
Advanced marketers are now solving this limitation of basic customer segmentation methods with clustering algorithms powered by machine learning. These algorithms are vastly expanding the number of dimensions that marketers can analyze in customer data.
One consequence of this evolution is that the customer personae derived through segmentation analysis are finally evolving beyond their two-dimensional existence in the Flatland of Excel pivot tables.
In this article, we’ll take a look at the ways clustering goes beyond more basic segmentation approaches. We’ll also detail advanced approaches to implement customer segmentation clustering in your marketing technology stack.
Here’s what we’ll cover:
The Limits of Basic Segmentation Approaches With Excel
This is the most important difference between basic segmentation and segmentation using clustering:
- In basic segmentation, you already know which dimensions you want to segment on, whereas,
- In segmentation with clustering, you want the algorithm to tell you which dimensions in your data are more important than others.
In my personal experience, I’ve found deriving segments from our highly structured customer relationship management (CRM) data to be fairly easy using pivot tables. This is because dimensions in our CRM data are pre-standardized to ease segmentation analysis.
But what about less structured data sets?
While we have good methodologies in place for understanding customers, understanding the behavior of visitors to our website is much more complex.
In the realm of web analytics, we lack the usual standardized dimensions for business size, number of employees etc. Instead, we have to develop segments that factor in dozens of KPIs related to bounce rate, click through and conversion rates, time on page etc. We haven’t been able to do this with pivot tables.
Site visitors thus remain much more of an unknown “X” for us than actual customers, because we don’t always know in advance which KPIs are the most important.
Clustering comes into play in such situations as this then, when:
- You don’t want to assume at the outset which dimensions are most important
- You have too many dimensions to build simple, 2D segments, e.g., “manufacturers with < 250 employees” or “single mothers with > $100,000 in annual income.”
How Clustering for Customer Segmentation Works
So what is clustering? Robert Moreira, senior product manager for machine learning, deep learning and cognitive computing at SAS (a leading developer of solutions for business analytics), explains:
“Clustering algorithms use mathematical techniques to separate data points using a distance-based approach (looking at the distance within the data space between the two points) or a density-based approach (looking at how densely packed the data points are within the data space).”
In other words, clustering algorithms can segment, or cluster, your data for you. As Moreira explains, the algorithms output “groups that are homogeneous within themselves but very different from each other.”
This may be difficult to understand in the abstract, so let’s take a look at the following screenshot of SAS Visual Analytics and you’ll quickly see how it differs from pivot tables:
Labels such as “inferred approx age,” “inferred education” and “inferred employment” are some of the dimensions being used in these clusters—but as you can see, the elliptical clusters actually cut across a number of other dimensions in the background.
In actuality, these ellipses are three-dimensional: They cut across a number of different dimensions in order to group together the most similar data records according to the algorithm’s statistical assessment of the data.
You simply input the number of clusters you want, and the algorithms automatically group your data into that number of clusters using the two rules Moreira reviewed:
- Data records within the same cluster must be as similar as possible.
- Data records within different clusters must be as different from each other as possible.
With clustering, the problem no longer lies in making sure you’re segmenting on the best dimensions—the algorithms figure this part out. Instead, the problem becomes determining the right number of clusters for your dimensions.
Solving for K Is the Magic of Clustering
The k-means algorithm essentially transforms your business question, i.e., “which dimensions are most valuable in understanding problems?” into a statistical question, i.e., “what is the optimal number of clusters for segmenting my data?”
Example Algorithms for Determining K
This statistical question is tough to solve, but thankfully, BI vendors offer tools to help.
Ilknur Kabul, senior manager of the machine learning algorithms group at SAS, explains that “At SAS, we have two methods for solving this problem: cubic clustering criterion (CCC), a method that estimates the number of clusters for you, and aligned box criterion (ABC), which leverages parallel computing to evaluate the reference distribution based on the input data to estimate the number of clusters.”
(Essentially, the reference distribution refers to how the data could hypothetically be distributed along the dimensional axes with the fewest possible assumptions.)
As far as the ABC method goes, Kabul explains that essentially, “we can estimate how many clusters you have in a data set and how confident we are in these clusters by looking at the reference distribution—it’s really based on your input data set.”
Thankfully, you don’t need to fully understand these methods, which will vary to some extent based on which tool you’re using for analysis.
By utilizing statistical methods such as CCC and ABC, you can now determine the optimal number for “K” in your k-means algorithm.
Moreira adds the important caveat that “the end goal is to have interpretable groups.”
If you make K too large, you’ll end up with too many clusters to use. Additionally, if you add too many dimensions when clustering, you’ll end up with segments that are too complex to understand.
Putting Clusters to Work
Traditionally, customer segmentation has been a retrospective task in marketing—you collect a few months’ worth of customer data, segment it and then decide what to do down the road.
This approach to customer segmentation is thus primarily strategic: You’re not making your segments work as part of the actual technologies you use to interact with customers.
Clustering, however, is powered by machine learning: a technology in which algorithms are iteratively fitted to data sets over time. This makes clusters much more flexible than traditional segments for responding to customers in real time.
Moreira explains that in this approach, clustering is combined with a technology known as a “business rules engine,” which recommends the appropriate action to take based on the data. (E.g., recommending a product to capitalize on a cross-sell opportunity, discounting a product to keep the customer from abandoning an order etc.)
Using machine learning-driven clustering, Moreira explains that “We can create a model that gives you a ‘propensity to buy’ score.” In other words, you group customers into clusters based on how likely it is that they’ll buy a given product.
Moreira continues that “from that model, we create what’s called ‘score code.’ In essence, this score code is just SQL code that serves to filter the data down in order to determine who a site visitor/mobile app user/etc. Is. For instance, if a visitor’s age, income and zip code fall into predefined ranges, the model will output a ‘propensity to buy’ score of 50 percent.”
What this means is that, if you can combine demographic data from your CRM system or your website with customers’ behaviors on your website, you can now make the best offer to each customer in real time based on her or his likelihood to buy.
What’s more, the machine learning algorithms continue to respond to the customer’s unique behaviors over time, rather than just assigning the customer to a cluster and calling it a day: “Once we’ve preprocessed these scores, we can apply real-time decision changes as the customers navigate the website if they don’t behave in accordance with the buckets they’ve been put into.”
Real-time optimization of customer experience through clustering will be a game changer across digital businesses. Early adopters stand to win big, so moving from pivot tables to a BI product that supports clustering is worth the hassle of dealing with the learning curve.
Beyond this advanced use case, we can also see the immense payoff in applying clustering to data from your web analytics platform to solve the problem I mentioned earlier, i.e., developing a segmentation strategy for optimizing your website for the needs of different visitors.
And finally, clustering is one of the best techniques currently available to marketers for understanding your customers as multidimensional beings rather than Flatlanders.
It’s important to note that Excel add-ins such as the Cluster Wizard enable clustering. As we’ve seen, however, additional safeguards are needed in order to make sure your clusters are optimized and interpretable, and it’s here where dedicated BI tools for statistical analysis shine.
If you’d like to discuss the challenges and benefits of adopting algorithmic segmentation in your marketing department, you can email me at firstname.lastname@example.org.