Seven questions for better algorithms
Leaders need to understand how recommendation systems are shaping customer interactions.
Where leaders might once have found themselves responsible for customer net promoter scores or retention rates, in the digital era they are likely to be working with data scientists and technical specialists on the algorithmic recommendation systems that shape customer experiences – and drive company performance. Whether you’re personalizing your website’s front page, tailoring marketing emails, or adding “You might also like” suggestions to product pages, the ability to understand customer behaviour data is invaluable. Data-driven algorithmic recommendations are powerful tools to help you increase customer engagement, sales, and other important metrics.
Basic recommendation systems are relatively straightforward to build: Google’s free recommendation systems course takes a mere four hours to complete (given prerequisite technical skills). Any team with enough customer data and a technically-oriented person can spin up a basic recommendation system. Yet ensuring that recommendation systems are used effectively and align with the business’s aims is less straightforward. For managers, the lure of machine learning and big data is becoming irresistible, but dangers accompany any siren’s song. There are seven crucial discussion questions for managers and data scientists to explore to deliver the best possible algorithmic recommendations to meet your business goals.
1 How are we quantifying users’ interactions with items?
Classically, recommendation systems were trained on explicit rating data: for example, Maria gave her favourite movie five stars. Now, firms are increasingly using implicit data, such as click counts and watch or visit time. The choice of how to quantify interactions should reflect what you value: if you don’t want to promote clickbait, for instance, don’t use click counts. Once you decide how to measure positive interactions, ask the follow-up question: what recommendation system model is best-suited to your data?
Good models of implicit data will try to capture the confidence you have in your observations: when a user interacts with an item, you can be pretty confident they like it. But most models also need examples of items that users don’t like, and there are lots of reasons a user might not interact with an item.
If we don’t have data on items a user doesn’t like (see Question 2, below), we can randomly pick items that they haven’t interacted with (yet) and say the user doesn’t like them, with a large amount of uncertainty.
2 When a user is recommended something but does not engage with it, are we saving that information?
If you can afford the storage (hint: you probably can), you should save these ‘negative impressions.’ It is shocking how often this data is discarded. At the very least, you can use it to add a simple check to prevent irrelevant material from being recommended over and over again. At best, you can understand where your recommendations haven’t worked, and develop more nuanced representations of customers’ preferences.
3 What evaluation metrics are we using?
Before any machine learning solution is deployed at scale, it needs to be tested. Quality evaluation requires a lot of thought. The key is to focus on what matters to your customers.
For example, companies often focus on ‘value prediction’, such as predicting whether the customer will give a product 5 stars – but customers probably don’t care if you predict that they’ll like something at a 4.5 level or 4.6. (A common metric is root-mean-square error, or RMSE.) Customers typically care more about whether or not they’ll like something relative to your other offerings. Ranking-based metrics (such as normalized discounted cumulative gain, or NDCG) are designed to capture this. For more on evaluation metrics, I highly recommend How to Measure Anything by Douglas Hubbard and Trustworthy Online Controlled Experiments by Ron Kohavi.
4 How are we treating new (or ‘cold-start’) users and items?
Basic recommendation systems cannot recommend items to users who were not in the training data (that is, the data used to train and calibrate the system). The same goes for new items, so by default, they won’t be recommended. In practice, firms typically have backup methods for recommending content until there have been enough user engagements for it to be integrated into the full personalized system. New users may see recommendations for popular items, for instance, or personalized features may be hidden temporarily. One alternative is to ask new users questions to get them started: YouTube Music asks new users to select artists they like, for instance.
5 How frequently are we updating the model?
The current popular wisdom is to update your recommendation model with new data as often as possible, while ensuring some degree of stability in the recommendations. ‘Warm-starting’ models – initializing a new model with a previous model – is a standard solution. However, research on feedback loops in recommendation systems has shown the importance of updating your models and their settings frequently.
6 How are we treating old data?
What you should do depends on how much data you have and the nature of the content you’re recommending. If you are recommending physical or ‘durable digital’ goods, such as music or movies, then you probably want to use all available data. If you expect your customers’ preferences to evolve, or you want to recommend ‘digital non-durables’, such as social media posts, prioritize recent data.
7 What other constraints do we consider in generating the final recommendations?
Nobody wants their top movie recommendations to be exclusively Star Wars films (and I say that as a fan). Instead of just sending out the top-ranked content to customers, engineers can re-rank recommended content by a variety of concepts, including novelty, diversity and fairness.
Leaders may not be accountants, but they are expected to understand how their companies make money, and to help their teams create value. In the same way, data science is a new and important language for leaders to understand. In determining how a company’s recommendation algorithms play out, a manager’s role is to ensure that the discretionary choices being made are reflective of the company’s goals.
Leaders have a responsibility to ensure that they are having the right conversations to maximize the alignment of system design, technical insight and business outcomes. When done well, your platform can provide the cutting edge of user experience.
Allison J B Chaney is an assistant professor at Duke University’s Fuqua School of Business.