Outer: ROAS tools

Background

Outdoor furniture D2C retailer Outer contracted Venice Technologies to analyze predictive analytics around marketing performance, with a particular focus on developing tooling to estimate return on ad spend (ROAS) for the marketing team. Our model directly estimates the expected value from each user based on their actions on the site. Because each user is attributed to the advertising channel that they arrived on the site from, we can aggregate the total expected value from each channel and compare to the spend through that channel to arrive at ROAS.

‍

Objective

Existing work had focused on “propensity” modeling which could identify users’ relative likelihood of completing a purchase, but fails to estimate a revenue value for each user. Therefore this does not complete the feedback loop to compare the marketing costs of acquiring users with the revenue recognized from their acquisition, and is difficult to use for marketing decision making. Our objective was to overhaul and enhance this existing work in order to enable Dollar for Dollar comparisons of advertising spend versus the customers acquired via that spend.

‍

Methodology

To complete this work, we developed a regression model using “extreme gradient boosted random forests”, also known as XGBoost. This model is widely favored in industry because it can provide a reliable framework for cases like this, even in the presence of common data problems such as multicollinearity and overfitting. Here, our work focused on three primary areas:

Designing a model that accounts for a seasonal business with a long sales cycle – Our client sells outdoor furniture, which sells better in seasons with good weather. There is also a long sales cycle. Users may visit many times over a long time before completing a purchase.

To account for seasonality, our model uses data from an entire previous year to generate baseline averages for how users act on the site when they are likely to make a purchase, then the model considers the exact month when the user is active and weights data from the same month in the previous year more heavily.
To account for a long sales cycle, we collect user site activity for an entire week before making any predictions, then we use the model to predict the revenue attributable to that user over the next 90 days. Our research shows that this window accounts for 75% of the total expected revenue from these users, so we can adjust our predictions after the fact to estimate all of the expected revenue.

Model fidelity in the context of extreme class imbalance – only a small percentage of site visitors actually complete a purchase, and we want a model that can do a sufficient job of distinguishing between purchasers and non purchasers

XGBoost modeling used here is powerful because it naturally incorporates the imbalance between purchasers and browsers into its decision process. It can identify the subtle cues that indicate when a user is likely or unlikely to result in a purchase.

Quantifying uncertainty in our estimates – Because our model is naturally probabilistic, we need to understand how likely our estimates are to actualize and how much variation we should expect from the estimated values.

We use a statistical technique called “bootstrapping” to estimate the uncertainty in our measurements. In this technique we use our model to estimate the revenue from past user data. Then, based on the error we observe between our estimates and the purchases actually made combined with the variation in those errors across all of the users considered, we can estimate how good our model is at predicting the true revenue.

‍

Findings

Our XGBoost Regression model is aimed at providing accurate aggregate estimates of expected revenue, based on the cumulative predictions across many site visitors. We find that individual predictions generally have high error because even users who appear very likely to make a purchase have some substantial probability of abandoning at some point in the process. However, we find that in aggregate our model generally performs quite well on out-of-sample data (users the model has never seen before). Some example data, which has been fictionalized to protect confidentiality, is below:

‍

These initial results were very promising, however to get a more complete view of the expected model uncertainty, we used a technique known as “bootstrapping” to estimate the model error across thousands of scenarios. Our results show that over a 90-day period we expect the model to estimate true revenue within a range of about 16% (shown against our second-best model candidate here):

‍

‍

In practice we used the model to estimate the future purchases across many different advertising channels. Because the model uses a week of user behavior before making predictions, and some users will have already completed a purchase during that time, we track the completed and estimated purchases separately:

‍

More importantly, for each ad-spend channel we compare the total estimated and realized purchases against the ad spend through that channel. This results in an estimate of return on ad spend (ROAS), which the marketing team can use to lean into their most effective advertising channels:

‍

‍

Conclusions and Future Work

ROAS is crucial to marketing science and decision making. Although this model is quite useful in its current form, we believe that there are many opportunities for improvement going forward:

The model combines estimates from all users acquired through a marketing channel each week, however we know some users come back week after week before completing a purchase. If they return through an ad channel each time, they may be double counted.
Our model has identified some users complete purchases quickly, and others more slowly. This bimodal distribution of users presents an opportunity to develop more sophisticated modeling to identify which users fall into each category. Each cohort could have unique marketing strategies developed to optimize purchase likelihood.
The feature set considered for this model is quite limited. Additional features such as more nuanced user site actions (scroll depth, time spent on pages, etc), geolocation data, and device information could help to improve model accuracy.