Credit Karma: Money Modeling

Better understanding users and predicting their future behavior

Background

Credit Karma launched a savings account in October 2019, followed by a checking account in October 2020. This opened up a new business line to compliment the credit score and credit referral core business.

Because the money accounts scaled so quickly there was a lot of opportunity to better understand the users, and in particular to predict their future behavior. By developing insights into user behavior, marketing dollars could be optimized, finance projections could be made more accurately, and targeted offers could be created for the most loyal customers.

Predicting User Churn

One of the key metrics for the checking account is active users. Here we focus on 7 day and 28 day active (have made a purchase on the debit card within the last 7 or 28 days. It is simple to analytically look back in time and say what the active user count was historically. However once there are thousands of new users being added every month, while existing users come and go, understanding expected future users becomes very important.

We aimed to predict the probability each currently active user would remain active over the following 7 and 28 days.

Data considered in the model included:

  • User tenure with Credit Karma
  • Age
  • Income
  • Credit score
  • Historical transactions
  • Historical deposits
  • If the user had direct deposit enabled
  • User acquisition channel
  • If the user had a savings account
  • Checking/Savings account balances

This data was fed into a proprietary machine learning algorithm in order to produce predictions about each user's probability of churning in the future. The model reached a high degree of accuracy, being able to predict high churn probabilities at over 10x the rate of low churn probabilities.

Predicting User Transactions

Predicting user churn and total user counts was very useful to the business, but not 100% aligned with the ultimate business goals. A customer who has direct deposit set up, is adding money to the savings account, and using the debit card attached to the checking account for purchases every day is much more valuable than a customer who only makes one purchase every week, however both are considered the same as a 7 and 28 day active customer.

To align our predictions more tightly with the business objectives we therefore also began to predict the future dollar value of customers transactions. The prediction methodology was very similar to the churn model, but the outcome variable modeled was customer spend rather than a binary outcome of if the customer was active at all. This model also achieved a high degree of accuracy, classifying some customers as likely to spend up to 100x as much as others.

With the churn and transaction models combined we were able to put together some very interesting user segments. In particular identifying the users who had a relatively high probability of churning, but very high spend predictions if they stayed using the account. These customers were the most important to understand as a group and attempt to persuade them to stay with the account.

  • High churn, low spend
  • High churn, high spend
  • Low churn, low spend
  • Low churn, high spend

Targeted Incentives / Scenario Forecasting

Once we had our high value, high risk segment identified the next goal was to retain them as efficiently as possible. There were various financial (such as offering the customer a bonus for setting up direct deposit or making additional transactions), and non-financial (such as additional marketing messages) levers that could be pulled to accomplish this goal. However resources were limited and we wanted to be even more targeted with these incentives.

Because the churn and transaction models were already taking past customer incentives and interactions into consideration, we were able to develop a methodology for scenario planning and forecasting the future under different conditions. If a user got a $20 bonus for making a purchase how would that impact the churn/transaction prediction vs. if the customer did not get the bonus? What about if they were given $50 to set up direct deposit, etc.

The difference between a customer's expected churn and transactions under different scenarios is the predicted impact of the intervention. We can then look at the ROI as ((churn reduction * predicted transactions) + (predicted transactions with intervention - predicted transactions without)) / cost of the intervention. We can then rank all possible interventions for all customers and spend our marketing dollars as efficiently as possible by simply going down the list until the marketing budget has been exhausted.

Addendum: Direct Deposit Identification

An unrelated project completed in parallel to the primary churn / transactions / incentives work used data to classify transactions as direct deposits more accurately. Knowing if a customer has actually signed up for payroll direct deposit is very valuable to the business, as these customers are likely to stick around and be active users of the account. 

From a data perspective there is no obvious way to determine if a deposit was manual or from a payroll deposit. All we have is the name of the company who made the deposit. Historically a manual list was maintained of confirmed payroll sources and exclusion lists that were not payroll. This is expensive to maintain as new sources make deposits and old ones change names, there are tens of thousands of deposit sources, a lot for anyone to have to go through and check regularly. 

We developed a methodology to use the data across all consumers to identify likely direct deposits based on deposit cadence. The theory behind this is that payroll is regular, run weekly, bi-weekly or monthly. So if a customer gets deposits on a regular cadence from a single source that source is likely a company direct deposit program. We can then say that source is likely sending direct deposits to other users, even if it is their first deposit and we have not seen the pattern for them yet. 

There is some complexity to this approach as the cadence is not exactly every 7,14, or 30 days due to weekends, holiday, and varying days in each month. By using fuzzy matching this was able to be overcome with good accuracy. 

The end result was a list of high likelihood direct deposit sources (where they nearly always went to customers on a regular cadence) and low likelihood direct deposit sources (no consistency at all to when deposits occur). This list could then be used to screen for possibly incorrectly identified sources in the manual review lists ie a source that is very consistent but listed as not a direct deposit source, or a source with no consistency that is listed as direct deposit. Additionally as new sources begin to make deposits the model can automatically classify as direct deposit or not rather than waiting for manual review.