Financial services companies have long struggled to make inclusive decisions for small businesses and for low- and moderate-income consumers. One key reason: to make accurate predictions of the financial risks associated with those customers’ accounts requires lenders to rely on a wider variety of data than a credit score alone. To accurately assess risk, expanded Fair Credit Reporting Act regulated data is helpful – including rental data, trended data, enhanced public records, alternative financial services data and more. This expanded FCRA data is one key to financial inclusion. Without that data, lenders risk rejecting potentially profitable customers, including so-called credit invisibles and thin file consumers. In fact, The Federal Reserve, along with four important financial services regulators, highlighted the consumer benefits of alternative data in their December 2019 interagency statement. That statement also highlighted the increased importance of managing compliance when firms use alternative data in credit underwriting. With hundreds of data sources available to help with important tasks such as verifying identity, checking credit, and assessing the value of automotive and real-estate collateral, why have some lenders been slow to use the most appropriate data attributes when making credit decisions? One reason is a matter of IT Architecture; another is priorities. Changing a business process to take advantage of new data requirements can be prohibitively lengthy and costly – in terms of both analytical and IT resources. This is especially true for older systems—which were seldom adapted to use Application Programming Interfaces (APIs) supporting modern data structures such as JSON. Furthermore, data access to older systems can require specific types of system connectivity such as VPNs or leased lines. Latency is important in this type of application: some of these tasks have to be done instantly in a digital-first or digital-only lending environment. So is time to market: lenders deploying analytics processes cannot wait for overtaxed IT teams to complete lengthy projects. Lenders’ analytics and IT teams have long known they need to be more agile and efficient, faster to market, and increasingly secure. Their answer, largely, has been a slow but steady migration of their systems to the cloud. A 2019 McKinsey survey revealed that CIOs were modernizing their infrastructures primarily to achieve four goals: agility and time to market, quality and reliability, cost, and security. There are other benefits as well. But if the business case for a cloud strategy was somewhat clear to IT and analytics leaders, it became crystal clear to the rest of the business in 2020. As companies shifted to at-home work using cloud-based collaboration tools, especially videoconferencing services, most companies conquered what was perhaps the final barrier to entry—the fear that the issues of data privacy and security were somehow more insurmountable with virtual machines, containers, and microservices than with on-premise infrastructure. Last quarter, the leading cloud providers Amazon Web Services, Google Cloud Platform, and Microsoft Azure reported incredible annual revenue growth: 29%, 45%, and 48% respectively. COVID-19 has proven to be the catalyst that greatly sped up the transition to cloud technologies. The jump to the cloud means that lenders are suddenly more capable than ever at making analytically sound – and therefore more financially inclusive decisions. The key to analytical decision-making is to use the right data and to make the most appropriate calculations (called attributes) as part of a business strategy or a mathematical model. With Experian programs such as Attribute Toolbox now available in the cloud, calculating those all-important attributes is as simple for the IT department as coding an API call. Lenders will soon be able just as easily to retrieve and process raw data from over 100 data sources, to recognize their native formats and to extract the desired information quickly enough for real-time and batch decisioning. The pandemic has brought economic distress to millions of Americans—it is unlike anything in our lifetimes. The growth of cloud computing promises to enable these consumers to obtain additional products as well as more favorable pricing and terms. It’s ironic that COVID has accelerated the adoption of the very technologies that will expand access to credit for many people who cannot currently access it from mainstream financial firms. To learn more about our Attribute Toolbox, click here. Learn More
This is the fourth in a series of blog posts highlighting optimization, artificial intelligence, predictive analytics, and decisioning for lending operations in times of extreme uncertainty. The first post dealt with optimization under uncertainty, the second with predicting consumer payment behavior, and the third with validating consumer credit scores. This post describes some specific Experian solutions that are especially timely for lenders strategizing their response to the COVID Recession. Will the US economy recover from the pandemic recession? Certainly yes. When will the economy recover? There is a lot more uncertainty around that question. Many people are encouraged by positive indicators, such as the initial rebound of the stock market, a return of many of the jobs lost at the beginning of the pandemic, and a significant increase in housing starts. August’s retail spending and homebuilder confidence are very encouraging economic indicators. Other experts doubt that the “V-shaped” recovery can survive flare-ups of the virus in various parts of the US and the world, and are calling for a “W-shaped” recovery. Employment indicators are alarming: many people remain out of work, some job losses are permanent, and there are more initial jobless claims each week now than at the height of the Great Recession. Serious hurdles to economic recovery may remain until a vaccine is widely available: childcare, urban transportation, and global trade, for example. I’m encouraged by the resilience of many of our country’s consumer lenders. They are generally responding well to these challenges. If past recessions are a guide, some lenders will not survive these turbulent times. This time, many lenders—whether or not they have already adopted the CECL accounting standards—have been increasing allowances for their anticipated credit losses. At least one rating agency believes major banks are prepared to absorb those losses from earnings. The lenders who are most prepared for the eventual recovery will be those that make good decisions during these volatile times and take action to put themselves in the best position in anticipation of the recovery that will certainly follow. The best lenders are making smart investments now to be prepared to capitalize on future opportunities. Experian’s analytics and consulting experts are continuously improving our suite of solutions that help consumer lenders and others assess consumer behavior and respond quickly to the rapidly fluctuating market conditions as well as changing regulations and credit reporting practices. Our newly announced Economic Response and Recovery Suite includes the ABCD’s that lenders need to be resilient and competitive now and to prepare to thrive during the eventual recovery: A – Analytics. As I’ve written about in prior blog posts, data is a prerequisite to making good business decisions, but data alone is not enough. To make wise, insightful decisions, lenders need to use the most appropriate analytical techniques, whether that means more meaningful attributes, more predictive and compliant credit scores, more accurate and defensible loss forecasting solutions, or optimization systems that help develop strategies in a world where budgets, regulations, and other constraints are changing. For example, Experian has released a set of Spotlight 2020 Attributes that help consumer lenders create a positive experience for customers who have received an accommodation during the pandemic. In many cases motivated by the new race to improve customer experience online, and in other cases as a reaction to new and creative fraud schemes, some clients are using this period as an opportunity to explore or deploy ethical and explainable Artificial Intelligence. B – Business Intelligence. Credit bureaus like Experian are uniquely situated to understand the impact of the COVID recession on America’s consumers. With impact reports, dashboards, and custom business intelligence solutions, lenders are working during the recession to gain an even better understanding of their current and prospective customers. We’re helping many of them to proactively help consumers when they need it most. For example, lenders have turned to us to understand their customer’s payment hierarchy—which bills they pay first when times are tough. Our free COVID-19 US Business Risk Index helps make lending options available to the businesses who need them most. And we’ve armed lenders with recommendations for which of our pre-existing attributes and scores are most helpful during trying times. Additional reporting tools such as the Auto Market Tracker, Ascend Market Insights Dashboard, and the weekly economic update video provide businesses with information on new market trends—information that helps them respond during the recession and promises to help them grow during the eventual recovery. C – Consulting. It’s good to turn data into information and information into insight, but how do these lenders incorporate these insights in their business strategies? Lenders and other businesses have been turning to Experian’s analytics and Advisory services consultants to unlock the information hidden in credit and other data sources—finding ways to make their business processes more efficient and more effective while developing quick response plans and more long-term recovery strategies. D – Delivery. Decision science is the practice of using advanced analytics, artificial intelligence, and other techniques to determine the best decision based on available data and resources. But putting those decisions into action can be a challenge. (Organizations like IBM and Gartner estimate that a great majority of data science projects are never put into production.) Experian technologies—from our analytics platform to our attribute integration and decision management solutions ensure that data-driven decisions can be quickly implemented to make a real difference. Treating each customer optimally has a number of benefits—whether you are trying to responsibly grow your portfolio, reduce credit losses and allowances, control servicing costs, or simply staying in compliance during dynamic times. In the age of COVID, IT departments have placed increased priority on agility, security, customer experience, and cost control, and appreciate cloud-first approach to deploying analytics. It’s too early to know how long this period of extreme uncertainty will last. But one thing is certain: it will come to an end, and the economy will recover someday. I predict that many of the companies that make the best use of data now will be the ones who do the best during the recovery. To hear more ways your organization can navigate this downturn and the recovery to follow, please watch our on-demand webinar and check out our Economic Response and Recovery Suite. Watch the Webinar
This is the third in a series of blog posts highlighting optimization, artificial intelligence, predictive analytics, and decisioning for lending operations in times of extreme uncertainty. The first post dealt with optimization under uncertainty and the second with predicting consumer payment behavior. In this post I will discuss how well credit scores will work for consumer lenders during and after the COVID-19 crisis and offer some recommendations for what lenders can be doing to measure and manage that model risk in a time like this. Perhaps no analytics innovation has created opportunity for more individuals than the credit score has. The first commercially available credit score was developed by MDS (now part of Experian) in 1987. Soon afterwards FICO® popularized the use of scores that evaluate the risk that a consumer would default on a loan. Prior to that, lending decisions were made by loan officers largely on the basis on their personal familiarity with credit applicants. Using data and analytics to assess risk not only created economic opportunity for millions of borrowers, but it also greatly improved the financial soundness of lending institutions worldwide. Predictive models such as credit scores have become the most critical tools for consumer lending businesses. They determine, among other things, who gets a loan and at what price and how an account such as a credit line is managed through its life cycle. Predictive models are in many cases critical for calculating loan and loss reserves, for stress testing, and for complying with accounting standards. Nearly all lenders rely on generic scores such as the FICO score and VantageScore®. Most larger companies also have a portfolio of custom scorecards that better predict particular aspects of payment behavior for the customers of interest. So how well are these scorecards likely to perform during and after the current pandemic? The models need to predict consumer credit risk even as: Nearly all consumers change their behaviors in response to the health crisis, Millions of people—in America and internationally—find their income suddenly reduced, and Consumers receive large numbers of accommodations from creditors, who have in turn temporarily changed some of their credit reporting practices in response to guidelines in the federal CARES Act. In an earlier post, I pointed out that there is good reason to believe that credit scores will tend to continue to rank order consumers from most likely to least likely to repay their debts even as we move from the longest economic expansion in history to a period of unforeseen and unexpected challenges. But the interpretation of the score (for example, the log odds or the bad rate) may need to be adjusted. Furthermore, that assumes that the model was working well on a lender’s population before this crisis started. If it has been a long time since a scorecard was validated, that assumption needs to be questioned. Because experts are considering several different scenarios regarding both the immediate and long-term economic impacts of COVID-19, it’s important to have a plan for ongoing monitoring as long as necessary. Some lenders have strong Model Risk Management (MRM) teams complying with requirements from the Federal Reserve, Federal Deposit Insurance Corporation (FDIC), the Office of the Comptroller of the Currency (OCC). Those resources are now stretched thin. Other institutions, with fewer resources for MRM, are now discovering gaps in their model inventories as they implement operational changes. In either case, now’s the time to reassess how well scorecards are working. Good model validation practices are especially critical now if lenders are to continue to make the sound data-driven decisions that promote fairness for consumers and financial soundness for the institution. If you’re a credit risk manager responsible for the generic or custom models driving your lending, servicing, or capital allocation policies, there are several things you can do--starting now--to be sure that your organization can continue to make fair and sound lending decisions throughout this volatile period: Assess your model inventory. Do you have good documentation showing when each of the models in your organization was built? When was it last validated? Assign a level of criticality to each model in use. Starting with your most critical models, perform a baseline validation to determine how the model was performing prior to the global health crisis. It may be prudent to conduct not only your routine validation (verifying that the model was continuing to perform at the beginning of the period) but also a baseline validation with a shortened performance window (such as 6-12 months). That baseline validation will be useful if the downturn becomes a protracted one—in which case your scorecard models should be validated more frequently than usual. A shorter outcome window will allow a timelier assessment of the relationship between the score and the bad rate—which will help you update your lending and servicing policies to prevent losses. Determine if any of your scorecards had deteriorated even before the global pandemic. Consider recalibrating or rebuilding those scorecards. (Use metrics such as the Population Stability Index, the K-S statistic and the Gini Coefficient to help with that decision.) Many lenders chose not to prioritize rebuilding their behavioral scorecards for account management or collections during the longest period of economic growth in memory. Those models may soon be among the most critical models in your organization as you work to maintain the trust of your accountholders while also maintaining your institution’s financial soundness. Once the CARES accommodation period has expired, it will be important to revalidate your models more frequently than in the past—for as long as it takes until consumer behavior normalizes and the economy finds its footing. When you find it appropriate to rebuild a scorecard model, consider whether now is the time to implement ethical and explainable AI. Some of our clients are finding that Machine Learned models are more predictive than traditional scorecards. Early Experian research using data from the last recession indicates this will continue to be true for the foreseeable future. Furthermore, Experian has invested in Research and Development to help these clients deliver FCRA-compliant Adverse Action reasons to their consumers and to make the models explainable and transparent for model risk governance and compliance purposes. The sudden economic volatility that has resulted from this global health crisis has been a shock to all organizations. It is important for lenders to take the pulse of their predictive models now and throughout the downturn. They are especially critical tools for making sound data-driven business decisions until the economy is less volatile. Experian is committed to helping your organization during times of uncertainty. For more resources, visit our Look Ahead 2020 Hub. Learn more
This is the second in a series of blog posts highlighting optimization, artificial intelligence, predictive analytics, and decisioning for lending operations in times of extreme uncertainty. The first post dealt with optimization under uncertainty. The word \"unprecedented\" gets thrown around pretty carelessly these days. When I hear that word, I think fondly of my high school history teacher. Mr. Fuller had a sign on his wall quoting the philosopher-poet George Santayana: \"Those who cannot remember the past are condemned to repeat it.\" Some of us thought it meant we had to memorize as many facts as possible so we wouldn\'t have to go to summer school. The COVID-19 crisis--with not only health consequences but also accompanying economic and financial impacts--certainly breaks with all precedents. The bankers and other businesspeople I\'ve been listening to are rightly worried that This Time is Different. While I\'m sure there are history teachers who can name the last time a global disaster led to a widescale humanitarian crisis and an economic and financial downturn, I\'m even more sure times have changed a lot since then. But there are plenty of recent precedents to guide business leaders and other policymakers through this crisis. Hurricanes Katrina and Sandy impacted large regions of the United States, with terrible human consequences followed by financial ones. Dozens of local disasters—floods, landslides, earthquakes—devastated smaller numbers of people in equally profound ways. The Great Recession, starting in 2008, put millions of Americans and others around the world out of work. Each of those disasters, like this one, broke with all precedents in various ways. Each of those events was in many ways a dress rehearsal, as bankers and other lenders learned to provide assistance to distressed businesses and consumers, while simultaneously planning for the inevitable changes to their balance sheets and income statements. Of course, the way we remember the past has changed. Just as most of us no longer memorize dates--we search for them on the web--businesspeople turn to their databases and use analytics to understand history. I\'ve been following closely as the data engineers and data scientists here at Experian have worked on perhaps their most important problem ever. Using Experian\'s Ascend Analytical Sandbox--named last year as the Best Overall Analytics Platform, they combed through over eighteen years of anonymized historical data covering every credit report in the United States. They asked--using historical experience, wisdom, time-consuming analytics, a little artificial intelligence, and a lot of hard work--whether predicting credit performance during and after a crisis is possible. They even considered scenarios regarding what happens as creditors change the way they report consumer delinquencies to the credit bureaus. After weeks of sleepless nights, they wrote down their conclusions. I\'ve read their analysis carefully and I’m pleased to report that it says…Drumroll, please…Yes, but. Yes, it\'s possible to predict consumer behavior after a disaster. But not in precisely the same way those predictions are made during a period of economic growth. For a credit risk manager to review a lending portfolio and to predict its credit losses after a crisis requires looking at more data--and looking at it a little differently--than during other periods. Yes, after each disaster, credit scores like FICO® and VantageScore® continued to rank consumers from most likely to least likely to repay debts. But the interpretation of the score changes. Technically speaking, there is a substantial shift in the odds ratio that is particularly pronounced when a score is applied to subprime consumers. To predict borrower behavior more accurately, our scientists found that it helps to look at ten additional categories of data attributes and a few additional types of mathematical models. Yes, there are attributes on the credit report that help lenders identify consumer distress, willingness, and ability to pay. But, the data engineers identified that during times like these it is especially helpful to look beyond a single point in time; trends in a consumer\'s payment history help understand whether that customer is changing their typical behavior. Yes, the data reported to the credit bureaus is predictive, especially over time. But when expanded FCRA data is available beyond what is traditionally reported to a bureau, that data further improves predictions. All told, the data engineers found over 140 data attributes that can help lenders and others better manage their portfolio risk, understand consumer behavior, appreciate how the market is changing, and choose their next best action. The list of attributes might be indispensable to a credit data specialist whose institution needs to weather the coming storm. Because Experian knows how important it is to learn from historical precedents, we\'re sharing the list at no charge with qualified risk managers. To get the latest Experian data and insights or to request the Crisis Response Attributes recommendation, visit our Look Ahead 2020 page. Learn more
This is the first to a series of blog posts highlighting optimization, artificial intelligence, predictive analytics, and decisioning for lending operations in times of extreme uncertainty. Like all businesses, lenders are facing tremendous change and uncertainty in the face of the COVID-19 crisis. While focusing first on how to keep their employees and customers safe during the new normal, they are asking how to make data-driven decisions in this new environment. It’s only natural that business people are skeptical about whether analytics will work in a situation like today\'s – in which the data deviate from all historical precedents. Certainly, nobody predicted, for example, that the number of loans with forbearance requests would increase by over 1000% during each two-week period in March. Can anyone possibly make an optimized decision when things are changing so quickly and when so many things are unknown? Prescriptive analytics – also known as mathematical optimization – is the practice of developing a business strategy to achieve a business objective subject to capacity and other constraints, often using a demand forecast. For example, banks use optimization software to develop marketing and debt management strategies to run their lending operations. But what happens when the demand forecast might be wrong, when the constraints change quickly, and when decision-makers cannot agree on a single objective? The reality is that decisionmakers have to balance multiple competing objectives related to many different stakeholders. And, especially during the COVID-19 crisis and the period of change that will certainly follow, they have to do so in the face of uncertainty. Let\'s discuss some of the methods that analysts use to control risk while optimizing lending practices during times like these. These techniques, collectively known as robust optimization and robust statistics, help lenders and other business people deal with the uncomfortable reality that we do not know what the future holds. Consider a hypothetical bank or other lender servicing a portfolio of consumer loans and forecasting its loss performance in this environment. Management probably has several competing objectives: they want to improve service levels on their digital channel, they want to minimize credit and fraud losses, they\'re facing a reduced operating budget, and they\'re not certain how many employees they will have and which vendors will be able to provide adequate service levels. Furthermore, they anticipate new and unpredicted changes, and they need to be able to update their strategies quickly. The mathematics can be quite technical, but Experian’s Marketswitch Optimization is user-friendly software to help businesspeople--not engineers--design and deploy optimal strategies for practices such as Account Management and Loan Originations while facing such a dynamic and uncertain environment. The bank\'s business analysts (not computer specialists or mathematicians) will use techniques such as these: With Sensitivity Analysis, the analysts will explore the performance of their optimized Account Management, Collections, and Loan Originations strategies while considering possible changes in input variables. Optimization Scenarios with Uncertainty (technically known as Stochastic Optimization) allow the managers and analysts to design operational strategies that control risk, particularly the bank’s exposure to probabilistic and worst-case scenarios. Using Scenario Performance Analysis, the lender\'s team will validate and test their optimization scenarios against a variety of different data sets to understand how their strategies would perform in each case. Model Quality Evaluation techniques help the credit risk managers compare model predictions against actual performance during a quickly changing economy. Model impact analysis (related to Model Risk Management) helps senior leadership assess when it is time to invest in improving its statistical models. Robust Model Calibration Analysis removes unjustifiable variations in the lender\'s predictive models to make their predictions more valid as things change over time. These six advanced analytics techniques are especially helpful when developing business strategies for a time in which some values are unknown—including future unemployment levels, staffing budgets, data reporting practices, interest rates, and customer demands. Business decisions can—and arguably must—be optimized during times of uncertainty. But during times like these, it is especially important that the analysts understand how and why to account for the uncertainty in both the data and the models. Lenders, are you optimizing your servicing and debt management strategies? It has never been more important than now to do so--using the advanced techniques available to manage uncertainty mathematically. Learn more about how Marketswitch can help you solve complex business problems and meet organizational objectives. Learn more
If you’re a credit risk manager or a data scientist responsible for modeling consumer credit risk at a lender, a fintech, a telecommunications company or even a utility company you’re certainly exploring how machine learning (ML) will make you even more successful with predictive analytics. You know your competition is looking beyond the algorithms that have long been used to predict consumer payment behavior: algorithms with names like regression, decision trees and cluster analysis. Perhaps you’re experimenting with or even building a few models with artificial intelligence (AI) algorithms that may be less familiar to your business: neural networks, support vector machines, gradient boosting machines or random forests. One recent survey found that 25 percent of financial services companies are ahead of the industry; they’re already implementing or scaling up adoption of advanced analytics and ML. My alma mater, the Virginia Cavaliers, recently won the 2019 NCAA national championship in nail-biting overtime. With the utmost respect to Coach Tony Bennett, this victory got me thinking more about John Wooden, perhaps the greatest college coach ever. In his book Coach Wooden and Me, Kareem Abdul-Jabbar recalled starting at UCLA in 1965 with what was probably the greatest freshman team in the history of basketball. What was their new coach’s secret as he transformed UCLA into the best college basketball program in the country? I can only imagine their surprise at the first practice when the coach told them, “Today we are going to learn how to put on our sneakers and socks correctly. … Wrinkles cause blisters. Blisters force players to sit on the sideline. And players sitting on the sideline lose games.” What’s that got to do with machine learning? Simply put, the financial services companies ready to move beyond the exploration stage with AI are those that have mastered the tasks that come before and after modeling with the new algorithms. Any ML library — whether it’s TensorFlow, PyTorch, extreme gradient boosting or your company’s in-house library — simply enables a computer to spot patterns in training data that can be generalized for new customers. To win in the ML game, the team and the process are more important than the algorithm. If you’ve assembled the wrong stakeholders, if your project is poorly defined or if you’ve got the wrong training data, you may as well be sitting on the sideline. Consider these important best practices before modeling: Careful project planning is a prerequisite — Assemble all the key project stakeholders, and insist they reach a consensus on specific and measurable project objectives. When during the project life cycle will the model be used? A wealth of new data sources are available. Which data sources and attributes are appropriate candidates for use in the modeling project? Does the final model need to be explainable, or is a black box good enough? If the model will be used to make real-time decisions, what data will be available at runtime? Good ML consultants (like those at Experian) use their experience to help their clients carefully define the model development parameters. Data collection and data preparation are incredibly important — Explore the data to determine not only how important and appropriate each candidate attribute is for your project, but also how you’ll handle missing or corrupt data during training and implementation. Carefully select the training and validation data samples and the performance definition. Any biases in the training data will be reflected in the patterns the algorithm learns and therefore in your future business decisions. When ML is used to build a credit scoring model for loan originations, a common source of bias is the difference between the application population and the population of booked accounts. ML experts from outside the credit risk industry may need to work with specialists to appreciate the variety of reject inference techniques available. Segmentation analysis — In most cases, more than one ML model needs to be built, because different segments of your population perform differently. The segmentation needs to be done in a way that makes sense — both statistically and from a business perspective. Intriguingly, some credit modeling experts have had success using an AI library to inform segmentation and then a more tried-and-true method, such as regression, to develop the actual models. During modeling: With a good plan and well-designed data sets, the modeling project has a very good chance of succeeding. But no automated tool can make the tough decisions that can make or break whether the model is suitable for use in your business — such as trade-offs between the ML model’s accuracy and its simplicity and transparency. Engaged leadership is important. After modeling: Model validation — Your project team should be sure the analysts and consultants appreciate and mitigate the risk of over fitting the model parameters to the training data set. Validate that any ML model is stable. Test it with samples from a different group of customers — preferably a different time period from which the training sample was taken. Documentation — AI models can have important impacts on people’s lives. In our industry, they determine whether someone gets a loan, a credit line increase or an unpleasant loss mitigation experience. Good model governance practice insists that a lender won’t make decisions based on an unexplained black box. In a globally transparent model, good documentation thoroughly explains the data sources and attributes and how the model considers those inputs. With a locally transparent model, you can further explain how a decision is reached for any specific individual — for example, by providing FCRA-compliant adverse action reasons. Model implementation — Plan ahead. How will your ML model be put into production? Will it be recoded into a new computer language, or can it be imported into one of your systems using a format such as the Predictive Model Markup Language (PMML)? How will you test that it works as designed? Post-implementation — Just as with an old-fashioned regression model, it’s important to monitor both the usage and the performance of the ML model. Your governance team should check periodically that the model is being used as it was intended. Audit the model periodically to know whether changing internal and external factors — which might range from a change in data definition to a new customer population to a shift in the economic environment — might impact the model’s strength and predictive power. Coach Wooden used to say, “It isn’t what you do. It’s how you do it.” Just like his players, the most successful ML practitioners understand that a process based on best practices is as important as the “game” itself.
In 2011, data scientists and credit risk managers finally found an appropriate analogy to explain what we do for a living. “You know Moneyball? What Paul DePodesta and Billy Beane did for the Oakland A’s, I do for XYZ Bank.” You probably remember the story: Oakland had to squeeze the most value out of its limited budget for hiring free agents, so it used analytics — the new baseball “sabermetrics” created by Bill James — to make data-driven decisions that were counterintuitive to the experienced scouts. Michael Lewis told the story in a book that was an incredible bestseller and led to a hit movie. The year after the movie was made, Harvard Business Review declared that data science was “the sexiest job of the 21st century.” Coincidence? The importance of data Moneyball emphasized the recognition, through sabermetrics, that certain players’ abilities had been undervalued. In Travis Sawchik’s bestseller Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak, he notes that the analysis would not have been possible without the data. Early visionaries, including John Dewan, began collecting baseball data at games all over the country in a volunteer program called Project Scoresheet. Eventually they were collecting a million data points per season. In a similar fashion, credit data pioneers, such as TRW’s Simon Ramo, began systematically compiling basic credit information into credit files in the 1960s. Recognizing that data quality is the key to insights and decision-making and responding to the demand for objective data, Dewan formed two companies — Sports Team Analysis and Tracking Systems (STATS) and Baseball Info Solutions (BIS). It seems quaint now, but those companies collected and cleaned data using a small army of video scouts with stopwatches. Now data is collected in real time using systems from Pitch F/X and the radar tracking system Statcast to provide insights that were never possible before. It’s hard to find a news article about Game 1 of this year’s World Series that doesn’t discuss the launch angle or exit velocity of Eduardo Núñez’s home run, but just a couple of years ago, neither statistic was even measured. Teams use proprietary biometric data to keep players healthy for games. Even neurological monitoring promises to provide new insights and may lead to changes in the game. Similarly, lenders are finding that so-called “nontraditional data” can open up credit to consumers who might have been unable to borrow money in the past. This includes nontraditional Fair Credit Reporting Act (FCRA)–compliant data on recurring payments such as rent and utilities, checking and savings transactions, and payments to alternative lenders like payday and short-term loans. Newer fintech lenders are innovating constantly — using permissioned, behavioral and social data to make it easier for their customers to open accounts and borrow money. Similarly, some modern banks use techniques that go far beyond passwords and even multifactor authentication to verify their customers’ identities online. For example, identifying consumers through their mobile device can improve the user experience greatly. Some lenders are even using behavioral biometrics to improve their online and mobile customer service practices. Continuously improving analytics Bill James and his colleagues developed a statistic called wins above replacement (WAR) that summarized the value of a player as a single number. WAR was never intended to be a perfect summary of a player’s value, but it’s very convenient to have a single number to rank players. Using the same mindset, early credit risk managers developed credit scores that summarized applicants’ risk based on their credit history at a single point in time. Just as WAR is only one measure of a player’s abilities, good credit managers understand that a traditional credit score is an imperfect summary of a borrower’s credit history. Newer scores, such as VantageScore® 4.0, are based on a broader view of applicants’ credit history, such as credit attributes that reflect how their financial situation has changed over time. More sophisticated financial institutions, though, don’t rely on a single score. They use a variety of data attributes and scores in their lending strategies. Just a few years ago, simply using data to choose players was a novel idea. Now new measures such as defense-independent pitching statistics drive changes on the field. Sabermetrics, once defined as the application of statistical analysis to evaluate and compare the performance of individual players, has evolved to be much more comprehensive. It now encompasses the statistical study of nearly all in-game baseball activities. A wide variety of data-driven decisions Sabermetrics began being used for recruiting players in the 1980’s. Today it’s used on the field as well as in the back office. Big Data Baseball gives the example of the “Ted Williams shift,” a defensive technique that was seldom used between 1950 and 2010. In the world after Moneyball, it has become ubiquitous. Likewise, pitchers alter their arm positions and velocity based on data — not only to throw more strikes, but also to prevent injuries. Similarly, when credit scores were first introduced, they were used only in originations. Lenders established a credit score cutoff that was appropriate for their risk appetite and used it for approving and declining applications. Now lenders are using Experian’s advanced analytics in a variety of ways that the credit scoring pioneers might never have imagined: Improving the account opening experience — for example, by reducing friction online Detecting identity theft and synthetic identities Anticipating bust-out activity and other first-party fraud Issuing the right offer to each prescreened customer Optimizing interest rates Reviewing and adjusting credit lines Optimizing collections Analytics is no substitute for wisdom Data scientists like those at Experian remind me that in banking, as in baseball, predictive analytics is never perfect. What keeps finance so interesting is the inherent unpredictability of the economy and human behavior. Likewise, the play on the field determines who wins each ball game: anything can happen. Rob Neyer’s book Power Ball: Anatomy of a Modern Baseball Game quotes the Houston Astros director of decision sciences: “Sometimes it’s just about reminding yourself that you’re not so smart.”