Machine learning and Extreme Gradient Boosting

by Guest Contributor 3 min read October 24, 2018

This is an exciting time to work in big data analytics. Here at Experian, we have more than 2 petabytes of data in the United States alone. In the past few years, because of high data volume, more computing power and the availability of open-source code algorithms, my colleagues and I have watched excitedly as more and more companies are getting into machine learning. We’ve observed the growth of competition sites like Kaggle, open-source code sharing sites like GitHub and various machine learning (ML) data repositories.

We’ve noticed that on Kaggle, two algorithms win over and over at supervised learning competitions:

  • If the data is well-structured, teams that use Gradient Boosting Machines (GBM) seem to win.
  • For unstructured data, teams that use neural networks win pretty often.

Modeling is both an art and a science. Those winning teams tend to be good at what the machine learning people call feature generation and what we credit scoring people called attribute generation. We have nearly 1,000 expert data scientists in more than 12 countries, many of whom are experts in traditional consumer risk models — techniques such as linear regression, logistic regression, survival analysis, CART (classification and regression trees) and CHAID analysis. So naturally I’ve thought about how GBM could apply in our world.

Credit scoring is not quite like a machine learning contest. We have to be sure our decisions are fair and explainable and that any scoring algorithm will generalize to new customer populations and stay stable over time. Increasingly, clients are sending us their data to see what we could do with newer machine learning techniques. We combine their data with our bureau data and even third-party data, we use our world-class attributes and develop custom attributes, and we see what comes out. It’s fun — like getting paid to enter a Kaggle competition! For one financial institution, GBM armed with our patented attributes found a nearly 5 percent lift in KS when compared with traditional statistics.

At Experian, we use Extreme Gradient Boosting (XGBoost) implementation of GBM that, out of the box, has regularization features we use to prevent overfitting. But it’s missing some features that we and our clients count on in risk scoring. Our Experian DataLabs team worked with our Decision Analytics team to figure out how to make it work in the real world. We found answers for a couple of important issues:

  • Monotonicity — Risk managers count on the ability to impose what we call monotonicity. In application scoring, applications with better attribute values should score as lower risk than applications with worse values. For example, if consumer Adrienne has fewer delinquent accounts on her credit report than consumer Bill, all other things being equal, Adrienne’s machine learning score should indicate lower risk than Bill’s score.
  • Explainability — We were able to adapt a fairly standard “Adverse Action” methodology from logistic regression to work with GBM.

There has been enough enthusiasm around our results that we’ve just turned it into a standard benchmarking service. We help clients appreciate the potential for these new machine learning algorithms by evaluating them on their own data. Over time, the acceptance and use of machine learning techniques will become commonplace among model developers as well as internal validation groups and regulators.

Whether you’re a data scientist looking for a cool place to work or a risk manager who wants help evaluating the latest techniques, check out our weekly data science video chats and podcasts.

Related Posts

Updated November 17th Related Posts Link to automotive form, business form

Published: April 24, 2025 by Rathnathilaga.MelapavoorSankaran@experian.com

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus at nisl nunc. Sed et nunc a erat vestibulum faucibus. Sed fermentum placerat mi aliquet vulputate. In hac habitasse platea dictumst. Maecenas ante dolor, venenatis vitae neque pulvinar, gravida gravida quam. Phasellus tempor rhoncus ante, ac viverra justo scelerisque at. Sed sollicitudin elit vitae est lobortis luctus. Mauris vel ex at metus cursus vestibulum lobortis cursus quam. Donec egestas cursus ex quis molestie. Mauris vel porttitor sapien. Curabitur tempor velit nulla, in tempor enim lacinia vitae. Sed cursus nunc nec auctor aliquam. Morbi fermentum, nisl nec pulvinar dapibus, lectus justo commodo lectus, eu interdum dolor metus et risus. Vivamus bibendum dolor tellus, ut efficitur nibh porttitor nec. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas facilisis pellentesque urna, et porta risus ornare id. Morbi augue sem, finibus quis turpis vitae, lobortis malesuada erat. Nullam vehicula rutrum urna et rutrum. Mauris convallis ac quam eget ornare. Nunc pellentesque risus dapibus nibh auctor tempor. Nulla neque tortor, feugiat in aliquet eget, tempus eget justo. Praesent vehicula aliquet tellus, ac bibendum tortor ullamcorper sit amet. Pellentesque tempus lacus eget aliquet euismod. Nam quis sapien metus. Nam eu interdum orci. Sed consequat, lectus quis interdum placerat, purus leo venenatis mi, ut ullamcorper dui lorem sit amet nunc. Donec semper suscipit quam eu blandit. Sed quis maximus metus. Nullam efficitur efficitur viverra. Curabitur egestas eu arcu in cursus. H1 asdf asdf H2 H3 H4 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus ullamcorper ex, sed congue massa. Duis at fringilla nisi. Aenean eu nibh vitae quam auctor ultrices. Donec consequat mattis viverra. Morbi sed egestas ante. Vivamus ornare nulla sapien. Integer mollis semper egestas. Cras vehicula erat eu ligula commodo vestibulum. Fusce at pulvinar urna, ut iaculis eros. Pellentesque volutpat leo non dui aliquet, sagittis auctor tellus accumsan. Curabitur nibh mauris, placerat sed pulvinar in, ullamcorper non nunc. Praesent id imperdiet lorem. H5 Curabitur id purus est. Fusce porttitor tortor ut ante volutpat egestas. Quisque imperdiet lobortis justo, ac vulputate eros imperdiet ut. Phasellus erat urna, pulvinar id turpis sit amet, aliquet dictum metus. Fusce et dapibus ipsum, at lacinia purus. Vestibulum euismod lectus quis ex porta, eget elementum elit fermentum. Sed semper convallis urna, at ultrices nibh euismod eu. Cras ultrices sem quis arcu fermentum viverra. Nullam hendrerit venenatis orci, id dictum leo elementum et. Sed mattis facilisis lectus ac laoreet. Nam a turpis mattis, egestas augue eu, faucibus ex. Integer pulvinar ut risus id auctor. Sed in mauris convallis, interdum mi non, sodales lorem. Praesent dignissim libero ligula, eu mattis nibh convallis a. Nunc pulvinar venenatis leo, ac rhoncus eros euismod sed. Quisque vulputate faucibus elit, vitae varius arcu congue et. Ut maximus felis quis diam accumsan suscipit. Etiam tellus erat, ultrices vitae molestie ut, bibendum id ipsum. Aenean eu dolor posuere, tincidunt libero vel, mattis mauris. Aliquam erat volutpat. Sed sit amet placerat nulla. Mauris diam leo, iaculis eget turpis a, condimentum laoreet ligula. Nunc in odio imperdiet, tincidunt velit in, lacinia urna. Aenean ultricies urna tempor, condimentum sem eget, aliquet sapien. Ut convallis cursus dictum. In hac habitasse platea dictumst. Ut eleifend eget erat vitae tempor. Nam tempus pulvinar dui, ac auctor augue pharetra nec. Sed magna augue, interdum a gravida ac, lacinia quis erat. Pellentesque fermentum in enim at tempor. Proin suscipit, odio ut lobortis semper, est dolor maximus elit, ac fringilla lorem ex eu mauris. Phasellus vitae elit et dui fermentum ornare. Vestibulum non odio nec nulla accumsan feugiat nec eu nibh. Cras tincidunt sem sed lacinia mollis. Vivamus augue justo, placerat vel euismod vitae, feugiat at sapien. Maecenas sed blandit dolor. Maecenas vel mauris arcu. Morbi id ligula congue, feugiat nisl nec, vulputate purus. Nunc nec aliquet tortor. Maecenas interdum lectus a hendrerit tristique. Ut sit amet feugiat velit. Test Yes asedtsdfd asdf asdf adsf Related Posts

Published: March 1, 2025 by , Sirisha Koduri

Discover how token-based authentication works, its types, and why businesses trust it to secure sensitive data.

Published: February 11, 2025 by Theresa Nguyen