Reuth has more than 14 years of credit risk management experience in the financial industry supporting a wide variety of businesses including bankcards, auto, mortgage & home lending, retailer and commercial leasing. Her experience includes scorecard and strategy development along with system implementation, scorecard validation and reporting. She also has experience leading complex and high impact projects such as designing data procurement and data management processes and optimizing portfolio management performance using a variety of statistical techniques. Prior to embarking on a career in credit risk management, Reuth spent 10 years working in the direct marketing industry providing modeling and statistical analysis support as well as managing direct mail campaigns including prospect scoring and selection and the evaluation of customer expansion opportunities. Today, Reuth shares her wealth of experience and knowledge with Experian clients leading complex and innovative projects catered to supporting clients’ needs and objectives in effectively managing risk and making strategic marketing decisions.

-- Reuth Kienow

All posts by Reuth Kienow

Beyond Basic Data Sampling for Model Development

Your model is only as good as your data, right? Actually, there are many considerations in developing a sound model, one of which is data. Yet if your data is bad or dirty or doesn’t represent the full population, can it be used? This is where sampling can help. When done right, sampling can lower your cost to obtain data needed for model development. When done well, sampling can turn a tainted and underrepresented data set into a sound and viable model development sample. First, define the population to which the model will be applied once it’s finalized and implemented. Determine what data is available and what population segments must be represented within the sampled data. The more variability in internal factors — such as changes in marketing campaigns, risk strategies and product launches — and external factors — such as economic conditions or competitor presence in the marketplace — the larger the sample size needed. A model developer often will need to sample over time to incorporate seasonal fluctuations in the development sample. The most robust samples are pulled from data that best represents the full population to which the model will be applied. It’s important to ensure your data sample includes customers or prospects declined by the prior model and strategy, as well as approved but nonactivated accounts. This ensures full representation of the population to which your model will be applied. Also, consider the number of predictors or independent variables that will be evaluated during model development, and increase your sample size accordingly. When it comes to spotting dirty or unacceptable data, the golden rule is know your data and know your target population. Spend time evaluating your intended population and group profiles across several important business metrics. Don’t underestimate the time needed to complete a thorough evaluation. Next, select the data from the population to aptly represent the population within the sampled data. Determine the best sampling methodology that will support the model development and business objectives. Sampling generates a smaller data set for use in model development, allowing the developer to build models more quickly. Reducing the data set’s size decreases the time needed for model computation and saves storage space without losing predictive performance. Once the data is selected, weights are applied so that each record appropriately represents the full population to which the model will be applied. Several traditional techniques can be used to sample data: Simple random sampling — Each record is chosen by chance, and each record in the population has an equal chance of being selected. Random sampling with replacement — Each record chosen by chance is included in the subsequent selection. Random sampling without replacement — Each record chosen by chance is removed from subsequent selections. Cluster sampling — Records from the population are sampled in groups, such as region, over different time periods. Stratified random sampling — This technique allows you to sample different segments of the population at different proportions. In some situations, stratified random sampling is helpful in selecting segments of the population that aren’t as prevalent as other segments but are equally vital within the model development sample. Learn more about how Experian Decision Analytics can help you with your custom model development needs.

Published: November 7, 2018 by Reuth Kienow

When Enough Isn’t Enough — Resampling Techniques for M...

As I mentioned in my previous blog, model validation is an essential step in evaluating a recently developed predictive model’s performance before finalizing and proceeding with implementation. An in-time validation sample is created to set aside a portion of the total model development sample so the predictive accuracy can be measured on a data sample not used to develop the model. However, if few records in the target performance group are available, splitting the total model development sample into the development and in-time validation samples will leave too few records in the target group for use during model development. An alternative approach to generating a validation sample is to use a resampling technique. There are many different types and variations of resampling methods. This blog will address a few common techniques. Jackknife technique — An iterative process whereby an observation is removed from each subsequent sample generation. So if there are N number of observations in the data, jackknifing calculates the model estimates on N - 1 different samples, with each sample having N - 1 observations. The model then is applied to each sample, and an average of the model predictions across all samples is derived to generate an overall measure of model performance and prediction accuracy. The jackknife technique can be broadened to a group of observations removed from each subsequent sample generation while giving equal opportunity for inclusion and exclusion to each observation in the data set. K-fold cross-validation — Generates multiple validation data sets from the holdout sample created for the model validation exercise, i.e., the holdout data is split into K subsets. The model then is applied to the K validation subsets, with each subset held out during the iterative process as the validation set while the model scores the remaining K-1 subsets. Again, an average of the predictions across the multiple validation samples is used to create an overall measure of model performance and prediction accuracy. Bootstrap technique — Generates subsets from the full model development data sample, with replacement, producing multiple samples generally of equal size. Thus, with a total sample size of N, this technique generates N random samples such that a single observation can be present in multiple subsets while another observation may not be present in any of the generated subsets. The generated samples are combined into a simulated larger data sample that then can be split into a development and an in-time, or holdout, validation sample. Before selecting a resampling technique, it’s important to check and verify data assumptions for each technique against the data sample selected for your model development, as some resampling techniques are more sensitive than others to violations of data assumptions. Learn more about how Experian Decision Analytics can help you with your custom model development.

Published: July 5, 2018 by Reuth Kienow

Understanding Validation Samples Within Model Developm...

An introduction to the different types of validation samples Model validation is an essential step in evaluating and verifying a model’s performance during development before finalizing the design and proceeding with implementation. More specifically, during a predictive model’s development, the objective of a model validation is to measure the model’s accuracy in predicting the expected outcome. For a credit risk model, this may be predicting the likelihood of good or bad payment behavior, depending on the predefined outcome. Two general types of data samples can be used to complete a model validation. The first is known as the in-time, or holdout, validation sample and the second is known as the out-of-time validation sample. So, what’s the difference between an in-time and an out-of-time validation sample? An in-time validation sample sets aside part of the total sample made available for the model development. Random partitioning of the total sample is completed upfront, generally separating the data into a portion used for development and the remaining portion used for validation. For instance, the data may be randomly split, with 70 percent used for development and the other 30 percent used for validation. Other common data subset schemes include an 80/20, a 60/40 or even a 50/50 partitioning of the data, depending on the quantity of records available within each segment of your performance definition. Before selecting a data subset scheme to be used for model development, you should evaluate the number of records available in your target performance group, such as number of bad accounts. If you have too few records in your target performance group, a 50/50 split can leave you with insufficient performance data for use during model development. A separate blog post will present a few common options for creating alternative validation samples through a technique known as resampling. Once the data has been partitioned, the model is created using the development sample. The model is then applied to the holdout validation sample to determine the model’s predictive accuracy on data that wasn’t used to develop the model. The model’s predictive strength and accuracy can be measured in various ways by comparing the known and predefined performance outcome to the model’s predicted performance outcome. The out-of-time validation sample contains data from an entirely different time period or customer campaign than what was used for model development. Validating model performance on a different time period is beneficial to further evaluate the model’s robustness. Selecting a data sample from a more recent time period having a fully mature set of performance data allows the modeler to evaluate model performance on a data set that may more closely align with the current environment in which the model will be used. In this case, a more recent time period can be used to establish expectations and set baseline parameters for model performance, such as population stability indices and performance monitoring. Learn more about how Experian Decision Analytics can help you with your custom model development needs.

Published: June 18, 2018 by Reuth Kienow

Designing a Robust Customer Segmentation — Evaluation ...

In my first blog post on the topic of customer segmentation, I shared with readers that segmentation is the process of dividing customers or prospects into groupings based on similar behaviors. The more similar or homogeneous the customer grouping, the less variation across the customer segments are included in each segment’s custom model development. A thoughtful segmentation analysis contains two phases: generation of potential segments, and the evaluation of those segments. Although several potential segments may be identified, not all segments will necessarily require a separate scorecard. Separate scorecards should be built only if there is real benefit to be gained through the use of multiple scorecards applied to partitioned portions of the population. The meaningful evaluation of the potential segments is therefore an essential step. There are many ways to evaluate the performance of a multiple-scorecard scheme compared with a single-scorecard scheme. Regardless of the method used, separate scorecards are only justified if a segment-based scorecard significantly outperforms a scorecard based on a broader population. To do this, Experian® builds a scorecard for each potential segment and evaluates the performance improvement compared with the broader population scorecard. This step is then repeated for each potential segmentation scheme. Once potential customer segments have been evaluated and the segmentation scheme finalized, the next step is to begin the model development. Learn more about how Experian Decision Analytics can help you with your segmentation or custom model development needs.

Published: April 27, 2018 by Reuth Kienow

Designing a Robust Customer Segmentation — Generation ...

Marketers are keenly aware of how important it is to “Know thy customer.” Yet customer knowledge isn’t restricted to the marketing-savvy. It’s also essential to credit risk managers and model developers. Identifying and separating customers into distinct groups based on various types of behavior is foundational to building effective custom models. This integral part of custom model development is known as segmentation analysis. Segmentation is the process of dividing customers or prospects into groupings based on similar behaviors such as length of time as a customer or payment patterns like credit card revolvers versus transactors. The more similar or homogeneous the customer grouping, the less variation across the customer segments are included in each segment’s custom model development. So how many scorecards are needed to aptly score and mitigate credit risk? There are several general principles we’ve learned over the course of developing hundreds of models that help determine whether multiple scorecards are warranted and, if so, how many. A robust segmentation analysis contains two components. The first is the generation of potential segments, and the second is the evaluation of such segments. Here I’ll discuss the generation of potential segments within a segmentation scheme. A second blog post will continue with a discussion on evaluation of such segments. When generating a customer segmentation scheme, several approaches are worth considering: heuristic, empirical and combined. A heuristic approach considers business learnings obtained through trial and error or experimental design. Portfolio managers will have insight on how segments of their portfolio behave differently that can and often should be included within a segmentation analysis. An empirical approach is data-driven and involves the use of quantitative techniques to evaluate potential customer segmentation splits. During this approach, statistical analysis is performed to identify forms of behavior across the customer population. Different interactive behavior for different segments of the overall population will correspond to different predictive patterns for these predictor variables, signifying that separate segment scorecards will be beneficial. Finally, a combination of heuristic and empirical approaches considers both the business needs and data-driven results. Once the set of potential customer segments has been identified, the next step in a segmentation analysis is the evaluation of those segments. Stay tuned as we look further into this topic. Learn more about how Experian Decision Analytics can help you with your segmentation or custom model development needs.

Published: April 26, 2018 by Reuth Kienow

A Unique Approach to Reject Inference Design

You’ve been tasked with developing a new model or enhancing an existing one, but the available data doesn’t include performance across the entire population of prospective customers. Sound familiar? A standard practice is to infer customer performance by using reject inference, but how can you improve your reject inference design? Reject inference is a technique used to classify the performance outcome of prospective customers within the declined or nonbooked population so this population’s performance reflects its performance had it been booked. A common method is to develop a parceling model using credit bureau attributes pulled at the time of application. This type of data, known as pre-diction data, can be used to predict the outcome of the customer prospect based on a data sample containing observations with known performance. Since the objective of a reject inference model is to classify, not necessarily predict, the outcome of the nonbooked population, data pulled at the end of the performance window can be used to develop the model, provided the accounts being classified are excluded from the attributes used to build the model. This type of data is known as post-diction data. Reject inference parceling models built using post-diction data generally have much higher model performance metrics, such as the KS statistic, also known as the Kolmogorov-Smirnov test, or the Gini coefficient, compared with reject inference parceling models built using pre-diction data. Use of post-diction data within a reject inference model design can boost the reliability of the nonbooked population performance classification. The additional lift in performance of the reject inference model can translate into improvements within the final model design. Post-diction credit bureau data can be easily obtained from Experian along with pre-diction data typically used for predictive model development. The Experian Decision Analytics team can help get you started.

Published: January 17, 2018 by Reuth Kienow

Use of Swap Sets to Measure Impact of Model Changes

You just finished redeveloping an existing scorecard, and now it’s time to replace the old with the new. If not properly planned, switching from one scorecard to another within a decisioning or scoring system can be disruptive. Once a scorecard has been redeveloped, it’s important to measure the impact of changes within the strategy as a result of replacing the old model with the new one. Evaluating such changes and modifying the strategy where needed will not only optimize strategy performance, but also maximize the full value of the newly redeveloped model. Such an impact assessment can be completed with a swap set analysis. The phrase swap set refers to “swapping out” a set of customer accounts — generally bad accounts — and replacing them with, or “swapping in,” a set of good customer accounts. Swap-ins are the customer population segment you didn’t previously approve under the old model but would approve with the new model. Swap-outs are the customer population segment you previously approved with the old model but wouldn’t approve with the new model. A worthy objective is to replace bad accounts with good accounts, thereby reducing the overall bad rate. However, different approaches can be used when evaluating swap sets to optimize your strategy and keep: The same overall bad rate while increasing the approval rate. The same approval rate while lowering the bad rate. The same approval and bad rate but increase the customer activation or customer response rates. It’s also important to assess the population that doesn’t change — the population that would be approved or declined using either the old or new model. The following chart highlights the three customer segments within a swap set analysis. With the incumbent model, the bad rate is 8.3%. With the new model, however, the bad rate is 4.9%. This is a reduction in the bad rate of 3.4 percentage points or a 41% improvement in the bad rate. This type of planning also is beneficial when replacing a generic model with another or a custom-developed model. Click here to learn more about how the Experian Decision Analytics team can help you manage the impacts of migrating from a legacy model to a newly developed model.

Published: January 7, 2018 by Reuth Kienow

Subscription title for insights blog

Description for the insights blog here

Email *

First Name *

Last Name *

Country *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Categories title

Select Category

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

Subscription title 2

Description here

Subscribe Now

Text legacy

Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.

All posts by Reuth Kienow

Beyond Basic Data Sampling for Model Development

When Enough Isn’t Enough — Resampling Techniques for M...

Understanding Validation Samples Within Model Developm...

Designing a Robust Customer Segmentation — Evaluation ...

Designing a Robust Customer Segmentation — Generation ...

A Unique Approach to Reject Inference Design

Use of Swap Sets to Measure Impact of Model Changes

My Search

Subscription title for insights blog

Categories title

Subscription title 2

Text legacy

recent post

Learn More Image

Follow Us!

All posts by Reuth Kienow

Beyond Basic Data Sampling for Model Development

When Enough Isn’t Enough — Resampling Techniques for M...

Understanding Validation Samples Within Model Developm...

Designing a Robust Customer Segmentation — Evaluation ...

Designing a Robust Customer Segmentation — Generation ...

A Unique Approach to Reject Inference Design

Use of Swap Sets to Measure Impact of Model Changes

My Search

Subscription title for insights blog

Categories title

Subscription title 2

Text legacy

recent post

Learn More Image

Tags

Follow Us!