This post is one of five entries related to the report, Accelerating Financial Inclusion with New Data, a collaboration between CFI and the Institute of International Finance (IIF).
In a recent post on her report, Accelerating Financial Inclusion with New Data, Tess Johnson highlighted the huge opportunity that alternative data represents for the future of financial services. The simple fact that mobile and internet penetration have surpassed financial services penetration in most emerging markets hints at a big opportunity: many people who have had no meaningful access to formal financial services are creating digital footprints financial service providers can capture and analyze to reach them with commercially viable services that help them improve their lives. This prospect is also made possible thanks to machine learning and big data methods that were not available to us a few years ago.
For those of us in the world of financial inclusion, these are very exciting times: the simultaneous emergence of online penetration and data analysis methods is generating an opportunity that our predecessors in this field couldn’t even have imagined.
The bad news is that harnessing digital footprint data using machine learning is not easy; it requires time, commitment and skills that are in short supply. However, the good news is that those with the vision and endurance to leverage this opportunity will build a competitive advantage that will be sustainable for years to come.
When developing an alternative credit score based on traditional information (e.g., demographics, repayment data), analysts usually have historical data to design and train models. Through back testing, the credit scoring model is applied to historical data to see how accurately it would have predicted the actual results (i.e., loan repayment). We can get a pretty good sense of how the model will perform in the future and set up a credit policy accordingly. Yet, when we cannot use such traditional data sources, we are entering into uncharted territory.
Lacking prior information about our current customers’ psychometric profile or digital footprint, we must build those data sets from scratch. Depending on the data source, we may need very large data sets to compensate for the lack of data structure (unstructured data is simply data that is not easily accessible in a format or structure, like an Excel spreadsheet, that is optimal for generating insights). Just as with all other artificial intelligence applications, the more data you collect, the more predictive and stable your algorithms become. LenddoEFL is an example of an organization that gathers data for these profiles and footprints. It is an alternative credit scoring and verification provider that uses psychometric and other data about a loan applicant to determine a credit score and verify identity.
Furthermore, even state-of-the-art alternative data sources do not necessarily allow you to build models that are stable and reliable across multiple segments of the market. Therefore, you need to build algorithms that are specific to your target population.
One of the most challenging issues when implementing alternative data scoring initiatives is showing the results that can be achieved within a given set of time and budgetary constraints. In the long run, after the portfolio has matured, you can show whether using alternative data allowed you to approve more applicants within your target default levels, controlling by business cycle. But if you are working with 24- to 36-month loans, it may take three or four years before you can fully assess the impact of using alternative data, by which time internal attention spans may have already run short.
To account for that, LenddoEFL uses early indicators of model performance. We set a target maturity and days in arrears according to a financial institution’s portfolio’s profile, for example, 60 days in arrears within the first 9 months. Then we calculate a Gini coefficient—a scale of predictive power that can help lenders understand how good its credit score is at assessing who will repay and who will default on a loan (not to be confused with the Gini coefficient that measures income inequality) for the model as applied to that portfolio. (For more details on how to use the Gini, check out our blog series from our risk and analytics team: Part 1, Part 2, Part 3).
Is it too late to pursue an alternative credit scoring initiative? There are plenty of companies already doing this—Te Creemos in Mexico, Mynt in the Philippines and Business Partners in South Africa—but only a few lenders are utilizing alternative data in each market. You could be the first institution in your segment and country to implement such an initiative, and you can still take advantage of others’ experiences and learning.
The sooner you start collecting data and building models, the sooner you will be able to underwrite the unbanked segment better than your competition, and the longer the window of advantage will be. For those who start late, catching up with the early adopters will be a great challenge.
Stay tuned for my next blog post where I will outline recommendations for getting starting using alternative data!