Gender Bias in AI: An Experiment with ChatGPT in Financial Inclusion

Following the release of CFI’s Equitable AI Guide and Brief, Lucciana Alvarez Ruiz identifies issues with popular AI tools, like ChatGPT, and how AI can result in harmful gender bias if key challenges with building accountability and transparency are not addressed.

AUTHOR

Lucciana Alvarez Ruiz

Topics

Why is meal planning a recommended financial strategy for women but not for men with children, according to ChatGPT? And why is this important for financial services?

Natural language processing (NLP) tools can create human-like conversations. Perhaps the most well-known application of an NLP tool is ChatGPT, which reached over 100 million users two months after its launch in November 2022. Many people have increasingly used ChatGPT as a tool for work or personal use. I wanted to test if ChatGPT showed gender bias when looking at finance-related applications, so I inputted a few prompts about financial services. The results are a reminder that while AI is a powerful tool, it is not free from problematic social norms and potential bias against women. This week CFI released two new publications to help strengthen industry conversations to mitigate against, monitor for, and correct harmful gender bias in inclusive finance. These publications are the result of CFI’s winning proposal to USAID’s Equitable AI Challenge, implemented by DAI’s Digital Frontiers.

One simple prompt revealed three problematic differences. However, they reflect a complex problem requiring an urgent response.

When I asked ChatGPT, or GPT-3.5, to write financial advice to help men and women with children, the answers showed important differences by gender, as seen in the following screenshots. For instance, the tool recommended that men, not women, create or update their wills and designate beneficiaries for their assets. This result demonstrates an existing assumption that men with children are more likely to have assets or income than women with children. Moreover, GPT-3.5 advised only men, not women, to have a work-life balance and spend quality time with their families. Again, this shows a bias about gender roles in work and within families. Another blatant example was the recommendation GPT-3.5 gave to women, but not to men, about planning meals. The tool advised women to cook larger meals at home and plan their grocery shopping to save money.

One simple prompt revealed these three problematic differences. However, they reflect a complex problem requiring an urgent response — especially as many companies, including fintechs, incorporate AI tools, such as NLP, into their business models.

Differences in ChatGPT giving financial advice to women and men

Growing Use Cases of AI in Inclusive Finance

AI has numerous applications in the financial sector, such as revenue generation, process automation, risk management, client acquisition, and customer service. Companies are increasingly using AI to accelerate lending decisions and use alternative data — including social media, mobile phone activity, and satellite data — to measure creditworthiness. AI presents an opportunity to quickly expand access to financial services for populations without long credit histories, including women. Companies use facial patterns, voice recognition, and biometric data for know-your-customer (KYC) requirements and to verify customers’ identities. Insurance companies use AI algorithms to speed up claims processing. Companies are also using AI tools for client acquisition and customer service. Chatbots and virtual assistants help prospective and existing customers find information about financial products that meet their needs.

Instead of offering helpful customization, the financial advice given perpetuates outdated and unhelpful stereotypes. The resulting harm from biased AI systems used in financial services contradicts the goals of financial inclusion and has potential business and regulatory costs for fintechs and their investors.

However, while AI tools are making high-stakes decisions for the economic prospects and future of individuals, their businesses, and their communities, these systems are also introducing new risks, including data privacy vulnerabilities and unfair outcomes for certain consumers. Women have experienced negative effects, such as being unfairly rejected as “false negative” in credit decisions, or receiving higher pricing, lower credit limits, and more limited choices. As the above experiment with GPT-3.5 showed, algorithms can provide different financial advice depending on a customer’s gender. When designed well, tailored solutions for different customer segments can increase usage by satisfying the specific needs of the group. However, in this case, instead of offering helpful customization, the financial advice given perpetuates outdated and unhelpful stereotypes. The resulting harm from these issues contradicts the goals of financial inclusion and has potential business and regulatory costs for fintechs and their investors.

Sources of Harmful Gender Bias

Framework to understand algorithmic bias

CFI has written extensively on the opportunities and risks of AI for inclusive finance, and in 2021, we released a framework to help understand algorithmic bias. The framework included three buckets to categorize sources of harmful AI bias: 1) inputs (such as datasets and AI teams), 2) code, and 3) context.

Applying this framework to my experiment with ChatGPT illustrates a few examples of where differences in financial advice by gender may originate. However, it’s important to highlight that these possible explanations are purely theoretical, as we have not tested them empirically.

Inputs: NLP and AI algorithms are trained on data to make predictions and inform decisions. In this case, GTP-3.5 processed input data from the web, books, and WebText2, which is culled from Reddit submissions. As you may imagine, in aggregate these input datasets compile a series of beliefs and social norms, such as the one that suggested that women save money through meal prepping.

Code and Context: As a model is being built, developers can use techniques to reduce biased responses by, for instance, penalizing the model for generating harmful or stereotypical content. AI developers, or other actors, can periodically review model outputs after launch and provide feedback and adjustments over time. However, if the teams testing the system are primarily men, problematic outputs — like from my experiment — may be less likely to be flagged.

A Path Forward

As with any technology that experiences exponential growth, stakeholders — like investors, donors, and regulators — must play catch-up, and many questions remain. How, amid all these changes, can concerned stakeholders understand, let alone contribute to, shaping the future of AI in a way that is responsible and equitable, particularly for achieving our target financial inclusion outcomes?

In the inclusive finance sector, many actors are increasingly interested in incorporating principles and tools that guide the responsible and equitable deployment of AI. However, although there is a proliferation of techniques and academic resources to identify, mitigate, and correct bias, our extensive review of over 120 guides and checklists showed a lack of tools appropriately designed for the financial sector.

Recognizing this gap, CFI developed and this week released a practical guide to help impact investors identify harmful AI gender bias among their investee companies, and an accompanying brief that presents some of the key challenges with building accountability and transparency for AI in inclusive finance and offers opportunities to strengthen the industry conversation. Recommended practices in the guide touch on data, inputs, and model building as well as crucial business practices including governance. For instance, in the case of OpenAI, the creator of GPT-3.5, the company publicly committed that AI should benefit all of humanity and wrote principles to guide their mission. This stance may explain OpenAI’s six-month effort to make the new version of ChatGPT, GPT-4, safer and with hopefully fewer problematic outputs. I’ll be sure to check back with some prompts for GPT-4!

Read more about our work around equitable AI here.

How CFI is filling the gap to provide an AI guide

*Note: In this blog, gender gaps refer to the gap between women and men. However, CFI acknowledges that gender includes non-binary populations such as LGTBQ communities. It is essential to highlight that this blog does not intend to dismiss the existence of non-binary people.

Authors

Lucciana Alvarez Ruiz

Former Senior Research Manager

With over ten years of experience in research, Lucciana Alvarez Ruiz played a leading role in projects across CFI’s portfolio by providing project management, research design, and quantitative analysis. She was with CFI as a Research Manager from February 2021 to January 2025, and focused on micro and small enterprises, climate finance, AI, and women’s financial inclusion.

Lucciana graduated from Harvard Kennedy School of Government with a master’s in public administration in international development. Additionally, Lucciana holds a bachelor’s degree in economics from the Pontificia Universidad Católica del Perú and master’s studies in economics from the same university.

Before joining the CFI, Lucciana worked as a consultant at the Inter-American Development Bank in Washington DC and Peru for over four years, researching and managing loans and operations in Latin American countries. She also worked for the United Nations Development Programme, the Bloomberg Harvard City Leadership Initiative, the Social Policy Data Visualization Lab of Harvard Kennedy School, and other think tanks.

She has experience designing and managing projects, supervising the implementation of surveys, analyzing data using STATA and RStudio language programs, writing academic papers, and visualizing data. She is fluent in English and Spanish and intermediate in Portuguese.