Research on micro and small enterprises (MSEs) is surprisingly often built on shaky ground. MSEs often operate informally, are not recorded in official databases, and are distributed unevenly across urban landscapes. Despite the central role these firms play in jobs, income, and resilience, much of what we know about them comes from samples that are convenient rather than representative, surveys of enterprises that are easiest to reach, lists that exclude the informal sector, or studies designed in ways that cannot be replicated. The result is data that may be voluminous but unreliable, leading to insights that misrepresent reality. This is not a niche concern confined to statisticians, it is a fundamental issue for policy and practice: if we want sound strategies for financing and supporting MSEs, we need to build them on sound data.
The Gallup story from 1936 reminds us why methodology is the decisive factor between misleading millions and getting it right. In 1936, George Gallup transformed public opinion research by proving that how you sample matters more than how many people you sample. At a time when the prestigious Literary Digest surveyed millions of voters, Gallup used a much smaller but carefully designed sample, and correctly predicted Franklin D. Roosevelt’s re-election. The Digest, relying on outdated mailing lists and self-selected respondents, got it disastrously wrong. The lesson was clear: large numbers do not guarantee reliable insight if the underlying method is flawed.
This lesson resonates powerfully in today’s development research, particularly when the focus is MSEs. Here, the challenge is not a lack of data but the quality and comparability of what is collected. Many studies generate detailed findings, but because they rely on inconsistent methods, such as different definitions of what counts as an enterprise, varying sampling frames, or incomplete coverage of the informal economy, the results cannot be combined or compared across contexts. The outcome is a fragmented evidence base that leaves policymakers and practitioners with partial, sometimes misleading, pictures of the sector.
Yet MSEs are surveyed constantly. From donor evaluations to academic studies to financial-sector diagnostics, research teams collect data again and again, often at great cost. What is missing is a common, rigorous, and implementable methodology that can be applied across markets to generate evidence that is both representative and replicable.
If we want sound strategies for financing and supporting MSEs, we need to build them on sound data.
Mapping the Unseen Economy
To meet this challenge, CFI worked on a methodology that combines rigor with practicality. Our aim was to strengthen the credibility of MSE research by building on methodological innovations to create a methodology that can be replicated by other organizations conducting enterprise surveys, while meeting high technical standards. In our 2025 flagship study Small Firms, Big Impact, we built on recent advances in adaptive cluster sampling (ACS) and geospatial enumeration to move beyond traditional business directories or street listings, which systematically miss large portions of the MSE universe, particularly informal firms.
Our approach was designed not only to capture a more accurate picture of MSEs across diverse urban contexts, but also to create a replicable method that other organizations can implement. By sampling discrete areas across the entire extent of each city and focusing fieldwork in areas with greater concentrations of small firms, we sought to ensure that every enterprise that fell within our eligibility criteria had a chance of being represented while also allocating fieldwork resources efficiently. The result is a research design that is both technically sound and accessible, capable of producing insights that can be compared across cities and over time.
What the Methodology Involves
Designing a sampling methodology to study MSEs requires making careful choices about how to represent a city’s economic life. Many existing approaches begin from what is easiest to observe or list: registered firms, busy markets, or main streets. These approaches often miss small, informal, or home-based businesses that operate in less visible areas. However, since the goal of our study was to be able to capture a representative sample of MSEs and make sound comparisons across cities in different countries, we had to take a different, more systematic approach.
To minimize the potential of sampling bias in the study, the methodology used block enumeration and adaptive cluster sampling (ACS). Block enumeration refers to the use of a geospatial grid in which each cell of the grid represents a primary sampling unit. ACS is a methodology in which the election of primary sampling units is allowed to be guided by observation. In this case, the discovery of small businesses in a given area triggers more intensive sampling in neighboring areas.
The main steps are outlined below:
- Step 1: Defining the sampling boundary
The first step is to determine the geographic area where the city’s economy actually operates. Instead of relying only on administrative limits, we use satellite imagery and other geospatial data such as night lights or population density to identify the built-up area. This helps capture peri-urban zones and commercial corridors that formal boundaries may exclude.
- Step 2: Constructing the grid
The area contained within the sampling boundary is divided into a grid of small, uniform “block areas”. These serve as primary sampling units (PSUs). Since all businesses in selected block areas will need be identified and listed, they need to be small enough to be covered by an enumerator. Using geospatial data offers an opportunity to refine the grid, for example using a land use/land cover layer to exclude blocks covered completely by water bodies, fields, or empty lots.
- Step 3: Enumerating the blocks
Enumerators conduct rapid scans in each selected block, counting and categorizing all visible businesses, including informal and home-based ones. This step provides a direct, current snapshot of the enterprise landscape rather than relying on older or incomplete business directories. Basic data collected during this enumeration serves to create an ‘on-the-fly’ sampling frame for the target population of small businesses.
- Step 4: Applying adaptive cluster sampling
An initial simple random sample of block areas is drawn from the sampling grid to ensure that all areas of the city have an equal chance of being represented. If more than a pre-defined number of small businesses are found (an ‘expansion threshold’) in a block area, all eight adjacent block areas are subsequently enumerated, and the process repeats until no further block areas meet the threshold requirement. As target businesses are encountered, they are selected at random with a pre-defined probability (e.g. 1 in 4) for the full survey. This adaptive process allows the survey to capture dense clusters of activity while avoiding unnecessary enumeration in areas with few enterprises.
- Step 5: Weighting and documentation
Survey weights are applied to account for differences in selection probability and block expansion. Careful documentation ensures that the process can be replicated elsewhere, and that results remain comparable across cities and over time.
Advantages and Trade-offs
Advantages
The approach we took offers a number of improvements over methods that are typically used to study MSEs. By grounding the sample in geography and adapting to real patterns of enterprise density, the method delivers data that are more representative and comparable while remaining practical to implement. It also creates opportunities to connect enterprise data with other spatial or environmental information, providing a richer understanding of how urban economies function.
Key advantages include:
- Improved representativeness: Both visible and less visible businesses have a measurable chance of inclusion.
- Comparability across contexts: The same design can be applied in multiple cities, allowing results to be compared over time and across markets.
- Reduced bias: Enumerator discretion is limited, and informal enterprises are systematically included.
- Integration with spatial data: The method allows results to be linked with satellite, infrastructure, or climate data for deeper analysis.
- Operational feasibility: The process achieves a balance between rigor and practicality and can be implemented by modestly resourced teams.
Trade-offs and challenges
The method also presents several challenges. It requires technical preparation and reliable data inputs, and adaptive sampling must be managed carefully to prevent the workload from expanding too quickly. As with any complex sampling design, maintaining data quality depends on training, supervision, and detailed documentation.
Main trade-offs include:
- Dependence on data: Up-to-date remote-sensing data and basic GIS tools are essential for defining boundaries and grids.
- Training and supervision needs: Field teams must identify enterprises consistently and apply adaptive rules correctly.
- Managing expansion: Adaptive sampling can increase workload rapidly in high-density areas, requiring careful oversight and budget monitoring.
- Complex weighting: Because sampling probabilities vary across blocks, weighting and data cleaning require close attention to ensure representativeness.
Dive in
Explore Technical Resources
- Navigate through dozens of data visualizations and analyses from our MSE study Small Firms, Big Impact
- Read through a detailed Technical Guide
- Access data repositories with the option to replicate and use for your own research
Photo credit: Carlos Martinez Subirats