Harnessing the predictive power of community workshops, geospatial data, and Bayesian statistics to address census omissions in remote areas of Colombia

Lina Sanchez-Cespedes, Departamento Administrativo Nacional de Estadística (DANE)
Glenn Amaya-Cruz, Departamento Administrativo Nacional de Estadística
Mariana Ospina-Bohórquez, Departamento Administrativo Nacional de Estadística
Douglas Leasure, WorldPop, Department of Geography and Environmental Sciences, University of Southampton
Natalia Tejedor-Garavito, WorldPop, Department of Geography and Environmental Sciences, University of Southampton

A full coverage national population and housing census or estimates of degree of completeness by municipality are essential to identify numbers of people and dwellings that are key for government planning and decision making. In Colombia, this is challenging for remote regions with low population densities, large territorial extents, and insecurity in some areas. Considering the importance of estimating census completeness at the municipality level and the flexibility of hierarchical Bayesian models to estimate census omissions, this study explores differences among three competing models. These combined population data from the 2018 census, including pre-census community workshops, with independent variables derived from GIS and remote sensing data to estimate census omissions for remote regions with limited information. As training data, we used census results of nearby fully enumerated areas. We assessed covariate effects, out-of-sample prediction accuracy, and uncertainty intervals using 10-fold cross-validation. We found that simple changes in model design resulted in important differences in model fit, and distinct approaches might get diverging results. The model with the best prediction accuracy had a hierarchical structure with intermediate complexity and fairly robust prediction intervals. These population estimates and uncertainty intervals can support government planning in municipalities not fully accessible to census enumerators.

Keywords: Bayesian methods / estimation, Census data, Geographic Information Systems (GIS), Culture, ethnicity, race, religion and language

See paper.

  Presented in Session 199. Augmenting Census and Other Data to Better Understand Spatial Population Distributions