Abstract
Background. Childhood lead exposure causes irreversible neurodevelopmental harm, and no safe blood lead level in children has been identified 1. Measured childhood blood lead is collected unevenly across U.S. states, and many areas publish no neighborhood-level surveillance. Zartarian et al. (2024), at the U.S. Environmental Protection Agency, showed that housing age and poverty predict where childhood lead exposure concentrates. They validated that approach against roughly 4.2 million children's blood-lead tests 4, and we build on their method.
Methods. We computed a lead-exposure risk score for all 83,388 U.S. census tracts and 3,222 counties and county-equivalents from American Community Survey 2018-2022 five-year estimates. The score combines a housing-age index (table B25034, three construction-era shares weighted toward older stock) with poverty (table S1701) 23. The housing-age score and poverty are z-scored, weighted 0.58 to 0.42, summed, and percentile-ranked. We compared the predicted score against measured childhood blood lead using Spearman's rank correlation (rho), a nonparametric measure of monotonic association from -1 to 1. Tract-scale confidence intervals come from a spatial block bootstrap, because neighboring tracts are spatially autocorrelated and an ordinary bootstrap understates the interval.
Results. At the census-tract scale, predicted risk correlated with measured elevated blood lead at rho = 0.54 (95% CI 0.42 to 0.64) in Michigan and 0.62 (0.54 to 0.68) in Ohio. Because those are the two states whose surveillance the source study itself used, we treat them as a consistency check rather than an independent test. Against surveillance the source study never used, predicted risk correlated at 0.70 (0.52 to 0.80) in metropolitan Milwaukee, Wisconsin, the one independent test at the census-tract scale, and at the county scale in five further states, from 0.77 (0.53 to 0.89) in New Jersey to 0.48 (0.20 to 0.69) in New York. Tract-scale intervals are from a spatial block bootstrap; the county-scale intervals are not corrected for spatial autocorrelation and, in two of the five states, are optimistic. We report the rank correlation, not the source study's Cohen's kappa of 0.49 to 0.63 (a different quantity, for binary hotspot agreement), and keep the two separate. Even a flexible model fit to the same public inputs explains only about half the tract-level variance in the two best-measured states, so housing age and poverty alone leave most of the variation unexplained.
Conclusions. A free, current, neighborhood-level lead-risk map, reproducible from public data, ranks U.S. neighborhoods in moderate agreement with measured childhood blood lead, including in states that publish no surveillance of their own. It is a screening layer that shows where confirmatory testing should go first. It does not diagnose exposure in any individual child or home.
Keywords: childhood lead exposure; ecological risk model; American Community Survey; blood-lead surveillance; geospatial screening; environmental justice.
Introduction
The problem has no safe threshold
Childhood lead exposure is a settled and quantified public-health failure, not an open question. The U.S. Centers for Disease Control and Prevention reports no safe level of lead in a child's blood, and that even low levels can affect IQ, attention, and academic achievement 5. In 2021 the CDC lowered its blood lead reference value from 5.0 to 3.5 micrograms per deciliter, set at the 97.5th percentile of the blood-lead distribution among U.S. children ages 1 to 5, so that more children with comparatively higher levels would be identified 1. The agency notes that the reference value is not health-based and not a regulatory standard. A result below it does not mean a child is unharmed 1.
The aggregate damage is large and already incurred. McFarland, Hauer, and Reuben (2022), in the Proceedings of the National Academy of Sciences, estimate that childhood exposure to leaded gasoline cost the living U.S. population roughly 824 million cumulative IQ points, an average of about 2.6 points per person across the population. They further estimate that more than 170 million Americans alive in 2015, over half the population, had childhood blood-lead levels above the clinical concern threshold of their era, and that the most exposed birth cohort (1966 to 1970) lost roughly six points each 6. Lead added to gasoline beginning in 1923 was not banned for on-road use until 1996, so most adults born before that date carry a measurable childhood-exposure burden 6. The burden may also persist into old age. Brown and colleagues (2025), in Alzheimer's and Dementia, link historical atmospheric lead, mapped at its 1960 to 1974 peak, to higher odds of memory problems half a century later in two large representative samples 7. The exposure pathway has shifted, but it has not closed. Leaded paint in housing built before the 1978 residential ban remains a dominant present-day source of childhood exposure, and the CDC has estimated that about 500,000 U.S. children ages 1 to 5 had blood-lead levels at or above the older 5.0 microgram-per-deciliter reference value, a count that is higher at the current 3.5 reference value 5.
The surveillance gap
Knowing that lead harms children at any dose is not the same as knowing which children to protect. The United States has no complete, population-representative measurement of childhood blood lead. The CDC receives about 3 million blood-lead test results per year, a fraction of the roughly 22 million children under six 8. But the selection matters more than the count. Testing is deliberately concentrated on children judged to be at higher risk, so the reported surveillance data are, in the CDC's own words, "not a population-based estimate" and "not representative of a whole county or a whole state" 8. The agency points anyone seeking nationally representative prevalence to NHANES, a survey designed for estimation rather than for locating individual neighborhoods at risk 8. The result is a systematic blind spot. A child who is never tested, in a place where few children are tested, never appears in the data, and the places with the least testing are not randomly distributed.
The measured data that do exist are held by states, not the federal government, and access to them is uneven by jurisdiction. Some states publish blood-lead surveillance through open data interfaces, and the CDC Environmental Public Health Tracking Network carries county-level elevated-blood-lead measures for participating states 9. Others lock the same information in static reports or dashboards that require formal records requests to obtain at usable spatial resolution. In this patchwork, the resolution of the available evidence depends less on where the hazard is than on each state's data-publishing posture. A family, clinician, or local health department in a low-publishing state has no neighborhood-level signal at all, even though the underlying housing and poverty drivers of risk are present and measurable there.
Why a prediction-first national map is needed
When direct measurement is incomplete and unevenly accessible, the established response is to predict risk from variables that are measured everywhere. The U.S. Environmental Protection Agency took exactly this approach. Zartarian et al. (2024), in Environmental Science and Technology, screened 73,086 census tracts containing at least one child under six in the 50 states and modeled lead-exposure risk from publicly available indicators, mainly the age of housing and poverty, because surveillance and environmental-data gaps make disproportionately exposed communities hard to find by measurement alone 10. They evaluated the predicted hotspots against approximately 1.9 million Michigan blood-lead results (2006 to 2016) and approximately 2.3 million Ohio results (2005 to 2018), and found moderate-to-substantial agreement, Cohen's kappa 0.49 to 0.63 10. They also found that a reduced model built on three variables (the share of homes built before 1940, the share built before 1950, and poverty) predicted hotspots about as well as the full model, and they restate the premise that "there is no known level of lead exposure to be without risk" 10.
That EPA result establishes the method but does not deliver a usable public instrument, even though its inputs are open and national. The American Community Survey publishes year-structure-built down to the census-tract level in table B25034 and poverty status in subject table S1701 1112, so the same housing-plus-poverty risk signal can be computed for every tract in the country, including the states where no measured blood-lead data are publicly accessible. A prediction-first map turns an indicator that depends on each state's reporting choices into a uniform, neighborhood-level warning that exists everywhere the Census reaches. The map does not diagnose any individual child and does not replace a blood test. It is a screening layer that shows where measured testing and on-the-ground hazard confirmation should go first.
To be explicit about what is and is not new here, the prediction method is not ours. Zartarian et al. established that housing age and poverty predict childhood lead exposure and validated it in Michigan and Ohio. This paper adds three contributions their study did not provide. First, a free, current, tract-level risk surface computed for the entire country, the usable public instrument their result implied but did not deliver. Second, an independent test of the method against measured childhood blood lead in six states their study never used, which is the central evidence here, because agreement on data the method was not derived from is not guaranteed. Third, a reproducible pipeline with the code and derived data released, so any reader can rebuild and re-check the result. We treat the prior finding as a method to be re-tested, not as settled ground. The Michigan and Ohio comparisons reuse the source study's own surveillance and are therefore a consistency check, while Wisconsin and the five county states are genuine out-of-sample tests.
Prior Work and Intellectual Lineage
The approach we describe here, predicting childhood lead-exposure risk from publicly available housing and socioeconomic data, did not originate with us, nor with any single recent effort. It rests on three decades of epidemiology, survey science, geographic analysis, and applied machine learning carried out by public health departments, federal agencies, university research groups, and investigative journalists. We summarize that lineage below.
Early geographic and area-based risk models
Sargent and colleagues first put this on a quantitative footing. In a logistic analysis of 238,275 Massachusetts children, Sargent et al. (1995) found that the percentage of housing built before 1950, per-capita income, the percentage of residents who were Black, and a poverty index were each independently associated with community lead-poisoning rates. This is the direct methodological ancestor of nearly every housing-age-plus-poverty index that followed.
Sargent JD, Brown MJ, Freeman JL, Bailey A, Goodman D, Freeman DH Jr. "Childhood lead poisoning in Massachusetts communities: its association with sociodemographic and housing characteristics." American Journal of Public Health. 1995;85(4):528–534. DOI: 10.2105/ajph.85.4.528. PMID: 7702117.
A closely related census-tract analysis of 17,956 children across Providence County, Rhode Island found that the share of houses built before 1950 carried the largest adjusted association with the proportion of children with elevated blood lead, with vacant housing also a strong predictor.
Sargent JD, et al. "Census tract analysis of lead exposure in Rhode Island children." Environmental Research. 1997;74(2):159–168. PMID: 9339229.
As geographic information systems matured, several groups translated these statistical associations into operational screening and targeting tools. Reissman et al. (2001) demonstrated the use of GIS to link blood-lead data and housing age in support of health-department decisions about prevention activities. Roberts et al. (2003), working in Charleston County, South Carolina, geocoded tax-assessor housing records and found that children in pre-1950 housing were roughly 3.9 times as likely to have an elevated blood lead level as children in post-1977 housing.
Reissman DB, Staley F, Curtis GB, Kaufmann RB. "Use of geographic information system technology to aid Health Department decision making about childhood lead poisoning prevention activities." Environmental Health Perspectives. 2001;109(1):89–94. DOI: 10.1289/ehp.0110989. PMID: 11171530.
Roberts JR, Hulsey TC, Curtis GB, Reigart JR. "Using geographic information systems to assess risk for elevated blood lead levels in children." Public Health Reports. 2003;118(3):221–229. DOI: 10.1093/phr/118.3.221. PMID: 12766217.
The most spatially ambitious strand of this work came from the Children's Environmental Health Initiative led by Marie Lynn Miranda. Miranda, Dolinoy, and Overstreet (2002) combined blood-lead screening, county tax-assessor housing-age data, and census data into GIS models intended to direct prevention programs. Kim, Galeano, Hull, and Miranda (2008) then resolved risk to the individual tax parcel across eighteen North Carolina counties and offered a framework for replicating such models elsewhere, the finest-grained housing-age risk modeling in the literature before Flint.
Miranda ML, Dolinoy DC, Overstreet MA. "Mapping for prevention: GIS models for directing childhood lead poisoning prevention programs." Environmental Health Perspectives. 2002;110(9):947–953. DOI: 10.1289/ehp.02110947. PMID: 12204831.
Kim D, Galeano MA, Hull A, Miranda ML. "A framework for widespread replication of a highly spatially resolved childhood lead exposure risk model." Environmental Health Perspectives. 2008;116(12):1735–1739. DOI: 10.1289/ehp.11540. PMID: 19079729.
Akkus and Ozdenerol (2014) later reviewed this body of GIS-based work and treated the area-based risk-index tradition as a coherent subfield.
Akkus C, Ozdenerol E. "Exploring Childhood Lead Exposure through GIS: A Review of the Recent Literature." International Journal of Environmental Research and Public Health. 2014;11(6):6314–6334. DOI: 10.3390/ijerph110606314. PMID: 24945189.
The Lead Exposure Risk Index developed by the Washington State Department of Health (first version, 2016) is where this research literature meets present-day public tools. It scores each census tract by combining the age of the housing stock with the share of households at or below 125 percent of the federal poverty level. This index has become the most widely replicated operational formulation of the housing-age-plus-poverty approach.
Washington State Department of Health, Childhood Lead Poisoning Prevention Program. Lead Exposure Risk Index. 2016. Available at: https://doh.wa.gov/data-and-statistical-reports/washington-tracking-network-wtn/lead-risk-and-exposure
National lead-paint surveys and the housing-age evidence base
These models weight housing age heavily because national surveys establish how the probability of lead-based paint varies by construction era. The original U.S. Environmental Protection Agency National Survey (1995) is the source of the canonical by-era probabilities, on the order of 87 percent of units built before 1940, roughly 69 percent for 1940–1959, and roughly 24 percent for 1960–1977.
U.S. Environmental Protection Agency. Report on the National Survey of Lead-Based Paint in Housing (Base Report). EPA 747-R-95-003. 1995. Available at: https://www.epa.gov/sites/default/files/documents/r95-003.pdf
Jacobs et al. (2002), reporting on the HUD National Survey of Lead and Allergens in Housing, estimated from a nationally representative sample that 38 million U.S. homes contained lead-based paint and 24 million had significant lead-based-paint hazards, with markedly higher prevalence in the Northeast and Midwest.
Jacobs DE, Clickner RP, Zhou JY, Viet SM, Marker DA, Rogers JW, Zeldin DC, Broene P, Friedman W. "The prevalence of lead-based paint hazards in U.S. housing." Environmental Health Perspectives. 2002;110(10):A599–A606. DOI: 10.1289/ehp.021100599. PMID: 12361941.
The Department of Housing and Urban Development's American Healthy Homes Surveys updated these national estimates. The second survey (AHHS II, 2021), based on fieldwork in 2018–2019, estimated that roughly 34.6 million homes (about 29.4 percent of housing units) contain lead-based paint and that about 21.9 million homes have dust-lead hazards under the 2019 standard.
U.S. Department of Housing and Urban Development, Office of Lead Hazard Control and Healthy Homes. American Healthy Homes Survey II: Lead Findings. 2021. Summary available at: https://www.huduser.gov/portal/pdredge/pdr-edge-trending-030822.html
Epidemiology of low-level lead exposure
The urgency behind this prediction work comes from a separate body of epidemiology showing that lead harms children at progressively lower exposures. Needleman et al. (1979) established subclinical harm by relating dentine lead levels in deciduous teeth to deficits in cognitive and classroom performance.
Needleman HL, Gunnoe C, Leviton A, Reed R, Peresie H, Maher C, Barrett P. "Deficits in psychologic and classroom performance of children with elevated dentine lead levels." New England Journal of Medicine. 1979;300(13):689–695. DOI: 10.1056/NEJM197903293001301. PMID: 763299.
Subsequent work pressed the harm below the long-standing 10 µg/dL action level. Lanphear et al. (2000), using NHANES III data on 4,853 children, observed cognitive deficits at blood lead concentrations under 10 µg/dL, and Canfield et al. (2003) found, in a cohort of 172 children, an even steeper inverse relationship between blood lead and IQ within that low range.
Lanphear BP, Dietrich K, Auinger P, Cox C. "Cognitive deficits associated with blood lead concentrations <10 µg/dL in US children and adolescents." Public Health Reports. 2000;115(6):521–529. DOI: 10.1093/phr/115.6.521. PMID: 11354334.
Canfield RL, Henderson CR Jr, Cory-Slechta DA, Cox C, Jusko TA, Lanphear BP. "Intellectual impairment in children with blood lead concentrations below 10 µg per deciliter." New England Journal of Medicine. 2003;348(16):1517–1526. DOI: 10.1056/NEJMoa022848. PMID: 12700371.
The international pooled analysis by Lanphear et al. (2005), combining seven prospective cohorts totaling 1,333 children, found measurable intellectual deficits even among children whose blood lead never exceeded 7.5 µg/dL and reported no evidence of a threshold below which lead is safe. If harm has no floor, the case for finding at-risk children before exposure rather than after is straightforward.
Lanphear BP, Hornung R, Khoury J, Yolton K, Baghurst P, Bellinger DC, Canfield RL, Dietrich KN, Bornschein R, Greene T, Rothenberg SJ, Needleman HL, Schnaas L, Wasserman G, Graziano J, Roberts R. "Low-level environmental lead exposure and children's intellectual function: an international pooled analysis." Environmental Health Perspectives. 2005;113(7):894–899. DOI: 10.1289/ehp.7688. PMID: 16002379.
Public mapping efforts
Journalists and civic-data teams brought area-based risk estimation to a broad public audience, often by directly adopting the Washington State methodology. In 2016, Frostenson and Kliff at Vox published a national neighborhood map assigning each census tract a risk decile from housing age and poverty, with open-source code, explicitly replicating the Washington State Department of Health approach.
Frostenson S, Kliff S. "The risk of lead poisoning isn't just in Flint. So we mapped the risk in every neighborhood in America." Vox. April 2016. Available at: https://www.vox.com/a/lead-exposure-risk-map. Code: https://github.com/voxmedia/data-projects/tree/master/vox-lead-exposure-risk
In the same year, Pell and Schneyer at Reuters compiled state blood-lead surveillance into an interactive map and identified thousands of census tracts and ZIP areas with lead-poisoning prevalence at least double that observed in Flint.
Pell MB, Schneyer J. "Off the Charts: The thousands of U.S. locales where lead poisoning is worse than in Flint." Reuters Investigates. December 2016. Available at: https://www.reuters.com/investigates/special-report/usa-lead-testing/
The same methodological lineage was subsequently institutionalized in two widely used data platforms. The City Health Dashboard, produced by NYU Langone Health, publishes a Lead Exposure Risk Index that bins housing units by construction era, weights them by lead likelihood, and combines them with the share of households at or below 125 percent of the federal poverty level, stating that its method was based on the Washington State Department of Health and Vox Media work. PolicyMap hosts a closely related "Risk of lead exposure" layer under an attribution to the same sources.
City Health Dashboard, NYU Langone Health, Department of Population Health. Lead Exposure Risk Index. Available at: https://www.cityhealthdashboard.com/metric/lead-exposure-risk-index
PolicyMap. Risk of Lead Exposure (attributed to Washington State Department of Health, Vox Media, and PolicyMap). Available at: https://www.policymap.com/data/sources/washington-state-department-of-health-vox-media-policymap
House-level and address-level machine learning
A final strand moved from area-based indices to predictive models resolved to the individual child, address, or parcel. Potash et al. (2015), a collaboration between the University of Chicago's Data Science for Social Good group and the Chicago Department of Public Health, built per-child and per-address risk models from historical blood-lead results and building characteristics, the address-level precedent later work built on.
Potash E, Brew J, Loewi A, Majumdar S, Reece A, Walsh J, Rozier E, Jorgenson E, Mansour R, Ghani R. "Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning." In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). 2015:2039–2047. DOI: 10.1145/2783258.2788629.
Abernethy, Chojnacki, Farahi, Schwartz, and Webb (2018) extended parcel-level prediction to lead service lines in Flint, Michigan, using property age, value, location, and city records, work that became the basis of the BlueConduit effort.
Abernethy J, Chojnacki A, Farahi A, Schwartz EM, Webb J. "ActiveRemediation: The Search for Lead Pipes in Flint, Michigan." In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '18). 2018:5–14. DOI: 10.1145/3219819.3219896.
Relation to the present work and to recent federal synthesis
The federal work of Zartarian and colleagues draws on all of these traditions and is the immediate basis for our paper. Their 2022 analysis catalogs and compares the prior housing-age and sociodemographic indices, and their 2024 hotspots analysis applies a random-forest model validated against several such indices.
Zartarian V, Poulakos A, Garrison VH, Spalt N, Tornero-Velez R, Xue J, Egan K, Courtney J. "Lead Data Mapping to Prioritize US Locations for Whole-of-Government Exposure Prevention Efforts." American Journal of Public Health. 2022;112(S7):S658–S669. DOI: 10.2105/AJPH.2022.307051.
Zartarian V, et al. "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology. 2024;58(7):3311–3321. DOI: 10.1021/acs.est.3c07881.
This body of prior scholarship spans the founding area-based regressions, the national survey infrastructure, the low-level-harm epidemiology, the public maps, and the house-level machine-learning models. We claim no new paradigm; this work is one further incremental step along a path many others laid down.
Methods
Overview and design rationale
We constructed a national, tract-level index of predicted childhood lead-exposure risk from two publicly documented determinants: the age of the housing stock and the prevalence of poverty. The choice of inputs is not novel. It follows the housing-age-plus-poverty approach the Washington State Department of Health uses for its Lead Exposure Risk Index, which combines ACS 5-year housing-age and poverty measures into a single community-level score 13. The same two determinants anchor the indices the U.S. Environmental Protection Agency screened in its national hotspots analysis, where a reduced three-variable random-forest model (percent of homes built before 1940, percent built before 1950, and a family income-to-poverty measure) reproduced the hotspot pattern of the full five-variable model 1415. Lead-based residential paint was banned for consumer use in 1978, so housing age is a direct proxy for leaded paint and the dust it generates; poverty proxies both deferred paint maintenance and reduced remediation capacity 21. We chose these inputs because both are published for every census tract on the same recent vintage. That uniform national coverage lets a single method reach states that publish no measured blood-lead data.
Our contribution is not the index form but its national reconstruction on the most recent 5-year American Community Survey (ACS) vintage, and its tract-by-tract validation against measured childhood blood lead, reported separately. The EPA hotspots analysis was built on 2010 Census geography and ACS 2013-2017 5-year inputs 15. We rebuilt the index on ACS 2018-2022 5-year estimates 16, which moves the housing and poverty measures forward roughly a decade and re-bases the geography on 2020-vintage census tracts.
Data sources
All inputs are American Community Survey 2018-2022 5-year estimates, the vintage released December 7, 2023 and current at the time of analysis 16. The 5-year file is the only ACS product published down to the census-tract level for the full universe of tracts. It is the correct vintage for small-area estimates because the 1-year file does not tabulate most tracts 16. We pulled four tables.
| Purpose | ACS table | Type | Key variables used |
|---|---|---|---|
| Housing age | B25034, Year Structure Built | Detail |
B25034_001E (total units); B25034_011E (pre-1940); B25034_010E, B25034_009E (1940 to 1959); B25034_008E, B25034_007E (1960 to 1979) 17
|
| Poverty | S1701, Poverty Status in the Past 12 Months | Subject |
S1701_C01_001E (population for whom poverty status is determined), S1701_C03_001E (percent below poverty level) 18
|
| Total population | B01003, Total Population | Detail |
B01003_001E (total population) 19
|
| Children under 18 | B09001, Population Under 18 Years by Age | Detail |
B09001_001E (population under 18 years) 20
|
B25034 reports occupied and vacant housing units by the decade built. The method uses its pre-1940 category (B25034_011E) together with the 1940-to-1959 and 1960-to-1979 decade categories (B25034_007E through B25034_010E), which give the three construction-era shares the housing-age score requires 17. S1701 is the ACS subject table for poverty status. It publishes the percent-below-poverty estimate directly as S1701_C03_001E over the universe "population for whom poverty status is determined" (S1701_C01_001E), so no separate denominator is needed 18. B01003 supplies total population and B09001 supplies the count of residents under 18, used for population-weighting and for the child-burden overlay rather than for the risk score itself 1920.
Census API pipeline
Data were retrieved programmatically from the Census Data API. Detailed tables (B25034, B01003, B09001) were requested from the ACS 2022 5-year detailed-tables endpoint and the subject table (S1701) from the parallel subject endpoint 16:
https://api.census.gov/data/2022/acs/acs5?get=NAME,group(B25034)&for=tract:*&in=state:{FIPS}&key={KEY}
https://api.census.gov/data/2022/acs/acs5/subject?get=NAME,group(S1701)&for=tract:*&in=state:{FIPS}&key={KEY}
The API caps tract-level wildcard queries at one state per call. The pipeline therefore iterated in=state:{FIPS} across the 50 states and the District of Columbia for tracts (plus Puerto Rico for counties), then concatenated the responses. County-level inputs were retrieved with for=county:*. Estimates were joined to 11-digit tract GEOIDs (2-digit state + 3-digit county + 6-digit tract) and to 5-digit county GEOIDs. We dropped records before scoring if they had a missing or zero housing-unit denominator (B25034_001E) or carried the Census sentinel values for suppressed or unestimable cells. This is what reduces the raw tract universe to the scored set described below. Margins of error are published for every estimate (the _M-suffixed variables) and were retained but not propagated into the point score 17.
Index construction
The score combines housing age and poverty, both from the American Community Survey 2018-2022 five-year estimates. Housing age enters as three construction-era shares, weighted so that older stock counts for more, because the probability that a home contains lead-based paint rises sharply with age. We define the inputs, give the exact Census fields, then state the model. Each quantity is computed once per census tract and once per county.
- s40: the share of housing units built in 1939 or earlier, the oldest and highest-lead stock.
- s4059: the share built between 1940 and 1959.
- s6079: the share built between 1960 and 1979, the last era before the 1978 residential lead-paint ban.
- poverty: the share of people living below the federal poverty line.
| Input | ACS source table | Computation from published Census fields |
|---|---|---|
| s40 | B25034 (Year Structure Built) | B25034_011E / B25034_001E |
| s4059 | B25034 (Year Structure Built) | (B25034_009E + B25034_010E) / B25034_001E |
| s6079 | B25034 (Year Structure Built) | (B25034_007E + B25034_008E) / B25034_001E |
| poverty | S1701 (Poverty Status) | S1701_C03_001E / 100 |
The three era shares are combined into a single housing-age score on a 0-to-100 scale that weights older construction more heavily:
The band weights (0.619, 0.309, 0.075) decline with construction age, a monotone weighting toward older stock set a priori from the established rise in lead-paint prevalence in older housing 21, not fitted to blood lead. The weight-sensitivity analysis below shows the validation is insensitive to the exact split as long as older stock is weighted more heavily. The housing-age score and poverty are each standardized to a z-score across the scored national universe, so the two share a common scale before they are weighted:
The standardized housing-age score and standardized poverty are combined into a composite risk score, which is converted to a within-nation percentile rank:
Here A is the housing-age score, z(x) is the standardization above, R is the composite risk score, rank(R) is the ascending rank of R across the N scored units, and P is the published 0-to-100 score. A unit at the 90th percentile carries higher predicted risk than 90 percent of scored units nationally.
The 0.58/0.42 composite split weights housing age above poverty, reflecting that housing age is the proximate source of the lead while poverty modifies exposure and remediation 21. Neither the band weights nor the composite split is fitted to the blood-lead data. The Washington State Department of Health index weights housing age and poverty and bins the result to deciles; we keep the full 0-to-100 resolution 13. To confirm the result does not depend on these choices, we re-derived the index across a grid of weightings and recomputed every validation correlation, reported under Weight sensitivity below. Because the score is ordinal by construction, the validation uses rank correlation.
After dropping records with a missing or zero housing denominator and Census-suppressed cells (described above), the method scored 3,222 counties and 83,388 census tracts. This is the subset of the full 2018-2022 ACS tract universe for which both a valid housing-age distribution and a valid poverty estimate exist. For comparison, the EPA hotspots analysis screened 73,086 tracts on the older ACS 2013-2017 geography, restricted to tracts in the 50 states containing at least one child under six years old 15. Our larger count reflects the newer 2020-vintage tract geography and the inclusion of tracts regardless of child presence. The 83,388 scored tracts cover the 50 states and the District of Columbia. The 3,222 counties and county-equivalents also include the District of Columbia and Puerto Rico's municipios, which is why Puerto Rico appears in the county outputs but not the tract outputs. The scored national surface is shown in Figure 1.
We did not impose a child-presence filter on the score itself; instead, the B09001 under-18 count and B01003 total population are carried alongside each scored tract as an exposed-population overlay 1920.
Scope and interpretation
The output is a prediction of relative risk from housing age and poverty. It does not measure lead in any specific home, and it does not diagnose any child. It is a screening surface meant to direct confirmatory testing, consistent with EPA's own statement that "there is no known level of lead exposure to be without risk" and with the use of these indices as targeting tools rather than exposure measurements 141521. The entire pipeline is reproducible from the four ACS tables and the public Census API with no restricted or licensed inputs.
Footnotes / sources
Validation
A risk map is only worth deploying if it agrees with where children actually carry elevated blood lead. We tested ours in two stages. First we summarize the federal anchor: the EPA hotspots analysis, which validated housing-and-poverty indices against roughly 4.2 million measured childhood blood-lead tests in two states. Then we test our published map against measured childhood blood lead in eight states, in a design with three distinct tiers. Michigan and Ohio, at the census-tract scale, reuse the source study's own surveillance, so they are a consistency check on its ground truth, not an independent test. Wisconsin (metropolitan Milwaukee), also at the census-tract scale, is the one independent test at tract resolution, on surveillance the source study never used. Five more states (New Jersey, Illinois, Missouri, Iowa, and New York) are independent tests at the coarser county scale. We report the tract and county scales separately and never pool them into a single figure. The two stages also use different statistics: the federal work reports Cohen's kappa for binary hotspot agreement, and we report the Spearman rank correlation between continuous predicted risk and continuous measured exposure. We keep the two separate and do not convert one into the other.
The federal anchor: Zartarian et al. (2024)
The scientific foundation is Zartarian et al., "A U.S. Lead Exposure Hotspots Analysis," published in Environmental Science & Technology in 2024 22. EPA's Office of Research and Development screened 73,086 census tracts containing at least one child under six across all 50 states, scoring each tract on lead-exposure indices built from housing age and sociodemographic data drawn from the American Community Survey 23.
The validation against measured childhood blood lead, not the prediction by itself, is what gives that paper its weight. EPA held the predicted hotspots against measured childhood blood-lead surveillance in two states with unusually complete records:
- Michigan: approximately 1.9 million blood-lead results from children under six, covering 2006 to 2016 23.
- Ohio: approximately 2.3 million blood-lead results from children under six, covering 2005 to 2018 23.
Across those roughly 4.2 million measured tests, the predicted hotspots agreed moderately to substantially with where children actually carried elevated blood lead, at Cohen's kappa of 0.49 to 0.63 22. EPA read kappa on a fixed scale: below 0.4 low, 0.4 to 0.6 moderate, 0.6 to 0.8 substantial, above 0.8 near-perfect 23. A companion tract-scale study of Ohio by the same EPA group reports a comparable band, kappa 0.54 to 0.64 comparing observed blood-lead hotspots against the predictive indices across the 3.5, 5, and 10 µg/dL reference values 24.
Two further results matter for anyone building on it. First, a reduced three-variable model, using only percent of homes built before 1940, percent built before 1950, and the percent of families with an income-to-poverty ratio above 2, performed comparably to the full five-variable model, holding kappa at 0.51 to 0.63 across the Michigan and Ohio datasets 23. Housing age plus poverty account for most of the predictive power, which is what makes a transparent, reproducible national map possible from public data alone. Second, the authors anchor the entire effort in the established toxicology, stating plainly that "there is no known level of lead exposure to be without risk" 23. A screening map does not need to find a safe threshold, because none exists; it only has to rank where exposure concentrates.
Our method, in brief
Our national map combines housing age and poverty, the two determinants the source study found carry most of the predictive signal. Housing age comes from ACS table B25034 (Year Structure Built) as three construction-era shares weighted toward older stock, because older homes carry the highest lead-paint burden 25. Poverty comes from ACS table S1701 (Poverty Status in the Past 12 Months) 18. We standardize the housing-age score and poverty, weight them 0.58 to 0.42, and sum. Tracts are then percentile-ranked within the national pool of 83,388 tracts; for the county-scale validation the same index is computed on county totals and ranked within the national pool of 3,222 counties. This follows the Washington State Department of Health housing-age-plus-poverty lineage rather than introducing a new index. The full equations are in Methods.
Validation at the census-tract scale
For each state we joined predicted tract-level risk to measured childhood blood lead at the tract level, then computed the Spearman rank correlation 29. Neighboring tracts are spatially autocorrelated, so an ordinary bootstrap that resamples tracts one at a time treats correlated observations as independent and returns an interval that is too narrow. We instead use a spatial block bootstrap: tracts are partitioned into compact geographic blocks by a quantile grid on their centroids, and whole blocks are resampled with replacement, 3,000 times. We also report Moran's I of the regression residual (observed on predicted) under queen contiguity, with a permutation p-value, as a direct measure of the leftover spatial structure the block bootstrap is correcting for.
What counts as independent differs by state. For Michigan and Ohio the measured values come from the same surveillance the source study used, so these two are a reproduction of the source result on its own ground truth, a consistency check rather than an independent test. Wisconsin is an independent test: its measured tract data come from the state's open ArcGIS service, which we pulled live with no records request.
| State | Measured source | Tracts (n) | Spearman ρ (95% CI) |
|---|---|---|---|
| Michigan | source-study surveillance, 2006–2016 (consistency check) | 2,156 | 0.54 (0.42 to 0.64) |
| Ohio | source-study surveillance, 2005–2018 (consistency check) | 2,534 | 0.62 (0.54 to 0.68) |
| Wisconsin (metro Milwaukee) | WI DHS open ArcGIS, children under 6 | 208 | 0.70 (0.52 to 0.80) |
All three correlations are positive, with spatial block bootstrap intervals well clear of zero. The residual carries strong positive spatial autocorrelation in every case (Moran's I 0.49 in Michigan, 0.53 in Ohio, 0.57 in Wisconsin, all permutation p ≈ 0.001), which is exactly why the ordinary tract-level bootstrap intervals (0.51 to 0.57, 0.59 to 0.64, and 0.62 to 0.76) are too narrow and the wider block intervals above are the honest ones. A county-block bootstrap, resampling whole counties rather than grid cells, gives intervals consistent with these for Michigan and Ohio (0.39 to 0.64 and 0.51 to 0.70); metropolitan Milwaukee falls within a single county, so the grid blocks are used there. The Wisconsin data came from the state DHS open ArcGIS service, which publishes children under six tested, children testing positive, and percent poisoned by tract; a tract is suppressed only when fewer than five children are poisoned, and stays visible if 100 or more were tested 26.
A rank correlation of 0.54 to 0.70 means the map's ordering and the measured ordering move together moderately to strongly, but much of the variation is unexplained: those correlations share only about a third to a half of the rank variance with measured exposure, leaving the rest to factors outside housing age and poverty. A gradient-boosted model fit to the same public inputs, cross-validated within Michigan and Ohio, reaches an R-squared of about 0.48 to 0.55 (see Residual analysis); that is an in-distribution upper bound for this feature set in those two states, not a national bound and not the fixed-weight index's own fit. This is a screening signal, not a diagnosis, and it says nothing about any individual child's blood lead.
Validation at the county scale: five more states
We also ran a coarser but broader test at the county level. Here the predictor is a county-resolution version of the same index, computed from county housing-age and poverty (the identical three-band housing-age score and 0.58/0.42 composite) and percentile-ranked within the national pool of 3,222 counties, not an average of tract scores. We joined it to measured childhood blood lead from the CDC Environmental Public Health Tracking Network 2022 series (children under six, at or above the 3.5 µg/dL reference value). All five use surveillance the source study did not, so they are independent tests, and all five share the same 3.5 µg/dL threshold and 2022 vintage.
| State | Counties (n) | Spearman ρ (95% CI) |
|---|---|---|
| New Jersey | 21 | 0.77 (0.53 to 0.89) |
| Illinois | 56 | 0.68 (0.49 to 0.81) |
| Missouri | 40 | 0.56 (0.30 to 0.74) |
| Iowa | 56 | 0.55 (0.33 to 0.72) |
| New York | 61 | 0.48 (0.20 to 0.69) |
These intervals come from resampling counties, which are themselves the spatial unit, so unlike the tract intervals they are not separately corrected for spatial autocorrelation. The residual carries little county-to-county autocorrelation in Illinois, Iowa, and New Jersey (Moran's I near zero, not significant), but it is substantial in Missouri and New York (Moran's I 0.59 and 0.48, permutation p ≈ 0.001), so those two intervals should be read as optimistic. County correlations should also not be read on the same scale as the tract ones: a coarser areal unit averages over within-county variation and tends to raise the correlation, an instance of the modifiable areal unit problem. The small county counts give wide intervals, widest for New Jersey at n = 21. New York is the weakest, and its county series excludes New York City, omitting much of the state's oldest housing.
We apply one inclusion rule uniformly: a series enters the headline only if it uses the current 3.5 µg/dL reference value and its correlation is distinguishable from zero, meaning a 95 percent confidence interval that excludes zero. Every tract and county series above passes. Massachusetts fails on both counts. It uses the older 5 µg/dL threshold on the 2020 vintage, and on only 13 counties gave ρ = 0.42 with a 95 percent confidence interval from -0.22 to 0.84 that includes zero. We report it for completeness but exclude it from the headline.
Across the independent tests, Wisconsin at the tract scale and the five county states, predicted risk tracks measured childhood blood lead at a moderate level (Figure 2), on surveillance systems the source study never used. We do not compare these rank correlations against the source study's kappa, and we do not average the tract and county correlations into one figure, because neither pair measures the same thing.
Sensitivity to the weighting
The composite split (0.58 housing, 0.42 poverty) and the housing-band weights (0.619, 0.309, 0.075) are set a priori, not fitted, so a fair question is whether the result depends on them. We re-derived the index from the raw national ACS distribution across nine weighting schemes and recomputed each tract-scale correlation. The published weights reproduce the canonical values exactly (Michigan 0.54, Ohio 0.62, Wisconsin 0.70). Moving the composite split anywhere from 0.42/0.58 to 0.70/0.30 changes each correlation by less than 0.03. The housing-band weighting matters more, and in the expected direction: weighting older stock more heavily, as the published weights and a pre-1940-only weighting both do, outperforms weighting the three construction eras equally, which drops the correlation to 0.45 in Michigan, 0.53 in Ohio, and 0.59 in Wisconsin. Dropping either housing age or poverty entirely is worse than keeping both. The ranking is driven by the prior decision to weight older housing and to include poverty, both fixed in advance from the lead-paint literature, not by the precise split.
Robustness to tract-boundary vintage
The index is built on 2020-vintage census tracts (ACS 2018-2022), while the Michigan and Ohio measured data predate that geography, collected from 2006 to 2016 and 2005 to 2018 on 2010 tracts. A direct 11-digit GEOID join can therefore mismatch tracts that split or merged between the two vintages. Using the Census 2010-to-2020 tract relationship file, 96.9 percent of the Michigan validation tracts and 97.5 percent of the Ohio tracts are unchanged one-to-one between vintages. Restricting each correlation to those unchanged tracts leaves it essentially identical (Michigan 0.54 on both the full and the unchanged set; Ohio 0.62 versus 0.61), so the cross-vintage join does not drive the result. The predictors are also more recent than these outcomes by roughly a decade, which attenuates a correlation rather than inflating it, so the tract-scale figures are if anything conservative.
Sensitivity to testing coverage
Measured childhood blood lead comes only from children who are tested, and testing is targeted rather than universal, so a fair concern is that these correlations track where testing happens rather than where lead is. Two things bound that concern. First, every measured value we use is a rate, the percent of tested children who are elevated, not a count of cases, so it is not mechanically inflated by testing more children. Second, where the data also publish the number of children tested, we condition on it.
At the census-tract scale, the Wisconsin service reports children tested per tract. Across metropolitan Milwaukee, predicted risk is only weakly related to testing volume (Spearman 0.17), and the partial correlation between predicted risk and the elevated rate, controlling for testing volume, is essentially unchanged from the raw value (0.69 versus 0.70). Restricting to well-tested tracts, where the rate is measured most precisely, the correlation is if anything stronger (0.75 at 100 or more children tested, 0.75 at 200 or more).
At the county scale, we reconstructed the number of children tested per county from the CDC series (the elevated count divided by the elevated rate). In four of the five states, predicted risk is negatively correlated with testing volume (Spearman -0.16 to -0.57; New Jersey is the exception at 0.34), which means the highest-risk counties test the fewest children. That is the surveillance gap the map is built for: measurement is thinnest exactly where predicted risk is highest. Controlling for testing volume, the predicted-risk-to-elevated-rate correlation stays positive in every state (partial Spearman 0.36 in Iowa to 0.79 in New Jersey). The attenuation in some states reflects that lower-testing counties are also higher-risk, a shared rural and disinvestment gradient, rather than an artifact in the rate itself.
A validation fully independent of who gets tested would require population-representative testing, such as NHANES, or a complete state registry rather than the public aggregate extracts used here. We invite that confirmation (see Discussion).
The data-access pipeline, and why prediction is necessary
Measured childhood blood-lead is held by states, and access varies sharply from one to the next. That is the practical reason a prediction map is useful: it brings the same neighborhood-level warning to every state, including the many that publish no usable blood data at all.
Where measured data exists in machine-readable form, we ingest it automatically:
- New York (ZIP-level, Socrata). New York publishes childhood blood-lead testing and elevated-incidence counts by ZIP code (excluding New York City) on health.data.ny.gov, served through the Socrata Open Data API (SODA) 27. We query it programmatically.
- County-level, roughly 45 states (CDC Tracking Network API). The CDC Environmental Public Health Tracking Network exposes childhood blood-lead surveillance through a machine-readable API, with the ≥3.5 µg/dL classification adopted for 2022-forward data after CDC lowered the blood-lead reference value from 5 to 3.5 µg/dL in October 2021 28. Children are counted once per year at their highest result 5. This is the broad county-level backbone.
- Wisconsin (tract-level, ArcGIS). Pulled live, as described above 26.
Then there are the holdouts. States including New Hampshire, Colorado, and Connecticut lock their childhood blood-lead behind Tableau dashboards or PDF reports with no API, so obtaining tract- or ZIP-level measured values requires a public-records (FOIA) request and manual extraction. These are exactly the places where a family has no public way to learn that their neighborhood's housing stock and poverty profile put their child at elevated risk. The prediction map closes that gap. It runs without waiting for a state to publish blood tests, because it needs none: only the public Census housing-and-poverty data, which exists for every tract in the country. The validation above is what justifies trusting that prediction where no measured data is available to check it.
Residual analysis: how much further public data can go
A screening map should be honest about its own error. Two questions follow from the validation: why does predicted risk track measured blood lead at a rank correlation near 0.6 rather than 1.0, and would more public variables push it higher. We tested both in the two states behind the federal anchor.
What we did
We assembled every census tract in Michigan and Ohio carrying both a published measured childhood blood-lead value and complete Census inputs: 4,690 tracts, population-weighted. We compared two models that predict measured blood lead from public variables, each scored by five-fold cross-validation within this two-state set, so every reported figure is from held-out tracts the model did not train on. The figures below therefore describe held-out tracts within Michigan and Ohio, not generalization to new states 30. The published risk index itself has fixed weights and is never fit to blood lead; the cross-validated models here are auxiliary, used only to ask how much additional public data could improve the ranking. The baseline uses only the EPA three-variable set: the share of homes built before 1940, the share built before 1950, and poverty. The expanded model adds seven more public variables: median home value, median household income, renter share, vacancy rate, the 1950-to-1979 housing share, percent Black, and percent Hispanic.
Housing age is most of the signal, and more variables barely move the ranking
| Model | Inputs | Cross-validated Spearman ρ | Variance explained (R²) |
|---|---|---|---|
| Baseline | EPA 3 variables (housing age + poverty) | 0.609 | 0.482 |
| Expanded | 10 variables (adds race, value, income, tenure, vacancy) | 0.619 | 0.550 |
Adding seven variables raised the rank correlation by 0.01 and the variance explained by 0.07. The ranking is already close to the ceiling that public data can reach. A single input, the share of homes built before 1940, carries roughly 68 percent of the model's predictive weight. This reproduces the EPA's own finding that a reduced three-variable model performed comparably to its full model 4: housing age and poverty carry most of the signal, and further socioeconomic variables are largely redundant with them.
Where the map misses, it misses by race and disinvestment
The structure of the error is more informative than its size. We took what the housing-and-poverty baseline leaves unexplained, its out-of-sample residual, and asked which variables predict it. Percent Black population dominates (Figure 3), well ahead of housing vacancy and of income. The map systematically under-predicts risk in tracts with larger Black populations and more vacant housing, relative to what housing age and poverty alone imply. This is consistent with the documented racial ecology of lead exposure, in which the legacy of redlining and disinvestment concentrated deteriorating lead hazards in Black neighborhoods beyond what income or housing vintage captures 31.
We state the ethical constraint plainly. That race improves the statistical fit does not mean race should be an input to a deployed targeting tool. Allocating screening by race would risk encoding the disparity it measures. We report the result for the opposite reason: it is a diagnostic of where a housing-and-poverty map runs conservative, and a signal that the highest-risk Black neighborhoods warrant at least as much confirmatory testing as the map indicates, not less.
Why the ceiling exists, and what is actually beyond it
Three limits hold any ACS-based map near a rank correlation of 0.6. Only one is fixable with more data.
First, the model is ecological. Every value is a property of a neighborhood, not a home. A high-risk tract contains remediated, lead-free houses; a low-risk tract contains pre-1940 houses with failing paint. No neighborhood variable resolves a hazard that varies house to house. The constraint lies in the unit of analysis, not in the variable list.
Second, the measured truth is itself noisy. A tract's blood-lead rate rests on the children who happened to be tested, which in a small tract is few. Some of the unexplained variance is measurement error in the target, not error in the model.
Third, the remaining exposure pathways are absent from the census entirely. Lead service lines, soil lead, imported consumer goods such as glazed ceramics and certain spices, occupational take-home lead, and the actual condition of paint are real drivers that no ACS table records.
The third limit points to two ways forward. The first is to combine external public data the census does not carry: the lead service-line inventories that water systems were required to publish under the EPA Lead and Copper Rule Revisions in October 2024 32; parcel-level assessor and sale records that proxy for maintenance and renovation; proximity to airports still burning leaded aviation fuel, now the largest source of airborne lead, which the EPA has formally found endangers public health 33; and soil-lead surveys. Each is public, but fragmented and laborious to assemble, which is exactly why existing maps omit it. The second is physical confirmation in the home. A neighborhood model that has reached its data ceiling cannot tell a family whether their home holds a hazard; a direct test of the home, by a certified risk assessment, an XRF reading, dust-wipe or paint-chip lab analysis, or the child's blood-lead test, can. Confirmation in the home is the step a map cannot replace.
Discussion: potential applications
A validated neighborhood risk map has a clear public-health use. Childhood blood-lead testing in the United States is targeted rather than universal, and the targeting is uneven. A current map of where exposure is most likely helps health departments and clinicians decide where to direct testing, outreach, and home-hazard assessment first. Because the map covers every neighborhood from public data, it extends that guidance to the many areas that publish no measured surveillance of their own.
The map is most useful to the people and programs that already reach families with young children. Pediatric and prenatal care providers, Women, Infants, and Children (WIC) nutrition clinics, Medicaid managed-care plans and their Early and Periodic Screening, Diagnostic, and Treatment programs, home-visiting programs, Head Start, and childhood lead poisoning prevention programs all serve the at-risk population and all decide whom to test, counsel, and refer. A free, neighborhood-resolution risk layer lets any of them flag which of the families they already serve live in higher-risk areas, and prioritize blood-lead screening, anticipatory guidance, and home-hazard education accordingly. Because the score is a property of place rather than of person, a provider can apply it to its own patient roster locally, without transmitting any patient information, which keeps the use within existing privacy practice. The same property makes the map a direct input to the geographic targeting plans the CDC asks state and local programs to maintain.
The map ranks risk; it does not confirm a hazard in any specific home. It is the first pass. A physical test of paint, dust, or water in the flagged home is what establishes whether a hazard is actually present, and pairing the two directs limited inspection and abatement capacity to where a hazard is most likely. The cost-effectiveness of any particular screening program is beyond the scope of this paper. It depends on local testing costs, follow-through rates, and remediation practices we do not model here.
A more rigorous validation is available to the agencies that hold complete blood-lead records. Our test used only publicly released, aggregate surveillance, which bounds how finely an outside party can check the map. State childhood lead poisoning prevention programs, Medicaid blood-lead testing files, and the CDC's full surveillance hold individual, address-level results for far more children than any public extract. Any of these custodians, or a research partner under an appropriate data-use agreement, could validate and recalibrate the map against their complete registry at the address level, including in states we could not test here, while keeping the identifiable data inside their own systems. The map and its open pipeline are built to enable that validation, and we invite it.
Limitations and Ethics
This map is a screening tool. It predicts where childhood lead-exposure risk concentrates, using public housing and poverty data, and we validate that prediction against measured childhood blood lead in eight states: one independent test at the census-tract scale, five independent tests at the county scale, and two consistency checks against the source study's own surveillance. It does not measure lead in any home, and it does not diagnose any child. Each limitation below constrains a specific claim a reader might otherwise draw from a colored map.
The estimates are ecological, not individual
Every value on this map is a property of a census tract, not of a person or a house. The model is built from tract-level aggregates (three construction-era housing-age shares, pre-1940, 1940-1959, and 1960-1979, from Census table B25034, weighted toward older stock, and tract poverty from table S1701), so its output describes the average risk environment of a neighborhood. Reading a tract-level association as if it applied to an individual is the ecological fallacy 34. The EPA hotspots analysis we extend is explicit on this point. Its authors state that the analysis operates at the population level (census tract, county, and state) and "cannot identify sources at particular addresses or risk at an individual level" 4. A high-risk tract still contains remediated and lead-free homes. A low-risk tract still contains pre-1940 houses with deteriorating lead paint. The map narrows where to look, but it cannot tell any single family whether their home has a hazard. That is what a physical test is for.
It predicts risk, not poisoning
The indices are correlates of exposure risk (old paint, concentrated poverty), not a measurement of lead in a child's blood. Our validation shows that the predicted surface tracks measured childhood blood lead at a moderate level (Spearman rho 0.54 in Michigan, 0.62 in Ohio, and 0.70 in metropolitan Milwaukee, Wisconsin). A correlation of that order is meaningful for an ecological model and weak as a basis for any individual prediction. Much variance remains unexplained, because exposure also depends on factors absent from the model: actual paint condition, renovation and disturbance history, water service-line material, soil, imported consumer goods, and occupational take-home lead. The map should be read as a way to set priorities, never as a count of poisoned children.
The input data carry a vintage lag
The map is only as current as the American Community Survey behind it. The 2018-2022 ACS 5-year estimates pool responses collected across the full five-year window of January 1, 2018 through December 31, 2022, so the housing and poverty picture is a multi-year average, not a snapshot of today 35. Housing stock changes slowly, which makes the pre-1940 share relatively stable, but poverty, occupancy, and demolition or renovation can shift faster than the data refresh. Any tract that has seen significant teardown, rehabilitation, or demographic turnover since the survey window will be characterized with a lag of several years. The EPA analysis faced the same constraint. That work relied on 2010 census inputs. We improve currency by moving to the 2018-2022 vintage, but we do not eliminate the lag 4.
Small and rural tracts are measured least precisely
ACS reliability degrades as geography shrinks, and the 5-year tract estimates that this map depends on are the Census Bureau's least precise published level. The Bureau ships a margin of error with every estimate for exactly this reason and urges caution where it is large 36. The problem is built into the sampling. In the 2007-2011 ACS the average tract had only about 135 completed interviews over five years, against an average of about 280 housing units in the 2000 long form, and tract-level margins of error run on average about 75 percent larger than the corresponding 2000 long-form figures 37. The 5-year estimates that absorbed the COVID-disrupted 2020 collection year carry wider margins still 38. The consequence falls hardest on low-population and rural tracts, where small samples widen the uncertainty band and a tract's percentile rank can be noisy. Sparse rural areas can also be physically large, so a single risk value may average over heterogeneous housing across many miles. Read rural risk on this map with more caution than urban risk, not less.
Validation correlations are spatially autocorrelated
The tracts and counties used to validate the map are not independent observations. Neighboring areas resemble one another in both predicted risk and measured blood lead, so the effective number of independent units is smaller than the raw count, and an ordinary bootstrap that ignores this reports intervals that are too narrow. We address it at the census-tract scale with a spatial block bootstrap that resamples whole geographic blocks, which widens the intervals to reflect the clustering, and we report Moran's I of the residual to show how much spatial structure remains (it is substantial, 0.49 to 0.57 at the tract scale, all permutation p near 0.001). At the county scale the resampling unit is already the county, but Missouri and New York still carry residual spatial autocorrelation, so their intervals are optimistic. None of this moves the point estimates; it widens the honest uncertainty around them, and it is why we do not lean on any single state.
Measured blood-lead data are scarce and uneven, which is the point
The reason a prediction map is needed is that ground-truth blood-lead data are incomplete and inconsistently available. Surveillance undercounts exposure because not all children are tested. CDC receives about 3 million blood-lead test results a year, and the agency states plainly that these data are "not a population-based estimate" and "are not representative of the United States or even of an entire state or county" 39. We test directly whether this targeting drives the result and find it does not: the measured values are rates, not counts, and controlling for the number of children tested leaves the correlations positive (Sensitivity to testing coverage, above), though a validation fully independent of who gets tested would need population-representative testing or a complete registry. Coverage and reporting vary by state, by insurer, and by year. That scarcity has two consequences for this work. First, our validation is limited to the few states that publish tract-resolvable measured data (Michigan and Ohio via the EPA paper's Supplement B; Wisconsin via the state's open ArcGIS service), so external validity to states with different housing eras and testing regimes is assumed, not proven. Second, the same scarcity is the public-health case for the map. It extends a neighborhood-level warning to the many jurisdictions (for example New Hampshire, Colorado, and Connecticut) where measured blood-lead data are locked in PDFs or Tableau dashboards and reachable only by FOIA. Where there is no public blood data, a validated prediction is the only neighborhood-level signal available.
Screening, not diagnosis
The honest framing for both the map and the test it points to is screening. CDC is explicit that even its blood-lead reference value of 3.5 micrograms per deciliter is "a screening tool," is "not health-based," and is "not a regulatory standard," and that no safe level of lead in children has been identified 40. A neighborhood risk map sits one step further from diagnosis than a blood test does. It flags places to investigate, and nothing more. It does not establish that a hazard exists at any address or that any child has been exposed. The right response to a high-risk tract is confirmation, not alarm. That means a home risk assessment, a dust-wipe or paint-chip analysis, or an XRF reading, followed by a blood-lead test for the child if a hazard is found. Used that way, the map does what a screen should do. It points limited inspection and testing resources toward the places most likely to need them, and it makes no claim it cannot support about any individual home or child.
Declarations
Competing interests. The author is the founder of Fluoro-Spec Inc. (DetectLead.com), which manufactures and sells a consumer lead-screening product, and also owns Spirochaete Research Labs, LLC, a research company with interests in lead-detection technology. Both companies have a commercial interest in lead detection and screening. The national risk map and the method described in this paper are built entirely from public data and are provided free of charge, with no login and no paywall. They did not influence the choice of data sources, the validation design, or the reported results, each of which is reproducible from public inputs by any independent party.
Funding. This work received no external or grant funding. It was conducted and supported internally by Fluoro-Spec Inc.
Data and code availability. All inputs are public. The risk model uses U.S. Census American Community Survey 2018-2022 five-year tables (B25034 and S1701, and for the residual analysis B25077, B19013, B25003, B25002, B02001, B03002, and B17001) retrieved through the public Census API. Validation uses publicly released state and federal childhood blood-lead surveillance: the CDC National Environmental Public Health Tracking Network county series, the Michigan and Ohio tract-level surveillance, the Wisconsin DHS open ArcGIS service, and New York State open health data. The analysis code, the derived tract- and county-level risk scores, and the state validation joins are archived at Zenodo under a CC-BY-4.0 license (DOI: 10.5281/zenodo.20531599). The interactive national map is at detectlead.com/lead-risk-map. No individual-level, identifiable, or access-restricted data were used.
Ethics and human subjects. This study used only aggregate, publicly available, de-identified data at the census-tract and county level. It involved no individual human subjects, no identifiable private information, and no intervention, and so did not require institutional review board approval. The analysis operates at the population (neighborhood) level and cannot determine exposure for any individual child or address.
Author contributions. E.C.R. conceived the study, assembled the public data, implemented the model and the validation, produced the figures and maps, and wrote the manuscript.
Use of artificial intelligence. An AI assistant (Anthropic's Claude) was used as a tool in this work: for retrieving and processing public data, for statistical and mapping code, for a first-pass literature search, and for drafting and copy-editing text. The author directed the work, wrote and audited the analysis pipeline, and re-ran it to confirm every reported figure. Each cited reference was independently verified against its PubMed or CrossRef record before inclusion. The author takes full responsibility for the entire content. No AI system is an author, in keeping with ICMJE and publisher policy, because an AI cannot be accountable for the work.
References
-
CDC, "CDC Updates Blood Lead Reference Value." https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ; "Update of the Blood Lead Reference Value, United States, 2021," MMWR 70(43). https://www.cdc.gov/mmwr/volumes/70/wr/mm7043a4.htm ^^^
-
U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates, Table B25034: Year Structure Built. https://data.census.gov/table/ACSDT5Y2022.B25034 ^
-
U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates, Table S1701: Poverty Status in the Past 12 Months. https://data.census.gov/table/ACSST5Y2022.S1701 ^
-
Zartarian, V. G., Xue, J., Poulakos, A. G., et al. (2024). "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology 58(7), 3311-3321. DOI: 10.1021/acs.est.3c07881. The authors report agreement of Cohen's kappa 0.49-0.63 against children's blood-lead hotspots from approximately 1.9 million Michigan tests (2006-2016) and 2.3 million Ohio tests (2005-2018), screened 73,086 census tracts containing at least one child under six, state the analysis operates at the population level and "cannot identify sources at particular addresses or risk at an individual level," and rely on 2010 census inputs. https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^
-
CDC, "About the Data: Blood Lead Surveillance" and "Childhood Blood Lead Surveillance: State Data" (machine-readable Tracking Network API; child counted once per year at highest result; ≥3.5 µg/dL classification for 2022-forward data). https://www.cdc.gov/lead-prevention/php/data/blood-lead-surveillance.html ^^^
-
McFarland MJ, Hauer ME, Reuben A. "Half of US population exposed to adverse lead levels in early childhood." PNAS 2022;119(11):e2118631119. https://www.pnas.org/doi/10.1073/pnas.2118631119 ^^
-
Brown EE, Lombard M, Chan A, Ayotte J, Rakowska S, Fuller-Thomson E. "Historical Atmospheric Lead Concentrations (1960-1974) and Memory Problems Half a Century Later." Alzheimer's and Dementia 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12726548/ ^
-
CDC, "Data and Statistics, Childhood Lead Poisoning Prevention." https://www.cdc.gov/lead-prevention/php/data/index.html ; "About the Data: Blood Lead Surveillance." https://www.cdc.gov/lead-prevention/php/data/blood-lead-surveillance.html ^^^
-
CDC Environmental Public Health Tracking, "Childhood Lead Poisoning." https://www.cdc.gov/environmental-health-tracking/php/data-research/childhood-lead-poisoning.html ^
-
Zartarian V, Xue J, Poulakos A, Tornero-Velez R, Stanek L, Snyder E, Helms Garrison V, Egan K, Courtney J. "A U.S. Lead Exposure Hotspots Analysis." Environmental Science and Technology 2024;58(7):3311-3321. DOI 10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^^
-
U.S. Census Bureau, American Community Survey table B25034, Year Structure Built. https://data.census.gov/table?q=B25034 ^
-
U.S. Census Bureau, American Community Survey subject table S1701, Poverty Status in the Past 12 Months. https://data.census.gov/table/ACSST1Y2022.S1701 ^
-
Washington State Department of Health. Lead Exposure Risk Index, data notes (housing-age and poverty measures from the ACS 2018-2022 5-year file, weighted equally and ranked into population deciles on a 1-to-10 scale). https://doh.wa.gov/data-and-statistical-reports/washington-tracking-network-wtn/lead-risk-and-exposure/lead-exposure-risk-ibl-data-notes ^^
-
Zartarian VG, Xue J, Poulakos AG, Tornero-Velez R, Stanek LW, Snyder E, Helms Garrison V, Egan K, Courtney JG. A U.S. Lead Exposure Hotspots Analysis. Environmental Science & Technology. 2024;58(7):3311-3321. doi:10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^
-
Zartarian VG, et al. A U.S. Lead Exposure Hotspots Analysis (full text). PMC10882963 (73,086 tracts in the 50 states with at least one child under six; RF v1 five-variable and RF v2 three-variable pre-1940 + pre-1950 + income-to-poverty models; 2010 Census geography and ACS 2013-2017 inputs; Michigan ~1.9M blood-lead points 2006-2016 and Ohio ~2.3M 2005-2018; Cohen's kappa 0.49-0.63; "no known safe level"). https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^
-
U.S. Census Bureau. American Community Survey 5-Year Data (2009-2023); developer API documentation; 2018-2022 release dated December 7, 2023. https://www.census.gov/data/developers/data-sets/acs-5year.html ^^^^
-
U.S. Census Bureau. Census Data API, ACS 2022 5-year, table B25034 (Year Structure Built) variable group: B25034_001E total, B25034_010E "Built 1940 to 1949", B25034_011E "Built 1939 or earlier". https://api.census.gov/data/2022/acs/acs5/groups/B25034.html ^^^
-
U.S. Census Bureau, American Community Survey 5-Year Estimates, Table S1701 "Poverty Status in the Past 12 Months." https://data.census.gov/table/ACSST5Y2022.S1701 ^^^
-
U.S. Census Bureau. Table B01003, Total Population, ACS 2022 5-year. https://data.census.gov/table/ACSDT5Y2022.B01003 ^^^
-
U.S. Census Bureau. Table B09001, Population Under 18 Years by Age (universe: population under 18 years), ACS 2022 5-year. https://censusreporter.org/tables/B09001/ ^^^
-
U.S. Centers for Disease Control and Prevention. Sources of lead exposure; lead in paint, dust, and soil; older housing (pre-1978) as the primary residential source. https://www.cdc.gov/lead-prevention/prevention/index.html ^^^^
-
Zartarian, V., Xue, J., Poulakos, A., Tornero-Velez, R., Stanek, L., Snyder, E., Helms Garrison, V., Egan, K., Courtney, J. (2024). "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology, 58(7), 3311–3321. DOI: 10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^
-
Zartarian et al. (2024), full text, PubMed Central PMC10882963 (73,086 tracts; Michigan ~1.9M tests 2006–2016; Ohio ~2.3M tests 2005–2018; kappa interpretation scale; reduced three-variable model kappa 0.51–0.63; "no known level of lead exposure to be without risk"). https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^^^
-
Stanek, L. W., Xue, J., Zartarian, V. G., et al. (2024). "Identification of high lead exposure locations in Ohio at the census tract scale using a generalizable geospatial hotspot approach." Journal of Exposure Science & Environmental Epidemiology, 34(4), 718–726 (Ohio tract-scale validation, 2005–2018 blood-lead, Cohen's kappa 0.54–0.64 observed hotspots vs predictive indices). PubMed Central PMC11303242. https://pmc.ncbi.nlm.nih.gov/articles/PMC11303242/ ^
-
U.S. Census Bureau, American Community Survey 5-Year Estimates, Table B25034 "Year Structure Built." https://data.census.gov/table/ACSDT5Y2022.B25034 ^
-
Wisconsin Department of Health Services, Environmental Public Health Tracking, childhood lead-poisoning data by census tract (children under 6 tested, positive, percent poisoned; suppressed when fewer than five children poisoned, unless 100 or more tested; published via ArcGIS). https://www.dhs.wisconsin.gov/epht/lead.htm ^^
-
New York State Department of Health, "Childhood Blood Lead Testing and Elevated Incidence by Zip Code: Beginning 2000," health.data.ny.gov, served via the Socrata Open Data API (SODA). https://health.data.ny.gov/Health/Childhood-Blood-Lead-Testing-and-Elevated-Incidenc/d54z-enu8 ^
-
CDC, "Update of the Blood Lead Reference Value, United States, 2021," MMWR 70(43):1509–1512 (BLRV lowered 5 → 3.5 µg/dL, October 2021). https://www.cdc.gov/mmwr/volumes/70/wr/mm7043a4.htm ^
-
Each validation uses the within-state Spearman correlation between the published national percentile P and measured blood lead. Because P is a monotone, rank-preserving transform of the composite score R, and Spearman correlation depends only on within-state ranks, the within-state rank correlation of P equals that of R. Ranking tracts nationally to publish the map therefore introduces no circularity and does not affect any within-state validation result. ^
-
Analysis of 4,690 Michigan and Ohio census tracts carrying both measured childhood blood lead (state surveillance aggregated to tract) and complete ACS 2018-2022 inputs. Models: gradient-boosted regression, five-fold cross-validation, population-weighted. Baseline inputs: pre-1940 share, pre-1950 share, poverty. Expanded adds median home value, median household income, renter share, vacancy rate, 1950-1979 housing share, percent Black, percent Hispanic. Residual = out-of-sample baseline prediction error; driver weights from standardized ridge regression on the residual. Reproducible from public Census API pulls and the published state blood-lead joins. ^
-
Sampson, R. J., and Winter, A. S. (2016). "The Racial Ecology of Lead Poisoning: Toxic Inequality in Chicago Neighborhoods, 1995-2013." Du Bois Review: Social Science Research on Race, 13(2), 261-283. DOI 10.1017/S1742058X16000151. ^
-
U.S. EPA, Lead and Copper Rule Revisions: community and non-transient non-community water systems were required to prepare and make publicly available an initial lead service line inventory by October 16, 2024. https://www.epa.gov/ground-water-and-drinking-water/lead-and-copper-rule-revisions ^
-
U.S. EPA (2023). Final determination that lead emissions from certain aircraft engines that operate on leaded fuel cause or contribute to air pollution that may reasonably be anticipated to endanger public health and welfare; piston-engine aircraft are the largest remaining source of lead emissions to air in the United States. https://www.epa.gov/regulations-emissions-vehicles-and-engines/regulations-onboard-diagnostics-and-lead-emissions-aircraft ^
-
Subramanian, S. V., Jones, K., Kaddour, A., and Krieger, N. (2009). "Revisiting Robinson: The perils of individualistic and ecologic fallacy." International Journal of Epidemiology 38(2), 342-360. DOI: 10.1093/ije/dyn359. On the hazard of reading area-level associations as individual-level associations. https://pmc.ncbi.nlm.nih.gov/articles/PMC2663721/ ^
-
U.S. Census Bureau. "2018-2022 ACS 5-Year Estimates" technical documentation and "Period Estimates in the American Community Survey." The 2018-2022 5-year estimates pool data collected from January 1, 2018 through December 31, 2022 and do not represent a single point in time. https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2022/5-year.html ^
-
U.S. Census Bureau. "Using American Community Survey Estimates and Margins of Error." Margins of error are published with each estimate so users can judge reliability, and users are urged to use caution where margins of error are high. https://www.census.gov/content/dam/Census/programs-surveys/acs/guidance/training-presentations/20180418_MOE_Webinar_Transcript.pdf ^
-
Spielman, S. E., Folch, D., and Nagle, N. (2014). "Patterns and causes of uncertainty in the American Community Survey." Applied Geography 46, 147-157. Tract-level ACS margins of error average about 75 percent larger than the corresponding 2000 long-form estimates; in the 2007-2011 ACS the average tract had about 135 completed surveys over five years, against an average of about 280 housing units in the 2000 long form. https://pmc.ncbi.nlm.nih.gov/articles/PMC4232960/ ^
-
U.S. Census Bureau (2022). "Increased Margins of Error in the 5-Year Estimates Containing Data Collected in 2020." The reduced 2020 response count raised relative margins of error and caused several key estimates to exceed the Bureau's quality threshold. https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-04.html ^
-
CDC, Childhood Lead Poisoning Prevention. "Childhood Blood Lead Surveillance: National Data." About 3 million blood-lead test results are received by CDC each year; the data are "not a population-based estimate" and "are not representative of the United States or even of an entire state or county." https://www.cdc.gov/lead-prevention/php/data/national-surveillance-data.html ^
-
CDC, Childhood Lead Poisoning Prevention. "CDC Updates Blood Lead Reference Value." The 3.5 ug/dL reference value is a screening tool, is not health-based, and is not a regulatory standard; no safe level of lead in children has been identified. https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ^