White paper

Predicting childhood lead-exposure risk from public data

Built on the EPA's work

This paper builds on the housing-age-plus-poverty approach that the U.S. EPA lead-exposure hotspots analysis of Zartarian et al. (2024), Environmental Science & Technology established and validated against roughly 4.2 million children's tests. Our contribution is to make that approach a free, current national map for every neighborhood, and to test it against measured childhood blood lead in eight states. The credit for establishing and validating the underlying approach is theirs; the open map and the multi-state validation are ours.

In preparation for preprint submission · medRxiv / EarthArXiv
83,388
census tracts scored, every U.S. neighborhood
8
states checked against real blood lead
0.48-0.77
rank correlation, predicted vs measured
$0
free, no login, no paywall

Abstract

Background. Childhood lead exposure causes irreversible neurodevelopmental harm, and no safe blood lead level in children has been identified 1. Measured childhood blood lead is collected unevenly across U.S. states, and many areas publish no neighborhood-level surveillance. Zartarian et al. (2024), at the U.S. Environmental Protection Agency, showed that housing age and poverty predict where childhood lead exposure concentrates. They validated that approach against roughly 4.2 million children's blood-lead tests 4, and we build on their method.

Methods. We computed a lead-exposure risk score for all 83,388 U.S. census tracts and 3,222 counties and county-equivalents from American Community Survey 2018-2022 five-year estimates. The score combines a housing-age index (table B25034, three construction-era shares weighted toward older stock) with poverty (table S1701) 23. The housing-age score and poverty are z-scored, weighted 0.58 to 0.42, summed, and percentile-ranked. We compared the predicted score against measured childhood blood lead using Spearman's rank correlation (rho), a nonparametric measure of monotonic association from -1 to 1. Tract-scale confidence intervals come from a spatial block bootstrap, because neighboring tracts are spatially autocorrelated and an ordinary bootstrap understates the interval.

Results. At the census-tract scale, predicted risk correlated with measured elevated blood lead at rho = 0.54 (95% CI 0.42 to 0.64) in Michigan and 0.62 (0.54 to 0.68) in Ohio. Because those are the two states whose surveillance the source study itself used, we treat them as a consistency check rather than an independent test. Against surveillance the source study never used, predicted risk correlated at 0.70 (0.52 to 0.80) in metropolitan Milwaukee, Wisconsin, the one independent test at the census-tract scale, and at the county scale in five further states, from 0.77 (0.53 to 0.89) in New Jersey to 0.48 (0.20 to 0.69) in New York. Tract-scale intervals are from a spatial block bootstrap; the county-scale intervals are not corrected for spatial autocorrelation and, in two of the five states, are optimistic. We report the rank correlation, not the source study's Cohen's kappa of 0.49 to 0.63 (a different quantity, for binary hotspot agreement), and keep the two separate. Even a flexible model fit to the same public inputs explains only about half the tract-level variance in the two best-measured states, so housing age and poverty alone leave most of the variation unexplained.

Conclusions. A free, current, neighborhood-level lead-risk map, reproducible from public data, ranks U.S. neighborhoods in moderate agreement with measured childhood blood lead, including in states that publish no surveillance of their own. It is a screening layer that shows where confirmatory testing should go first. It does not diagnose exposure in any individual child or home.

Keywords: childhood lead exposure; ecological risk model; American Community Survey; blood-lead surveillance; geospatial screening; environmental justice.

Introduction

The problem has no safe threshold

Childhood lead exposure is a settled and quantified public-health failure, not an open question. The U.S. Centers for Disease Control and Prevention reports no safe level of lead in a child's blood, and that even low levels can affect IQ, attention, and academic achievement 5. In 2021 the CDC lowered its blood lead reference value from 5.0 to 3.5 micrograms per deciliter, set at the 97.5th percentile of the blood-lead distribution among U.S. children ages 1 to 5, so that more children with comparatively higher levels would be identified 1. The agency notes that the reference value is not health-based and not a regulatory standard. A result below it does not mean a child is unharmed 1.

The aggregate damage is large and already incurred. McFarland, Hauer, and Reuben (2022), in the Proceedings of the National Academy of Sciences, estimate that childhood exposure to leaded gasoline cost the living U.S. population roughly 824 million cumulative IQ points, an average of about 2.6 points per person across the population. They further estimate that more than 170 million Americans alive in 2015, over half the population, had childhood blood-lead levels above the clinical concern threshold of their era, and that the most exposed birth cohort (1966 to 1970) lost roughly six points each 6. Lead added to gasoline beginning in 1923 was not banned for on-road use until 1996, so most adults born before that date carry a measurable childhood-exposure burden 6. The burden may also persist into old age. Brown and colleagues (2025), in Alzheimer's and Dementia, link historical atmospheric lead, mapped at its 1960 to 1974 peak, to higher odds of memory problems half a century later in two large representative samples 7. The exposure pathway has shifted, but it has not closed. Leaded paint in housing built before the 1978 residential ban remains a dominant present-day source of childhood exposure, and the CDC has estimated that about 500,000 U.S. children ages 1 to 5 had blood-lead levels at or above the older 5.0 microgram-per-deciliter reference value, a count that is higher at the current 3.5 reference value 5.

The surveillance gap

Knowing that lead harms children at any dose is not the same as knowing which children to protect. The United States has no complete, population-representative measurement of childhood blood lead. The CDC receives about 3 million blood-lead test results per year, a fraction of the roughly 22 million children under six 8. But the selection matters more than the count. Testing is deliberately concentrated on children judged to be at higher risk, so the reported surveillance data are, in the CDC's own words, "not a population-based estimate" and "not representative of a whole county or a whole state" 8. The agency points anyone seeking nationally representative prevalence to NHANES, a survey designed for estimation rather than for locating individual neighborhoods at risk 8. The result is a systematic blind spot. A child who is never tested, in a place where few children are tested, never appears in the data, and the places with the least testing are not randomly distributed.

The measured data that do exist are held by states, not the federal government, and access to them is uneven by jurisdiction. Some states publish blood-lead surveillance through open data interfaces, and the CDC Environmental Public Health Tracking Network carries county-level elevated-blood-lead measures for participating states 9. Others lock the same information in static reports or dashboards that require formal records requests to obtain at usable spatial resolution. In this patchwork, the resolution of the available evidence depends less on where the hazard is than on each state's data-publishing posture. A family, clinician, or local health department in a low-publishing state has no neighborhood-level signal at all, even though the underlying housing and poverty drivers of risk are present and measurable there.

Why a prediction-first national map is needed

When direct measurement is incomplete and unevenly accessible, the established response is to predict risk from variables that are measured everywhere. The U.S. Environmental Protection Agency took exactly this approach. Zartarian et al. (2024), in Environmental Science and Technology, screened 73,086 census tracts containing at least one child under six in the 50 states and modeled lead-exposure risk from publicly available indicators, mainly the age of housing and poverty, because surveillance and environmental-data gaps make disproportionately exposed communities hard to find by measurement alone 10. They evaluated the predicted hotspots against approximately 1.9 million Michigan blood-lead results (2006 to 2016) and approximately 2.3 million Ohio results (2005 to 2018), and found moderate-to-substantial agreement, Cohen's kappa 0.49 to 0.63 10. They also found that a reduced model built on three variables (the share of homes built before 1940, the share built before 1950, and poverty) predicted hotspots about as well as the full model, and they restate the premise that "there is no known level of lead exposure to be without risk" 10.

That EPA result establishes the method but does not deliver a usable public instrument, even though its inputs are open and national. The American Community Survey publishes year-structure-built down to the census-tract level in table B25034 and poverty status in subject table S1701 1112, so the same housing-plus-poverty risk signal can be computed for every tract in the country, including the states where no measured blood-lead data are publicly accessible. A prediction-first map turns an indicator that depends on each state's reporting choices into a uniform, neighborhood-level warning that exists everywhere the Census reaches. The map does not diagnose any individual child and does not replace a blood test. It is a screening layer that shows where measured testing and on-the-ground hazard confirmation should go first.

To be explicit about what is and is not new here, the prediction method is not ours. Zartarian et al. established that housing age and poverty predict childhood lead exposure and validated it in Michigan and Ohio. This paper adds three contributions their study did not provide. First, a free, current, tract-level risk surface computed for the entire country, the usable public instrument their result implied but did not deliver. Second, an independent test of the method against measured childhood blood lead in six states their study never used, which is the central evidence here, because agreement on data the method was not derived from is not guaranteed. Third, a reproducible pipeline with the code and derived data released, so any reader can rebuild and re-check the result. We treat the prior finding as a method to be re-tested, not as settled ground. The Michigan and Ohio comparisons reuse the source study's own surveillance and are therefore a consistency check, while Wisconsin and the five county states are genuine out-of-sample tests.

Prior Work and Intellectual Lineage

The approach we describe here, predicting childhood lead-exposure risk from publicly available housing and socioeconomic data, did not originate with us, nor with any single recent effort. It rests on three decades of epidemiology, survey science, geographic analysis, and applied machine learning carried out by public health departments, federal agencies, university research groups, and investigative journalists. We summarize that lineage below.

Early geographic and area-based risk models

Sargent and colleagues first put this on a quantitative footing. In a logistic analysis of 238,275 Massachusetts children, Sargent et al. (1995) found that the percentage of housing built before 1950, per-capita income, the percentage of residents who were Black, and a poverty index were each independently associated with community lead-poisoning rates. This is the direct methodological ancestor of nearly every housing-age-plus-poverty index that followed.

Sargent JD, Brown MJ, Freeman JL, Bailey A, Goodman D, Freeman DH Jr. "Childhood lead poisoning in Massachusetts communities: its association with sociodemographic and housing characteristics." American Journal of Public Health. 1995;85(4):528–534. DOI: 10.2105/ajph.85.4.528. PMID: 7702117.

A closely related census-tract analysis of 17,956 children across Providence County, Rhode Island found that the share of houses built before 1950 carried the largest adjusted association with the proportion of children with elevated blood lead, with vacant housing also a strong predictor.

Sargent JD, et al. "Census tract analysis of lead exposure in Rhode Island children." Environmental Research. 1997;74(2):159–168. PMID: 9339229.

As geographic information systems matured, several groups translated these statistical associations into operational screening and targeting tools. Reissman et al. (2001) demonstrated the use of GIS to link blood-lead data and housing age in support of health-department decisions about prevention activities. Roberts et al. (2003), working in Charleston County, South Carolina, geocoded tax-assessor housing records and found that children in pre-1950 housing were roughly 3.9 times as likely to have an elevated blood lead level as children in post-1977 housing.

Reissman DB, Staley F, Curtis GB, Kaufmann RB. "Use of geographic information system technology to aid Health Department decision making about childhood lead poisoning prevention activities." Environmental Health Perspectives. 2001;109(1):89–94. DOI: 10.1289/ehp.0110989. PMID: 11171530.

Roberts JR, Hulsey TC, Curtis GB, Reigart JR. "Using geographic information systems to assess risk for elevated blood lead levels in children." Public Health Reports. 2003;118(3):221–229. DOI: 10.1093/phr/118.3.221. PMID: 12766217.

The most spatially ambitious strand of this work came from the Children's Environmental Health Initiative led by Marie Lynn Miranda. Miranda, Dolinoy, and Overstreet (2002) combined blood-lead screening, county tax-assessor housing-age data, and census data into GIS models intended to direct prevention programs. Kim, Galeano, Hull, and Miranda (2008) then resolved risk to the individual tax parcel across eighteen North Carolina counties and offered a framework for replicating such models elsewhere, the finest-grained housing-age risk modeling in the literature before Flint.

Miranda ML, Dolinoy DC, Overstreet MA. "Mapping for prevention: GIS models for directing childhood lead poisoning prevention programs." Environmental Health Perspectives. 2002;110(9):947–953. DOI: 10.1289/ehp.02110947. PMID: 12204831.

Kim D, Galeano MA, Hull A, Miranda ML. "A framework for widespread replication of a highly spatially resolved childhood lead exposure risk model." Environmental Health Perspectives. 2008;116(12):1735–1739. DOI: 10.1289/ehp.11540. PMID: 19079729.

Akkus and Ozdenerol (2014) later reviewed this body of GIS-based work and treated the area-based risk-index tradition as a coherent subfield.

Akkus C, Ozdenerol E. "Exploring Childhood Lead Exposure through GIS: A Review of the Recent Literature." International Journal of Environmental Research and Public Health. 2014;11(6):6314–6334. DOI: 10.3390/ijerph110606314. PMID: 24945189.

The Lead Exposure Risk Index developed by the Washington State Department of Health (first version, 2016) is where this research literature meets present-day public tools. It scores each census tract by combining the age of the housing stock with the share of households at or below 125 percent of the federal poverty level. This index has become the most widely replicated operational formulation of the housing-age-plus-poverty approach.

Washington State Department of Health, Childhood Lead Poisoning Prevention Program. Lead Exposure Risk Index. 2016. Available at: https://doh.wa.gov/data-and-statistical-reports/washington-tracking-network-wtn/lead-risk-and-exposure

National lead-paint surveys and the housing-age evidence base

These models weight housing age heavily because national surveys establish how the probability of lead-based paint varies by construction era. The original U.S. Environmental Protection Agency National Survey (1995) is the source of the canonical by-era probabilities, on the order of 87 percent of units built before 1940, roughly 69 percent for 1940–1959, and roughly 24 percent for 1960–1977.

U.S. Environmental Protection Agency. Report on the National Survey of Lead-Based Paint in Housing (Base Report). EPA 747-R-95-003. 1995. Available at: https://www.epa.gov/sites/default/files/documents/r95-003.pdf

Jacobs et al. (2002), reporting on the HUD National Survey of Lead and Allergens in Housing, estimated from a nationally representative sample that 38 million U.S. homes contained lead-based paint and 24 million had significant lead-based-paint hazards, with markedly higher prevalence in the Northeast and Midwest.

Jacobs DE, Clickner RP, Zhou JY, Viet SM, Marker DA, Rogers JW, Zeldin DC, Broene P, Friedman W. "The prevalence of lead-based paint hazards in U.S. housing." Environmental Health Perspectives. 2002;110(10):A599–A606. DOI: 10.1289/ehp.021100599. PMID: 12361941.

The Department of Housing and Urban Development's American Healthy Homes Surveys updated these national estimates. The second survey (AHHS II, 2021), based on fieldwork in 2018–2019, estimated that roughly 34.6 million homes (about 29.4 percent of housing units) contain lead-based paint and that about 21.9 million homes have dust-lead hazards under the 2019 standard.

U.S. Department of Housing and Urban Development, Office of Lead Hazard Control and Healthy Homes. American Healthy Homes Survey II: Lead Findings. 2021. Summary available at: https://www.huduser.gov/portal/pdredge/pdr-edge-trending-030822.html

Epidemiology of low-level lead exposure

The urgency behind this prediction work comes from a separate body of epidemiology showing that lead harms children at progressively lower exposures. Needleman et al. (1979) established subclinical harm by relating dentine lead levels in deciduous teeth to deficits in cognitive and classroom performance.

Needleman HL, Gunnoe C, Leviton A, Reed R, Peresie H, Maher C, Barrett P. "Deficits in psychologic and classroom performance of children with elevated dentine lead levels." New England Journal of Medicine. 1979;300(13):689–695. DOI: 10.1056/NEJM197903293001301. PMID: 763299.

Subsequent work pressed the harm below the long-standing 10 µg/dL action level. Lanphear et al. (2000), using NHANES III data on 4,853 children, observed cognitive deficits at blood lead concentrations under 10 µg/dL, and Canfield et al. (2003) found, in a cohort of 172 children, an even steeper inverse relationship between blood lead and IQ within that low range.

Lanphear BP, Dietrich K, Auinger P, Cox C. "Cognitive deficits associated with blood lead concentrations <10 µg/dL in US children and adolescents." Public Health Reports. 2000;115(6):521–529. DOI: 10.1093/phr/115.6.521. PMID: 11354334.

Canfield RL, Henderson CR Jr, Cory-Slechta DA, Cox C, Jusko TA, Lanphear BP. "Intellectual impairment in children with blood lead concentrations below 10 µg per deciliter." New England Journal of Medicine. 2003;348(16):1517–1526. DOI: 10.1056/NEJMoa022848. PMID: 12700371.

The international pooled analysis by Lanphear et al. (2005), combining seven prospective cohorts totaling 1,333 children, found measurable intellectual deficits even among children whose blood lead never exceeded 7.5 µg/dL and reported no evidence of a threshold below which lead is safe. If harm has no floor, the case for finding at-risk children before exposure rather than after is straightforward.

Lanphear BP, Hornung R, Khoury J, Yolton K, Baghurst P, Bellinger DC, Canfield RL, Dietrich KN, Bornschein R, Greene T, Rothenberg SJ, Needleman HL, Schnaas L, Wasserman G, Graziano J, Roberts R. "Low-level environmental lead exposure and children's intellectual function: an international pooled analysis." Environmental Health Perspectives. 2005;113(7):894–899. DOI: 10.1289/ehp.7688. PMID: 16002379.

Public mapping efforts

Journalists and civic-data teams brought area-based risk estimation to a broad public audience, often by directly adopting the Washington State methodology. In 2016, Frostenson and Kliff at Vox published a national neighborhood map assigning each census tract a risk decile from housing age and poverty, with open-source code, explicitly replicating the Washington State Department of Health approach.

Frostenson S, Kliff S. "The risk of lead poisoning isn't just in Flint. So we mapped the risk in every neighborhood in America." Vox. April 2016. Available at: https://www.vox.com/a/lead-exposure-risk-map. Code: https://github.com/voxmedia/data-projects/tree/master/vox-lead-exposure-risk

In the same year, Pell and Schneyer at Reuters compiled state blood-lead surveillance into an interactive map and identified thousands of census tracts and ZIP areas with lead-poisoning prevalence at least double that observed in Flint.

Pell MB, Schneyer J. "Off the Charts: The thousands of U.S. locales where lead poisoning is worse than in Flint." Reuters Investigates. December 2016. Available at: https://www.reuters.com/investigates/special-report/usa-lead-testing/

The same methodological lineage was subsequently institutionalized in two widely used data platforms. The City Health Dashboard, produced by NYU Langone Health, publishes a Lead Exposure Risk Index that bins housing units by construction era, weights them by lead likelihood, and combines them with the share of households at or below 125 percent of the federal poverty level, stating that its method was based on the Washington State Department of Health and Vox Media work. PolicyMap hosts a closely related "Risk of lead exposure" layer under an attribution to the same sources.

City Health Dashboard, NYU Langone Health, Department of Population Health. Lead Exposure Risk Index. Available at: https://www.cityhealthdashboard.com/metric/lead-exposure-risk-index

PolicyMap. Risk of Lead Exposure (attributed to Washington State Department of Health, Vox Media, and PolicyMap). Available at: https://www.policymap.com/data/sources/washington-state-department-of-health-vox-media-policymap

House-level and address-level machine learning

A final strand moved from area-based indices to predictive models resolved to the individual child, address, or parcel. Potash et al. (2015), a collaboration between the University of Chicago's Data Science for Social Good group and the Chicago Department of Public Health, built per-child and per-address risk models from historical blood-lead results and building characteristics, the address-level precedent later work built on.

Potash E, Brew J, Loewi A, Majumdar S, Reece A, Walsh J, Rozier E, Jorgenson E, Mansour R, Ghani R. "Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning." In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). 2015:2039–2047. DOI: 10.1145/2783258.2788629.

Abernethy, Chojnacki, Farahi, Schwartz, and Webb (2018) extended parcel-level prediction to lead service lines in Flint, Michigan, using property age, value, location, and city records, work that became the basis of the BlueConduit effort.

Abernethy J, Chojnacki A, Farahi A, Schwartz EM, Webb J. "ActiveRemediation: The Search for Lead Pipes in Flint, Michigan." In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '18). 2018:5–14. DOI: 10.1145/3219819.3219896.

Relation to the present work and to recent federal synthesis

The federal work of Zartarian and colleagues draws on all of these traditions and is the immediate basis for our paper. Their 2022 analysis catalogs and compares the prior housing-age and sociodemographic indices, and their 2024 hotspots analysis applies a random-forest model validated against several such indices.

Zartarian V, Poulakos A, Garrison VH, Spalt N, Tornero-Velez R, Xue J, Egan K, Courtney J. "Lead Data Mapping to Prioritize US Locations for Whole-of-Government Exposure Prevention Efforts." American Journal of Public Health. 2022;112(S7):S658–S669. DOI: 10.2105/AJPH.2022.307051.

Zartarian V, et al. "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology. 2024;58(7):3311–3321. DOI: 10.1021/acs.est.3c07881.

This body of prior scholarship spans the founding area-based regressions, the national survey infrastructure, the low-level-harm epidemiology, the public maps, and the house-level machine-learning models. We claim no new paradigm; this work is one further incremental step along a path many others laid down.

Methods

Overview and design rationale

We constructed a national, tract-level index of predicted childhood lead-exposure risk from two publicly documented determinants: the age of the housing stock and the prevalence of poverty. The choice of inputs is not novel. It follows the housing-age-plus-poverty approach the Washington State Department of Health uses for its Lead Exposure Risk Index, which combines ACS 5-year housing-age and poverty measures into a single community-level score 13. The same two determinants anchor the indices the U.S. Environmental Protection Agency screened in its national hotspots analysis, where a reduced three-variable random-forest model (percent of homes built before 1940, percent built before 1950, and a family income-to-poverty measure) reproduced the hotspot pattern of the full five-variable model 1415. Lead-based residential paint was banned for consumer use in 1978, so housing age is a direct proxy for leaded paint and the dust it generates; poverty proxies both deferred paint maintenance and reduced remediation capacity 21. We chose these inputs because both are published for every census tract on the same recent vintage. That uniform national coverage lets a single method reach states that publish no measured blood-lead data.

Our contribution is not the index form but its national reconstruction on the most recent 5-year American Community Survey (ACS) vintage, and its tract-by-tract validation against measured childhood blood lead, reported separately. The EPA hotspots analysis was built on 2010 Census geography and ACS 2013-2017 5-year inputs 15. We rebuilt the index on ACS 2018-2022 5-year estimates 16, which moves the housing and poverty measures forward roughly a decade and re-bases the geography on 2020-vintage census tracts.

Data sources

All inputs are American Community Survey 2018-2022 5-year estimates, the vintage released December 7, 2023 and current at the time of analysis 16. The 5-year file is the only ACS product published down to the census-tract level for the full universe of tracts. It is the correct vintage for small-area estimates because the 1-year file does not tabulate most tracts 16. We pulled four tables.

Purpose ACS table Type Key variables used
Housing age B25034, Year Structure Built Detail B25034_001E (total units); B25034_011E (pre-1940); B25034_010E, B25034_009E (1940 to 1959); B25034_008E, B25034_007E (1960 to 1979) 17
Poverty S1701, Poverty Status in the Past 12 Months Subject S1701_C01_001E (population for whom poverty status is determined), S1701_C03_001E (percent below poverty level) 18
Total population B01003, Total Population Detail B01003_001E (total population) 19
Children under 18 B09001, Population Under 18 Years by Age Detail B09001_001E (population under 18 years) 20

B25034 reports occupied and vacant housing units by the decade built. The method uses its pre-1940 category (B25034_011E) together with the 1940-to-1959 and 1960-to-1979 decade categories (B25034_007E through B25034_010E), which give the three construction-era shares the housing-age score requires 17. S1701 is the ACS subject table for poverty status. It publishes the percent-below-poverty estimate directly as S1701_C03_001E over the universe "population for whom poverty status is determined" (S1701_C01_001E), so no separate denominator is needed 18. B01003 supplies total population and B09001 supplies the count of residents under 18, used for population-weighting and for the child-burden overlay rather than for the risk score itself 1920.

Census API pipeline

Data were retrieved programmatically from the Census Data API. Detailed tables (B25034, B01003, B09001) were requested from the ACS 2022 5-year detailed-tables endpoint and the subject table (S1701) from the parallel subject endpoint 16:

https://api.census.gov/data/2022/acs/acs5?get=NAME,group(B25034)&for=tract:*&in=state:{FIPS}&key={KEY} https://api.census.gov/data/2022/acs/acs5/subject?get=NAME,group(S1701)&for=tract:*&in=state:{FIPS}&key={KEY}

The API caps tract-level wildcard queries at one state per call. The pipeline therefore iterated in=state:{FIPS} across the 50 states and the District of Columbia for tracts (plus Puerto Rico for counties), then concatenated the responses. County-level inputs were retrieved with for=county:*. Estimates were joined to 11-digit tract GEOIDs (2-digit state + 3-digit county + 6-digit tract) and to 5-digit county GEOIDs. We dropped records before scoring if they had a missing or zero housing-unit denominator (B25034_001E) or carried the Census sentinel values for suppressed or unestimable cells. This is what reduces the raw tract universe to the scored set described below. Margins of error are published for every estimate (the _M-suffixed variables) and were retained but not propagated into the point score 17.

Index construction

The score combines housing age and poverty, both from the American Community Survey 2018-2022 five-year estimates. Housing age enters as three construction-era shares, weighted so that older stock counts for more, because the probability that a home contains lead-based paint rises sharply with age. We define the inputs, give the exact Census fields, then state the model. Each quantity is computed once per census tract and once per county.

  • s40: the share of housing units built in 1939 or earlier, the oldest and highest-lead stock.
  • s4059: the share built between 1940 and 1959.
  • s6079: the share built between 1960 and 1979, the last era before the 1978 residential lead-paint ban.
  • poverty: the share of people living below the federal poverty line.
Input ACS source table Computation from published Census fields
s40 B25034 (Year Structure Built) B25034_011E / B25034_001E
s4059 B25034 (Year Structure Built) (B25034_009E + B25034_010E) / B25034_001E
s6079 B25034 (Year Structure Built) (B25034_007E + B25034_008E) / B25034_001E
poverty S1701 (Poverty Status) S1701_C03_001E / 100

The three era shares are combined into a single housing-age score on a 0-to-100 scale that weights older construction more heavily:

\[ A = 100\,\bigl(0.619\,s_{40} + 0.309\,s_{4059} + 0.075\,s_{6079}\bigr) \]

The band weights (0.619, 0.309, 0.075) decline with construction age, a monotone weighting toward older stock set a priori from the established rise in lead-paint prevalence in older housing 21, not fitted to blood lead. The weight-sensitivity analysis below shows the validation is insensitive to the exact split as long as older stock is weighted more heavily. The housing-age score and poverty are each standardized to a z-score across the scored national universe, so the two share a common scale before they are weighted:

\[ z(x) = \dfrac{x - \bar{x}}{s_x} \]

The standardized housing-age score and standardized poverty are combined into a composite risk score, which is converted to a within-nation percentile rank:

\[ R = 0.58\,z(A) + 0.42\,z(\mathrm{poverty}) \]
\[ P = 100 \times \dfrac{\operatorname{rank}(R)}{N} \]

Here A is the housing-age score, z(x) is the standardization above, R is the composite risk score, rank(R) is the ascending rank of R across the N scored units, and P is the published 0-to-100 score. A unit at the 90th percentile carries higher predicted risk than 90 percent of scored units nationally.

The 0.58/0.42 composite split weights housing age above poverty, reflecting that housing age is the proximate source of the lead while poverty modifies exposure and remediation 21. Neither the band weights nor the composite split is fitted to the blood-lead data. The Washington State Department of Health index weights housing age and poverty and bins the result to deciles; we keep the full 0-to-100 resolution 13. To confirm the result does not depend on these choices, we re-derived the index across a grid of weightings and recomputed every validation correlation, reported under Weight sensitivity below. Because the score is ordinal by construction, the validation uses rank correlation.

After dropping records with a missing or zero housing denominator and Census-suppressed cells (described above), the method scored 3,222 counties and 83,388 census tracts. This is the subset of the full 2018-2022 ACS tract universe for which both a valid housing-age distribution and a valid poverty estimate exist. For comparison, the EPA hotspots analysis screened 73,086 tracts on the older ACS 2013-2017 geography, restricted to tracts in the 50 states containing at least one child under six years old 15. Our larger count reflects the newer 2020-vintage tract geography and the inclusion of tracts regardless of child presence. The 83,388 scored tracts cover the 50 states and the District of Columbia. The 3,222 counties and county-equivalents also include the District of Columbia and Puerto Rico's municipios, which is why Puerto Rico appears in the county outputs but not the tract outputs. The scored national surface is shown in Figure 1.

National lead-risk map
Figure 1. The national lead-exposure risk map, shown at the county scale. Every U.S. county and census tract is scored from public Census housing-age and poverty data and percentile-ranked; counties are shaded from light (lower predicted risk) to dark green (higher predicted risk). Older-housing regions of the Northeast, Midwest, Great Plains, and Appalachia rank highest, and the newer-built West and Sun Belt rank lower. Projection is Albers USA, with Alaska and Hawaii repositioned. The full interactive version, including the census-tract layer, is freely available online.

We did not impose a child-presence filter on the score itself; instead, the B09001 under-18 count and B01003 total population are carried alongside each scored tract as an exposed-population overlay 1920.

Scope and interpretation

The output is a prediction of relative risk from housing age and poverty. It does not measure lead in any specific home, and it does not diagnose any child. It is a screening surface meant to direct confirmatory testing, consistent with EPA's own statement that "there is no known level of lead exposure to be without risk" and with the use of these indices as targeting tools rather than exposure measurements 141521. The entire pipeline is reproducible from the four ACS tables and the public Census API with no restricted or licensed inputs.


Footnotes / sources


Validation

A risk map is only worth deploying if it agrees with where children actually carry elevated blood lead. We tested ours in two stages. First we summarize the federal anchor: the EPA hotspots analysis, which validated housing-and-poverty indices against roughly 4.2 million measured childhood blood-lead tests in two states. Then we test our published map against measured childhood blood lead in eight states, in a design with three distinct tiers. Michigan and Ohio, at the census-tract scale, reuse the source study's own surveillance, so they are a consistency check on its ground truth, not an independent test. Wisconsin (metropolitan Milwaukee), also at the census-tract scale, is the one independent test at tract resolution, on surveillance the source study never used. Five more states (New Jersey, Illinois, Missouri, Iowa, and New York) are independent tests at the coarser county scale. We report the tract and county scales separately and never pool them into a single figure. The two stages also use different statistics: the federal work reports Cohen's kappa for binary hotspot agreement, and we report the Spearman rank correlation between continuous predicted risk and continuous measured exposure. We keep the two separate and do not convert one into the other.

The federal anchor: Zartarian et al. (2024)

The scientific foundation is Zartarian et al., "A U.S. Lead Exposure Hotspots Analysis," published in Environmental Science & Technology in 2024 22. EPA's Office of Research and Development screened 73,086 census tracts containing at least one child under six across all 50 states, scoring each tract on lead-exposure indices built from housing age and sociodemographic data drawn from the American Community Survey 23.

The validation against measured childhood blood lead, not the prediction by itself, is what gives that paper its weight. EPA held the predicted hotspots against measured childhood blood-lead surveillance in two states with unusually complete records:

  • Michigan: approximately 1.9 million blood-lead results from children under six, covering 2006 to 2016 23.
  • Ohio: approximately 2.3 million blood-lead results from children under six, covering 2005 to 2018 23.

Across those roughly 4.2 million measured tests, the predicted hotspots agreed moderately to substantially with where children actually carried elevated blood lead, at Cohen's kappa of 0.49 to 0.63 22. EPA read kappa on a fixed scale: below 0.4 low, 0.4 to 0.6 moderate, 0.6 to 0.8 substantial, above 0.8 near-perfect 23. A companion tract-scale study of Ohio by the same EPA group reports a comparable band, kappa 0.54 to 0.64 comparing observed blood-lead hotspots against the predictive indices across the 3.5, 5, and 10 µg/dL reference values 24.

Two further results matter for anyone building on it. First, a reduced three-variable model, using only percent of homes built before 1940, percent built before 1950, and the percent of families with an income-to-poverty ratio above 2, performed comparably to the full five-variable model, holding kappa at 0.51 to 0.63 across the Michigan and Ohio datasets 23. Housing age plus poverty account for most of the predictive power, which is what makes a transparent, reproducible national map possible from public data alone. Second, the authors anchor the entire effort in the established toxicology, stating plainly that "there is no known level of lead exposure to be without risk" 23. A screening map does not need to find a safe threshold, because none exists; it only has to rank where exposure concentrates.

Our method, in brief

Our national map combines housing age and poverty, the two determinants the source study found carry most of the predictive signal. Housing age comes from ACS table B25034 (Year Structure Built) as three construction-era shares weighted toward older stock, because older homes carry the highest lead-paint burden 25. Poverty comes from ACS table S1701 (Poverty Status in the Past 12 Months) 18. We standardize the housing-age score and poverty, weight them 0.58 to 0.42, and sum. Tracts are then percentile-ranked within the national pool of 83,388 tracts; for the county-scale validation the same index is computed on county totals and ranked within the national pool of 3,222 counties. This follows the Washington State Department of Health housing-age-plus-poverty lineage rather than introducing a new index. The full equations are in Methods.

Validation at the census-tract scale

For each state we joined predicted tract-level risk to measured childhood blood lead at the tract level, then computed the Spearman rank correlation 29. Neighboring tracts are spatially autocorrelated, so an ordinary bootstrap that resamples tracts one at a time treats correlated observations as independent and returns an interval that is too narrow. We instead use a spatial block bootstrap: tracts are partitioned into compact geographic blocks by a quantile grid on their centroids, and whole blocks are resampled with replacement, 3,000 times. We also report Moran's I of the regression residual (observed on predicted) under queen contiguity, with a permutation p-value, as a direct measure of the leftover spatial structure the block bootstrap is correcting for.

What counts as independent differs by state. For Michigan and Ohio the measured values come from the same surveillance the source study used, so these two are a reproduction of the source result on its own ground truth, a consistency check rather than an independent test. Wisconsin is an independent test: its measured tract data come from the state's open ArcGIS service, which we pulled live with no records request.

State Measured source Tracts (n) Spearman ρ (95% CI)
Michigan source-study surveillance, 2006–2016 (consistency check) 2,156 0.54 (0.42 to 0.64)
Ohio source-study surveillance, 2005–2018 (consistency check) 2,534 0.62 (0.54 to 0.68)
Wisconsin (metro Milwaukee) WI DHS open ArcGIS, children under 6 208 0.70 (0.52 to 0.80)

All three correlations are positive, with spatial block bootstrap intervals well clear of zero. The residual carries strong positive spatial autocorrelation in every case (Moran's I 0.49 in Michigan, 0.53 in Ohio, 0.57 in Wisconsin, all permutation p ≈ 0.001), which is exactly why the ordinary tract-level bootstrap intervals (0.51 to 0.57, 0.59 to 0.64, and 0.62 to 0.76) are too narrow and the wider block intervals above are the honest ones. A county-block bootstrap, resampling whole counties rather than grid cells, gives intervals consistent with these for Michigan and Ohio (0.39 to 0.64 and 0.51 to 0.70); metropolitan Milwaukee falls within a single county, so the grid blocks are used there. The Wisconsin data came from the state DHS open ArcGIS service, which publishes children under six tested, children testing positive, and percent poisoned by tract; a tract is suppressed only when fewer than five children are poisoned, and stays visible if 100 or more were tested 26.

A rank correlation of 0.54 to 0.70 means the map's ordering and the measured ordering move together moderately to strongly, but much of the variation is unexplained: those correlations share only about a third to a half of the rank variance with measured exposure, leaving the rest to factors outside housing age and poverty. A gradient-boosted model fit to the same public inputs, cross-validated within Michigan and Ohio, reaches an R-squared of about 0.48 to 0.55 (see Residual analysis); that is an in-distribution upper bound for this feature set in those two states, not a national bound and not the fixed-weight index's own fit. This is a screening signal, not a diagnosis, and it says nothing about any individual child's blood lead.

Validation at the county scale: five more states

We also ran a coarser but broader test at the county level. Here the predictor is a county-resolution version of the same index, computed from county housing-age and poverty (the identical three-band housing-age score and 0.58/0.42 composite) and percentile-ranked within the national pool of 3,222 counties, not an average of tract scores. We joined it to measured childhood blood lead from the CDC Environmental Public Health Tracking Network 2022 series (children under six, at or above the 3.5 µg/dL reference value). All five use surveillance the source study did not, so they are independent tests, and all five share the same 3.5 µg/dL threshold and 2022 vintage.

State Counties (n) Spearman ρ (95% CI)
New Jersey 21 0.77 (0.53 to 0.89)
Illinois 56 0.68 (0.49 to 0.81)
Missouri 40 0.56 (0.30 to 0.74)
Iowa 56 0.55 (0.33 to 0.72)
New York 61 0.48 (0.20 to 0.69)

These intervals come from resampling counties, which are themselves the spatial unit, so unlike the tract intervals they are not separately corrected for spatial autocorrelation. The residual carries little county-to-county autocorrelation in Illinois, Iowa, and New Jersey (Moran's I near zero, not significant), but it is substantial in Missouri and New York (Moran's I 0.59 and 0.48, permutation p ≈ 0.001), so those two intervals should be read as optimistic. County correlations should also not be read on the same scale as the tract ones: a coarser areal unit averages over within-county variation and tends to raise the correlation, an instance of the modifiable areal unit problem. The small county counts give wide intervals, widest for New Jersey at n = 21. New York is the weakest, and its county series excludes New York City, omitting much of the state's oldest housing.

We apply one inclusion rule uniformly: a series enters the headline only if it uses the current 3.5 µg/dL reference value and its correlation is distinguishable from zero, meaning a 95 percent confidence interval that excludes zero. Every tract and county series above passes. Massachusetts fails on both counts. It uses the older 5 µg/dL threshold on the 2020 vintage, and on only 13 counties gave ρ = 0.42 with a 95 percent confidence interval from -0.22 to 0.84 that includes zero. We report it for completeness but exclude it from the headline.

Across the independent tests, Wisconsin at the tract scale and the five county states, predicted risk tracks measured childhood blood lead at a moderate level (Figure 2), on surveillance systems the source study never used. We do not compare these rank correlations against the source study's kappa, and we do not average the tract and county correlations into one figure, because neither pair measures the same thing.

Validation by state with confidence intervals
Figure 2. Spearman rank correlation between predicted neighborhood risk and measured childhood blood lead, by state. Green marks the census-tract scale, with spatial block bootstrap 95 percent intervals; Michigan and Ohio are a consistency check against the source study's own surveillance, and Wisconsin is the one independent tract-scale test. Amber marks the county scale, with intervals from resampling counties: five independent states, plus Massachusetts (open marker), which we report but exclude from the headline. A dagger (†) marks the two county states, Missouri and New York, whose residuals carry spatial autocorrelation, so their intervals are optimistic. The shaded band is the conventional moderate range of 0.4 to 0.6. The Massachusetts interval (n = 13) includes zero. County correlations are not directly comparable to tract correlations, because a coarser areal unit tends to raise them.

Sensitivity to the weighting

The composite split (0.58 housing, 0.42 poverty) and the housing-band weights (0.619, 0.309, 0.075) are set a priori, not fitted, so a fair question is whether the result depends on them. We re-derived the index from the raw national ACS distribution across nine weighting schemes and recomputed each tract-scale correlation. The published weights reproduce the canonical values exactly (Michigan 0.54, Ohio 0.62, Wisconsin 0.70). Moving the composite split anywhere from 0.42/0.58 to 0.70/0.30 changes each correlation by less than 0.03. The housing-band weighting matters more, and in the expected direction: weighting older stock more heavily, as the published weights and a pre-1940-only weighting both do, outperforms weighting the three construction eras equally, which drops the correlation to 0.45 in Michigan, 0.53 in Ohio, and 0.59 in Wisconsin. Dropping either housing age or poverty entirely is worse than keeping both. The ranking is driven by the prior decision to weight older housing and to include poverty, both fixed in advance from the lead-paint literature, not by the precise split.

Robustness to tract-boundary vintage

The index is built on 2020-vintage census tracts (ACS 2018-2022), while the Michigan and Ohio measured data predate that geography, collected from 2006 to 2016 and 2005 to 2018 on 2010 tracts. A direct 11-digit GEOID join can therefore mismatch tracts that split or merged between the two vintages. Using the Census 2010-to-2020 tract relationship file, 96.9 percent of the Michigan validation tracts and 97.5 percent of the Ohio tracts are unchanged one-to-one between vintages. Restricting each correlation to those unchanged tracts leaves it essentially identical (Michigan 0.54 on both the full and the unchanged set; Ohio 0.62 versus 0.61), so the cross-vintage join does not drive the result. The predictors are also more recent than these outcomes by roughly a decade, which attenuates a correlation rather than inflating it, so the tract-scale figures are if anything conservative.

Sensitivity to testing coverage

Measured childhood blood lead comes only from children who are tested, and testing is targeted rather than universal, so a fair concern is that these correlations track where testing happens rather than where lead is. Two things bound that concern. First, every measured value we use is a rate, the percent of tested children who are elevated, not a count of cases, so it is not mechanically inflated by testing more children. Second, where the data also publish the number of children tested, we condition on it.

At the census-tract scale, the Wisconsin service reports children tested per tract. Across metropolitan Milwaukee, predicted risk is only weakly related to testing volume (Spearman 0.17), and the partial correlation between predicted risk and the elevated rate, controlling for testing volume, is essentially unchanged from the raw value (0.69 versus 0.70). Restricting to well-tested tracts, where the rate is measured most precisely, the correlation is if anything stronger (0.75 at 100 or more children tested, 0.75 at 200 or more).

At the county scale, we reconstructed the number of children tested per county from the CDC series (the elevated count divided by the elevated rate). In four of the five states, predicted risk is negatively correlated with testing volume (Spearman -0.16 to -0.57; New Jersey is the exception at 0.34), which means the highest-risk counties test the fewest children. That is the surveillance gap the map is built for: measurement is thinnest exactly where predicted risk is highest. Controlling for testing volume, the predicted-risk-to-elevated-rate correlation stays positive in every state (partial Spearman 0.36 in Iowa to 0.79 in New Jersey). The attenuation in some states reflects that lower-testing counties are also higher-risk, a shared rural and disinvestment gradient, rather than an artifact in the rate itself.

A validation fully independent of who gets tested would require population-representative testing, such as NHANES, or a complete state registry rather than the public aggregate extracts used here. We invite that confirmation (see Discussion).

The data-access pipeline, and why prediction is necessary

Measured childhood blood-lead is held by states, and access varies sharply from one to the next. That is the practical reason a prediction map is useful: it brings the same neighborhood-level warning to every state, including the many that publish no usable blood data at all.

Where measured data exists in machine-readable form, we ingest it automatically:

  • New York (ZIP-level, Socrata). New York publishes childhood blood-lead testing and elevated-incidence counts by ZIP code (excluding New York City) on health.data.ny.gov, served through the Socrata Open Data API (SODA) 27. We query it programmatically.
  • County-level, roughly 45 states (CDC Tracking Network API). The CDC Environmental Public Health Tracking Network exposes childhood blood-lead surveillance through a machine-readable API, with the ≥3.5 µg/dL classification adopted for 2022-forward data after CDC lowered the blood-lead reference value from 5 to 3.5 µg/dL in October 2021 28. Children are counted once per year at their highest result 5. This is the broad county-level backbone.
  • Wisconsin (tract-level, ArcGIS). Pulled live, as described above 26.

Then there are the holdouts. States including New Hampshire, Colorado, and Connecticut lock their childhood blood-lead behind Tableau dashboards or PDF reports with no API, so obtaining tract- or ZIP-level measured values requires a public-records (FOIA) request and manual extraction. These are exactly the places where a family has no public way to learn that their neighborhood's housing stock and poverty profile put their child at elevated risk. The prediction map closes that gap. It runs without waiting for a state to publish blood tests, because it needs none: only the public Census housing-and-poverty data, which exists for every tract in the country. The validation above is what justifies trusting that prediction where no measured data is available to check it.

Residual analysis: how much further public data can go

A screening map should be honest about its own error. Two questions follow from the validation: why does predicted risk track measured blood lead at a rank correlation near 0.6 rather than 1.0, and would more public variables push it higher. We tested both in the two states behind the federal anchor.

What we did

We assembled every census tract in Michigan and Ohio carrying both a published measured childhood blood-lead value and complete Census inputs: 4,690 tracts, population-weighted. We compared two models that predict measured blood lead from public variables, each scored by five-fold cross-validation within this two-state set, so every reported figure is from held-out tracts the model did not train on. The figures below therefore describe held-out tracts within Michigan and Ohio, not generalization to new states 30. The published risk index itself has fixed weights and is never fit to blood lead; the cross-validated models here are auxiliary, used only to ask how much additional public data could improve the ranking. The baseline uses only the EPA three-variable set: the share of homes built before 1940, the share built before 1950, and poverty. The expanded model adds seven more public variables: median home value, median household income, renter share, vacancy rate, the 1950-to-1979 housing share, percent Black, and percent Hispanic.

Housing age is most of the signal, and more variables barely move the ranking

Model Inputs Cross-validated Spearman ρ Variance explained (R²)
Baseline EPA 3 variables (housing age + poverty) 0.609 0.482
Expanded 10 variables (adds race, value, income, tenure, vacancy) 0.619 0.550

Adding seven variables raised the rank correlation by 0.01 and the variance explained by 0.07. The ranking is already close to the ceiling that public data can reach. A single input, the share of homes built before 1940, carries roughly 68 percent of the model's predictive weight. This reproduces the EPA's own finding that a reduced three-variable model performed comparably to its full model 4: housing age and poverty carry most of the signal, and further socioeconomic variables are largely redundant with them.

Where the map misses, it misses by race and disinvestment

The structure of the error is more informative than its size. We took what the housing-and-poverty baseline leaves unexplained, its out-of-sample residual, and asked which variables predict it. Percent Black population dominates (Figure 3), well ahead of housing vacancy and of income. The map systematically under-predicts risk in tracts with larger Black populations and more vacant housing, relative to what housing age and poverty alone imply. This is consistent with the documented racial ecology of lead exposure, in which the legacy of redlining and disinvestment concentrated deteriorating lead hazards in Black neighborhoods beyond what income or housing vintage captures 31.

Residual analysis: feature importance and residual drivers
Figure 3. Residual analysis on 4,690 Michigan and Ohio tracts. (A) Relative importance of each public variable in a cross-validated model of measured blood lead: the share of homes built before 1940 carries most of the signal. (B) Standardized association of each added variable with the residual the housing-and-poverty baseline leaves behind: percent Black population is the strongest driver of what the baseline misses, consistent with the documented racial ecology of lead exposure.

We state the ethical constraint plainly. That race improves the statistical fit does not mean race should be an input to a deployed targeting tool. Allocating screening by race would risk encoding the disparity it measures. We report the result for the opposite reason: it is a diagnostic of where a housing-and-poverty map runs conservative, and a signal that the highest-risk Black neighborhoods warrant at least as much confirmatory testing as the map indicates, not less.

Why the ceiling exists, and what is actually beyond it

Three limits hold any ACS-based map near a rank correlation of 0.6. Only one is fixable with more data.

First, the model is ecological. Every value is a property of a neighborhood, not a home. A high-risk tract contains remediated, lead-free houses; a low-risk tract contains pre-1940 houses with failing paint. No neighborhood variable resolves a hazard that varies house to house. The constraint lies in the unit of analysis, not in the variable list.

Second, the measured truth is itself noisy. A tract's blood-lead rate rests on the children who happened to be tested, which in a small tract is few. Some of the unexplained variance is measurement error in the target, not error in the model.

Third, the remaining exposure pathways are absent from the census entirely. Lead service lines, soil lead, imported consumer goods such as glazed ceramics and certain spices, occupational take-home lead, and the actual condition of paint are real drivers that no ACS table records.

The third limit points to two ways forward. The first is to combine external public data the census does not carry: the lead service-line inventories that water systems were required to publish under the EPA Lead and Copper Rule Revisions in October 2024 32; parcel-level assessor and sale records that proxy for maintenance and renovation; proximity to airports still burning leaded aviation fuel, now the largest source of airborne lead, which the EPA has formally found endangers public health 33; and soil-lead surveys. Each is public, but fragmented and laborious to assemble, which is exactly why existing maps omit it. The second is physical confirmation in the home. A neighborhood model that has reached its data ceiling cannot tell a family whether their home holds a hazard; a direct test of the home, by a certified risk assessment, an XRF reading, dust-wipe or paint-chip lab analysis, or the child's blood-lead test, can. Confirmation in the home is the step a map cannot replace.

Discussion: potential applications

A validated neighborhood risk map has a clear public-health use. Childhood blood-lead testing in the United States is targeted rather than universal, and the targeting is uneven. A current map of where exposure is most likely helps health departments and clinicians decide where to direct testing, outreach, and home-hazard assessment first. Because the map covers every neighborhood from public data, it extends that guidance to the many areas that publish no measured surveillance of their own.

The map is most useful to the people and programs that already reach families with young children. Pediatric and prenatal care providers, Women, Infants, and Children (WIC) nutrition clinics, Medicaid managed-care plans and their Early and Periodic Screening, Diagnostic, and Treatment programs, home-visiting programs, Head Start, and childhood lead poisoning prevention programs all serve the at-risk population and all decide whom to test, counsel, and refer. A free, neighborhood-resolution risk layer lets any of them flag which of the families they already serve live in higher-risk areas, and prioritize blood-lead screening, anticipatory guidance, and home-hazard education accordingly. Because the score is a property of place rather than of person, a provider can apply it to its own patient roster locally, without transmitting any patient information, which keeps the use within existing privacy practice. The same property makes the map a direct input to the geographic targeting plans the CDC asks state and local programs to maintain.

The map ranks risk; it does not confirm a hazard in any specific home. It is the first pass. A physical test of paint, dust, or water in the flagged home is what establishes whether a hazard is actually present, and pairing the two directs limited inspection and abatement capacity to where a hazard is most likely. The cost-effectiveness of any particular screening program is beyond the scope of this paper. It depends on local testing costs, follow-through rates, and remediation practices we do not model here.

A more rigorous validation is available to the agencies that hold complete blood-lead records. Our test used only publicly released, aggregate surveillance, which bounds how finely an outside party can check the map. State childhood lead poisoning prevention programs, Medicaid blood-lead testing files, and the CDC's full surveillance hold individual, address-level results for far more children than any public extract. Any of these custodians, or a research partner under an appropriate data-use agreement, could validate and recalibrate the map against their complete registry at the address level, including in states we could not test here, while keeping the identifiable data inside their own systems. The map and its open pipeline are built to enable that validation, and we invite it.

Limitations and Ethics

This map is a screening tool. It predicts where childhood lead-exposure risk concentrates, using public housing and poverty data, and we validate that prediction against measured childhood blood lead in eight states: one independent test at the census-tract scale, five independent tests at the county scale, and two consistency checks against the source study's own surveillance. It does not measure lead in any home, and it does not diagnose any child. Each limitation below constrains a specific claim a reader might otherwise draw from a colored map.

The estimates are ecological, not individual

Every value on this map is a property of a census tract, not of a person or a house. The model is built from tract-level aggregates (three construction-era housing-age shares, pre-1940, 1940-1959, and 1960-1979, from Census table B25034, weighted toward older stock, and tract poverty from table S1701), so its output describes the average risk environment of a neighborhood. Reading a tract-level association as if it applied to an individual is the ecological fallacy 34. The EPA hotspots analysis we extend is explicit on this point. Its authors state that the analysis operates at the population level (census tract, county, and state) and "cannot identify sources at particular addresses or risk at an individual level" 4. A high-risk tract still contains remediated and lead-free homes. A low-risk tract still contains pre-1940 houses with deteriorating lead paint. The map narrows where to look, but it cannot tell any single family whether their home has a hazard. That is what a physical test is for.

It predicts risk, not poisoning

The indices are correlates of exposure risk (old paint, concentrated poverty), not a measurement of lead in a child's blood. Our validation shows that the predicted surface tracks measured childhood blood lead at a moderate level (Spearman rho 0.54 in Michigan, 0.62 in Ohio, and 0.70 in metropolitan Milwaukee, Wisconsin). A correlation of that order is meaningful for an ecological model and weak as a basis for any individual prediction. Much variance remains unexplained, because exposure also depends on factors absent from the model: actual paint condition, renovation and disturbance history, water service-line material, soil, imported consumer goods, and occupational take-home lead. The map should be read as a way to set priorities, never as a count of poisoned children.

The input data carry a vintage lag

The map is only as current as the American Community Survey behind it. The 2018-2022 ACS 5-year estimates pool responses collected across the full five-year window of January 1, 2018 through December 31, 2022, so the housing and poverty picture is a multi-year average, not a snapshot of today 35. Housing stock changes slowly, which makes the pre-1940 share relatively stable, but poverty, occupancy, and demolition or renovation can shift faster than the data refresh. Any tract that has seen significant teardown, rehabilitation, or demographic turnover since the survey window will be characterized with a lag of several years. The EPA analysis faced the same constraint. That work relied on 2010 census inputs. We improve currency by moving to the 2018-2022 vintage, but we do not eliminate the lag 4.

Small and rural tracts are measured least precisely

ACS reliability degrades as geography shrinks, and the 5-year tract estimates that this map depends on are the Census Bureau's least precise published level. The Bureau ships a margin of error with every estimate for exactly this reason and urges caution where it is large 36. The problem is built into the sampling. In the 2007-2011 ACS the average tract had only about 135 completed interviews over five years, against an average of about 280 housing units in the 2000 long form, and tract-level margins of error run on average about 75 percent larger than the corresponding 2000 long-form figures 37. The 5-year estimates that absorbed the COVID-disrupted 2020 collection year carry wider margins still 38. The consequence falls hardest on low-population and rural tracts, where small samples widen the uncertainty band and a tract's percentile rank can be noisy. Sparse rural areas can also be physically large, so a single risk value may average over heterogeneous housing across many miles. Read rural risk on this map with more caution than urban risk, not less.

Validation correlations are spatially autocorrelated

The tracts and counties used to validate the map are not independent observations. Neighboring areas resemble one another in both predicted risk and measured blood lead, so the effective number of independent units is smaller than the raw count, and an ordinary bootstrap that ignores this reports intervals that are too narrow. We address it at the census-tract scale with a spatial block bootstrap that resamples whole geographic blocks, which widens the intervals to reflect the clustering, and we report Moran's I of the residual to show how much spatial structure remains (it is substantial, 0.49 to 0.57 at the tract scale, all permutation p near 0.001). At the county scale the resampling unit is already the county, but Missouri and New York still carry residual spatial autocorrelation, so their intervals are optimistic. None of this moves the point estimates; it widens the honest uncertainty around them, and it is why we do not lean on any single state.

Measured blood-lead data are scarce and uneven, which is the point

The reason a prediction map is needed is that ground-truth blood-lead data are incomplete and inconsistently available. Surveillance undercounts exposure because not all children are tested. CDC receives about 3 million blood-lead test results a year, and the agency states plainly that these data are "not a population-based estimate" and "are not representative of the United States or even of an entire state or county" 39. We test directly whether this targeting drives the result and find it does not: the measured values are rates, not counts, and controlling for the number of children tested leaves the correlations positive (Sensitivity to testing coverage, above), though a validation fully independent of who gets tested would need population-representative testing or a complete registry. Coverage and reporting vary by state, by insurer, and by year. That scarcity has two consequences for this work. First, our validation is limited to the few states that publish tract-resolvable measured data (Michigan and Ohio via the EPA paper's Supplement B; Wisconsin via the state's open ArcGIS service), so external validity to states with different housing eras and testing regimes is assumed, not proven. Second, the same scarcity is the public-health case for the map. It extends a neighborhood-level warning to the many jurisdictions (for example New Hampshire, Colorado, and Connecticut) where measured blood-lead data are locked in PDFs or Tableau dashboards and reachable only by FOIA. Where there is no public blood data, a validated prediction is the only neighborhood-level signal available.

Screening, not diagnosis

The honest framing for both the map and the test it points to is screening. CDC is explicit that even its blood-lead reference value of 3.5 micrograms per deciliter is "a screening tool," is "not health-based," and is "not a regulatory standard," and that no safe level of lead in children has been identified 40. A neighborhood risk map sits one step further from diagnosis than a blood test does. It flags places to investigate, and nothing more. It does not establish that a hazard exists at any address or that any child has been exposed. The right response to a high-risk tract is confirmation, not alarm. That means a home risk assessment, a dust-wipe or paint-chip analysis, or an XRF reading, followed by a blood-lead test for the child if a hazard is found. Used that way, the map does what a screen should do. It points limited inspection and testing resources toward the places most likely to need them, and it makes no claim it cannot support about any individual home or child.

Declarations

Competing interests. The author is the founder of Fluoro-Spec Inc. (DetectLead.com), which manufactures and sells a consumer lead-screening product, and also owns Spirochaete Research Labs, LLC, a research company with interests in lead-detection technology. Both companies have a commercial interest in lead detection and screening. The national risk map and the method described in this paper are built entirely from public data and are provided free of charge, with no login and no paywall. They did not influence the choice of data sources, the validation design, or the reported results, each of which is reproducible from public inputs by any independent party.

Funding. This work received no external or grant funding. It was conducted and supported internally by Fluoro-Spec Inc.

Data and code availability. All inputs are public. The risk model uses U.S. Census American Community Survey 2018-2022 five-year tables (B25034 and S1701, and for the residual analysis B25077, B19013, B25003, B25002, B02001, B03002, and B17001) retrieved through the public Census API. Validation uses publicly released state and federal childhood blood-lead surveillance: the CDC National Environmental Public Health Tracking Network county series, the Michigan and Ohio tract-level surveillance, the Wisconsin DHS open ArcGIS service, and New York State open health data. The analysis code, the derived tract- and county-level risk scores, and the state validation joins are archived at Zenodo under a CC-BY-4.0 license (DOI: 10.5281/zenodo.20531599). The interactive national map is at detectlead.com/lead-risk-map. No individual-level, identifiable, or access-restricted data were used.

Ethics and human subjects. This study used only aggregate, publicly available, de-identified data at the census-tract and county level. It involved no individual human subjects, no identifiable private information, and no intervention, and so did not require institutional review board approval. The analysis operates at the population (neighborhood) level and cannot determine exposure for any individual child or address.

Author contributions. E.C.R. conceived the study, assembled the public data, implemented the model and the validation, produced the figures and maps, and wrote the manuscript.

Use of artificial intelligence. An AI assistant (Anthropic's Claude) was used as a tool in this work: for retrieving and processing public data, for statistical and mapping code, for a first-pass literature search, and for drafting and copy-editing text. The author directed the work, wrote and audited the analysis pipeline, and re-ran it to confirm every reported figure. Each cited reference was independently verified against its PubMed or CrossRef record before inclusion. The author takes full responsibility for the entire content. No AI system is an author, in keeping with ICMJE and publisher policy, because an AI cannot be accountable for the work.

References


  1. CDC, "CDC Updates Blood Lead Reference Value." https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ; "Update of the Blood Lead Reference Value, United States, 2021," MMWR 70(43). https://www.cdc.gov/mmwr/volumes/70/wr/mm7043a4.htm ^^^

  2. U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates, Table B25034: Year Structure Built. https://data.census.gov/table/ACSDT5Y2022.B25034 ^

  3. U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates, Table S1701: Poverty Status in the Past 12 Months. https://data.census.gov/table/ACSST5Y2022.S1701 ^

  4. Zartarian, V. G., Xue, J., Poulakos, A. G., et al. (2024). "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology 58(7), 3311-3321. DOI: 10.1021/acs.est.3c07881. The authors report agreement of Cohen's kappa 0.49-0.63 against children's blood-lead hotspots from approximately 1.9 million Michigan tests (2006-2016) and 2.3 million Ohio tests (2005-2018), screened 73,086 census tracts containing at least one child under six, state the analysis operates at the population level and "cannot identify sources at particular addresses or risk at an individual level," and rely on 2010 census inputs. https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^

  5. CDC, "About the Data: Blood Lead Surveillance" and "Childhood Blood Lead Surveillance: State Data" (machine-readable Tracking Network API; child counted once per year at highest result; ≥3.5 µg/dL classification for 2022-forward data). https://www.cdc.gov/lead-prevention/php/data/blood-lead-surveillance.html ^^^

  6. McFarland MJ, Hauer ME, Reuben A. "Half of US population exposed to adverse lead levels in early childhood." PNAS 2022;119(11):e2118631119. https://www.pnas.org/doi/10.1073/pnas.2118631119 ^^

  7. Brown EE, Lombard M, Chan A, Ayotte J, Rakowska S, Fuller-Thomson E. "Historical Atmospheric Lead Concentrations (1960-1974) and Memory Problems Half a Century Later." Alzheimer's and Dementia 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12726548/ ^

  8. CDC, "Data and Statistics, Childhood Lead Poisoning Prevention." https://www.cdc.gov/lead-prevention/php/data/index.html ; "About the Data: Blood Lead Surveillance." https://www.cdc.gov/lead-prevention/php/data/blood-lead-surveillance.html ^^^

  9. CDC Environmental Public Health Tracking, "Childhood Lead Poisoning." https://www.cdc.gov/environmental-health-tracking/php/data-research/childhood-lead-poisoning.html ^

  10. Zartarian V, Xue J, Poulakos A, Tornero-Velez R, Stanek L, Snyder E, Helms Garrison V, Egan K, Courtney J. "A U.S. Lead Exposure Hotspots Analysis." Environmental Science and Technology 2024;58(7):3311-3321. DOI 10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^^

  11. U.S. Census Bureau, American Community Survey table B25034, Year Structure Built. https://data.census.gov/table?q=B25034 ^

  12. U.S. Census Bureau, American Community Survey subject table S1701, Poverty Status in the Past 12 Months. https://data.census.gov/table/ACSST1Y2022.S1701 ^

  13. Washington State Department of Health. Lead Exposure Risk Index, data notes (housing-age and poverty measures from the ACS 2018-2022 5-year file, weighted equally and ranked into population deciles on a 1-to-10 scale). https://doh.wa.gov/data-and-statistical-reports/washington-tracking-network-wtn/lead-risk-and-exposure/lead-exposure-risk-ibl-data-notes ^^

  14. Zartarian VG, Xue J, Poulakos AG, Tornero-Velez R, Stanek LW, Snyder E, Helms Garrison V, Egan K, Courtney JG. A U.S. Lead Exposure Hotspots Analysis. Environmental Science & Technology. 2024;58(7):3311-3321. doi:10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^

  15. Zartarian VG, et al. A U.S. Lead Exposure Hotspots Analysis (full text). PMC10882963 (73,086 tracts in the 50 states with at least one child under six; RF v1 five-variable and RF v2 three-variable pre-1940 + pre-1950 + income-to-poverty models; 2010 Census geography and ACS 2013-2017 inputs; Michigan ~1.9M blood-lead points 2006-2016 and Ohio ~2.3M 2005-2018; Cohen's kappa 0.49-0.63; "no known safe level"). https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^

  16. U.S. Census Bureau. American Community Survey 5-Year Data (2009-2023); developer API documentation; 2018-2022 release dated December 7, 2023. https://www.census.gov/data/developers/data-sets/acs-5year.html ^^^^

  17. U.S. Census Bureau. Census Data API, ACS 2022 5-year, table B25034 (Year Structure Built) variable group: B25034_001E total, B25034_010E "Built 1940 to 1949", B25034_011E "Built 1939 or earlier". https://api.census.gov/data/2022/acs/acs5/groups/B25034.html ^^^

  18. U.S. Census Bureau, American Community Survey 5-Year Estimates, Table S1701 "Poverty Status in the Past 12 Months." https://data.census.gov/table/ACSST5Y2022.S1701 ^^^

  19. U.S. Census Bureau. Table B01003, Total Population, ACS 2022 5-year. https://data.census.gov/table/ACSDT5Y2022.B01003 ^^^

  20. U.S. Census Bureau. Table B09001, Population Under 18 Years by Age (universe: population under 18 years), ACS 2022 5-year. https://censusreporter.org/tables/B09001/ ^^^

  21. U.S. Centers for Disease Control and Prevention. Sources of lead exposure; lead in paint, dust, and soil; older housing (pre-1978) as the primary residential source. https://www.cdc.gov/lead-prevention/prevention/index.html ^^^^

  22. Zartarian, V., Xue, J., Poulakos, A., Tornero-Velez, R., Stanek, L., Snyder, E., Helms Garrison, V., Egan, K., Courtney, J. (2024). "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology, 58(7), 3311–3321. DOI: 10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^

  23. Zartarian et al. (2024), full text, PubMed Central PMC10882963 (73,086 tracts; Michigan ~1.9M tests 2006–2016; Ohio ~2.3M tests 2005–2018; kappa interpretation scale; reduced three-variable model kappa 0.51–0.63; "no known level of lead exposure to be without risk"). https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^^^

  24. Stanek, L. W., Xue, J., Zartarian, V. G., et al. (2024). "Identification of high lead exposure locations in Ohio at the census tract scale using a generalizable geospatial hotspot approach." Journal of Exposure Science & Environmental Epidemiology, 34(4), 718–726 (Ohio tract-scale validation, 2005–2018 blood-lead, Cohen's kappa 0.54–0.64 observed hotspots vs predictive indices). PubMed Central PMC11303242. https://pmc.ncbi.nlm.nih.gov/articles/PMC11303242/ ^

  25. U.S. Census Bureau, American Community Survey 5-Year Estimates, Table B25034 "Year Structure Built." https://data.census.gov/table/ACSDT5Y2022.B25034 ^

  26. Wisconsin Department of Health Services, Environmental Public Health Tracking, childhood lead-poisoning data by census tract (children under 6 tested, positive, percent poisoned; suppressed when fewer than five children poisoned, unless 100 or more tested; published via ArcGIS). https://www.dhs.wisconsin.gov/epht/lead.htm ^^

  27. New York State Department of Health, "Childhood Blood Lead Testing and Elevated Incidence by Zip Code: Beginning 2000," health.data.ny.gov, served via the Socrata Open Data API (SODA). https://health.data.ny.gov/Health/Childhood-Blood-Lead-Testing-and-Elevated-Incidenc/d54z-enu8 ^

  28. CDC, "Update of the Blood Lead Reference Value, United States, 2021," MMWR 70(43):1509–1512 (BLRV lowered 5 → 3.5 µg/dL, October 2021). https://www.cdc.gov/mmwr/volumes/70/wr/mm7043a4.htm ^

  29. Each validation uses the within-state Spearman correlation between the published national percentile P and measured blood lead. Because P is a monotone, rank-preserving transform of the composite score R, and Spearman correlation depends only on within-state ranks, the within-state rank correlation of P equals that of R. Ranking tracts nationally to publish the map therefore introduces no circularity and does not affect any within-state validation result. ^

  30. Analysis of 4,690 Michigan and Ohio census tracts carrying both measured childhood blood lead (state surveillance aggregated to tract) and complete ACS 2018-2022 inputs. Models: gradient-boosted regression, five-fold cross-validation, population-weighted. Baseline inputs: pre-1940 share, pre-1950 share, poverty. Expanded adds median home value, median household income, renter share, vacancy rate, 1950-1979 housing share, percent Black, percent Hispanic. Residual = out-of-sample baseline prediction error; driver weights from standardized ridge regression on the residual. Reproducible from public Census API pulls and the published state blood-lead joins. ^

  31. Sampson, R. J., and Winter, A. S. (2016). "The Racial Ecology of Lead Poisoning: Toxic Inequality in Chicago Neighborhoods, 1995-2013." Du Bois Review: Social Science Research on Race, 13(2), 261-283. DOI 10.1017/S1742058X16000151. ^

  32. U.S. EPA, Lead and Copper Rule Revisions: community and non-transient non-community water systems were required to prepare and make publicly available an initial lead service line inventory by October 16, 2024. https://www.epa.gov/ground-water-and-drinking-water/lead-and-copper-rule-revisions ^

  33. U.S. EPA (2023). Final determination that lead emissions from certain aircraft engines that operate on leaded fuel cause or contribute to air pollution that may reasonably be anticipated to endanger public health and welfare; piston-engine aircraft are the largest remaining source of lead emissions to air in the United States. https://www.epa.gov/regulations-emissions-vehicles-and-engines/regulations-onboard-diagnostics-and-lead-emissions-aircraft ^

  34. Subramanian, S. V., Jones, K., Kaddour, A., and Krieger, N. (2009). "Revisiting Robinson: The perils of individualistic and ecologic fallacy." International Journal of Epidemiology 38(2), 342-360. DOI: 10.1093/ije/dyn359. On the hazard of reading area-level associations as individual-level associations. https://pmc.ncbi.nlm.nih.gov/articles/PMC2663721/ ^

  35. U.S. Census Bureau. "2018-2022 ACS 5-Year Estimates" technical documentation and "Period Estimates in the American Community Survey." The 2018-2022 5-year estimates pool data collected from January 1, 2018 through December 31, 2022 and do not represent a single point in time. https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2022/5-year.html ^

  36. U.S. Census Bureau. "Using American Community Survey Estimates and Margins of Error." Margins of error are published with each estimate so users can judge reliability, and users are urged to use caution where margins of error are high. https://www.census.gov/content/dam/Census/programs-surveys/acs/guidance/training-presentations/20180418_MOE_Webinar_Transcript.pdf ^

  37. Spielman, S. E., Folch, D., and Nagle, N. (2014). "Patterns and causes of uncertainty in the American Community Survey." Applied Geography 46, 147-157. Tract-level ACS margins of error average about 75 percent larger than the corresponding 2000 long-form estimates; in the 2007-2011 ACS the average tract had about 135 completed surveys over five years, against an average of about 280 housing units in the 2000 long form. https://pmc.ncbi.nlm.nih.gov/articles/PMC4232960/ ^

  38. U.S. Census Bureau (2022). "Increased Margins of Error in the 5-Year Estimates Containing Data Collected in 2020." The reduced 2020 response count raised relative margins of error and caused several key estimates to exceed the Bureau's quality threshold. https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-04.html ^

  39. CDC, Childhood Lead Poisoning Prevention. "Childhood Blood Lead Surveillance: National Data." About 3 million blood-lead test results are received by CDC each year; the data are "not a population-based estimate" and "are not representative of the United States or even of an entire state or county." https://www.cdc.gov/lead-prevention/php/data/national-surveillance-data.html ^

  40. CDC, Childhood Lead Poisoning Prevention. "CDC Updates Blood Lead Reference Value." The 3.5 ug/dL reference value is a screening tool, is not health-based, and is not a regulatory standard; no safe level of lead in children has been identified. https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ^

My Vision & ContributionSpeculative. This is my own direction and argument, not part of the submitted scientific paper above.

Prevention economics: what a first-pass screen is worth

A risk map answers where, but not whether spending money to look there pays. This section builds the cost-benefit case for using a cheap field screen as the first pass in the locations the map flags, states the model as explicit equations, works a 10,000-kit example, and grounds every dollar figure in the published lead-economics literature. The screen estimates risk and confirms a present hazard on the spot. It is not a blood test and does not diagnose a child.

The core fact that makes the math work is old and well established: lead damage is permanent and expensive, and avoided damage is worth far more than the cost of avoiding it. The U.S. Centers for Disease Control and Prevention sets a blood lead reference value of 3.5 micrograms per deciliter, drawn from the 97.5th percentile of blood lead in U.S. children aged 1 to 5, to flag the children in the top 2.5 percent of exposure, and is explicit that no safe blood lead level has been identified 1. Below we monetize what staying under that line is worth.

The unit value of a child: IQ to lifetime earnings

The economic value of preventing exposure rests on a dose-response chain that has been stable in the literature for two decades. Lanphear and colleagues, pooling seven cohort studies, found that an increase in concurrent blood lead from 2.4 to 10 micrograms per deciliter was associated with a decline of 3.9 IQ points (95 percent CI, 2.4 to 5.3), with the steepest loss per microgram at the lowest exposures 2. Grosse and colleagues then converted IQ loss to money, applying roughly a 2.0 percent decline in lifetime earnings per IQ point against a present value of lifetime earnings of about $723,300 in 2000 dollars for a two-year-old, which yields a base-case value near $14,500 per IQ point 3. Later work in the same lineage carried that figure to about $17,815 in present-value lifetime earnings lost per IQ point in 2006 dollars 4. Adjusted for inflation alone, $17,815 in 2006 dollars is roughly $28,000 today, before any allowance for real earnings growth.

The model in this paper uses a deliberately rounded, conservative per-child value of $22,000, representing the present-discounted lifetime-earnings loss avoided when a single child is kept off a meaningful exposure path. At $22,000 the model sits below the inflation-adjusted earnings figure on purpose. It is earnings only. It excludes the costs of special education, medical management, lost parental productivity, and criminal-justice involvement that the same literature attributes to lead, so it understates true societal benefit.

The cost-benefit model

A deployment equips N kits at a per-kit cost. The variables:

Symbol Meaning Default
N kits deployed 10,000
cost_per_kit manufactured plus distributed cost per kit $50
pct_eligible share of target homes that are pre-1978 and carry lead-paint risk 0.36
hazard_rate share of those homes with a detectable lead-paint hazard 0.30
detection probability the screen flags a present, accessible hazard (idealized ceiling) 1.00
action_rate share of found hazards that lead to remediation or avoidance 0.55
kids_per_home young children per affected home 1.00
value_per_child lifetime-earnings loss avoided per child $22,000

The equations:

hazards_found = N × pct_eligible × hazard_rate × detection kids_spared = hazards_found × action_rate × kids_per_home benefit = kids_spared × value_per_child program_cost = N × cost_per_kit return_ratio = benefit ÷ program_cost breakeven = program_cost ÷ kids_spared (benefit needed per child to break even)

Two of these inputs deserve a flag. The detection term is set to 1.00 as an idealized upper bound: the reagent has nanogram sensitivity and flags lead-paint dust below the HUD 10-micrograms-per-square-foot floor-dust standard, but no field screen catches every hazard, so a real deployment runs below 1.00 and the worked example below is therefore a best case, not a promise. The hazard_rate is held conservative against HUD's American Healthy Homes Survey, which finds lead-based paint in a far larger share of the oldest stock (about 87 percent of pre-1940 homes); pinning the rate at 0.30 understates hazards in exactly the old housing the map prioritizes. The pct_eligible default of 0.36 is held to roughly the HUD national estimate that on the order of a third of all U.S. homes contain some lead-based paint 5; note that the share of homes simply built before the 1978 residential lead-paint ban is higher still, near half, so this input is conservative on both counts.

Worked example: 10,000 kits at $50

Running the defaults:

  • hazards_found = 10,000 × 0.36 × 0.30 × 1.00 = 1,080
  • kids_spared = 1,080 × 0.55 × 1.00 = 594
  • benefit = 594 × $22,000 = $13,068,000
  • program_cost = 10,000 × $50 = $500,000
  • return_ratio = $13,068,000 ÷ $500,000 = 26.1 : 1
  • breakeven = $500,000 ÷ 594 = $842 per child

A $500,000 program returns about $13.1 million in avoided lifetime-earnings loss, a 26-to-1 return, and breaks even if each child kept off the exposure path is worth at least $842, against the roughly $22,000 the literature supports. The program clears its break-even threshold by a factor of about 26. Because detection is set to its 1.00 ceiling, treat these as the upper edge of the range rather than the expected outcome.

Refill economics push the return higher

The $50 per-kit default is the full first-unit cost: reagent bottle, rechargeable 365-nanometer flashlight, fluorescent reference card, and printed bag. The flashlight and card are durable. On refills, only the consumable reagent recurs, dropping marginal cost toward roughly $5 per screen. Holding benefit constant and substituting cost_per_kit = $5:

  • program_cost = 10,000 × $5 = $50,000
  • return_ratio = $13,068,000 ÷ $50,000 = 261 : 1

Refilling rather than re-kitting raises the return into the low hundreds to one. This is in the same range as the lead-hazard-control literature, where Gould put the return at $17 to $221 per dollar spent 4. The first kit buys the hardware; every screen after that is nearly pure prevention.

Why screening beats waiting: the reactive cost

The alternative to screening is finding hazards after a child's blood lead is already elevated, through the case-management and environmental-investigation pathway that state and local health departments run. For a substantially elevated child, in the 20 to 45 micrograms-per-deciliter range, CDC's recommended response runs to eight visits for diagnostic testing, nurse follow-up, and a home environmental investigation, documented at about $1,027 per child for that visit sequence 6. Lower but still elevated levels trigger a shorter version of the same pathway. Once the loaded cost of case-management labor, repeat testing, and follow-up that continues until the child's level falls or the child ages out is counted, the realistic reactive cost lands on the order of $1,000 to $2,000 per confirmed case.

That figure is the point. The reactive system spends roughly $1,000 to $2,000 per child it has already failed to protect, after exposure has occurred and after the permanent IQ cost has been incurred. The proactive screen costs $5 to $50 and runs before exposure. By construction, the reactive pathway is both more expensive per child and too late.

The societal cost the map is screening against

The per-child value above is conservative precisely because the aggregate burden is enormous. McFarland, Hauer, and Reuben estimate that childhood lead exposure has cost the living U.S. population about 824 million cumulative IQ points as of 2015 (824,097,690 points), an average of 2.6 points per person, with more than 170 million Americans, roughly half the population, exposed to harmful levels in early childhood and the 1966 to 1970 birth cohort averaging a 5.9-point deficit 7. At the population level, the highest quintile of childhood blood lead carries a 4.1-fold increase in the odds of ADHD relative to the lowest (95 percent CI, 1.2 to 14.0) in NHANES 8. And the bill is not closed. Reuben and colleagues, following the Dunedin birth cohort, found that higher childhood blood lead tracked with lower cognitive function and an older estimated brain age in midlife 9. Separately, a 2026 NHANES-Medicare analysis found that higher cumulative lead burden, measured in bone, was associated with incident Alzheimer's disease and all-cause dementia, attributing roughly 18 percent of new dementia cases to lead 10. The leaded-gasoline cohorts, born roughly 1955 to 1975, are aging into the years of heightened dementia risk now.

Against a societal loss measured in hundreds of millions of IQ points and a cohort effect still unfolding, a screening tool that costs single-digit dollars per use and surfaces a hazard before a child is exposed is a large intervention for the money. The map identifies where risk concentrates; the economics show that screening there returns on the order of 26 to 1 at full kit cost, and higher still on refills, before counting a single dollar of avoided medical, educational, or criminal-justice cost. The map predicts where risk concentrates but cannot speak to any individual child's exposure, and the screen confirms a hazard without diagnosing a child. Together they narrow where the scarce dollars should go first.

Deployment and Public-Health Use: A Screen-Then-Confirm Workflow

The gap this workflow fills

Childhood lead-exposure surveillance in the United States is incomplete by design. The Centers for Disease Control and Prevention recommends targeted blood-lead testing focused on children in pre-1978 housing and with sociodemographic risk factors, and instructs state and local officials to build local screening plans that reflect local risk, rather than defaulting to universal testing 11. The Centers for Medicare and Medicaid Services require a blood-lead test for Medicaid-enrolled children at 12 and 24 months, and for any child aged 36 to 72 months not previously tested, but coverage in practice is uneven and most states have not reconciled their screening targets with local prevalence data 11. The result is a country where the location of the hazard is largely predictable from public data but the measured outcome, a child's blood-lead level, is observed only after exposure has already occurred, and only in the subset of children who are actually tested.

The risk map described in this paper closes the front half of that gap. Built from Census ACS 2022 housing-age and poverty data and validated against measured childhood blood-lead in eight states (Spearman 0.48 to 0.77), it predicts, for all 3,222 counties and 83,388 tracts, where lead exposure concentrates, including the many states that publish no neighborhood-level blood-lead data at all. What it cannot do is confirm a hazard inside a specific home. A high-risk tract describes housing stock and poverty. It does not diagnose any one address. About 29 percent of U.S. homes, an estimated 34.6 million units, still contain some lead-based paint, and prevalence rises sharply with age, from 48 to 76 percent of units built 1960 to 1977 up to 71 to 100 percent of units built before 1940 1213. Within any high-risk tract, some homes carry an active lead-dust hazard and some do not. Distinguishing them requires a measurement at the property.

Why a cheap confirmatory test belongs in the loop

The conventional confirmatory tools are blood-lead testing of the child and environmental investigation of the home. Both are essential, but both arrive late, or cost too much, or both. A finger-stick (capillary) blood-lead screen confirms that exposure has already happened, and because residual lead on the skin produces frequent false positives, any capillary result at or above CDC's blood-lead reference value of 3.5 ug/dL should be confirmed with a venous draw 14. CDC set that reference value at the 97.5th percentile of the U.S. distribution for children aged 1 to 5 (NHANES 2015-2016 and 2017-2018); it is a screening threshold, not a safety threshold, and CDC and the National Toxicology Program hold that no blood-lead level is known to be without risk 1. Environmental investigation by a certified risk assessor, using laboratory-analyzed dust-wipe sampling against EPA's hazard standards, is the definitive home measurement, but it is a scheduled, paid inspection that does not scale to a screening pass over a high-risk neighborhood.

A low-cost field test occupies the missing middle. FluoroSpec uses a methylammonium bromide reagent in isopropanol that fluoresces bright green under 365 nm ultraviolet light when it contacts lead in surface paint and dust, giving an immediate visible read at the surface without a laboratory turnaround. Positioned correctly, it is the lowest-cost first-pass screen in the prevention sequence. It does not replace the venous blood draw that diagnoses a child, nor the laboratory dust-wipe clearance that a risk assessor signs. What it does is let a non-laboratory user decide, on the spot and for a few dollars, whether a given surface warrants the more expensive confirmation. The workflow therefore has three tiers, ordered cheapest first. The map predicts where to look, the field test flags which surfaces are likely positive, and laboratory blood and dust analysis confirms and acts. Each tier removes volume from the one above it.

Use cases

State and local childhood lead poisoning prevention (CLPP) programs and health-department inspectors. EPA's strengthened dust-lead rule, with full compliance required by January 12, 2026, lowered the dust-lead action levels to 5 ug/ft2 on floors and 40 ug/ft2 on interior windowsills, and redefined the hazard standard so that any laboratory-reportable level of dust-lead on a floor or windowsill, as analyzed by a lab in EPA's National Lead Laboratory Accreditation Program, is a hazard 15. Tighter standards mean more surfaces require formal evaluation, which increases the load on a finite pool of certified risk assessors and X-ray fluorescence (XRF) inspection time. A field test used as a pre-screen lets an inspector or a CLPP program triage: surfaces and homes that read clearly negative on the field test can be deprioritized, and laboratory dust-wipe sampling, which is what the standard is enforced against, is concentrated where a positive field read indicates it is likely to be warranted. The risk map directs that limited inspection capacity toward the tracts where measured blood-lead is highest, which is where the validation shows the map is most accurate.

HUD lead-hazard-control grantees. HUD's combined Lead-Based Paint Hazard Control (LBPHC) and Lead Hazard Reduction (LHRD) programs award up to $4 million to a jurisdiction to identify and control lead hazards in pre-1978 owner-occupied and rental housing, and applicants must operate or partner with an EPA-authorized lead abatement certification program 16. Grantees must find eligible high-risk units and document the work, and HUD deliverables routinely include outreach and education to high-risk families. A free field-test component supports both halves. It helps grantees prioritize unit intake within their target geography, and it is a tangible item to distribute to families in high-risk housing as part of an education deliverable. The risk map gives grantees a defensible, public-data basis for where to concentrate enrollment, consistent with the targeting logic HUD already uses to identify jurisdictions with deteriorated paint.

Renovation contractors under the RRP rule. EPA's Renovation, Repair, and Painting (RRP) rule requires certified firms disturbing paint in pre-1978 housing to follow lead-safe work practices and to perform post-work cleaning verification, and on HUD-funded jobs, laboratory dust-wipe clearance below the action levels in 40 CFR 745.227 17. A field test is a fast in-process check a certified renovator can run during cleanup, before committing to formal clearance sampling, to catch a surface that is still releasing lead and re-clean it rather than fail a paid clearance test. It does not substitute for the required cleaning-verification step or the HUD-job clearance dust wipe. What it does is cut the number of clearance failures and re-mobilizations.

Families in high-risk housing. For a household in a high-risk tract, the same three-tier logic applies at the kitchen-table scale. The map tells a family their neighborhood's housing stock is in the high-risk band; a field test lets them check the specific surfaces a small child contacts (windowsills, painted trim, porch components) without waiting on an inspection; a positive read is the prompt to seek a child's blood-lead test and a professional risk assessment. Because the test is non-destructive and does not require laboratory turnaround, it can serve as the part of a home cleaning-and-prevention kit that tells a family where to clean. That is, before a child is exposed, not after a blood test confirms exposure.

Why first-pass cost is the right frame

The economic case for lead prevention is settled and large. A peer-reviewed cost-benefit analysis estimates that every dollar invested in controlling lead-paint hazards returns $17 to $221 in avoided health-care, special-education, crime, and lost-lifetime-earnings costs 4. The constraint on capturing that return is not the value of prevention. It is the per-unit cost and throughput of finding the hazard. Confirmatory measurement (venous blood, laboratory dust wipes, XRF inspection) is accurate and necessary, and it is the cost that does not scale to a national high-risk housing stock numbering in the tens of millions of pre-1978 units. A screen-then-confirm workflow drives down the cost of the first pass, so the expensive, definitive measurements are spent where they are most likely to matter. The map makes that first pass free at the neighborhood scale using only public Census data; a cheap field test extends the first pass to the individual surface. Neither tier diagnoses a child or certifies a home. Each one raises the yield, and lowers the unit cost, of the confirmatory tier that does.

This is the deployment claim, stated conservatively: the map predicts risk, not poisoning, and the field test flags likely hazards, not certified ones. Used together and in order, they let a fixed budget of blood draws and laboratory dust wipes cover more of the children and homes where the validated data say the risk actually concentrates.

Where this is going: house-level targeting

This is the speculative part, my own direction for the work, and it is not validated here. The map ranks neighborhoods. The next step is to rank individual homes.

The strongest single predictor of lead paint in a specific house is the year it was built. National surveys put the chance that a home contains lead-based paint near 87 percent for houses built before 1940, about 69 percent for 1940 to 1959, about 24 percent for 1960 to 1977, and near zero after the 1978 ban. That per-house number, multiplied by the validated neighborhood risk score from this paper, gives a simple triage score: screen this home first. Year built is public, held in county assessor records. The neighborhood score is already built.

The honest boundary matters. The neighborhood map is validated science. A house-level number is a triage prior, not a validated prediction, and the language has to stay there: this home is older and sits in a higher-risk area, so look here first, never this home will poison a child. The first real test in any home overwrites the prior.

The deployment that fits a health department is a local join. The department runs its own enrollee list, its WIC or Medicaid or blood-lead registry addresses, against the parcel-risk layer inside its own building, and gets back a ranked door-knock list. No family's name ever leaves the department. That privacy architecture, not a promise but a design, is what lets a government partner say yes. House-level prediction has a real research lineage, from Miranda's parcel-level North Carolina models to the Flint lead-service-line work, and the contribution here is not the idea but the combination: a free validated national map, a cheap on-the-spot confirmatory test, and a join that never touches personal data.

How this was built

How this was built: an account of the work

What it took, plainly

This map exists because the public record kept disappearing and someone decided to put it back.

The method behind it is not new. Housing age plus poverty, z-scored and percentile-ranked per tract, is the Washington State Department of Health approach, the one Vox published code for in 2016 and the one EPA leaned on in the Zartarian hotspots study. What was missing was a version that was national, current, free, covered every tract including rural ones, and stayed up. NYU's dashboard skipped rural America. Vox's build was frozen on 2014 data and Python 2.7. PolicyMap charged for it. The federal all-tract tool was designed and never shipped. The opening was specific and it had been sitting open for years.

A small organization filled it in a single concentrated session. Not a quarter, not a grant cycle. One night of focused work, run by a person directing layered teams of AI agents and sub-agents, each team handed a bounded problem and reporting back into a shared synthesis before the next layer started. That structure is the only reason the scope was reachable in the time. No single thread could have held the Census pulls, the geometry, three state validations, a fifty-state access survey, and a public map at once. Split across coordinated deployments, each narrow enough to finish and verify, it closed.

Recovering what had been pulled down

The first problem was that some of the source data had gone dark. EPA's EJSCREEN, HUD housing-condition layers, and the Supplement B file from the EPA hotspots paper had been pulled from their public homes. A research deployment was sent after each one. Where a primary host was gone, the agents went to mirrors, archive snapshots, and the dataset DOIs that outlive the web pages pointing at them. The EPA hotspots dataset still resolves through its DOI even when the landing page does not. The paper's Supplement B, the per-tract measured blood-lead that later became the spine of the validation, was recovered and parsed rather than re-collected, because EPA had already done the hard part of assembling roughly 4.2 million children's blood tests from Michigan and Ohio and the only task left was to not lose it.

A note from the field, because it is the kind of thing that makes the work real. While fetching one source page, a tool hit text claiming a "125-character quote limit," formatted to look like a constraint coming from the page itself. It was not a real limit. It read like injected instruction sitting in the fetched content. The agent ignored it and pulled the genuine source files directly with curl, so the formula and the data came from the actual primary documents and not from anything a page was trying to tell the tooling to do. The discipline there matters more than the incident. Treat fetched content as data, never as orders.

Pulling and scoring 83,388 tracts

With the method settled and the sources back in hand, a build deployment went to the U.S. Census API for the inputs. You cannot pull every tract in one request, so the work loops over roughly fifty-one state FIPS codes, two calls per state. Housing age comes from table B25034, the detailed-tables base path, total occupied units down through the pre-1940 bucket. Poverty comes from table S1701, which lives on a separate subject-table base path and cannot be mixed into the same call, a small Census quirk that trips people who assume one endpoint. Each tract gets an eleven-digit GEOID built from state plus county plus tract, and everything is cached to disk so the pulls run once.

Then the scoring, faithful to the published method. Housing age is weighted toward the oldest stock the way Washington DOH weights it, the pre-1940 and pre-1950 homes carrying the most signal because that is where lead paint and lead service lines concentrate. Housing risk and poverty risk are each z-scored across all tracts, combined into a single standardized score, and percentile-ranked. Water-only tracts and zero-population tracts are dropped. The result is a score for all 3,222 counties and 83,388 census tracts, every one of them, including the rural places the city-only dashboards never see.

The same Census plumbing already existed in DetectLead's zip-screener, which pulls Census risk at the zip level. This was the tract-level version of inputs the organization had handled before, which is part of why one session was enough.

Holding the prediction against real blood

A prediction is a claim until you check it against measurement. The point of this build was to make that check, openly, tract by tract.

A validation deployment took the predicted risk score and held it against real measured childhood blood-lead in three states. In Michigan, predicted risk against measured elevated blood-lead across about 2,156 tracts gave a Spearman rank correlation of 0.54. In Ohio, about 2,534 tracts gave 0.62. The Michigan and Ohio measured values came out of the recovered Supplement B, the same blood tests EPA had validated against, now used a second time to validate this independent build. Wisconsin was the one that ran end to end with no human in the loop on the data. A sub-agent hit Wisconsin's open ArcGIS REST endpoint for childhood lead surveillance by census tract, pulled 208 metro-Milwaukee tracts live, and scored the rank correlation at 0.70. No FOIA, no email, no waiting. An open API answered and the validation wrote itself.

Rank correlation was the right tool because the question is whether the map orders neighborhoods correctly, worst to best, not whether it nails an exact microgram value it was never built to predict. Spearman compares the two rankings directly. The three results, 0.54, 0.62, and 0.70, sit in the moderate range EPA reported for its own validated indices, where Cohen's kappa ran 0.49 to 0.63. The map tracks measured exposure at a moderate level, in the same broad range as the federal study's own checks, though kappa and rank correlation are different measures and are not directly comparable. Each result is published side by side, predicted next to measured, on its own page, so anyone can see the agreement and the scatter for themselves.

Mapping who has the data and who hides it

One more deployment was sent to answer a question that turned out to matter as much as the map itself. If measured blood-lead is the gold standard, why predict at all? Because measured blood-lead is state-held and wildly uneven, and a survey deployment went state by state across all fifty plus DC to prove exactly how uneven.

The finding is stark. A handful of states publish clean machine-readable blood-lead at neighborhood grain. New York serves it at zip level on Socrata. Connecticut serves it by town. Wisconsin serves it by tract on ArcGIS. Roughly forty-five states report county-level numbers through the CDC Tracking Network API on the 3.5 microgram measure. And then a long tail of states locks it inside Tableau dashboards and PDF reports, New Hampshire and Colorado and Connecticut's deeper tables among them, reachable only by FOIA if at all. Five states ship no public blood-lead product at all.

That is the whole argument for the prediction map in one sentence. The places with the least public data are not the places with the least lead. A predicted map built from Census data, which exists uniformly for every tract in the country, brings the same neighborhood-level warning to the states that publish nothing as to the states that publish everything. The survey did not just catalogue access. It justified the project.

TECHNICAL BREAKOUT

The programmatic strategies, concretely, for anyone who wants to reproduce or audit the build.

Census ACS API pulls. Source is the ACS 5-year, vintage 2022, with a free Census API key. Housing age is table B25034 on the /data/2022/acs/acs5 base path, variable B25034_001E for total occupied units through B25034_011E for units built 1939 or earlier, with the oldest buckets weighted hardest. Poverty is table S1701 on the separate /data/2022/acs/acs5/subject base path, S1701_C03_001E for percent below the poverty line. The two tables live on different base paths and cannot be combined in one request, so the puller makes two calls per state and joins on GEOID. Tracts cannot be pulled nationally in a single call, so the loop iterates the state FIPS list, roughly fifty-one iterations, and caches each response to disk so the network work happens once. GEOID is assembled as state plus county plus tract, eleven digits, the key everything else joins on.

TIGERweb GeoJSON geometry. Attribute data is meaningless without shapes to paint. The build joins scored tracts to Census tract polygons by GEOID, using the lightweight cartographic-boundary geometry rather than full-resolution TIGER/Line, because at roughly 85,000 polygons the full geometry is far heavier than a browser map needs and the simplified boundaries render cleanly at choropleth scale. Output is GeoJSON keyed by GEOID, one feature per tract, attributes carrying the score and percentile.

The geoIdentity planar-projection fix for d3 winding. This one cost real time and is worth flagging for anyone who maps Census GeoJSON in d3. Census polygons do not always follow the winding order d3's spherical geometry expects. When d3 treats the coordinates as points on a sphere and a polygon's ring winds the "wrong" way, d3 reads the inside as the outside and fills the entire rest of the globe instead of the small tract, so a single bad ring paints the whole map a solid block. The fix is to stop asking d3 to reason about the sphere at all for already-projected or planar coordinates. Render through d3.geoIdentity, which treats coordinates as flat plane values and skips the spherical winding rules entirely, optionally with reflectY to put the origin where the data expects it. Planar identity projection, no winding ambiguity, tracts fill as tracts.

Socrata and CDC Tracking Network APIs. For the measured-blood-lead side and the access survey, two API families did the work. Socrata serves New York's zip-level and Connecticut's town-level blood-lead as plain JSON resource endpoints, queryable directly. The CDC Tracking Network API exposes annual blood-lead at county level for roughly forty-five states on the 3.5 microgram per deciliter measure through its core-holder gateway. Wisconsin's tract-level surveillance came off an ArcGIS REST endpoint returning features by census tract, which is what made the Wisconsin validation fully automatic, query the endpoint, get tracts back, score the correlation, done.

Rank-correlation validation. Validation is a Spearman rank correlation between predicted risk score and measured elevated blood-lead, computed per tract within each state. Spearman ranks both variables and correlates the ranks, which is the honest test for a screening tool whose job is to order neighborhoods correctly rather than to predict an exact blood value. Michigan returned 0.54 over about 2,156 tracts, Ohio 0.62 over about 2,534, Wisconsin 0.70 over 208 metro-Milwaukee tracts, in the same broad moderate range as EPA's own kappa-based checks (0.49 to 0.63), though the two statistics are not directly comparable.

The crash-resilient frame renderer. Rendering tens of thousands of tract features, plus any per-frame or per-state image output the map and its tooling needed, is long-running work that should never lose everything to one bad input. The renderer is built to survive its own failures. It checkpoints completed frames to disk as it goes, wraps each unit of rendering so a single malformed geometry or a one-off failure is caught, logged, and skipped rather than allowed to kill the run, and resumes from the last good checkpoint instead of restarting from zero. A crash costs one frame, not the night.

Why it reads as a small effort doing a large thing

Nothing here required inventing a new science. The risk index is settled and credited to the people who built it. What was new was the assembly: recovering public data that had been taken down, scoring every tract in the country from the primary Census source, validating that score against real children's blood in three states with one of them running with no human touching the data, mapping exactly which states hide their numbers, and shipping the whole thing free and public with the test that confirms a hazard on the spot sitting at the end of it.

It worked because the work was cut into pieces small enough to finish and checked at every seam. Agents recovered the sources. Sub-agents pulled and scored and validated and surveyed. Group syntheses pulled the pieces back together between layers so nothing drifted. A person held the direction and made the calls. That is the account. A small organization, one intense session, layered teams, and a public record put back where it belonged, with the receipts to show it is right.

There is no known level of lead exposure to be without risk. The map predicts where that risk concentrates. It does not diagnose a child and it does not replace a blood test. It is the lowest-cost first-pass screen in lead prevention, built in the open, and now anyone can check our work.

Technical breakout: the programmatic strategy

This supplement documents the programmatic methods behind the map, the validation, and the publishing system. It is written for a reviewer who wants to re-run the work or audit it. Nothing here requires proprietary data. Every input is a public government API or a published file, and every output is a page you can open.

B.1 How the work was organized

The build ran as a small organization of cooperating agents rather than one long script. A coordinating layer held the plan and the source-of-truth facts. Below it, specialized teams ran in parallel: a data team that pulled and joined state blood-lead measurements, a cartography team that solved the map projection and rendering, a writing team that drafted and fact-checked the paper sections, an offer team that structured the government pricing, and a render team that produced and recolored the video. Teams that did not depend on each other ran at the same time, so the wall-clock time of the program was set by the slowest single chain, not by the sum of the work. Where a task could be checked, a separate reviewer agent checked it, so that drafting and verification were never done by the same pass.

B.2 Data acquisition: getting real blood-lead numbers without a FOIA

The prediction is built from the U.S. Census American Community Survey: housing age (table B25034, weighting the pre-1940 and pre-1950 shares) and poverty (table S1701). Those are clean, keyed, and national. The harder problem was the validation, because measured childhood blood-lead is held by state health departments and the federal tracking network, not handed out as a tidy file.

The CDC National Environmental Public Health Tracking network has the county-level numbers, but its public API gateway rejects automated server-side requests at the firewall. The path that works is the same one the agency's own Data Explorer uses in the browser. A request to the core data holder endpoint, issued from inside a real browser session so it carries the session context the firewall expects, returns the county table as JSON. The data team scripted a headless browser to make that call for each state, parsed the returned tableResult records (each carries a county FIPS, a county name, and a measured value), and wrote them to a single joined file. This pulled ten states cleanly and validated eight, with no records request and no waiting. For states that do not publish the metric at the county level on that network, the script records a clean "no data on this path" rather than guessing.

For New York the team used the state's own open-data portal, which exposes blood-lead by ZIP through a standard Socrata query interface, and joined on ZIP instead of county.

B.3 The cartography problem: why the first maps filled the globe

The first interactive maps rendered as solid colored squares that covered the whole frame. The cause is a quiet convention clash. The Census TIGERweb service returns polygon rings wound clockwise, the ArcGIS convention. The standard web mapping library, d3-geo, treats geometry as spherical and uses ring direction to decide which side of a ring is "inside." Clockwise rings tell it the inside is everything except the county, so it dutifully fills the rest of the planet. Rewinding the rings by hand did not hold up across mixed sources.

The fix was to stop treating these planar, pre-projected shapes as spherical. The cartography team switched the projection to d3.geoIdentity().reflectY(true), which draws the coordinates as flat geometry and flips the vertical axis to match screen space, then fit it to the viewport. For sources that are already RFC-compliant, such as the published us-atlas topology, a normal Mercator projection fit to the same extent works directly. With that distinction settled, every map, national and per-state, renders correctly from the same template.

B.4 The validation pipeline: one repeatable path from API to published page

The eight state validations are not eight hand-built analyses. They are one pipeline run eight times. For a given state the pipeline hits the open data source, joins the measured values to our predicted risk on the shared geography (county FIPS, tract, or ZIP), computes a Spearman rank correlation between predicted and measured, and renders a side-by-side choropleth page where a reader can see the two maps next to each other and read the correlation. The rank correlation is the right test here because the claim is ordering, that the map puts the higher-risk places above the lower-risk ones, not that it reproduces an exact microgram value. Across the eight states the agreement ranges from 0.48 to 0.77, centered near 0.6, which is the band the federal EPA study reported for the same kind of comparison. Because the path is scripted, adding a ninth state is a data pull, not a rebuild.

B.5 Publishing: idempotent builders that can be re-run safely

Every page on the site is produced by a builder script, not edited by hand. Each builder is idempotent: it looks for the page by handle, updates it in place if it exists, and creates it if it does not. This matters because the storefront API intermittently reports a creation as failed when it actually succeeded. The builders catch that specific case, re-fetch by handle, and update, so a re-run converges to the right state instead of producing duplicates or stopping. Each builder also ensures the clean root URL works by registering the redirect from the short path to the underlying page. The result is that the whole site can be regenerated from the source files and data at any time, which is the same property that makes the analysis auditable.

B.6 The video: deterministic frames and a recolor that cannot miss a spot

The explainer video is rendered, not animated by hand. The scene is an HTML document that exposes a single function which, given a time in seconds, draws exactly the frame for that moment. A headless browser steps through the timeline calling that function and capturing each frame, so the render is fully deterministic: the same timeline always produces the same frames. The renderer is crash-resilient, relaunching the browser if it dies mid-run, and because a relaunch can leave a one-frame gap that would otherwise make the encoder stop early, a final check fills any missing frame before encoding so the full duration always survives to the finished file.

Recoloring the whole video to the jitter palette was done as a deterministic transform, not a re-creation. A single map of source colors to jitter colors is applied across the scene document, including the warm choropleth ramp remapped to the mint ramp and the alert red preserved, with the glow effects and the logo swapped to their dark-background versions. Because the transform is a fixed table applied programmatically, there are no missed elements and the recolor is reproducible from the original.

B.7 Reproducibility and provenance

The standard a skeptic should hold this to is simple: can an independent party rebuild it. The prediction comes from two named Census tables. The validation comes from named state and federal data sources reached through documented public endpoints. The correlations are a standard rank statistic anyone can recompute from the joined files. The maps, the paper, the pricing pages, and the video are each generated by a script from those inputs. There is no private dataset in the chain that a reviewer would have to take on faith. That is the point of publishing the method alongside the map: the map is only worth deploying behind a test if the map itself can be checked, and it can.


  1. Centers for Disease Control and Prevention, "CDC Updates Blood Lead Reference Value" (3.5 ug/dL, 97.5th percentile, NHANES 2015-2016 and 2017-2018). https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ^^

  2. Lanphear BP, Hornung R, Khoury J, et al. "Low-Level Environmental Lead Exposure and Children's Intellectual Function: An International Pooled Analysis." Environmental Health Perspectives 113(7):894-899, 2005. A blood lead rise from 2.4 to 10 micrograms per deciliter is associated with a 3.9-point IQ decline (95 percent CI, 2.4 to 5.3). https://pmc.ncbi.nlm.nih.gov/articles/PMC1257652/ ^

  3. Grosse SD, Matte TD, Schwartz J, Jackson RJ. "Economic Gains Resulting from the Reduction in Children's Exposure to Lead in the United States." Environmental Health Perspectives 110(6):563-569, 2002. About a 2.0 percent earnings change per IQ point on $723,300 present-value lifetime earnings for a two-year-old (2000 USD, 3 percent discount); base-case value near $14,500 per IQ point. https://pmc.ncbi.nlm.nih.gov/articles/PMC1240871/ ^

  4. Gould E., "Childhood Lead Poisoning: Conservative Estimates of the Social and Economic Benefits of Lead Hazard Control," Environmental Health Perspectives 117(7), 2009 (return of $17 to $221 per dollar invested). https://pmc.ncbi.nlm.nih.gov/articles/PMC2717145/ ^^^

  5. U.S. Department of Housing and Urban Development, American Healthy Homes Survey II, and EPA Renovation, Repair and Painting and Lead Disclosure rules implementing the 1978 ban on residential lead-based paint. HUD finds lead-based paint in roughly a third of U.S. homes overall and about 87 percent of pre-1940 homes. https://www.epa.gov/lead ^

  6. Cost of the recommended case-management and environmental-investigation visit sequence for a child with blood lead 20 to 45 micrograms per deciliter (eight visits, about $1,027 per child), citing CDC 2004 protocols. Reported in Gould 2009 (see above). https://pmc.ncbi.nlm.nih.gov/articles/PMC2717145/ ^

  7. McFarland MJ, Hauer ME, Reuben A. "Half of US Population Exposed to Adverse Lead Levels in Early Childhood." PNAS 119(11):e2118631119, 2022. 824,097,690 cumulative IQ points lost; 2.6 points per person on average; more than 170 million exposed; 1966 to 1970 cohort averaging a 5.9-point deficit. https://www.pnas.org/doi/10.1073/pnas.2118631119 ^

  8. Braun JM, Kahn RS, Froehlich T, Auinger P, Lanphear BP. "Exposures to Environmental Toxicants and Attention Deficit Hyperactivity Disorder in U.S. Children." Environmental Health Perspectives 114(12):1904-1909, 2006. Highest versus lowest blood-lead quintile OR = 4.1 (95 percent CI, 1.2 to 14.0) for ADHD, NHANES 1999-2002. https://pmc.ncbi.nlm.nih.gov/articles/PMC1764142/ ^

  9. Reuben A, Elliott ML, Abraham WC, et al. "Association of Childhood Lead Exposure With MRI Measurements of Structural Brain Integrity in Midlife" (JAMA 2020) and "Childhood lead exposure is associated with lower cognitive functioning at older ages" (Science Advances 8(45):eabn5164, 2022). Higher childhood blood lead tracks with lower midlife cognition and older estimated brain age in the Dunedin cohort. https://www.science.org/doi/10.1126/sciadv.abn5164 ^

  10. Wang X, Bakulski KM, Walker E, et al. "Exposure to lead and incidence of Alzheimer's disease and all-cause dementia in the United States." Alzheimer's & Dementia, 2026. Higher bone-lead burden associated with incident Alzheimer's disease and all-cause dementia in NHANES linked to Medicare; about 18 percent of new dementia cases attributable to lead. https://pmc.ncbi.nlm.nih.gov/articles/PMC12895363/ ^

  11. Centers for Disease Control and Prevention, "Recommendations for Blood Lead Screening of Medicaid-Eligible Children Aged 1--5 Years: an Updated Approach to Targeting a Group at High Risk," MMWR Recommendations and Reports 58(RR-9), 2009. https://www.cdc.gov/mmwr/preview/mmwrhtml/rr5809a1.htm ^^

  12. U.S. Department of Housing and Urban Development, American Healthy Homes Survey II (2018-2019); EPA summary, "How many homes still contain lead-based paint?" (34.6 million homes, 29.4 percent of all housing units). https://www.epa.gov/lead/i-thought-lead-based-paint-had-been-phased-out-how-many-homes-still-contain-lead-based-paint ^

  13. U.S. EPA / HUD, "Report on the National Survey of Lead-Based Paint in Housing," EPA 747-R-95-003 (lead-based-paint prevalence ranges by housing age: 71 to 100 percent pre-1940, 48 to 76 percent 1960 to 1977). https://www.epa.gov/sites/default/files/documents/r95-003.pdf ^

  14. Centers for Disease Control and Prevention, "Testing for Lead Poisoning in Children" (capillary vs. venous confirmation, false positives from skin contamination). https://www.cdc.gov/lead-prevention/testing/index.html ^

  15. U.S. EPA, "Hazard Standards and Clearance Levels for Lead in Paint, Dust and Soil (TSCA Sections 402 and 403)," and "Reconsideration of the Dust-Lead Hazard Standards and Dust-Lead Post-Abatement Clearance Levels," final rule published November 12, 2024, full compliance required January 12, 2026. https://www.epa.gov/lead/hazard-standards-and-clearance-levels-lead-paint-dust-and-soil-tsca-sections-402-and-403 ^

  16. U.S. Department of Housing and Urban Development, Lead-Based Paint Hazard Control (LBPHC) and Lead Hazard Reduction (LHRD) Grant Programs, FY2024 funding terms (up to $4 million per eligible jurisdiction). https://www.hud.gov/program_offices/cfo/gmomgmt/grantsinfo/fundingopps/LHR ^

  17. U.S. EPA, "Renovation, Repair and Painting Program: Work Practices," and "Clearance and Clearance Testing Requirements for the RRP Program" (action levels in 40 CFR 745.227(e)(8)). https://www.epa.gov/lead/renovation-repair-and-painting-program-work-practices ^

The map finds the risk. A test confirms it.

Open the national map See the validation