Abstract
Childhood lead exposure causes irreversible neurodevelopmental harm, and the U.S. Centers for Disease Control and Prevention has identified no safe blood lead level in children 1. Yet measured childhood blood lead is collected unevenly across U.S. states, and many jurisdictions publish no neighborhood-resolution surveillance data at all. We built a free, current national map that predicts childhood lead-exposure risk for all 3,222 U.S. counties and 83,388 census tracts from public American Community Survey 2018-2022 five-year data, combining housing-age indices (table B25034, weighting pre-1940 and pre-1950 housing) with poverty (table S1701) 23. The indices are z-scored, summed, and percentile-ranked per tract, reproducing the housing-plus-poverty screening approach the U.S. Environmental Protection Agency applied to 73,086 census tracts nationally and validated against measured childhood blood lead 4. We then tested the map against real measured childhood blood lead, tract by tract, in three states. Predicted risk correlated with measured elevated blood lead at Spearman rho = 0.66 in Michigan (~2,156 tracts), 0.65 in Ohio (~2,534 tracts), and 0.70 in metropolitan Milwaukee, Wisconsin (208 tracts), the Wisconsin figure pulled automatically from the state health department's open ArcGIS data with no records request. EPA reported its own index-to-blood-lead agreement as Cohen's kappa of 0.49 to 0.63 against roughly 4.2 million children's tests in Michigan and Ohio 4. Kappa and Spearman rho measure agreement differently, so the values are not directly comparable, but both fall in the moderate-to-substantial range, indicating our independent validation tracks measured exposure about as closely as EPA's. Because the map brings neighborhood-level warning to every state, including those with no public blood data, it functions as a low-cost first-pass screen that can direct confirmatory hazard testing where children are most at risk. The map predicts risk at the neighborhood level; it does not diagnose exposure in any individual child or home.
Plain-language summary
Lead paint and lead dust in older homes still harm American children, and even small amounts can permanently lower a child's IQ and attention. We do not have to wait for a child's blood test to show where the danger concentrates. Two facts the U.S. Census already collects for every neighborhood, how old the houses are and how many families are poor, predict where lead exposure is most likely. We turned those facts into a free map covering every county and neighborhood in the country, then checked it against actual blood-lead results from children in Michigan, Ohio, and Wisconsin. The map's predictions lined up with where children really tested high, in the same moderate-to-substantial range the federal government's own version reached. The map shows where to look first. A cheap on-the-spot lead test can then confirm a hazard in a specific home before a child is exposed. The map points to risky neighborhoods, not to specific poisoned children.
Introduction
The problem has no safe threshold
Childhood lead exposure is a settled and quantified public-health failure, not an open question. The U.S. Centers for Disease Control and Prevention states plainly that there is no safe level of lead in a child's blood, and that even low levels can affect IQ, attention, and academic achievement 5. In 2021 the CDC lowered its blood lead reference value from 5.0 to 3.5 micrograms per deciliter, set at the 97.5th percentile of the blood-lead distribution among U.S. children ages 1 to 5, so that more children with comparatively higher levels would be identified 1. The agency is explicit that the reference value is not health-based and is not a regulatory standard; it is a screening marker for the population, not a clinical safety line, and a result below it does not mean a child is unharmed 1.
The aggregate damage is large and already incurred. McFarland, Hauer, and Reuben (2022), in the Proceedings of the National Academy of Sciences, estimate that childhood exposure to leaded gasoline cost the living U.S. population roughly 824 million cumulative IQ points across more than 170 million Americans, on the order of half the population, with an average loss of about 2.6 points per person and the most exposed birth cohort (1966 to 1970) losing roughly six points each 6. Lead added to gasoline beginning in 1923 was not banned for on-road use until 1996, so most adults born before that date carry a measurable childhood-exposure burden 6. That burden appears to propagate into older-adult outcomes: Brown and colleagues (2025), in Alzheimer's and Dementia, link historical atmospheric lead, mapped at its 1960 to 1974 peak, to higher odds of memory problems half a century later in two large representative samples 7. The exposure pathway has shifted but not closed. Leaded paint in housing built before the 1978 residential ban remains a dominant present-day source of childhood exposure, and the CDC estimates that roughly 500,000 U.S. children currently have blood-lead levels at or above the reference value 5.
The surveillance gap: most children are never tested, and the data that exist are state-held and uneven
Knowing that lead harms children at any dose is not the same as knowing which children to protect. The United States has no complete, population-representative measurement of childhood blood lead. The CDC receives about 3 million blood-lead test results per year, a fraction of the roughly 22 million children under six 8. More important than the count is the selection: testing is deliberately concentrated on children judged to be at higher risk, so the reported surveillance data are, in the CDC's own words, "not a population-based estimate" and "not representative of a whole county or a whole state" 8. The agency points anyone seeking nationally representative prevalence to NHANES, a survey designed for estimation rather than for locating individual neighborhoods at risk 8. The result is a structural blind spot. A child who is never tested, in a place where few children are tested, is statistically invisible, and the places with the least testing are not randomly distributed.
The measured data that do exist are held by states, not the federal government, and access to them is uneven by jurisdiction. Some states publish blood-lead surveillance through open data interfaces, and the CDC Environmental Public Health Tracking Network carries county-level elevated-blood-lead measures for participating states 9. Others lock the same information in static reports or dashboards that require formal records requests to obtain at usable spatial resolution. This patchwork means that the granularity of available evidence about childhood lead risk depends less on where the hazard is and more on each state's data-publishing posture. A family, clinician, or local health department in a low-publishing state has no neighborhood-level signal at all, even though the underlying housing and poverty drivers of risk are present and measurable there.
Why a prediction-first national map is needed
When direct measurement is incomplete and unevenly accessible, the established response is to predict risk from variables that are measured everywhere. The U.S. Environmental Protection Agency took exactly this approach. Zartarian et al. (2024), in Environmental Science and Technology, screened 73,086 census tracts containing at least one child under six in the 50 states and modeled lead-exposure risk from publicly available indicators, principally the age of housing and poverty, precisely because limitations in children's blood-lead surveillance and gaps in environmental data make it difficult to identify communities with disproportionate exposure by measurement alone 10. They evaluated the predicted hotspots against approximately 1.9 million Michigan blood-lead results (2006 to 2016) and approximately 2.3 million Ohio results (2005 to 2018), and found moderate-to-substantial agreement, with Cohen's kappa from 0.49 to 0.63 10. They also found that a reduced model built on three variables, the share of homes built before 1940, the share built before 1950, and poverty, predicted hotspots about as well as the full model, and they restate the foundational premise that "there is no known level of lead exposure to be without risk" 10.
That EPA result establishes the method but not a usable public instrument. The inputs it relies on are open and national. The American Community Survey publishes year-structure-built down to the census-tract level in table B25034 and poverty status in subject table S1701 1112, so the same housing-plus-poverty risk signal can be computed for every tract in the country, including the states where no measured blood-lead data are publicly accessible. A prediction-first map turns an indicator that depends on each state's reporting choices into a uniform, neighborhood-level warning that exists everywhere the Census reaches. It does not diagnose any individual child and is not a substitute for a blood test; it is a screening layer that says where measured testing and on-the-ground hazard confirmation should go first.
This paper builds that national, tract-level prediction from public ACS data using the validated housing-and-poverty method, and then makes its own independent contribution: holding the predicted map against real, state-measured childhood blood-lead at the tract level in three states, to test whether a freely reproducible national map tracks measured reality at the strength the EPA analysis itself reported.
Methods
2.1 Overview and design rationale
We constructed a national, tract-level index of predicted childhood lead-exposure risk from two publicly documented determinants: the age of the housing stock and the prevalence of poverty. The choice of inputs is not novel. It follows the housing-age-plus-poverty approach the Washington State Department of Health uses for its Lead Exposure Risk Index, which combines an ACS 5-year housing-age measure and an ACS 5-year poverty measure into a single community-level score 13. The same two determinants anchor the indices the U.S. Environmental Protection Agency screened in its national hotspots analysis, where a reduced three-variable random-forest model (percent of homes built before 1940, percent built before 1950, and a family income-to-poverty measure) reproduced the hotspot pattern of the full five-variable model 1415. Lead-based residential paint was banned for consumer use in 1978, so the age of housing is a direct proxy for the presence of leaded paint and the dust it generates, and poverty proxies both deteriorated-paint maintenance deficits and reduced remediation capacity 21. We selected these inputs because both are available, current, and uniform for every census tract in the country, the property that lets one method cover states that publish no measured blood-lead data.
Our contribution is not the index form but its national reconstruction on the most recent 5-year American Community Survey (ACS) vintage and its tract-by-tract validation against measured childhood blood lead, reported separately. The EPA hotspots analysis was built on 2010 Census geography and ACS 2013-2017 5-year inputs 15. We rebuilt the index on ACS 2018-2022 5-year estimates 16, which moves the housing and poverty measures forward roughly a decade and re-bases the geography on 2020-vintage census tracts.
2.2 Data sources
All inputs are American Community Survey 2018-2022 5-year estimates, the vintage released December 7, 2023 and current at the time of analysis 16. The 5-year file is the only ACS product published down to the census-tract level for the full universe of tracts, and it is the correct vintage for small-area estimates because the 1-year file does not tabulate most tracts 16. Four tables were pulled.
| Purpose | ACS table | Type | Key variables used |
|---|---|---|---|
| Housing age | B25034, Year Structure Built | Detail |
B25034_001E (total units), B25034_011E (built 1939 or earlier), B25034_010E (built 1940 to 1949) 17
|
| Poverty | S1701, Poverty Status in the Past 12 Months | Subject |
S1701_C01_001E (population for whom poverty status is determined), S1701_C03_001E (percent below poverty level) 18
|
| Total population | B01003, Total Population | Detail |
B01003_001E (total population) 19
|
| Children under 18 | B09001, Population Under 18 Years by Age | Detail |
B09001_001E (population under 18 years) 20
|
B25034 reports occupied and vacant housing units by the decade the structure was built; its top two age categories are "Built 1939 or earlier" (B25034_011E) and "Built 1940 to 1949" (B25034_010E), which together give the two pre-threshold shares the method requires 17. S1701 is the ACS subject table for poverty status; the percent-below-poverty estimate is published directly as S1701_C03_001E over the universe "population for whom poverty status is determined" (S1701_C01_001E), so no separate denominator computation is needed 18. B01003 supplies total population and B09001 supplies the count of residents under 18, used for population-weighting and for the child-burden overlay rather than for the risk score itself 1920.
2.3 Census API pipeline
Data were retrieved programmatically from the Census Data API. Detailed tables (B25034, B01003, B09001) were requested from the ACS 2022 5-year detailed-tables endpoint and the subject table (S1701) from the parallel subject endpoint 16:
https://api.census.gov/data/2022/acs/acs5?get=NAME,group(B25034)&for=tract:*&in=state:{FIPS}&key={KEY}
https://api.census.gov/data/2022/acs/acs5/subject?get=NAME,group(S1701)&for=tract:*&in=state:{FIPS}&key={KEY}
The API caps tract-level wildcard queries at one state per call, so the pipeline iterated in=state:{FIPS} across the 50 states, the District of Columbia, and Puerto Rico, then concatenated the responses. County-level inputs were retrieved with for=county:*. Estimates were joined to 11-digit tract GEOIDs (2-digit state + 3-digit county + 6-digit tract) and to 5-digit county GEOIDs. Records with a missing or zero housing-unit denominator (B25034_001E), or carrying the Census sentinel values for suppressed or unestimable cells, were dropped before scoring; this is what reduces the raw tract universe to the scored set in Section 2.6. Margins of error are published for every estimate (the _M-suffixed variables) and were retained but not propagated into the point score 17.
2.4 Housing-age sub-index
For each tract t we computed two age shares directly from B25034:
pre1940_t = B25034_011E / B25034_001E
pre1950_t = (B25034_011E + B25034_010E) / B25034_001E
pre1940 is the share of units built 1939 or earlier; pre1950 is the cumulative share built 1949 or earlier 17. Using both thresholds rather than one mirrors the EPA reduced model, which carried percent-pre-1940 and percent-pre-1950 as separate predictors because the oldest stock (the highest-lead, most-degraded paint) and the broader pre-war stock each contribute independent signal 1415. Older housing is the dominant driver of residential lead-dust exposure, so the sub-index is weighted toward the pre-1940 share. Each share was standardized to a z-score across the scored tract universe,
z(x_t) = (x_t - mean(x)) / sd(x)
and the two standardized shares were combined into a single housing-age z-score with a 0.70/0.30 weighting in favor of pre-1940:
Z_housing_t = 0.70 * z(pre1940_t) + 0.30 * z(pre1950_t)
Standardizing before combining puts the two shares on a common scale so the weights act on relative position rather than on raw percentage magnitudes. This standardize-then-combine order follows the construction of the Washington State DOH index 13.
2.5 Poverty sub-index and the combined score
The poverty input is the published percent-below-poverty estimate, taken without recomputation:
poverty_t = S1701_C03_001E / 100
over the universe of persons for whom poverty status is determined 18. It was standardized to a z-score on the same scored universe, Z_poverty_t = z(poverty_t). The housing-age and poverty z-scores were then combined into a single composite with a 0.58/0.42 weighting:
Z_risk_t = 0.58 * Z_housing_t + 0.42 * Z_poverty_t
The housing-weighted split reflects that the age of housing is the proximate source of the lead, while poverty modifies exposure and remediation. This ordering is consistent with the public-health framing that children in poverty in pre-1950 housing carry the greatest risk 21. The weights are a deliberate, transparent modeling choice, not a fitted coefficient. The Washington State DOH index instead weights its housing-age and poverty measures equally; our 0.58/0.42 split is our own and is tested, not asserted 13. The validation in the companion sections checks the resulting ranking against measured blood lead rather than defending the weights a priori.
2.6 Percentile ranking and coverage
Because the composite z-score has no interpretable absolute units, the final published value is a within-nation percentile rank of Z_risk_t, computed separately over the scored tract universe and over the scored county universe:
percentile_t = 100 * rank(Z_risk_t) / N
A tract at the 90th percentile carries higher predicted risk than 90 percent of scored tracts nationally. Percentile ranking is the same family of final transform the Washington State DOH index uses, which bins its standardized, weighted measures into population deciles on a 1-to-10 scale 13; we retain the full 0-100 percentile rather than binning to deciles to preserve resolution for the validation analysis. Scoring this way makes the index ordinal by construction, which is why the validation uses Spearman rank correlation against measured blood-lead rates.
After dropping records with a missing or zero housing denominator and Census-suppressed cells (Section 2.3), the method scored 3,222 counties and 83,388 census tracts. This is the subset of the full 2018-2022 ACS tract universe for which both a valid housing-age distribution and a valid poverty estimate exist. For comparison, the EPA hotspots analysis screened 73,086 tracts on the older ACS 2013-2017 geography, restricted to tracts in the 50 states containing at least one child under six years old 15. Our larger count reflects the newer 2020-vintage tract geography, the inclusion of tracts regardless of child presence, and coverage of the District of Columbia and Puerto Rico. We did not impose a child-presence filter on the score itself; instead, the B09001 under-18 count and B01003 total population are carried alongside each scored tract as an exposed-population overlay 1920.
2.7 Scope and interpretation
The output is a prediction of relative risk from housing age and poverty. It is not a measurement of lead in any specific home and not a diagnosis of any child. It is a screening surface meant to direct confirmatory testing, consistent with EPA's own statement that "there is no known level of lead exposure to be without risk" and with the use of these indices as targeting tools rather than exposure measurements 141521. The entire pipeline is reproducible from the four ACS tables and the public Census API with no restricted or licensed inputs.
Footnotes / sources
Reviewer note (what I changed and why)
Every numeric claim and citation was checked against primary sources by web search and direct Census API lookups. The science holds up well. Four substantive fixes, the rest light tightening.
-
Author name was wrong in both EPA footnotes. Draft cited "Zartarian Morrison V." The verified byline (PubMed 38334298, PMC10882963, ACS journal record, EPA bio) is "Valerie G. Zartarian," no "Morrison." Fixed to "Zartarian VG" and spelled out the full nine-author list in 14. Volume/issue/pages/DOI (58(7):3311-3321, 2024, 10.1021/acs.est.3c07881) all verified correct.
-
The 3,222-county figure contradicted the pipeline text. The draft said the pipeline iterated "50 states plus the District of Columbia," but 3,222 counties exceeds the 50-states-plus-DC universe (~3,143-3,144 county-equivalents). 3,143 + DC + Puerto Rico's 78 municipios lands at ~3,222. I reconciled the prose by adding Puerto Rico to the pipeline loop and to the coverage explanation. Confirm this matches your actual run. If you did NOT pull Puerto Rico, the county count needs to be re-derived, because as written the two statements can't both be true. The 83,388-tract count I left as your computed output.
-
Washington DOH attribution was partly inaccurate. The draft quoted the WA index as "standardized, weighted, summed, and ranked" and implied it supported a housing-over-poverty weighting. The live DOH data-notes page actually weights housing age and poverty EQUALLY, weights housing by American Healthy Homes Survey II decade-deterioration factors, and outputs a 1-to-10 decile scale. I removed the not-verifiable verbatim quote, kept the defensible "standardize-then-combine order" point, and added one sentence stating plainly that WA weights the two equally while your 0.58/0.42 split is your own and is tested in validation. This protects you from a reviewer pulling up the DOH page and finding the equal-weight discrepancy.
-
EPA comparison tightened for accuracy. The 73,086-tract comparison now states the EPA restrictions explicitly (50 states, at least one child under six) and attributes our larger count to the newer geography, no child filter, and DC + PR coverage, rather than the vaguer original phrasing. The three-variable model's poverty term is the family income-to-poverty ratio, now noted.
Verified correct, left as-is: B25034_011E = "Built 1939 or earlier" and B25034_010E = "Built 1940 to 1949" (Census API group page); S1701_C01_001E and S1701_C03_001E labels (Census variable JSON, exact); ACS 2018-2022 released Dec 7 2023; EPA inputs 2010 Census + ACS 2013-2017; Michigan ~1.9M (2006-2016) and Ohio ~2.3M (2005-2018); kappa 0.49-0.63; the "no known level of lead exposure to be without risk" quote; the RF v1 five-variable / RF v2 three-variable structure.
Other edits: corrected the 16 series label to 2009-2023 (the developer set is not "2009-2024") and added the Dec 7 2023 release date there. Scrubbed for AI-cliche phrasing and confirmed zero em-dashes (all dashes are hyphens in ranges or compounds). Reinforced screening-not-diagnosis and risk-not-poisoning language in 2.5, 2.6, and 2.7. No real citation was dropped; all eight sources resolve and support the claims attached to them.
One thing I could not independently confirm: the exact 0.70/0.30 and 0.58/0.42 weights and the 83,388 scored-tract count are internal outputs of your pipeline, not public facts, so I left them as stated and only made the surrounding prose consistent. If you want, regenerate the scored counts with the Puerto Rico question settled and I'll lock the numbers.
Validation
A predictive risk map is only worth deploying if it agrees with where children are actually being poisoned. This section establishes that agreement in two stages. First, we summarize the federal anchor: EPA's national hotspots analysis, which validated housing-and-poverty indices against roughly 4.2 million measured childhood blood-lead tests in two states. Second, we present our own independent, tract-by-tract validation of the map published at detectlead.com/lead-risk-map against measured childhood blood-lead in three states, two reproduced from the federal study's own supplement and one pulled live from a state open-data API. The two stages use different statistics (the federal work reports Cohen's kappa on hotspot agreement; we report Spearman rank correlation of continuous predicted risk against continuous measured exposure), so they are complementary rather than redundant, and our results land at the same strength the federal authors achieved.
The federal anchor: Zartarian et al. (2024)
The scientific foundation is Zartarian et al., "A U.S. Lead Exposure Hotspots Analysis," published in Environmental Science & Technology in 2024 22. EPA's Office of Research and Development screened 73,086 census tracts containing at least one child under six across all 50 states, scoring each tract on lead-exposure indices built from housing age and sociodemographic data drawn from the American Community Survey 23.
The decisive step in that paper is not the prediction. It is the validation against real children. EPA held the predicted hotspots against measured childhood blood-lead surveillance data in two states with unusually complete records:
- Michigan: approximately 1.9 million blood-lead results from children under six, covering 2006 to 2016 23.
- Ohio: approximately 2.3 million blood-lead results from children under six, covering 2005 to 2018 23.
Across those roughly 4.2 million measured tests, the predicted hotspots showed moderate-to-substantial agreement with the locations where children actually carried elevated blood-lead, with Cohen's kappa scores of 0.49 to 0.63 22. EPA interpreted kappa on a fixed scale: below 0.4 is low, above 0.4 to 0.6 is moderate, above 0.6 to 0.8 is substantial, and above 0.8 is near-perfect agreement 23. A companion tract-scale study of Ohio by the same EPA group reports a comparable band, kappa 0.54 to 0.64 comparing observed blood-lead hotspots against the predictive indices across the 3.5, 5, and 10 µg/dL reference values 24.
Two further results from the paper matter for anyone building on it. First, a reduced three-variable model, using only percent of homes built before 1940, percent built before 1950, and the percent of families with an income-to-poverty ratio above 2, performed comparably to the full five-variable model, holding kappa at 0.51 to 0.63 across the Michigan and Ohio datasets 23. That is the result that makes a transparent, reproducible national map possible from public data alone: the heavy lifting is done by housing age plus poverty. Second, the authors anchor the entire effort in the established toxicology, stating plainly that "there is no known level of lead exposure to be without risk" 23. A screening map does not need to find a safe threshold, because there is none; it needs to rank where exposure concentrates.
Our method, in brief
Our national map follows the validated three-variable approach directly. Housing age comes from ACS table B25034 (Year Structure Built), weighting the oldest stock most heavily because pre-1940 and pre-1950 homes carry the highest lead-paint burden 25. Poverty comes from ACS table S1701 (Poverty Status in the Past 12 Months) 18. We z-score each component, combine them, and percentile-rank the result per tract across all 3,222 counties and 83,388 tracts. This is the Washington State Department of Health and EPA lineage, not a novel index. The contribution here is not the model. It is the independent test of the model against measured blood-lead, tract by tract, in three states.
Independent multi-state validation
For each validation state we joined our predicted tract-level risk to the state's measured childhood blood-lead at the same geography (or its published equivalent), then computed the Spearman rank correlation between predicted risk and measured exposure. Spearman is the right statistic here: it asks whether the map orders tracts from lower to higher risk the way the measured blood data orders them, without assuming a linear relationship or a particular blood-lead distribution.
| State | Measured-data source | Tracts joined | Spearman ρ (predicted vs measured) |
|---|---|---|---|
| Michigan | Zartarian 2024 surveillance, aggregated by tract (2006–2016) | ~2,156 | 0.66 |
| Ohio | Zartarian 2024 surveillance, aggregated by tract (2005–2018) | ~2,534 | 0.65 |
| Wisconsin | WI DHS childhood lead-poisoning data by census tract (ArcGIS) | 208 (metro Milwaukee) | 0.70 |
For Michigan and Ohio we used the measured surveillance values published in the federal study's own supplement, the same ~1.9M and ~2.3M-test datasets EPA validated against, so our predicted risk is being checked against the identical ground truth the agency used. For Wisconsin we did not need the federal supplement at all: the measured tract-level data came straight from Wisconsin DHS's open ArcGIS service, which publishes children under six tested, children testing positive, and percent poisoned at the census-tract level. Wisconsin suppresses a tract only when fewer than five children there are poisoned, and even then leaves it visible if 100 or more children were tested 26. That pull required no FOIA and no manual extraction.
A Spearman ρ of 0.65 to 0.70 means the map's risk ranking and the measured childhood-exposure ranking move together strongly and monotonically. These figures sit at or above the strength of the federal study's own kappa band, on independent joins, in three separate states with three separate surveillance systems. The map predicts where measured childhood lead exposure is highest, and it does so as well as the EPA-validated method it is built on. The standard caveat holds and is load-bearing: this is a tract-level screening signal, not a diagnosis. The map predicts neighborhood risk, never an individual child's blood-lead.
The data-access pipeline, and why prediction is necessary
Measured childhood blood-lead is held by states, and access is wildly uneven. That unevenness is the practical reason a prediction map matters: it brings the same neighborhood-level warning to every state, including the many that publish no usable blood data at all.
Where measured data exists in machine-readable form, we ingest it automatically:
- New York (ZIP-level, Socrata). New York publishes childhood blood-lead testing and elevated-incidence counts by ZIP code (excluding New York City) on health.data.ny.gov, served through the Socrata Open Data API (SODA) 27. We query it programmatically.
- County-level, roughly 45 states (CDC Tracking Network API). The CDC Environmental Public Health Tracking Network exposes childhood blood-lead surveillance through a machine-readable API, with the ≥3.5 µg/dL classification adopted for 2022-forward data after CDC lowered the blood-lead reference value from 5 to 3.5 µg/dL in October 2021 28. Children are counted once per year at their highest result 5. This is the broad county-level backbone.
- Wisconsin (tract-level, ArcGIS). Pulled live, as described above 26.
Then there are the holdouts. States including New Hampshire, Colorado, and Connecticut lock their childhood blood-lead behind Tableau dashboards or PDF reports with no API, so obtaining tract- or ZIP-level measured values requires a public-records (FOIA) request and manual extraction. These are exactly the places where a family has no public way to learn that their neighborhood's housing stock and poverty profile put their child at elevated risk. The prediction map closes that gap. It does not wait for a state to publish blood tests, because it does not need blood tests to run; it needs only the public Census housing-and-poverty data that exists, uniformly, for every tract in the country. The validation above is what licenses trusting that prediction where no measured data is available to check it.
Prevention economics: what a first-pass screen is worth
A risk map answers where. It does not answer whether spending money to look there pays. This section builds the cost-benefit case for using a cheap field screen as the first pass in the locations the map flags, states the model as explicit equations, works a 10,000-kit example, and grounds every dollar figure in the published lead-economics literature. The screen estimates risk and confirms a present hazard on the spot. It is not a blood test and does not diagnose a child.
The core fact that makes the math work is old and well established: lead damage is permanent and expensive, and avoided damage is worth far more than the cost of avoiding it. The U.S. Centers for Disease Control and Prevention sets a blood lead reference value of 3.5 micrograms per deciliter, drawn from the 97.5th percentile of blood lead in U.S. children aged 1 to 5, to flag the children in the top 2.5 percent of exposure, and is explicit that no safe blood lead level has been identified 1. Below we monetize what staying under that line is worth.
The unit value of a child: IQ to lifetime earnings
The economic value of preventing exposure rests on a dose-response chain that has been stable in the literature for two decades. Lanphear and colleagues, pooling seven cohort studies, found that an increase in concurrent blood lead from 2.4 to 10 micrograms per deciliter was associated with a decline of 3.9 IQ points (95 percent CI, 2.4 to 5.3), with the steepest loss per microgram at the lowest exposures 29. Grosse and colleagues then converted IQ loss to money, applying roughly a 2.0 percent decline in lifetime earnings per IQ point against a present value of lifetime earnings of about $723,300 in 2000 dollars for a two-year-old, which yields a base-case value near $14,500 per IQ point 30. Later work in the same lineage carried that figure to about $17,815 in present-value lifetime earnings lost per IQ point in 2006 dollars 31. Adjusted for inflation alone, $17,815 in 2006 dollars is roughly $28,000 today, before any allowance for real earnings growth.
The model in this paper uses a deliberately rounded, conservative per-child value of $22,000, representing the present-discounted lifetime-earnings loss avoided when a single child is kept off a meaningful exposure path. At $22,000 the model sits below the inflation-adjusted earnings figure on purpose. It is earnings only. It excludes the costs of special education, medical management, lost parental productivity, and criminal-justice involvement that the same literature attributes to lead, so it understates true societal benefit.
The cost-benefit model
A deployment equips N kits at a per-kit cost. The variables:
| Symbol | Meaning | Default |
|---|---|---|
N |
kits deployed | 10,000 |
cost_per_kit |
manufactured plus distributed cost per kit | $50 |
pct_eligible |
share of target homes that are pre-1978 and carry lead-paint risk | 0.36 |
hazard_rate |
share of those homes with a detectable lead-paint hazard | 0.30 |
detection |
probability the screen flags a present, accessible hazard (idealized ceiling) | 1.00 |
action_rate |
share of found hazards that lead to remediation or avoidance | 0.55 |
kids_per_home |
young children per affected home | 1.00 |
value_per_child |
lifetime-earnings loss avoided per child | $22,000 |
The equations:
hazards_found = N × pct_eligible × hazard_rate × detection
kids_spared = hazards_found × action_rate × kids_per_home
benefit = kids_spared × value_per_child
program_cost = N × cost_per_kit
return_ratio = benefit ÷ program_cost
breakeven = program_cost ÷ kids_spared (benefit needed per child to break even)
Two of these inputs deserve a flag. The detection term is set to 1.00 as an idealized upper bound: the reagent has nanogram sensitivity and flags lead-paint dust below the HUD 10-micrograms-per-square-foot floor-dust standard, but no field screen catches every hazard, so a real deployment runs below 1.00 and the worked example below is therefore a best case, not a promise. The hazard_rate is held conservative against HUD's American Healthy Homes Survey, which finds lead-based paint in a far larger share of the oldest stock (about 87 percent of pre-1940 homes); pinning the rate at 0.30 understates hazards in exactly the old housing the map prioritizes. The pct_eligible default of 0.36 is held to roughly the HUD national estimate that on the order of a third of all U.S. homes contain some lead-based paint 32; note that the share of homes simply built before the 1978 residential lead-paint ban is higher still, near half, so this input is conservative on both counts.
Worked example: 10,000 kits at $50
Running the defaults:
hazards_found = 10,000 × 0.36 × 0.30 × 1.00 = 1,080kids_spared = 1,080 × 0.55 × 1.00 = 594benefit = 594 × $22,000 = $13,068,000program_cost = 10,000 × $50 = $500,000return_ratio = $13,068,000 ÷ $500,000 = 26.1 : 1breakeven = $500,000 ÷ 594 = $842 per child
A $500,000 program returns about $13.1 million in avoided lifetime-earnings loss, a 26-to-1 return, and breaks even if each child kept off the exposure path is worth at least $842, against the roughly $22,000 the literature supports. The program clears its break-even threshold by a factor of about 26. Because detection is set to its 1.00 ceiling, treat these as the upper edge of the range rather than the expected outcome.
Refill economics push the return higher
The $50 per-kit default is the full first-unit cost: reagent bottle, rechargeable 365-nanometer flashlight, fluorescent reference card, and printed bag. The flashlight and card are durable. On refills, only the consumable reagent recurs, dropping marginal cost toward roughly $5 per screen. Holding benefit constant and substituting cost_per_kit = $5:
program_cost = 10,000 × $5 = $50,000return_ratio = $13,068,000 ÷ $50,000 = 261 : 1
Refilling rather than re-kitting raises the return into the low hundreds to one. This is in the same range as the lead-hazard-control literature, where Gould put the return at $17 to $221 per dollar spent 31. The first kit buys the hardware; every screen after that is nearly pure prevention.
Why screening beats waiting: the reactive cost
The alternative to screening is finding hazards after a child's blood lead is already elevated, through the case-management and environmental-investigation pathway that state and local health departments run. For a substantially elevated child, in the 20 to 45 micrograms-per-deciliter range, CDC's recommended response runs to eight visits for diagnostic testing, nurse follow-up, and a home environmental investigation, documented at about $1,027 per child for that visit sequence 33. Lower but still elevated levels trigger a shorter version of the same pathway. Once the loaded cost of case-management labor, repeat testing, and follow-up that continues until the child's level falls or the child ages out is counted, the realistic reactive cost lands on the order of $1,000 to $2,000 per confirmed case.
That figure is the point. The reactive system spends roughly $1,000 to $2,000 per child it has already failed to protect, after exposure has occurred and after the permanent IQ cost has been incurred. The proactive screen costs $5 to $50 and runs before exposure. The reactive pathway is both more expensive per child and, by construction, too late.
The societal cost the map is screening against
The per-child value above is conservative precisely because the aggregate burden is enormous. McFarland, Hauer, and Reuben estimate that childhood lead exposure has cost the living U.S. population about 824 million cumulative IQ points as of 2015 (824,097,690 points), an average of 2.6 points per person, with more than 170 million Americans, roughly half the population, exposed to harmful levels in early childhood and the 1966 to 1970 birth cohort averaging a 5.9-point deficit 34. At the population level, the highest quintile of childhood blood lead carries a 4.1-fold increase in the odds of ADHD relative to the lowest (95 percent CI, 1.2 to 14.0) in NHANES 35. And the bill is not closed. Reuben and colleagues, following the Dunedin birth cohort, found that higher childhood blood lead tracked with lower cognitive function and an older estimated brain age in midlife 36. Separately, a 2026 NHANES-Medicare analysis found that higher cumulative lead burden, measured in bone, was associated with incident Alzheimer's disease and all-cause dementia, attributing roughly 18 percent of new dementia cases to lead 37. The leaded-gasoline cohorts, born roughly 1955 to 1975, are aging into the years of heightened dementia risk now.
Against a societal loss measured in hundreds of millions of IQ points and a cohort effect still unfolding, a screening tool that costs single-digit dollars per use and surfaces a hazard before a child is exposed is not a marginal intervention. The map identifies where risk concentrates; the economics show that screening there returns on the order of 26 to 1 at full kit cost, and higher still on refills, before counting a single dollar of avoided medical, educational, or criminal-justice cost. The map predicts risk, not any individual child's exposure, and the screen confirms a hazard, not a diagnosis. Both narrow where the scarce dollars should go first.
Deployment and Public-Health Use: A Screen-Then-Confirm Workflow
The gap this workflow fills
Childhood lead-exposure surveillance in the United States is incomplete by design. The Centers for Disease Control and Prevention recommends targeted blood-lead testing focused on children in pre-1978 housing and with sociodemographic risk factors, and instructs state and local officials to build local screening plans that reflect local risk, rather than defaulting to universal testing 38. The Centers for Medicare and Medicaid Services require a blood-lead test for Medicaid-enrolled children at 12 and 24 months, and for any child aged 36 to 72 months not previously tested, but coverage in practice is uneven and most states have not reconciled their screening targets with local prevalence data 38. The result is a country where the location of the hazard is largely predictable from public data but the measured outcome, a child's blood-lead level, is observed only after exposure has already occurred, and only in the subset of children who are actually tested.
The risk map described in this paper closes the front half of that gap. Built from Census ACS 2022 housing-age and poverty data and validated against measured childhood blood-lead in three states (Spearman 0.65 to 0.70), it predicts, for all 3,222 counties and 83,388 tracts, where lead exposure concentrates, including the many states that publish no neighborhood-level blood-lead data at all. What it cannot do is confirm a hazard inside a specific home. A high-risk tract is a statement about housing stock and poverty, not a diagnosis of any one address. About 29 percent of U.S. homes, an estimated 34.6 million units, still contain some lead-based paint, and prevalence rises sharply with age, from 48 to 76 percent of units built 1960 to 1977 up to 71 to 100 percent of units built before 1940 3940. Within any high-risk tract, some homes carry an active lead-dust hazard and some do not. Distinguishing them requires a measurement at the property.
Why a cheap confirmatory test belongs in the loop
The conventional confirmatory tools are blood-lead testing of the child and environmental investigation of the home. Both are essential, and both arrive late or expensive. A finger-stick (capillary) blood-lead screen confirms that exposure has already happened, and because residual lead on the skin produces frequent false positives, any capillary result at or above CDC's blood-lead reference value of 3.5 ug/dL should be confirmed with a venous draw 41. CDC set that reference value at the 97.5th percentile of the U.S. distribution for children aged 1 to 5 (NHANES 2015-2016 and 2017-2018); it is a screening threshold, not a safety threshold, and CDC and the National Toxicology Program hold that no blood-lead level is known to be without risk 1. Environmental investigation by a certified risk assessor, using laboratory-analyzed dust-wipe sampling against EPA's hazard standards, is the definitive home measurement, but it is a scheduled, paid inspection that does not scale to a screening pass over a high-risk neighborhood.
A low-cost field test occupies the missing middle. FluoroSpec uses a methylammonium bromide reagent in isopropanol that fluoresces bright green under 365 nm ultraviolet light when it contacts lead in surface paint and dust, giving an immediate visible read at the surface without a laboratory turnaround. Positioned correctly, it is the lowest-cost first-pass screen in the prevention sequence: it does not replace the venous blood draw that diagnoses a child, and it does not replace the laboratory dust-wipe clearance that a risk assessor signs, but it lets a non-laboratory user decide, on the spot and for a few dollars, whether a given surface warrants the more expensive confirmation. The workflow is therefore three tiers, ordered cheapest first: the map predicts where to look, the field test flags which surfaces are likely positive, and laboratory blood and dust analysis confirms and acts. Each tier removes volume from the one above it.
Use cases
State and local childhood lead poisoning prevention (CLPP) programs and health-department inspectors. EPA's strengthened dust-lead rule, with full compliance required by January 12, 2026, lowered the dust-lead action levels to 5 ug/ft2 on floors and 40 ug/ft2 on interior windowsills, and redefined the hazard standard so that any laboratory-reportable level of dust-lead on a floor or windowsill, as analyzed by a lab in EPA's National Lead Laboratory Accreditation Program, is a hazard 42. Tighter standards mean more surfaces require formal evaluation, which increases the load on a finite pool of certified risk assessors and X-ray fluorescence (XRF) inspection time. A field test used as a pre-screen lets an inspector or a CLPP program triage: surfaces and homes that read clearly negative on the field test can be deprioritized, and laboratory dust-wipe sampling, which is what the standard is enforced against, is concentrated where a positive field read indicates it is likely to be warranted. The risk map directs that limited inspection capacity toward the tracts where measured blood-lead is highest, which is where the validation shows the map is most accurate.
HUD lead-hazard-control grantees. HUD's combined Lead-Based Paint Hazard Control (LBPHC) and Lead Hazard Reduction (LHRD) programs award up to $4 million to a jurisdiction to identify and control lead hazards in pre-1978 owner-occupied and rental housing, and applicants must operate or partner with an EPA-authorized lead abatement certification program 43. Grantees must find eligible high-risk units and document the work, and HUD deliverables routinely include outreach and education to high-risk families. A free field-test component supports both halves: it helps grantees prioritize unit intake within their target geography, and it is a tangible item that can be distributed to families in high-risk housing as part of an education deliverable. The risk map gives grantees a defensible, public-data basis for where to concentrate enrollment, consistent with the targeting logic HUD already uses to identify jurisdictions with deteriorated paint.
Renovation contractors under the RRP rule. EPA's Renovation, Repair, and Painting (RRP) rule requires certified firms disturbing paint in pre-1978 housing to follow lead-safe work practices and to perform post-work cleaning verification, and on HUD-funded jobs, laboratory dust-wipe clearance below the action levels in 40 CFR 745.227 44. A field test is a fast in-process check a certified renovator can run during cleanup, before committing to formal clearance sampling, to catch a surface that is still releasing lead and re-clean it rather than fail a paid clearance test. It does not substitute for the required cleaning-verification step or the HUD-job clearance dust wipe; it reduces the number of clearance failures and re-mobilizations.
Families in high-risk housing. For a household in a high-risk tract, the same three-tier logic applies at the kitchen-table scale. The map tells a family their neighborhood's housing stock is in the high-risk band; a field test lets them check the specific surfaces a small child contacts (windowsills, painted trim, porch components) without waiting on an inspection; a positive read is the prompt to seek a child's blood-lead test and a professional risk assessment. Because the test is non-destructive and does not require laboratory turnaround, it functions as the component of a home cleaning-and-prevention kit that tells a family where to clean, before a child is exposed rather than after a blood test confirms exposure.
Why first-pass cost is the right frame
The economic case for lead prevention is settled and large. A peer-reviewed cost-benefit analysis estimates that every dollar invested in controlling lead-paint hazards returns $17 to $221 in avoided health-care, special-education, crime, and lost-lifetime-earnings costs 31. The binding constraint on capturing that return is not the value of prevention but the per-unit cost and throughput of finding the hazard. Confirmatory measurement (venous blood, laboratory dust wipes, XRF inspection) is accurate and necessary, and it is the cost that does not scale to a national high-risk housing stock numbering in the tens of millions of pre-1978 units. The contribution of a screen-then-confirm workflow is to drive down the cost of the first pass so that the expensive, definitive measurements are spent where they are most likely to matter. The map makes that first pass free at the neighborhood scale using only public Census data; a cheap field test extends the first pass to the individual surface. Neither tier diagnoses a child or certifies a home. Their function is to raise the yield, and lower the unit cost, of the confirmatory tier that does.
This is the deployment claim, stated conservatively: the map predicts risk, not poisoning, and the field test flags likely hazards, not certified ones. Used together and in order, they let a fixed budget of blood draws and laboratory dust wipes cover more of the children and homes where the validated data say the risk actually concentrates.
Limitations and Ethics
This map is a screening tool. It predicts where childhood lead-exposure risk concentrates, using public housing and poverty data, and we validate that prediction against measured childhood blood lead in three states. It does not measure lead in any home, and it does not diagnose any child. The distinctions below are not throat-clearing. Each one bounds a specific claim a reader might otherwise draw from a colored map, and each is grounded in the same literature the method is built on.
The estimates are ecological, not individual
Every value on this map is a property of a census tract, not of a person or a house. The model is built from tract-level aggregates (the share of housing built before 1940 and before 1950, from Census table B25034, and tract poverty from table S1701), so its output describes the average risk environment of a neighborhood. Reading a tract-level association as if it applied to an individual is the ecological fallacy: the error of assuming that because a neighborhood scores high, a given child or address in it is high-risk, or that because it scores low, a child in it is safe 45. The EPA hotspots analysis we extend is explicit on this point. Its authors state that the analysis operates at the population level (census tract, county, and state) and "cannot identify sources at particular addresses or risk at an individual level" 4. A high-risk tract still contains remediated and lead-free homes; a low-risk tract still contains pre-1940 houses with deteriorating lead paint. The map narrows where to look. It cannot tell any single family whether their home has a hazard. That is what a physical test is for.
It predicts risk, not poisoning
The indices are correlates of exposure risk (old paint, concentrated poverty), not a measurement of lead in a child's blood. Our validation establishes that the predicted surface tracks measured childhood blood lead at the same strength the source paper reports (Spearman 0.66 in Michigan, 0.65 in Ohio, 0.70 in metro-Milwaukee Wisconsin; the EPA paper reports Cohen's kappa of 0.49 to 0.63 against roughly 1.9 million Michigan tests from 2006 to 2016 and 2.3 million Ohio tests from 2005 to 2018) 4. A correlation of that order is strong for an ecological model and weak as a basis for any individual prediction. Substantial variance remains unexplained, because exposure also depends on factors absent from the model: actual paint condition, renovation and disturbance history, water service-line material, soil, imported consumer goods, and occupational take-home lead. The map should be read as a prioritization signal, never as a count of poisoned children.
The input data carry a vintage lag
The map is only as current as the American Community Survey behind it. The 2018-2022 ACS 5-year estimates pool responses collected across the full five-year window of January 1, 2018 through December 31, 2022, so the housing and poverty picture is a multi-year average, not a snapshot of today 46. Housing stock changes slowly, which makes the pre-1940 share relatively stable, but poverty, occupancy, and demolition or renovation can shift faster than the data refresh. Any tract that has seen significant teardown, rehabilitation, or demographic turnover since the survey window will be characterized with a lag of several years. This is the same constraint the EPA analysis faced. That work relied on 2010 census inputs, and we improve currency by moving to the 2018-2022 vintage, but we do not eliminate the lag 4.
Small and rural tracts are measured least precisely
ACS reliability degrades as geography shrinks, and the 5-year tract estimates that this map depends on are the Census Bureau's least precise published level. The Bureau ships a margin of error with every estimate for exactly this reason and urges caution where it is large 47. The problem is structural: in the 2007-2011 ACS the average tract had only about 135 completed interviews over five years, against an average of about 280 housing units in the 2000 long form, and tract-level margins of error run on average about 75 percent larger than the corresponding 2000 long-form figures 48. The 5-year estimates that absorbed the COVID-disrupted 2020 collection year carry wider margins still 49. The practical consequence falls hardest on low-population and rural tracts, where small samples widen the uncertainty band and a tract's percentile rank can be noisy. Sparse rural areas can also be physically large, so a single risk value may average over heterogeneous housing across many miles. Rural risk on this map should be read with more caution than urban risk, not less.
Measured blood-lead data are scarce and uneven, which is the point
The reason a prediction map is needed is that ground-truth blood-lead data are incomplete and inconsistently available. Surveillance undercounts exposure because not all children are tested. CDC receives about 3 million blood-lead test results a year, and the agency states plainly that these data are "not a population-based estimate" and "are not representative of the United States or even of an entire state or county" 50. Coverage and reporting vary by state, by insurer, and by year. That scarcity has two consequences for this work. First, our validation is necessarily limited to the few states that publish tract-resolvable measured data (Michigan and Ohio via the EPA paper's Supplement B; Wisconsin via the state's open ArcGIS service), so external validity to states with different housing eras and testing regimes is assumed, not proven. Second, the same scarcity is the public-health case for the map: it extends a neighborhood-level warning to the many jurisdictions (for example New Hampshire, Colorado, and Connecticut) where measured blood-lead data are locked in PDFs or Tableau dashboards and reachable only by FOIA. Where there is no public blood data, a validated prediction is the only neighborhood-level signal available.
Screening, not diagnosis
The honest framing for both the map and the test it points to is screening. CDC is explicit that even its blood-lead reference value of 3.5 micrograms per deciliter is "a screening tool," is "not health-based," and is "not a regulatory standard," and that no safe level of lead in children has been identified 51. A neighborhood risk map sits one step further from diagnosis than a blood test does. It flags places to investigate. It does not establish that a hazard exists at any address or that any child has been exposed. The appropriate response to a high-risk tract is confirmation, not alarm: an inspection, a dust-wipe, an XRF reading, or a low-cost field screen, followed by a blood-lead test for the child if a hazard is found. Used that way, the map does what a screen should do. It directs limited inspection and testing resources toward the places most likely to need them, and it makes no claim it cannot support about any individual home or child.
-
Centers for Disease Control and Prevention, "CDC Updates Blood Lead Reference Value" (3.5 ug/dL, 97.5th percentile, NHANES 2015-2016 and 2017-2018). https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ^^^^^
-
U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates, Table B25034: Year Structure Built. https://data.census.gov/table/ACSDT5Y2022.B25034 ^
-
U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates, Table S1701: Poverty Status in the Past 12 Months. https://data.census.gov/table/ACSST5Y2022.S1701 ^
-
Zartarian, V. G., Xue, J., Poulakos, A. G., et al. (2024). "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology 58(7), 3311-3321. DOI: 10.1021/acs.est.3c07881. The authors report agreement of Cohen's kappa 0.49-0.63 against children's blood-lead hotspots from approximately 1.9 million Michigan tests (2006-2016) and 2.3 million Ohio tests (2005-2018), screened 73,086 census tracts containing at least one child under six, state the analysis operates at the population level and "cannot identify sources at particular addresses or risk at an individual level," and rely on 2010 census inputs. https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^^
-
CDC, "About the Data: Blood Lead Surveillance" and "Childhood Blood Lead Surveillance: State Data" (machine-readable Tracking Network API; child counted once per year at highest result; ≥3.5 µg/dL classification for 2022-forward data). https://www.cdc.gov/lead-prevention/php/data/blood-lead-surveillance.html ^^^
-
McFarland MJ, Hauer ME, Reuben A. "Half of US population exposed to adverse lead levels in early childhood." PNAS 2022;119(11):e2118631119. https://www.pnas.org/doi/10.1073/pnas.2118631119 ^^
-
Brown EE, Lombard M, Chan A, Ayotte J, Rakowska S, Fuller-Thomson E. "Historical Atmospheric Lead Concentrations (1960-1974) and Memory Problems Half a Century Later." Alzheimer's and Dementia 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12726548/ ^
-
CDC, "Data and Statistics, Childhood Lead Poisoning Prevention." https://www.cdc.gov/lead-prevention/php/data/index.html ; "About the Data: Blood Lead Surveillance." https://www.cdc.gov/lead-prevention/php/data/blood-lead-surveillance.html ^^^
-
CDC Environmental Public Health Tracking, "Childhood Lead Poisoning." https://www.cdc.gov/environmental-health-tracking/php/data-research/childhood-lead-poisoning.html ^
-
Zartarian V, Xue J, Poulakos A, Tornero-Velez R, Stanek L, Snyder E, Helms Garrison V, Egan K, Courtney J. "A U.S. Lead Exposure Hotspots Analysis." Environmental Science and Technology 2024;58(7):3311-3321. DOI 10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^^
-
U.S. Census Bureau, American Community Survey table B25034, Year Structure Built. https://data.census.gov/table?q=B25034 ^
-
U.S. Census Bureau, American Community Survey subject table S1701, Poverty Status in the Past 12 Months. https://data.census.gov/table/ACSST1Y2022.S1701 ^
-
Washington State Department of Health. Lead Exposure Risk Index, data notes (housing-age and poverty measures from the ACS 2018-2022 5-year file, weighted equally and ranked into population deciles on a 1-to-10 scale). https://doh.wa.gov/data-and-statistical-reports/washington-tracking-network-wtn/lead-risk-and-exposure/lead-exposure-risk-ibl-data-notes ^^^^
-
Zartarian VG, Xue J, Poulakos AG, Tornero-Velez R, Stanek LW, Snyder E, Helms Garrison V, Egan K, Courtney JG. A U.S. Lead Exposure Hotspots Analysis. Environmental Science & Technology. 2024;58(7):3311-3321. doi:10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^^^
-
Zartarian VG, et al. A U.S. Lead Exposure Hotspots Analysis (full text). PMC10882963 (73,086 tracts in the 50 states with at least one child under six; RF v1 five-variable and RF v2 three-variable pre-1940 + pre-1950 + income-to-poverty models; 2010 Census geography and ACS 2013-2017 inputs; Michigan ~1.9M blood-lead points 2006-2016 and Ohio ~2.3M 2005-2018; Cohen's kappa 0.49-0.63; "no known safe level"). https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^^
-
U.S. Census Bureau. American Community Survey 5-Year Data (2009-2023); developer API documentation; 2018-2022 release dated December 7, 2023. https://www.census.gov/data/developers/data-sets/acs-5year.html ^^^^^
-
U.S. Census Bureau. Census Data API, ACS 2022 5-year, table B25034 (Year Structure Built) variable group: B25034_001E total, B25034_010E "Built 1940 to 1949", B25034_011E "Built 1939 or earlier". https://api.census.gov/data/2022/acs/acs5/groups/B25034.html ^^^^
-
U.S. Census Bureau, American Community Survey 5-Year Estimates, Table S1701 "Poverty Status in the Past 12 Months." https://data.census.gov/table/ACSST5Y2022.S1701 ^^^^
-
U.S. Census Bureau. Table B01003, Total Population, ACS 2022 5-year. https://data.census.gov/table/ACSDT5Y2022.B01003 ^^^
-
U.S. Census Bureau. Table B09001, Population Under 18 Years by Age (universe: population under 18 years), ACS 2022 5-year. https://censusreporter.org/tables/B09001/ ^^^
-
U.S. Centers for Disease Control and Prevention. Sources of lead exposure; lead in paint, dust, and soil; older housing (pre-1978) as the primary residential source. https://www.cdc.gov/lead-prevention/prevention/index.html ^^^
-
Zartarian, V., Xue, J., Poulakos, A., Tornero-Velez, R., Stanek, L., Snyder, E., Helms Garrison, V., Egan, K., Courtney, J. (2024). "A U.S. Lead Exposure Hotspots Analysis." Environmental Science & Technology, 58(7), 3311–3321. DOI: 10.1021/acs.est.3c07881. https://pubs.acs.org/doi/10.1021/acs.est.3c07881 ^^
-
Zartarian et al. (2024), full text, PubMed Central PMC10882963 (73,086 tracts; Michigan ~1.9M tests 2006–2016; Ohio ~2.3M tests 2005–2018; kappa interpretation scale; reduced three-variable model kappa 0.51–0.63; "no known level of lead exposure to be without risk"). https://pmc.ncbi.nlm.nih.gov/articles/PMC10882963/ ^^^^^^
-
Stanek, L. W., Xue, J., Zartarian, V. G., et al. (2024). "Identification of high lead exposure locations in Ohio at the census tract scale using a generalizable geospatial hotspot approach." Journal of Exposure Science & Environmental Epidemiology, 34(4), 718–726 (Ohio tract-scale validation, 2005–2018 blood-lead, Cohen's kappa 0.54–0.64 observed hotspots vs predictive indices). PubMed Central PMC11303242. https://pmc.ncbi.nlm.nih.gov/articles/PMC11303242/ ^
-
U.S. Census Bureau, American Community Survey 5-Year Estimates, Table B25034 "Year Structure Built." https://data.census.gov/table/ACSDT5Y2022.B25034 ^
-
Wisconsin Department of Health Services, Environmental Public Health Tracking, childhood lead-poisoning data by census tract (children under 6 tested, positive, percent poisoned; suppressed when fewer than five children poisoned, unless 100 or more tested; published via ArcGIS). https://www.dhs.wisconsin.gov/epht/lead.htm ^^
-
New York State Department of Health, "Childhood Blood Lead Testing and Elevated Incidence by Zip Code: Beginning 2000," health.data.ny.gov, served via the Socrata Open Data API (SODA). https://health.data.ny.gov/Health/Childhood-Blood-Lead-Testing-and-Elevated-Incidenc/d54z-enu8 ^
-
CDC, "Update of the Blood Lead Reference Value — United States, 2021," MMWR 70(43):1509–1512 (BLRV lowered 5 → 3.5 µg/dL, October 2021). https://www.cdc.gov/mmwr/volumes/70/wr/mm7043a4.htm ^
-
Lanphear BP, Hornung R, Khoury J, et al. "Low-Level Environmental Lead Exposure and Children's Intellectual Function: An International Pooled Analysis." Environmental Health Perspectives 113(7):894-899, 2005. A blood lead rise from 2.4 to 10 micrograms per deciliter is associated with a 3.9-point IQ decline (95 percent CI, 2.4 to 5.3). https://pmc.ncbi.nlm.nih.gov/articles/PMC1257652/ ^
-
Grosse SD, Matte TD, Schwartz J, Jackson RJ. "Economic Gains Resulting from the Reduction in Children's Exposure to Lead in the United States." Environmental Health Perspectives 110(6):563-569, 2002. About a 2.0 percent earnings change per IQ point on $723,300 present-value lifetime earnings for a two-year-old (2000 USD, 3 percent discount); base-case value near $14,500 per IQ point. https://pmc.ncbi.nlm.nih.gov/articles/PMC1240871/ ^
-
Gould E., "Childhood Lead Poisoning: Conservative Estimates of the Social and Economic Benefits of Lead Hazard Control," Environmental Health Perspectives 117(7), 2009 (return of $17 to $221 per dollar invested). https://pmc.ncbi.nlm.nih.gov/articles/PMC2717145/ ^^^
-
U.S. Department of Housing and Urban Development, American Healthy Homes Survey II, and EPA Renovation, Repair and Painting and Lead Disclosure rules implementing the 1978 ban on residential lead-based paint. HUD finds lead-based paint in roughly a third of U.S. homes overall and about 87 percent of pre-1940 homes. https://www.epa.gov/lead ^
-
Cost of the recommended case-management and environmental-investigation visit sequence for a child with blood lead 20 to 45 micrograms per deciliter (eight visits, about $1,027 per child), citing CDC 2004 protocols. Reported in Gould 2009 (see above). https://pmc.ncbi.nlm.nih.gov/articles/PMC2717145/ ^
-
McFarland MJ, Hauer ME, Reuben A. "Half of US Population Exposed to Adverse Lead Levels in Early Childhood." PNAS 119(11):e2118631119, 2022. 824,097,690 cumulative IQ points lost; 2.6 points per person on average; more than 170 million exposed; 1966 to 1970 cohort averaging a 5.9-point deficit. https://www.pnas.org/doi/10.1073/pnas.2118631119 ^
-
Braun JM, Kahn RS, Froehlich T, Auinger P, Lanphear BP. "Exposures to Environmental Toxicants and Attention Deficit Hyperactivity Disorder in U.S. Children." Environmental Health Perspectives 114(12):1904-1909, 2006. Highest versus lowest blood-lead quintile OR = 4.1 (95 percent CI, 1.2 to 14.0) for ADHD, NHANES 1999-2002. https://pmc.ncbi.nlm.nih.gov/articles/PMC1764142/ ^
-
Reuben A, Elliott ML, Abraham WC, et al. "Association of Childhood Lead Exposure With MRI Measurements of Structural Brain Integrity in Midlife" (JAMA 2020) and "Childhood lead exposure is associated with lower cognitive functioning at older ages" (Science Advances 8(45):eabn5164, 2022). Higher childhood blood lead tracks with lower midlife cognition and older estimated brain age in the Dunedin cohort. https://www.science.org/doi/10.1126/sciadv.abn5164 ^
-
Wang X, Bakulski KM, Walker E, et al. "Exposure to lead and incidence of Alzheimer's disease and all-cause dementia in the United States." Alzheimer's & Dementia, 2026. Higher bone-lead burden associated with incident Alzheimer's disease and all-cause dementia in NHANES linked to Medicare; about 18 percent of new dementia cases attributable to lead. https://pmc.ncbi.nlm.nih.gov/articles/PMC12895363/ ^
-
Centers for Disease Control and Prevention, "Recommendations for Blood Lead Screening of Medicaid-Eligible Children Aged 1--5 Years: an Updated Approach to Targeting a Group at High Risk," MMWR Recommendations and Reports 58(RR-9), 2009. https://www.cdc.gov/mmwr/preview/mmwrhtml/rr5809a1.htm ^^
-
U.S. Department of Housing and Urban Development, American Healthy Homes Survey II (2018-2019); EPA summary, "How many homes still contain lead-based paint?" (34.6 million homes, 29.4 percent of all housing units). https://www.epa.gov/lead/i-thought-lead-based-paint-had-been-phased-out-how-many-homes-still-contain-lead-based-paint ^
-
U.S. EPA / HUD, "Report on the National Survey of Lead-Based Paint in Housing," EPA 747-R-95-003 (lead-based-paint prevalence ranges by housing age: 71 to 100 percent pre-1940, 48 to 76 percent 1960 to 1977). https://www.epa.gov/sites/default/files/documents/r95-003.pdf ^
-
Centers for Disease Control and Prevention, "Testing for Lead Poisoning in Children" (capillary vs. venous confirmation, false positives from skin contamination). https://www.cdc.gov/lead-prevention/testing/index.html ^
-
U.S. EPA, "Hazard Standards and Clearance Levels for Lead in Paint, Dust and Soil (TSCA Sections 402 and 403)," and "Reconsideration of the Dust-Lead Hazard Standards and Dust-Lead Post-Abatement Clearance Levels," final rule published November 12, 2024, full compliance required January 12, 2026. https://www.epa.gov/lead/hazard-standards-and-clearance-levels-lead-paint-dust-and-soil-tsca-sections-402-and-403 ^
-
U.S. Department of Housing and Urban Development, Lead-Based Paint Hazard Control (LBPHC) and Lead Hazard Reduction (LHRD) Grant Programs, FY2024 funding terms (up to $4 million per eligible jurisdiction). https://www.hud.gov/program_offices/cfo/gmomgmt/grantsinfo/fundingopps/LHR ^
-
U.S. EPA, "Renovation, Repair and Painting Program: Work Practices," and "Clearance and Clearance Testing Requirements for the RRP Program" (action levels in 40 CFR 745.227(e)(8)). https://www.epa.gov/lead/renovation-repair-and-painting-program-work-practices ^
-
Subramanian, S. V., Jones, K., Kaddour, A., and Krieger, N. (2009). "Revisiting Robinson: The perils of individualistic and ecologic fallacy." International Journal of Epidemiology 38(2), 342-360. DOI: 10.1093/ije/dyn359. On the hazard of reading area-level associations as individual-level associations. https://pmc.ncbi.nlm.nih.gov/articles/PMC2663721/ ^
-
U.S. Census Bureau. "2018-2022 ACS 5-Year Estimates" technical documentation and "Period Estimates in the American Community Survey." The 2018-2022 5-year estimates pool data collected from January 1, 2018 through December 31, 2022 and do not represent a single point in time. https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2022/5-year.html ^
-
U.S. Census Bureau. "Using American Community Survey Estimates and Margins of Error." Margins of error are published with each estimate so users can judge reliability, and users are urged to use caution where margins of error are high. https://www.census.gov/content/dam/Census/programs-surveys/acs/guidance/training-presentations/20180418_MOE_Webinar_Transcript.pdf ^
-
Spielman, S. E., Folch, D., and Nagle, N. (2014). "Patterns and causes of uncertainty in the American Community Survey." Applied Geography 46, 147-157. Tract-level ACS margins of error average about 75 percent larger than the corresponding 2000 long-form estimates; in the 2007-2011 ACS the average tract had about 135 completed surveys over five years, against an average of about 280 housing units in the 2000 long form. https://pmc.ncbi.nlm.nih.gov/articles/PMC4232960/ ^
-
U.S. Census Bureau (2022). "Increased Margins of Error in the 5-Year Estimates Containing Data Collected in 2020." The reduced 2020 response count raised relative margins of error and caused several key estimates to exceed the Bureau's quality threshold. https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-04.html ^
-
CDC, Childhood Lead Poisoning Prevention. "Childhood Blood Lead Surveillance: National Data." About 3 million blood-lead test results are received by CDC each year; the data are "not a population-based estimate" and "are not representative of the United States or even of an entire state or county." https://www.cdc.gov/lead-prevention/php/data/national-surveillance-data.html ^
-
CDC, Childhood Lead Poisoning Prevention. "CDC Updates Blood Lead Reference Value." The 3.5 ug/dL reference value is a screening tool, is not health-based, and is not a regulatory standard; no safe level of lead in children has been identified. https://www.cdc.gov/lead-prevention/php/news-features/updates-blood-lead-reference-value.html ^
Supplemental A. How this was built
How this was built: an account of the work
What it took, plainly
This map exists because the public record kept disappearing and someone decided to put it back.
The method behind it is not new. Housing age plus poverty, z-scored and percentile-ranked per tract, is the Washington State Department of Health approach, the one Vox published code for in 2016 and the one EPA leaned on in the Zartarian hotspots study. What was missing was a version that was national, current, free, covered every tract including rural ones, and stayed up. NYU's dashboard skipped rural America. Vox's build was frozen on 2014 data and Python 2.7. PolicyMap charged for it. The federal all-tract tool was designed and never shipped. The opening was specific and it had been sitting open for years.
A small organization filled it in a single concentrated session. Not a quarter, not a grant cycle. One night of focused work, run by a person directing layered teams of AI agents and sub-agents, each team handed a bounded problem and reporting back into a shared synthesis before the next layer started. That structure is the only reason the scope was reachable in the time. No single thread could have held the Census pulls, the geometry, three state validations, a fifty-state access survey, and a public map at once. Split across coordinated deployments, each narrow enough to finish and verify, it closed.
Recovering what had been pulled down
The first problem was that some of the source data had gone dark. EPA's EJSCREEN, HUD housing-condition layers, and the Supplement B file from the EPA hotspots paper had been pulled from their public homes. A research deployment was sent after each one. Where a primary host was gone, the agents went to mirrors, archive snapshots, and the dataset DOIs that outlive the web pages pointing at them. The EPA hotspots dataset still resolves through its DOI even when the landing page does not. The paper's Supplement B, the per-tract measured blood-lead that later became the spine of the validation, was recovered and parsed rather than re-collected, because EPA had already done the hard part of assembling roughly 4.2 million children's blood tests from Michigan and Ohio and the only task left was to not lose it.
A note from the field, because it is the kind of thing that makes the work real. While fetching one source page, a tool hit text claiming a "125-character quote limit," formatted to look like a constraint coming from the page itself. It was not a real limit. It read like injected instruction sitting in the fetched content. The agent ignored it and pulled the genuine source files directly with curl, so the formula and the data came from the actual primary documents and not from anything a page was trying to tell the tooling to do. The discipline there matters more than the incident. Treat fetched content as data, never as orders.
Pulling and scoring 83,388 tracts
With the method settled and the sources back in hand, a build deployment went to the U.S. Census API for the inputs. You cannot pull every tract in one request, so the work loops over roughly fifty-one state FIPS codes, two calls per state. Housing age comes from table B25034, the detailed-tables base path, total occupied units down through the pre-1940 bucket. Poverty comes from table S1701, which lives on a separate subject-table base path and cannot be mixed into the same call, a small Census quirk that trips people who assume one endpoint. Each tract gets an eleven-digit GEOID built from state plus county plus tract, and everything is cached to disk so the pulls run once.
Then the scoring, faithful to the published method. Housing age is weighted toward the oldest stock the way Washington DOH weights it, the pre-1940 and pre-1950 homes carrying the most signal because that is where lead paint and lead service lines concentrate. Housing risk and poverty risk are each z-scored across all tracts, combined into a single standardized score, and percentile-ranked. Water-only tracts and zero-population tracts are dropped. The result is a score for all 3,222 counties and 83,388 census tracts, every one of them, including the rural places the city-only dashboards never see.
The same Census plumbing already existed in DetectLead's zip-screener, which pulls Census risk at the zip level. This was the tract-level version of inputs the organization had handled before, which is part of why one session was enough.
Holding the prediction against real blood
A prediction is a claim until you check it against measurement. The point of this build was to make that check, openly, tract by tract.
A validation deployment took the predicted risk score and held it against real measured childhood blood-lead in three states. In Michigan, predicted risk against measured elevated blood-lead across about 2,156 tracts gave a Spearman rank correlation of 0.66. In Ohio, about 2,534 tracts gave 0.65. The Michigan and Ohio measured values came out of the recovered Supplement B, the same blood tests EPA had validated against, now used a second time to validate this independent build. Wisconsin was the one that ran end to end with no human in the loop on the data. A sub-agent hit Wisconsin's open ArcGIS REST endpoint for childhood lead surveillance by census tract, pulled 208 metro-Milwaukee tracts live, and scored the rank correlation at 0.70. No FOIA, no email, no waiting. An open API answered and the validation wrote itself.
Rank correlation was the right tool because the question is whether the map orders neighborhoods correctly, worst to best, not whether it nails an exact microgram value it was never built to predict. Spearman compares the two rankings directly. The three results, 0.66, 0.65, and 0.70, sit right inside the strength EPA reported for its own validated indices, where Cohen's kappa ran 0.49 to 0.63. The independent map tracks reality about as well as the peer-reviewed federal one it is built from. Each result is published side by side, predicted next to measured, on its own page, so anyone can see the agreement and the scatter for themselves.
Mapping who has the data and who hides it
One more deployment was sent to answer a question that turned out to matter as much as the map itself. If measured blood-lead is the gold standard, why predict at all? Because measured blood-lead is state-held and wildly uneven, and a survey deployment went state by state across all fifty plus DC to prove exactly how uneven.
The finding is stark. A handful of states publish clean machine-readable blood-lead at neighborhood grain. New York serves it at zip level on Socrata. Connecticut serves it by town. Wisconsin serves it by tract on ArcGIS. Roughly forty-five states report county-level numbers through the CDC Tracking Network API on the 3.5 microgram measure. And then a long tail of states locks it inside Tableau dashboards and PDF reports, New Hampshire and Colorado and Connecticut's deeper tables among them, reachable only by FOIA if at all. Five states ship no public blood-lead product at all.
That is the whole argument for the prediction map in one sentence. The places with the least public data are not the places with the least lead. A predicted map built from Census data, which exists uniformly for every tract in the country, brings the same neighborhood-level warning to the states that publish nothing as to the states that publish everything. The survey did not just catalogue access. It justified the project.
TECHNICAL BREAKOUT
The programmatic strategies, concretely, for anyone who wants to reproduce or audit the build.
Census ACS API pulls. Source is the ACS 5-year, vintage 2022, with a free Census API key. Housing age is table B25034 on the /data/2022/acs/acs5 base path, variable B25034_001E for total occupied units through B25034_011E for units built 1939 or earlier, with the oldest buckets weighted hardest. Poverty is table S1701 on the separate /data/2022/acs/acs5/subject base path, S1701_C03_001E for percent below the poverty line. The two tables live on different base paths and cannot be combined in one request, so the puller makes two calls per state and joins on GEOID. Tracts cannot be pulled nationally in a single call, so the loop iterates the state FIPS list, roughly fifty-one iterations, and caches each response to disk so the network work happens once. GEOID is assembled as state plus county plus tract, eleven digits, the key everything else joins on.
TIGERweb GeoJSON geometry. Attribute data is meaningless without shapes to paint. The build joins scored tracts to Census tract polygons by GEOID, using the lightweight cartographic-boundary geometry rather than full-resolution TIGER/Line, because at roughly 85,000 polygons the full geometry is far heavier than a browser map needs and the simplified boundaries render cleanly at choropleth scale. Output is GeoJSON keyed by GEOID, one feature per tract, attributes carrying the score and percentile.
The geoIdentity planar-projection fix for d3 winding. This one cost real time and is worth flagging for anyone who maps Census GeoJSON in d3. Census polygons do not always follow the winding order d3's spherical geometry expects. When d3 treats the coordinates as points on a sphere and a polygon's ring winds the "wrong" way, d3 reads the inside as the outside and fills the entire rest of the globe instead of the small tract, so a single bad ring paints the whole map a solid block. The fix is to stop asking d3 to reason about the sphere at all for already-projected or planar coordinates. Render through d3.geoIdentity, which treats coordinates as flat plane values and skips the spherical winding rules entirely, optionally with reflectY to put the origin where the data expects it. Planar identity projection, no winding ambiguity, tracts fill as tracts.
Socrata and CDC Tracking Network APIs. For the measured-blood-lead side and the access survey, two API families did the work. Socrata serves New York's zip-level and Connecticut's town-level blood-lead as plain JSON resource endpoints, queryable directly. The CDC Tracking Network API exposes annual blood-lead at county level for roughly forty-five states on the 3.5 microgram per deciliter measure through its core-holder gateway. Wisconsin's tract-level surveillance came off an ArcGIS REST endpoint returning features by census tract, which is what made the Wisconsin validation fully automatic, query the endpoint, get tracts back, score the correlation, done.
Rank-correlation validation. Validation is a Spearman rank correlation between predicted risk score and measured elevated blood-lead, computed per tract within each state. Spearman ranks both variables and correlates the ranks, which is the honest test for a screening tool whose job is to order neighborhoods correctly rather than to predict an exact blood value. Michigan returned 0.66 over about 2,156 tracts, Ohio 0.65 over about 2,534, Wisconsin 0.70 over 208 metro-Milwaukee tracts, all sitting inside EPA's own validated kappa band of 0.49 to 0.63.
The crash-resilient frame renderer. Rendering tens of thousands of tract features, plus any per-frame or per-state image output the map and its tooling needed, is long-running work that should never lose everything to one bad input. The renderer is built to survive its own failures. It checkpoints completed frames to disk as it goes, wraps each unit of rendering so a single malformed geometry or a one-off failure is caught, logged, and skipped rather than allowed to kill the run, and resumes from the last good checkpoint instead of restarting from zero. A crash costs one frame, not the night.
Why it reads as a small effort doing a large thing
Nothing here required inventing a new science. The risk index is settled and credited to the people who built it. What was new was the assembly: recovering public data that had been taken down, scoring every tract in the country from the primary Census source, validating that score against real children's blood in three states with one of them running with no human touching the data, mapping exactly which states hide their numbers, and shipping the whole thing free and public with the test that confirms a hazard on the spot sitting at the end of it.
It worked because the work was cut into pieces small enough to finish and checked at every seam. Agents recovered the sources. Sub-agents pulled and scored and validated and surveyed. Group syntheses pulled the pieces back together between layers so nothing drifted. A person held the direction and made the calls. That is the account. A small organization, one intense session, layered teams, and a public record put back where it belonged, with the receipts to show it is right.
There is no known level of lead exposure to be without risk. The map predicts where that risk concentrates. It does not diagnose a child and it does not replace a blood test. It is the lowest-cost first-pass screen in lead prevention, built in the open, and now anyone can check our work.
Supplemental B. Technical breakout: the programmatic strategy
This supplement documents the programmatic methods behind the map, the validation, and the publishing system. It is written for a reviewer who wants to re-run the work or audit it. Nothing here requires proprietary data. Every input is a public government API or a published file, and every output is a page you can open.
B.1 How the work was organized
The build ran as a small organization of cooperating agents rather than one long script. A coordinating layer held the plan and the source-of-truth facts. Below it, specialized teams ran in parallel: a data team that pulled and joined state blood-lead measurements, a cartography team that solved the map projection and rendering, a writing team that drafted and fact-checked the paper sections, an offer team that structured the government pricing, and a render team that produced and recolored the video. Teams that did not depend on each other ran at the same time, so the wall-clock time of the program was set by the slowest single chain, not by the sum of the work. Where a task could be checked, a separate reviewer agent checked it, so that drafting and verification were never done by the same pass.
B.2 Data acquisition: getting real blood-lead numbers without a FOIA
The prediction is built from the U.S. Census American Community Survey: housing age (table B25034, weighting the pre-1940 and pre-1950 shares) and poverty (table S1701). Those are clean, keyed, and national. The harder problem was the validation, because measured childhood blood-lead is held by state health departments and the federal tracking network, not handed out as a tidy file.
The CDC National Environmental Public Health Tracking network has the county-level numbers, but its public API gateway rejects automated server-side requests at the firewall. The path that works is the same one the agency's own Data Explorer uses in the browser. A request to the core data holder endpoint, issued from inside a real browser session so it carries the session context the firewall expects, returns the county table as JSON. The data team scripted a headless browser to make that call for each state, parsed the returned tableResult records (each carries a county FIPS, a county name, and a measured value), and wrote them to a single joined file. This pulled ten states cleanly and validated eight, with no records request and no waiting. For states that do not publish the metric at the county level on that network, the script records a clean "no data on this path" rather than guessing.
For New York the team used the state's own open-data portal, which exposes blood-lead by ZIP through a standard Socrata query interface, and joined on ZIP instead of county.
B.3 The cartography problem: why the first maps filled the globe
The first interactive maps rendered as solid colored squares that covered the whole frame. The cause is a quiet convention clash. The Census TIGERweb service returns polygon rings wound clockwise, the ArcGIS convention. The standard web mapping library, d3-geo, treats geometry as spherical and uses ring direction to decide which side of a ring is "inside." Clockwise rings tell it the inside is everything except the county, so it dutifully fills the rest of the planet. Rewinding the rings by hand did not hold up across mixed sources.
The fix was to stop treating these planar, pre-projected shapes as spherical. The cartography team switched the projection to d3.geoIdentity().reflectY(true), which draws the coordinates as flat geometry and flips the vertical axis to match screen space, then fit it to the viewport. For sources that are already RFC-compliant, such as the published us-atlas topology, a normal Mercator projection fit to the same extent works directly. With that distinction settled, every map, national and per-state, renders correctly from the same template.
B.4 The validation pipeline: one repeatable path from API to published page
The eight state validations are not eight hand-built analyses. They are one pipeline run eight times. For a given state the pipeline hits the open data source, joins the measured values to our predicted risk on the shared geography (county FIPS, tract, or ZIP), computes a Spearman rank correlation between predicted and measured, and renders a side-by-side choropleth page where a reader can see the two maps next to each other and read the correlation. The rank correlation is the right test here because the claim is ordering, that the map puts the higher-risk places above the lower-risk ones, not that it reproduces an exact microgram value. Across the eight states the agreement ranges from 0.48 to 0.77, centered near 0.6, which is the band the federal EPA study reported for the same kind of comparison. Because the path is scripted, adding a ninth state is a data pull, not a rebuild.
B.5 Publishing: idempotent builders that can be re-run safely
Every page on the site is produced by a builder script, not edited by hand. Each builder is idempotent: it looks for the page by handle, updates it in place if it exists, and creates it if it does not. This matters because the storefront API intermittently reports a creation as failed when it actually succeeded. The builders catch that specific case, re-fetch by handle, and update, so a re-run converges to the right state instead of producing duplicates or stopping. Each builder also ensures the clean root URL works by registering the redirect from the short path to the underlying page. The result is that the whole site can be regenerated from the source files and data at any time, which is the same property that makes the analysis auditable.
B.6 The video: deterministic frames and a recolor that cannot miss a spot
The explainer video is rendered, not animated by hand. The scene is an HTML document that exposes a single function which, given a time in seconds, draws exactly the frame for that moment. A headless browser steps through the timeline calling that function and capturing each frame, so the render is fully deterministic: the same timeline always produces the same frames. The renderer is crash-resilient, relaunching the browser if it dies mid-run, and because a relaunch can leave a one-frame gap that would otherwise make the encoder stop early, a final check fills any missing frame before encoding so the full duration always survives to the finished file.
Recoloring the whole video to the jitter palette was done as a deterministic transform, not a re-creation. A single map of source colors to jitter colors is applied across the scene document, including the warm choropleth ramp remapped to the mint ramp and the alert red preserved, with the glow effects and the logo swapped to their dark-background versions. Because the transform is a fixed table applied programmatically, there are no missed elements and the recolor is reproducible from the original.
B.7 Reproducibility and provenance
The standard a skeptic should hold this to is simple: can an independent party rebuild it. The prediction comes from two named Census tables. The validation comes from named state and federal data sources reached through documented public endpoints. The correlations are a standard rank statistic anyone can recompute from the joined files. The maps, the paper, the pricing pages, and the video are each generated by a script from those inputs. There is no private dataset in the chain that a reviewer would have to take on faith. That is the point of publishing the method alongside the map: the map is only worth deploying behind a test if the map itself can be checked, and it can.