AirData 5-Year Emission Trend — Methodology
Version 1.0 — Last updated: May 25, 2026
The AirData 5-Year Emission Trend is a county-level change classification published on /report/{zip}/ pages and as the open dataset us-county-airdata-trends (npm, PyPI, Zenodo DOI 10.5281/zenodo.20382474).
This page documents the data source, the gates that decide whether a county's trend is displayed, and the categories of claim the trend can and cannot support.
Source
EPA AirData annual AQI summaries (annual_aqi_by_county_{YEAR}.zip), county rollup, years 2020-2024. Retrieved from the EPA Air Quality System Data Mart: aqs.epa.gov/aqsweb/airdata/download_files.html.
AirData is a facility-reported inventory. Counties roll up reported emissions from EPA-regulated facilities in their boundaries. AirData is not an ambient air quality measurement — it does not represent what residents breathe at any specific address. Any user-facing render or downstream use of this dataset must repeat that distinction.
Output schema
Each county FIPS code maps to a single record:
| Field | Description |
|---|---|
county_fips |
5-digit US county FIPS code (state + county subcode) |
county_name, state |
County name without "County" suffix, 2-letter state abbreviation |
airdata_change_class |
One of decrease, increase, stable, insufficient_data |
airdata_pct_change |
Signed percent change between earliest and latest cycle in window |
cycles_used |
Number of reporting cycles included (target 5; minimum gate 3) |
facility_count |
Distinct reporting facilities in latest cycle (display gate ≥5) |
sensitivity_robust |
True if direction stable when top-1 facility excluded |
skip_reason |
facility_count_below_threshold, petrochemical_corridor, cycles_below_threshold, or null |
source_attribution |
Required attribution string for any public render |
petrochemical_corridor |
True for 4 corridor counties on the methodology-review list |
aqi_latest_year |
Latest reporting year included in window |
Sensitivity gates
A county's airdata_change_class is displayed only when all three of the following are true:
1. Cycle coverage ≥3. Slope between two annual snapshots is not a trend. Three or more reporting cycles in the 2020-2024 window are required before any non-insufficient_data classification is assigned.
2. Facility floor ≥5. Rural counties with one to three reporting facilities exhibit single-facility artifacts (a refinery closure can produce a -100% county "trend"). Counties with fewer than 5 reporting facilities in the latest cycle surface as skip_reason="facility_count_below_threshold" with the trend held.
3. Top-1-exclude robustness. A direction is "sensitivity-robust" when it does not flip if the largest facility is excluded from the county rollup. Direction-flipping under top-1 exclusion means one facility dominates the county aggregate; we hold display rather than represent a county-level pattern.
Counties classified stable use a threshold |pct_change| < 10% for noise tolerance. The signed airdata_pct_change value is preserved in the record even when the class is insufficient_data — so downstream users can pick their own threshold — but only the class is rendered.
Petrochemical-corridor display hold
Four counties surface as skip_reason="petrochemical_corridor" and render a methodology-review notice instead of a class value:
| FIPS | County | State |
|---|---|---|
| 48201 | Harris | TX |
| 22019 | Calcasieu | LA |
| 22047 | Iberville | LA |
| 54039 | Kanawha | WV |
These counties have the highest concentration of EPA-regulated petrochemical and chemical-corridor facilities in the country and a history of facility-named litigation. The default county-rollup trend reading risks both misrepresenting facility-specific dynamics and creating unwarranted defamation exposure for named operators. The trend is held pending per-county methodology guidance. The data is preserved in the dataset with the reason code so consumers studying these counties have full source access.
The corridor list is recorded in data/canonical/nei-petrochemical-skip-counties.json and reviewed annually or upon new facility-named litigation precedent. The list is intentionally a configuration file, not a hardcoded set in code — additions or removals are auditable through git history.
Coverage (snapshot)
The 2026-05-25 release covers 994 US counties with EPA AirData rollups:
| Class | Counties |
|---|---|
decrease |
67 |
increase |
176 |
stable |
376 |
insufficient_data (display held) |
375 |
| Total | 994 |
Of which display-held: 368 below the facility floor, 4 petrochemical-corridor. Counties not in the file have no AirData rollup for the window in EPA's published summaries.
What the trend supports and does not support
Supported claim categories (federal source first; verbatim attribution; no causal language):
- A direct restatement of the change classification with full attribution, including facility count and cycle window: "EPA AirData reports a 26% decrease in county-rolled facility emissions for Orange County, NY between 2020 and 2024 (based on 7 reporting facilities)."
- An explicit display-hold statement when a county does not meet the sensitivity gates: "EPA AirData has fewer than 5 reporting facilities in Pittsylvania County, VA — trend display held."
- Side-by-side display of a county's AirData trend with other independently-attributed federal data (CDC PLACES modeled health estimate, FHWA NBI bridge condition, etc.), provided the blocks are physically separated and no arithmetic combination is performed between them.
Unsupported claim categories (filtered by scripts/lib/compliance-rules.js BANNED_PHRASES and BANNED_PATTERNS):
- Qualitative ambient-air claims. AirData is facility-reported emissions, not ambient air quality. Any rendering that asserts ambient improvement or worsening on the basis of this dataset is filtered.
- Numeric ambient-air overclaim. "Emissions dropped N%" framed as an ambient-air change is filtered for the same reason.
- Synthetic indices pairing AirData with health outcomes. Any derived metric or named index joining this dataset to BRFSS/CDC small-area health estimates is filtered. The 4-agent pre-build review concluded such derived joints fall under the FTC v. NCBI (2024) precedent and are not protected by Section 230 when authored by ZipCheckup.
- Per-facility predictions. Predictive language about specific facilities (closure timeline, lifespan, etc.) is filtered by
BANNED_PATTERNSregex per the project's PE-licensure and per-asset defamation gates.
Update cadence
EPA AirData publishes annual AQI summaries on a delayed cycle — typically the prior calendar year's file is released in spring of the following year. The dataset is rebuilt and republished (npm + PyPI + Zenodo new version DOI) when a new annual file lands. The concept DOI 10.5281/zenodo.20382474 is stable across versions; the version DOI changes per release.
Citation
Akulov, A. (2026). U.S. County AirData 5-Year Emission Trends [Data set].
ZipCheckup. https://doi.org/10.5281/zenodo.20382474
Pre-build review
This dataset's display methodology underwent a 4-agent pre-build review before publication: FTC compliance (FTC v. NCBI net-impression doctrine), defamation/stigmatization (per-asset defamation exposure for facility-named counties), statistical methodology (sensitivity gates, FDR considerations on multi-county comparisons), and feeds feasibility (cycle availability across the 2020-2024 window). The review pre-empted publication of two related features that were withdrawn rather than deferred: a derived emission-and-health joint index (FTC and Section 230 exposure for ZipCheckup-authored synthetic indices) and a per-bridge years-to-failure countdown (PE-licensure exposure and NBI ordinal-scale extrapolation invalidity). Both are documented as "SKIP-PERMANENT" in the project methodology to signal that the abstention is principled, not pending future work.
Source code
- Build script:
scripts/build-county-airdata-trends-package.js - Sensitivity gates:
scripts/build-nei-emissions.js - Validator (report-only):
scripts/validate-nei-trend-stability.js - Petrochemical corridor config:
data/canonical/nei-petrochemical-skip-counties.json - Voice guidelines for federal-data trend rendering: published in the project documentation.
License
CC BY 4.0 — free to use with attribution to ZipCheckup.