AirData 5-Year Emission Trend — Methodology

Version 1.0 — Last updated: May 25, 2026

The AirData 5-Year Emission Trend is a county-level change classification published on /report/{zip}/ pages and as the open dataset us-county-airdata-trends (npm, PyPI, Zenodo DOI 10.5281/zenodo.20382474).

This page documents the data source, the gates that decide whether a county's trend is displayed, and the categories of claim the trend can and cannot support.

Source

EPA AirData annual AQI summaries (annual_aqi_by_county_{YEAR}.zip), county rollup, years 2020-2024. Retrieved from the EPA Air Quality System Data Mart: aqs.epa.gov/aqsweb/airdata/download_files.html.

AirData is a facility-reported inventory. Counties roll up reported emissions from EPA-regulated facilities in their boundaries. AirData is not an ambient air quality measurement — it does not represent what residents breathe at any specific address. Any user-facing render or downstream use of this dataset must repeat that distinction.

Output schema

Each county FIPS code maps to a single record:

Field Description
county_fips 5-digit US county FIPS code (state + county subcode)
county_name, state County name without "County" suffix, 2-letter state abbreviation
airdata_change_class One of decrease, increase, stable, insufficient_data
airdata_pct_change Signed percent change between earliest and latest cycle in window
cycles_used Number of reporting cycles included (target 5; minimum gate 3)
facility_count Distinct reporting facilities in latest cycle (display gate ≥5)
sensitivity_robust True if direction stable when top-1 facility excluded
skip_reason facility_count_below_threshold, petrochemical_corridor, cycles_below_threshold, or null
source_attribution Required attribution string for any public render
petrochemical_corridor True for 4 corridor counties on the methodology-review list
aqi_latest_year Latest reporting year included in window

Sensitivity gates

A county's airdata_change_class is displayed only when all three of the following are true:

1. Cycle coverage ≥3. Slope between two annual snapshots is not a trend. Three or more reporting cycles in the 2020-2024 window are required before any non-insufficient_data classification is assigned.

2. Facility floor ≥5. Rural counties with one to three reporting facilities exhibit single-facility artifacts (a refinery closure can produce a -100% county "trend"). Counties with fewer than 5 reporting facilities in the latest cycle surface as skip_reason="facility_count_below_threshold" with the trend held.

3. Top-1-exclude robustness. A direction is "sensitivity-robust" when it does not flip if the largest facility is excluded from the county rollup. Direction-flipping under top-1 exclusion means one facility dominates the county aggregate; we hold display rather than represent a county-level pattern.

Counties classified stable use a threshold |pct_change| < 10% for noise tolerance. The signed airdata_pct_change value is preserved in the record even when the class is insufficient_data — so downstream users can pick their own threshold — but only the class is rendered.

Petrochemical-corridor display hold

Four counties surface as skip_reason="petrochemical_corridor" and render a methodology-review notice instead of a class value:

FIPS County State
48201 Harris TX
22019 Calcasieu LA
22047 Iberville LA
54039 Kanawha WV

These counties have the highest concentration of EPA-regulated petrochemical and chemical-corridor facilities in the country and a history of facility-named litigation. The default county-rollup trend reading risks both misrepresenting facility-specific dynamics and creating unwarranted defamation exposure for named operators. The trend is held pending per-county methodology guidance. The data is preserved in the dataset with the reason code so consumers studying these counties have full source access.

The corridor list is recorded in data/canonical/nei-petrochemical-skip-counties.json and reviewed annually or upon new facility-named litigation precedent. The list is intentionally a configuration file, not a hardcoded set in code — additions or removals are auditable through git history.

Coverage (snapshot)

The 2026-05-25 release covers 994 US counties with EPA AirData rollups:

Class Counties
decrease 67
increase 176
stable 376
insufficient_data (display held) 375
Total 994

Of which display-held: 368 below the facility floor, 4 petrochemical-corridor. Counties not in the file have no AirData rollup for the window in EPA's published summaries.

What the trend supports and does not support

Supported claim categories (federal source first; verbatim attribution; no causal language):

  • A direct restatement of the change classification with full attribution, including facility count and cycle window: "EPA AirData reports a 26% decrease in county-rolled facility emissions for Orange County, NY between 2020 and 2024 (based on 7 reporting facilities)."
  • An explicit display-hold statement when a county does not meet the sensitivity gates: "EPA AirData has fewer than 5 reporting facilities in Pittsylvania County, VA — trend display held."
  • Side-by-side display of a county's AirData trend with other independently-attributed federal data (CDC PLACES modeled health estimate, FHWA NBI bridge condition, etc.), provided the blocks are physically separated and no arithmetic combination is performed between them.

Unsupported claim categories (filtered by scripts/lib/compliance-rules.js BANNED_PHRASES and BANNED_PATTERNS):

  • Qualitative ambient-air claims. AirData is facility-reported emissions, not ambient air quality. Any rendering that asserts ambient improvement or worsening on the basis of this dataset is filtered.
  • Numeric ambient-air overclaim. "Emissions dropped N%" framed as an ambient-air change is filtered for the same reason.
  • Synthetic indices pairing AirData with health outcomes. Any derived metric or named index joining this dataset to BRFSS/CDC small-area health estimates is filtered. The 4-agent pre-build review concluded such derived joints fall under the FTC v. NCBI (2024) precedent and are not protected by Section 230 when authored by ZipCheckup.
  • Per-facility predictions. Predictive language about specific facilities (closure timeline, lifespan, etc.) is filtered by BANNED_PATTERNS regex per the project's PE-licensure and per-asset defamation gates.

Update cadence

EPA AirData publishes annual AQI summaries on a delayed cycle — typically the prior calendar year's file is released in spring of the following year. The dataset is rebuilt and republished (npm + PyPI + Zenodo new version DOI) when a new annual file lands. The concept DOI 10.5281/zenodo.20382474 is stable across versions; the version DOI changes per release.

Citation

Akulov, A. (2026). U.S. County AirData 5-Year Emission Trends [Data set].
ZipCheckup. https://doi.org/10.5281/zenodo.20382474

Pre-build review

This dataset's display methodology underwent a 4-agent pre-build review before publication: FTC compliance (FTC v. NCBI net-impression doctrine), defamation/stigmatization (per-asset defamation exposure for facility-named counties), statistical methodology (sensitivity gates, FDR considerations on multi-county comparisons), and feeds feasibility (cycle availability across the 2020-2024 window). The review pre-empted publication of two related features that were withdrawn rather than deferred: a derived emission-and-health joint index (FTC and Section 230 exposure for ZipCheckup-authored synthetic indices) and a per-bridge years-to-failure countdown (PE-licensure exposure and NBI ordinal-scale extrapolation invalidity). Both are documented as "SKIP-PERMANENT" in the project methodology to signal that the abstention is principled, not pending future work.

Source code

License

CC BY 4.0 — free to use with attribution to ZipCheckup.