If your application depends on HTS tariff data, you need to know when it changes. Rates change. Codes are added, split, and removed. Chapter 99 — where Section 232 and Section 301 additional duties live — is updated whenever trade policy moves. In the current environment, that is frequently.
This guide covers the practical approaches to building HTS change detection: what data is available, how to diff it reliably, what the failure modes are, and what a production-ready implementation looks like.
Why There Is No Simple Answer
The first thing engineers usually look for is a changelog or event feed from USITC. Something like: "here are the codes that changed since last Tuesday." That does not exist. USITC publishes full schedule updates — the entire dataset — and it is your responsibility to determine what changed between the previous version and the new one.
This means change detection requires you to maintain state: a stored copy of the last known good dataset that you diff against each new fetch. Without that stored baseline, you have no reference point.
Step 1: Fetch the Current Dataset
USITC exposes the HTS schedule through their API. A basic fetch of a single chapter looks like this:
import urllib.request
import json
def fetch_chapter(chapter: int) -> list:
url = f"https://hts.usitc.gov/reststop/api/details/sectionChapter/{chapter:02d}"
req = urllib.request.Request(url)
req.add_header("User-Agent", "curl/7.88.1") # required — Python UA is blocked
with urllib.request.urlopen(req, timeout=30) as r:
return json.loads(r.read())["HTSData"]
Two things to note. First, the User-Agent header is not optional — Cloudflare blocks Python's default user agent with error 1010. Second, the response structure is {"HTSData": [...]}, not a bare array. Failing to account for this is a common first-pass mistake.
To fetch the full schedule you loop over all 99 chapters. Some chapters are reserved and return empty or minimal data — that is expected, not an error.
def fetch_full_schedule() -> list:
all_records = []
for chapter in range(1, 100):
try:
records = fetch_chapter(chapter)
all_records.extend(records)
except Exception as e:
print(f"Chapter {chapter:02d} failed: {e}")
return all_records
Safety check — do not skip this: A partial USITC response can look like a valid response. If the fetch aborts mid-schedule, you may get 15,000 records instead of 32,000 and have no obvious indication something went wrong. Always validate record count before doing anything with the data. If the result is below 30,000 records, treat it as a failed fetch and abort.
MIN_RECORDS = 30000
def fetch_validated_schedule() -> list:
records = fetch_full_schedule()
if len(records) < MIN_RECORDS:
raise ValueError(f"Fetch returned only {len(records)} records — aborting")
return records
Step 2: Build a Comparable Representation
Raw USITC records contain fields that change on every fetch regardless of whether the tariff data changed — things like internal timestamps or formatting artifacts. Before diffing, normalize each record down to only the fields that are meaningful for change detection.
def normalize_record(r: dict) -> dict:
return {
"htsno": r.get("htsno", "").strip(),
"description": r.get("description", "").strip(),
"general": r.get("general", "").strip(), # MFN rate
"special": r.get("special", "").strip(), # special program rates
"other": r.get("other", "").strip(), # Column 2 rate
"units": r.get("units", "").strip(),
"indent": str(r.get("indent", "")),
}
def build_index(records: list) -> dict:
# keyed by htsno, skipping records with no code
return {
r["htsno"]: normalize_record(r)
for r in records
if r.get("htsno", "").strip()
}
Step 3: Diff the Two Datasets
With two indexed snapshots — the stored baseline and the new fetch — the diff is straightforward:
from datetime import datetime, timezone
def diff_schedules(old: dict, new: dict) -> dict:
added = []
removed = []
modified = []
all_codes = set(old) | set(new)
for code in all_codes:
if code not in old:
added.append({"htsno": code, "new": new[code]})
elif code not in new:
removed.append({"htsno": code, "old": old[code]})
elif old[code] != new[code]:
modified.append({
"htsno": code,
"old": old[code],
"new": new[code],
"fields_changed": [
k for k in old[code]
if old[code][k] != new[code].get(k)
]
})
return {
"timestamp": datetime.now(timezone.utc).isoformat(),
"added": added,
"removed": removed,
"modified": modified,
"total_changes": len(added) + len(removed) + len(modified)
}
The fields_changed list is worth including — it tells you immediately whether a rate field moved versus a description change, which have different downstream implications for most applications.
Step 4: Persist the Baseline
The change log and the live dataset need to be updated atomically. Write to a temp file first, then rename — a rename on Linux is atomic at the filesystem level, which means a crash mid-write never leaves you with a corrupt live file.
import os
import json
LIVE_FILE = "/data/hts_full.json"
CHANGE_LOG = "/data/hts_changes.json"
def apply_update(new_records: list, changes: dict):
# Write new dataset atomically
tmp = LIVE_FILE + ".tmp"
with open(tmp, "w") as f:
json.dump(new_records, f)
os.rename(tmp, LIVE_FILE)
# Append to change log only if something changed
if changes["total_changes"] > 0:
log = []
if os.path.exists(CHANGE_LOG):
with open(CHANGE_LOG) as f:
log = json.load(f)
log.append(changes)
with open(CHANGE_LOG, "w") as f:
json.dump(log, f, indent=2)
Step 5: Schedule It
The pipeline should run nightly. USITC typically publishes updates during business hours US Eastern, so a 02:00 UTC run catches same-day changes by the following morning. On Linux with systemd:
# /etc/systemd/system/hts-update.timer
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
Set Persistent=true so that if the server was down at 02:00, the job runs immediately on next boot rather than skipping until the following night.
The Failure Modes to Plan For
A pipeline that runs in production long enough will encounter all of these:
- USITC downtime. The API goes down occasionally. Your pipeline needs a retry with backoff and should not overwrite the live dataset on a failed fetch.
- Schema changes. USITC has changed field names across schedule versions. The normalization step above helps, but you need alerting when a fetch returns records where expected fields are missing across more than a handful of codes.
- Chapter 99 structural changes. Chapter 99 entries reference other chapters rather than describing goods directly. Changes here require different interpretation than changes in product chapters — your diff output should flag Chapter 99 changes separately.
- False positives from whitespace or encoding. USITC descriptions occasionally have trailing spaces, encoding artifacts, or punctuation inconsistencies that vary between fetches without representing real tariff changes. The
.strip()calls in the normalization step catch most of these, but watch your change log for spurious high-volume description changes on the first few runs.
What This Looks Like in Production
A mature implementation of this pipeline adds a few more layers on top of the core diff: a dead man's switch that alerts if the nightly job has not run in 25 hours, a separate validation pass on Chapter 99 specifically, logging of each run with record counts and timing, and — for applications that need to notify downstream systems — a webhook delivery mechanism that fires when total_changes is greater than zero.
The webhook layer is where the architecture gets more involved. You need a delivery queue, retry logic for failed deliveries, and per-customer endpoint management. It is buildable, but it is not a small addition to the pipeline described above.
How long does this take to build? The core pipeline described here — fetch, validate, normalize, diff, persist — is a focused weekend project for an experienced engineer. Getting it to production reliability, with proper error handling, alerting, schema resilience, and Chapter 99 coverage, is closer to two weeks. Adding webhook delivery adds another week. None of this is the hard part of building trade compliance software, but it is time you are spending on infrastructure rather than your product.
TradeFacts.io runs this pipeline for you. The /changes endpoint returns the full diff history — added codes, removed codes, modified codes with field-level detail, all timestamped. Tier 2 adds webhook delivery to your endpoint when changes are detected, so you don't poll at all. 30-day free trial, no credit card required.
Skip the pipeline. Use the API.
30-day free trial, no credit card required. Change detection, full diff history, and webhook delivery included.
Request API Access