Introducing Data Dumps: Bulk Download of urlscan Scan Data
We are excited to announce the launch of Data Dumps, a new feature that allows customers to bulk-download scan data from urlscan.io without making individual API calls for each result.
Data Dumps provide pre-built, gzip-compressed JSONL files and tar archives containing the results of all public and unlisted scans (private scans are not included). Files are organised by time window and data type, and are available at per-minute, per-hour, and per-day granularity — so you can pick exactly the slice of data you need.
Why Data Dumps?
Until now, customers who needed large volumes of scan result data had to call the Result API individually for every scan UUID — often resulting in millions of API calls per day. Data Dumps change this fundamentally:
- Drastically reduced quota usage — download an entire day’s worth of scan results in a single request instead of hundreds of thousands.
- Higher throughput — retrieve large datasets as pre-built, compressed files over high-bandwidth connections rather than through rate-limited API endpoints.
- Simpler pipelines — integrate scan data into your own data warehouse or processing pipeline by periodically fetching a single file per time window.
- Multiple data types — dumps are available for API results, search results, screenshots, and DOM snapshots.
What’s included?
The available data types are:
| Type | Format | Description |
|---|---|---|
api |
.gz (JSONL) |
Full scan result (equivalent to the Result API) |
search |
.gz (JSONL) |
Search API result metadata |
screenshot |
.tar.gz |
Screenshot images |
dom |
.tar.gz |
DOM snapshots |
Only public and unlisted scans are included in data dumps. Private scans are never exported. Dumps are available for a rolling 7-day window, so you can backfill up to 7 days of data at any time.
Availability
Data Dumps are available today for customers on the Ultimate and Enterprise plans. You can browse and download dump files directly from the urlscan Pro Data Dumps page, which also shows the available time windows, file sizes, and timestamps. The page also provides the corresponding API URL for each file so you can integrate downloads into your own tooling.
Using Data Dumps with urlscan-cli
Data Dump support is included in urlscan-cli v0.0.5 and later. The
urlscan pro datadump command provides two sub-commands: list and
download.
The path format is <time-window>/<file-type>/<date>, where:
time-windowisdays,hours, orminutesfile-typeisapi,search,screenshot, ordomdateis an optional date inYYYYMMDDformat
List available dump files:
# List all available daily API dumps
urlscan pro datadump list days/api
# List hourly API dumps for a specific day
urlscan pro datadump list hours/api/20260301
Download a specific file:
# Download a daily API dump
urlscan pro datadump download days/api/20260301.gz
# Download a specific hourly dump
urlscan pro datadump download hours/api/20260301/20260301-14.gz --extract
Use --follow to continuously sync all available files (up to the last 7
days). The --follow flag memoises which files have already been downloaded,
so it is safe to run periodically as a cron job — only new files will be
fetched:
# Download all available hourly API dumps (last 7 days)
urlscan pro datadump download hours/api/ --follow
# Download all hourly DOM dumps for a specific day
urlscan pro datadump download hours/dom/20260301/ --follow
Files can be saved to a specific directory with --directory-prefix / -P
and automatically extracted with --extract / -x.
Full CLI reference: urlscan pro datadump
Using Data Dumps with urlscan-python
Data Dump support is included in urlscan-python v0.0.2 and later via the
Pro client’s datadump attribute.
import os
from urlscan import Pro
from urlscan.utils import extract
with Pro("<your_api_key>") as pro:
# List hourly API dump files for a specific day
res = pro.datadump.get_list("hours/api/20260301/")
# Download and extract each file
for f in res["files"]:
path = f["path"]
basename = os.path.basename(path)
with open(basename, "wb") as file:
pro.datadump.download_file(path, file=file)
extract(basename, "/tmp")
The get_list method accepts the same <time-window>/<file-type>/<date> path
format as the CLI. The extract utility from urlscan.utils handles
decompression of the downloaded .gz files.
Getting Started
Log in to urlscan Pro to explore the available dumps and get your API key. If you have any questions or feedback, please reach out to support@urlscan.io.
More on urlscan Pro
If you want to learn about the urlscan Pro platform and how it might be valuable for your organization feel free to reach out to us! We offer free trials with no strings attached. We would be happy to give you a passionate demo of what our platform can do for you. Reach out to us at sales@urlscan.io.