urlscan.io Blog


Introducing Data Dumps: Bulk Download of urlscan Scan Data

We are excited to announce the launch of Data Dumps, a new feature that allows customers to bulk-download scan data from urlscan.io without making individual API calls for each result.

Data Dumps provide pre-built, gzip-compressed JSONL files and tar archives containing the results of all public and unlisted scans (private scans are not included). Files are organised by time window and data type, and are available at per-minute, per-hour, and per-day granularity — so you can pick exactly the slice of data you need.

Why Data Dumps?

Until now, customers who needed large volumes of scan result data had to call the Result API individually for every scan UUID — often resulting in millions of API calls per day. Data Dumps change this fundamentally:

  • Drastically reduced quota usage — download an entire day’s worth of scan results in a single request instead of hundreds of thousands.
  • Higher throughput — retrieve large datasets as pre-built, compressed files over high-bandwidth connections rather than through rate-limited API endpoints.
  • Simpler pipelines — integrate scan data into your own data warehouse or processing pipeline by periodically fetching a single file per time window.
  • Multiple data types — dumps are available for API results, search results, screenshots, and DOM snapshots.

What’s included?

The available data types are:

Type Format Description
api .gz (JSONL) Full scan result (equivalent to the Result API)
search .gz (JSONL) Search API result metadata
screenshot .tar.gz Screenshot images
dom .tar.gz DOM snapshots

Only public and unlisted scans are included in data dumps. Private scans are never exported. Dumps are available for a rolling 7-day window, so you can backfill up to 7 days of data at any time.

Availability

Data Dumps are available today for customers on the Ultimate and Enterprise plans. You can browse and download dump files directly from the urlscan Pro Data Dumps page, which also shows the available time windows, file sizes, and timestamps. The page also provides the corresponding API URL for each file so you can integrate downloads into your own tooling.

Using Data Dumps with urlscan-cli

Data Dump support is included in urlscan-cli v0.0.5 and later. The urlscan pro datadump command provides two sub-commands: list and download.

The path format is <time-window>/<file-type>/<date>, where:

  • time-window is days, hours, or minutes
  • file-type is api, search, screenshot, or dom
  • date is an optional date in YYYYMMDD format

List available dump files:

# List all available daily API dumps
urlscan pro datadump list days/api

# List hourly API dumps for a specific day
urlscan pro datadump list hours/api/20260301

Download a specific file:

# Download a daily API dump
urlscan pro datadump download days/api/20260301.gz

# Download a specific hourly dump
urlscan pro datadump download hours/api/20260301/20260301-14.gz --extract

Use --follow to continuously sync all available files (up to the last 7 days). The --follow flag memoises which files have already been downloaded, so it is safe to run periodically as a cron job — only new files will be fetched:

# Download all available hourly API dumps (last 7 days)
urlscan pro datadump download hours/api/ --follow

# Download all hourly DOM dumps for a specific day
urlscan pro datadump download hours/dom/20260301/ --follow

Files can be saved to a specific directory with --directory-prefix / -P and automatically extracted with --extract / -x.

Full CLI reference: urlscan pro datadump

Using Data Dumps with urlscan-python

Data Dump support is included in urlscan-python v0.0.2 and later via the Pro client’s datadump attribute.

import os
from urlscan import Pro
from urlscan.utils import extract

with Pro("<your_api_key>") as pro:
    # List hourly API dump files for a specific day
    res = pro.datadump.get_list("hours/api/20260301/")

    # Download and extract each file
    for f in res["files"]:
        path = f["path"]
        basename = os.path.basename(path)

        with open(basename, "wb") as file:
            pro.datadump.download_file(path, file=file)

        extract(basename, "/tmp")

The get_list method accepts the same <time-window>/<file-type>/<date> path format as the CLI. The extract utility from urlscan.utils handles decompression of the downloaded .gz files.

Getting Started

Log in to urlscan Pro to explore the available dumps and get your API key. If you have any questions or feedback, please reach out to support@urlscan.io.

More on urlscan Pro

If you want to learn about the urlscan Pro platform and how it might be valuable for your organization feel free to reach out to us! We offer free trials with no strings attached. We would be happy to give you a passionate demo of what our platform can do for you. Reach out to us at sales@urlscan.io.