Our API allows you to submit URLs for scanning and retrieve the results once the scan has finished. Furthermore, you can use the API for searching existings scans by attributes such as domains, IPs, Autonomous System (AS) numbers, hashes, etc. To use the API, you should create a user account, attach an API key and supply it when calling the API. Unauthenticated users only received minor quotas for API calls.

Scans on our platform have one of three visibility levels, make sure to use the appropriate level for your application:

PublicScan is visible on the frontpage and in the public search results and info pages.
UnlistedScan is not visibile on the public page or search results, but is visible to vetted security researchers and security companies in our urlscan Pro platform. Use this if you want to submit malicious websites but are concerned that they might contain PII or non-public information.
PrivateScan is only visible to you in your personalised search or if you share the scan ID with third parties. Scans will be deleted from our system after a certain retention period. Use this if you don't want anyone else to see the URLs you submitted.

To get started with our API, check out one of the existing tools and integrations for urlscan.io


API Use Best Practices

These are some general pieces of advice we have collected over the years. Please stick to them, our life will be a lot easier!

  • DO NOT attempt to mirror or scrape our data wholesale. Please work with us if you have specific requirements.
  • TAKE CARE to remove PII from URLs or submit these scans as Unlisted, e.g. when there is an email address in the URL.
  • ATTENTION Certain JSON properties in API responses might occasionally be missing, make sure you handle this gracefully.
  • Use your API-Key for all API requests (submit, search, retrieve), otherwise you're subject to quotas for unauthenticated users.
  • Any API endpoint not documented on this page is not guaranteed to be stable or even be available in the future.
  • Make sure to follow HTTP redirects (HTTP 301 and HTTP 302) sent by urlscan.io.
  • Use exponential backoffs and limit concurrency for all types of requests. Respect HTTP 429 response codes!
  • Existing scans can be deleted at any time, even right after they were found in the search API. Make sure to handle this case.
  • Use a work queue with backoffs and retries for API actions such as scans, results, and DOM / response downloads.
  • Build a way to deduplicate searches and URL submissions on your end.
  • Consider using out-of-band mechanisms to determine that the URL you want to submit will actually deliver content.
  • Consider searching for a domain / URL before submitting it again.
  • Search: Combine search-terms into one query and limit it by date if possible, e.g. if you query on an interval.
  • Integrations: Use a custom HTTP user-agent string for your library/integration. Include a software version if applicable.
  • Integrations: Expose HTTP status codes and error messages to your users.
  • Integrations: Expect keys to be added to any JSON response object at any point in time, handle gracefully.

Quotas & Rate Limiting

  • Some actions on urlscan.io are subject to quotas and rate-limits, regardless of performed in the UI or via the API.
  • There are separate limits per minute, per hour and per day for each action. Check your personal quotas for details.
  • Only successful requests count against your quota, i.e. requests which return an HTTP 200 status code.
  • We use a fixed-window approach to rate-limit requests, with resets at the full minute, hour and midnight UTC.
  • If you exceed a rate-limit for an action, the API will respond with a HTTP 429 error code for additional requests against that action.
  • You can query your current limits and used quota like this:
    curl -H "Content-Type: application/json" -H "API-Key: $apikey" "https://urlscan.io/user/quotas/" 

The API returns X-Rate-Limit HTTP headers on each request to a rate-limited resource. The values only apply to the action of that API request, i.e. if you exceeded your quota for private scans you might still have available quota to submit unlisted scans or perform a search request. The limit returned is always the next one to be exceed in absolute numbers, so if your per-hour quota still has 1000 requests remaining but your per-day quota only has 500 requests left, you will receive the per-day quota. Make sure to respect the rate-limit headers as returned by every request.

X-Rate-Limit-Scope: ip-address
X-Rate-Limit-Action: search
X-Rate-Limit-Window: minute
X-Rate-Limit-Limit: 30
X-Rate-Limit-Remaining: 24
X-Rate-Limit-Reset: 2020-05-18T20:19:00.000Z
X-Rate-Limit-Reset-After: 17

X-Rate-Limit-ScopeEither user (with cookie or API-Key header) or ip-address for unauthenticated requests.
X-Rate-Limit-ActionWhich API actions the rate-limit refers to, e.g. search or public.
X-Rate-Limit-WindowRate window with the least fewest remaining calls, either minute,hour, or day.
X-Rate-Limit-LimitYour rate-limit for this action and window.
X-Rate-Limit-RemainingRemaining calls for this action and window (not counting the current request).
X-Rate-Limit-ResetISO-8601 timestamp of when the rate-limit resets.
X-Rate-Limit-Reset-AfterSeconds remaining until the rate-limit resets.

Submission API

The submission API allows you to submit a URL to be scanned and set some options for the scan.

curl -X POST "https://urlscan.io/api/v1/scan/" \
      -H "Content-Type: application/json" \
      -H "API-Key: $apikey" \
      -d "{ \
        \"url\": \"$url\", \"visibility\": \"public\", \
        \"tags\": [\"demotag1\", \"demotag2\"] \
      }"
import requests
import json
headers = {'API-Key':'$apikey','Content-Type':'application/json'}
data = {"url": "https://urlyouwanttoscan.com/path/", "visibility": "public"}
response = requests.post('https://urlscan.io/api/v1/scan/',headers=headers, data=json.dumps(data))
print(response)
print(response.json())
{
  "url": "https://urlscan.io/api/v1/scan/",
  "content_type": "json",
  "method": "post",
  "payload": {
    "url": "https://tines.io/",
    "visibility": "public",
    "tags":[
      "demotag1", "demotag2"
    ]
  },
  "headers": {
    "API-Key": "{% credential urlscan_io %}"
  },
  "expected_update_period_in_days": "1"
}
If the
visibility
parameter is omitted, scans will default to your configured default visibility.
The response to the API call will give you the scan ID and API endpoint for the scan, you can use it to retrieve the result after waiting for a short while. Until the scan is finished, the URL will respond with a HTTP 404 status code.

Other options that can be set in the POST data JSON object:

  • customagent: Override User-Agent for this scan
  • referer: Override HTTP referer for this scan
  • visibility: One of public, unlisted, private. Defaults to your configured default visibility.
  • tags: User-defined tags to annotate this scan, e.g. "phishing" or "malicious". Limited to 10 tags.

If you have a list of URLs, you can use the following code-snippet to submit all of them:

echo list|tr -d "\r"|while read url; do
  curl -X POST "https://urlscan.io/api/v1/scan/" \
      -H "Content-Type: application/json" \
      -H "API-Key: $apikey" \
      -d "{\"url\": \"$url\", \"visibility\": \"public\"}"
  sleep 2;
done


Result API

Using the Scan ID received from the Submission API, you can use the Result API to poll for the scan. The most efficient approach would be to wait at least 10 seconds before starting to poll, and then only polling 2-second intervals with an eventual upper timeout in case the scan does not return.

curl https://urlscan.io/api/v1/result/$uuid/
{
  "url": "https://urlscan.io/api/v1/result/{{.uuid}}/",
  "content_type": "json",
  "method": "get",
  "expected_update_period_in_days": "1"
}
Until a scan is finished, this URL will respond with a HTTP 404 status code.

Once the scan is in our database, the URL will return a JSON object with these top-level properties:

task
Information about the submission: Time, method, options, links to screenshot/DOM
page
High-level information about the page: Geolocation, IP, PTR
lists
Lists of domains, IPs, URLs, ASNs, servers, hashes
data
All of the requests/responses, links, cookies, messages
meta
Processor output: ASN, GeoIP, AdBlock, Google Safe Browsing
stats
Computed stats (by type, protocol, IP, etc.)
verdicts
Verdicts about malicious content, with subkeys urlscan, engines, community.

Some of the information is contained in duplicate form for convenience.

In a similar fashion, you can get the DOM and screenshot for a scan using these URLs:

curl https://urlscan.io/screenshots/$uuid.png
curl https://urlscan.io/dom/$uuid/


Search API

You can use the same ElasticSearch syntax to search for scans as on the Search page. Each result has high-level metadata about the scan result and a link to the API for the full scan result.

{
  "url": "https://urlscan.io/api/v1/search/",
  "content_type": "json",
  "method": "get",
  "payload": {
    "q": "domain:tines.io OR domain:urlscan.io"
  },
  "expected_update_period_in_days": "1"
}
Available query parameters for the search endpoint:

q
The query term (ElasticSearch Query String Query). Default: "*"
size
Number of results returned. Default: 100, Max: 10000 (depending on your subscription)
search_after
For iterating, value of the sort attribute of the last result you received (comma-separated).
offset
Deprecated, not supported anymore, use search_after.

The API search will only indicate an exact count of results up to 10,000 results in the total property. After that the has_more flag will be true. Use the search_after query parameter for iterating over further results.

API search will find public scans performed by anyone as well as unlisted and private scans performed by your or your teams.

Query String Help

  • All API actions (including Search) are subject to your individual API Quotas.
  • The query field uses the ElasticSearch Query String to search for results. All queries are run in filter mode, sorted by date descending.
  • Refer to the documentation for advanced queries such as wildcard, regex, boolean operators, fuzzy searches, etc.
  • You can group and concatenate search-terms with brackets
    ( )
    ,
    AND
    ,
    OR
    , and
    NOT
    . The default operator is
    AND
    .
  • Always use the field names of the fields you want to search. Wildcards for the field-name are not supported!
  • Always escape reserved characters with backslash:
    + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
  • Always limit the time-range if possible using
    date
    , e.g.
    date:>now-7d
    or
    date:>now-1y
    .
  • You can use wildcard (though no leading wildcard) and regex search on almost all fields. Regexes are always anchored to beginning/end of the tokens.
  • The
    date
    allows relative queries like
    date:>now-7d
    or range-queries like
    date:[2020-01-01 TO 2020-02-01]
    or both combined.
  • Domain fields contain the whole domain and each smaller domain component
    domain
    can be searched by google.com which will include www.google.com
  • The
    page.url
    field is analysed as text, if you want to find multiple path components you should use phrase search with
    page.url:"foo/bar/batz"
  • The
    user
    and
    team
    field are special, you can search for
    user:me
    or
    team:me
    to get your own scans.
  • Searchable fields:
    ip
    ,
    domain
    ,
    page.url
    ,
    hash
    ,
    asn
    ,
    asnname
    ,
    country
    ,
    server
    ,
    filename
    ,
    task.visibility
    ,
    task.method
  • The fields
    ip
    ,
    domain
    ,
    url
    ,
    asn
    ,
    asnname
    ,
    country
    and
    server
    contain all requests of the scan.
  • To just search for the primary IP/Domain/ASN, prefix it with
    page.
    , e.g.
    page.domain:paypal.com
    .


Error Codes

The API returns different error codes using the HTTP status and will also include some high-level information in the JSON response, including the status code, a message and sometimes a more elaborate description.

For scan submissions, there are various reasons why a scan request won't be accepted and will return an error code. This includes, among others:

  • Blacklisted domains and URLs, requested to be blacklisted by their respective owners.
  • Spammy submissions of URLs known to be used only for spamming this service.
  • Invalid hostnames or invalid protocol schemes (FTP etc).
  • Missing URL property ... yes, it does happen.
  • Contains HTTP basic auth information ... yes, that happens as well.
  • Non-resolvable hostnames (A, AAAA, CNAME) which we will not even try to scan.

An error will typically be indicated by the HTTP 400 status code. It might look like this:

{
  "message": "DNS Error - Could not resolve domain",
  "description": "The domain .google.com could not be resolved to a valid IPv4/IPv6 address. We won't try to load it in the browser.",
  "status": 400
}

If you think an error is incorrect, let us know via mail!


Integrations & Tools

A few companies and individuals have integrated urlscan.io into their tools and workflows.
If you'd like to see your product listed here, send us an email!

Commercial
  • Tines - Advanced security orchestration & automation platform
  • Demisto Enterprise - Incident Lifecycle Platform
  • Phantom - Security Automation & Orchestration Platform
  • Anomali - A Threat Intelligence Platform that enables businesses to integrate security products and leverage threat data
  • Exabeam - Smarter SIEM, Better Security
  • Siemplify - Security Orchestration, Automation and Incident Response
  • Swimlane - Security Orchestration, Automation and Response
  • IBM Resilient - IBM Resilient Incident Response Platform
  • Rapid7 Komand - An orchestration layer for security tools
  • Rapid7 InsightConnect - Orchestration and automation to accelerate your teams and tools
  • LogicHub - Intelligent Security Automation
  • ThreatConnect - Threat Intelligence, Analytics, and Orchestration Platform
  • FireEye Security Orchestrator - Simplify threat response through orchestration and automation
  • RSA NetWitness - Threat detection & response
  • Cybersponse - Security Orchestration, Automation and Incident Response Solution
  • Polarity - Augmented Reality for Your Desktop - Integration
  • Nevelex Labs - Security Flow is a new automation and orchestration tool for corporate security.
  • Sanguine eComscan - eComscan is smart CCTV for online stores
  • D3 SOAR - Security Orchestration and Automated Incident Response with MITRE ATT&CK
  • DTonomy AIR - SOAR with Adaptive Intelligence
  • Joe Sandbox Cloud - Automated Deep Malware Analysis in the Cloud for Malware
  • Hybrid Analysis - Free malware analysis service for the community that detects and analyzes unknown threats
Open Source

Manual & Automatic Submissions

Manual submissions are the submissions through our website. No registration is required. Manual submissions have the same features as API and automatic submissions.

Automatic Submissions are URLs we collect from a variety of sources and submit to urlscan.io internally. The reason behind this to provide a good coverage of well-known URLs, especially with a focus towards potentially malicious sites. This helps when scanning a new site and searching for one of the many features (domain, IP, ASN) that can be extracted. Automatic Sources

  • OpenPhish AI-powered list of potential phishing sites: OpenPhish
  • PhishTank List of potential phishing sites: PhishTank
  • CertStream Suspicious domains observed on CertStream
  • Twitter URLs being tweeted / pasted by various Twitter users.
  • URLhaus Malware URL exchange by Abuse.ch: URLhaus