Search API Reference v1
Last updated: 2022-04-20


Our Search API & UI allows you to find archived scans of URLs on urlscan.io. This page is a reference for the available fields that can be used to query the API. Please see explanations about the field types and visibility below!

Query String Syntax and General Instructions

  • Search requests (through the UI or API) are subject to your individual Search API Quotas. Make sure to use your API key.
  • The query field uses the ElasticSearch Query String to search for results.
  • All queries are run in filter mode, sorted by date with the more recent scans first. There is no scoring of search results.
  • You can group and concatenate search-terms with brackets
    ( )
    ,
    AND
    ,
    OR
    , and
    NOT
    . The default operator is
    AND
    .
  • You can concatenate terms within a group, e.g.
    page.domain:(foo.com OR bar.com)
    .
  • Always use the field names of the fields you want to search. Wildcards for the field-name are not supported! Field names are case sensitive!
  • Always escape reserved characters with backslash:
    + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
  • Limit the time-range if possible using
    date
    , e.g.
    date:>now-7d
    or
    date:>now-1y
    .
  • The
    date
    allows relative queries like
    date:>now-7d
    or range-queries like
    date:[2020-01-01 TO 2020-02-01]
    or both combined.
  • You can only use leading wildcard searches and regular expression searches on supported fields, and only as a signed-in user.
  • Everything is indexed as lowercase, even if the Search API returns values in a case-preserving manner.
  • Regular expressions are always anchored to beginning/end of the tokens (implicit ^ and $). Make sure to prefix/suffix with
    .*
    to match infix strings.
  • Domain fields contain the whole domain and each smaller domain component, i.e.
    domain
    can be searched by google.com which will find hits for www.google.com

Recent Changes

April 20, 2022 - Search Index Version v1.2

  • Changed Behaviour:
    • date: Contains the timestamp for when the scan finished rather than when the scan was submitted.
    • The
      indexed_at
      field was removed. Please rely on the
      task.time
      field.
    • The
      submitter.country
      field is now lowercase and you have to search it via lowercase queries!
    • Searching fields that your user does not have access to will result in a HTTP/400 error from the search API. Previously these searches simply didn't return any results.
  • keyword RE: Fields marked with this symbol allow fast and efficient search using infix queries and regular expressions. If you try to perform an infix or regex query with a leading wildcard on a field which does not support it, we will try to rewrite the query to use the equivalent keyword RE field if it exists.
  • Additional Fields: We have added additional fields for searching. Some are new top-level fields and others are available under existing parent fields. New top-level fields include content dom frames links text verdicts
  • Implementation guidance: Make sure your code can handle missing fields in the search results. While fields are strongly typed, they can absent in certain records.
  • Deprecated behavior: The screenshot and result fields in the result entry for each scan are deprecated and should not be used any more. The corresponding URLs can be constructed using the _id. These fields will be removed in future versions of this API!
  • Improved Documentation: New fields are highlighted in the list below.

Field Type Legend

The available search fields can have different types, which dictate how they can be searched. It is important to understand the limits of each type to fully utilise our Search API.

  • — Field can be searched and is returned in the search results.
  • — Field can be searched, but the original value is not returned in the search results.
  • keyword — Field is analysed as one keyword, use an exact value or a trailing wildcard search to search. Indexed as lowercase.
  • keyword RE — Field can be searched by regular expression and leading wildcard. Indexed as lowercase.
  • text — Field is analysed as text, broken into multiple tokens (e.g. split on slash in the URL). Phrase search with quotes possible. Indexed as lowercase.
  • date — Field is analysed as date, allowing range-queries and date math, e.g.
    date:>now-24h
    .
  • ip — Search by an IPv4 or IPv6, either using an exact IP or a subnet definition like
    ip:8.8.8.8\/24
    .
  • domain — Search by a domain or parent domain. You can search for
    www.foobar.com
    or just
    foobar.com
    and it will both find scans for www.foobar.com.
  • integer — Allows searching by exact value, range, or threshold, e.g.
    stats.uniqIPs:>5
    .

Searchable Fields

Field Name Type Field semantics, features, & notes
date date Datetime of when the scan was performed
asn keyword Any of the AS numbers that were contacted (e.g. AS123)
asnname text Any of the AS Names that were contacted
asnname.keyword keyword Any of the AS names that were contacted (analysed as keyword)
country keyword ISO 3166-1 2-letter country code of any country that was contacted
domain domain Any domain and subdomain that was contacted
domain.keyword keywordRE Any domain and subdomain that was contacted
filename text Any URL that was requested
filename.keyword keywordRE Any URL that was requested
hash keyword Any SHA256 hash of any HTTP response
ip ip Any IP that was contacted
server keyword Any HTTP "Server" header of subrequests
page.apexDomain keyword Apex domain of the page — (New since v1.2)
page.asn keyword AS Number of the website
page.asnname text Name of the main AS of the website
page.country keyword Primary IP GeoIP Country (ISO 3166-1 2-letter country code)
page.domain domain Primary Domain (Analysed as all levels of parent domains)
page.domain.keyword keywordRE Primary Domain (Analysed as keyword)
page.ip ip Primary IP
page.mimeType keyword MIME type of the primary HTTP response — (New since v1.2)
page.ptr domain DNS PTR record of primary IP
page.redirected keyword Whether the page was redirected from task.url, can be one of same-domain, sub-domain, off-domain, https-only — (New since v1.2)
page.server text HTTP "Server" header of primary request
page.status keyword HTTP status code of primary request response
page.title text Title of the page — (New since v1.2)
page.title.keyword keywordRE Title of the page — (New since v1.2)
page.tlsAgeDays date Age of TLS certificate when the page was scanned (in days) — (New since v1.2)
page.tlsIssuer keyword Issuer of the page TLS certificate — (New since v1.2)
page.tlsValidDays date TLS certificate validity period in days — (New since v1.2)
page.tlsValidFrom date TLS certificate Valid-From date — (New since v1.2)
page.umbrellaRank integer Cisco Umbrella Top 1 Million rank of page domain — (New since v1.2)
page.url text URL of the primary page (after redirection)
page.url.keyword keywordRE URL of the primary page (after redirection, analysed as keyword)
stats.dataLength integer Data size of all subresources
stats.encodedDataLength integer Transfer size of all subresources
stats.requests integer Number of subrequests
stats.uniqCountries integer Number of unique countries contacted
stats.uniqIPs integer Number of unique IPs contacted
task.domain domain Domain of the tasked URL
task.domain.keyword keywordRE Domain of the tasked URL (analysed as keyword)
task.method keyword Can be manual, api, or automatic
task.source keyword Examples: phishtank or certstream-suspicious
task.tags keyword User-defined tags supplied during scan submission
task.url text The original URL that was tasked
task.url.keyword keywordRE The original URL that was tasked (analysed as keyword)
task.uuid keyword The unique UUID of the scan
task.visibility keyword Can be one of public, unlisted, or private
user virtual Scans submitted by yourself (Can only be me)
team virtual Scans submitted by any of your teams (Can only be me)
apikey virtual Scans submitted using one of your API keys (Can only be me)
files.sha256 keyword SHA256 of file downloaded by the website
files.filename keywordRE Filename of file downloaded by the website
files.filesize integer Filesize of file downloaded by the website
files.mimeType keyword MIME type description of file downloaded by the website

urlscan Professional and Enterprise — The following fields can only be searched on the Professional and Enterprise plans

brand.country keyword ISO 3166-1 2-letter country code of the brand
brand.key keyword Unique key of the brand
brand.name text Name of the brand
brand.vertical text Industry vertical of the brand, e.g. "banking"
content.cookieNames keyword Names of cookies set by page — (New since v1.2)
content.globalNames keyword Names of non-standard JavaScript global variables — (New since v1.2)
content.inputNames keyword Name attributes of input fields on page — (New since v1.2)
content.inputTypes keyword Type attributes of input fields on page — (New since v1.2)
content.technologies keyword Names of technologies detected according to Wappalyzer — (New since v1.2)
dom.hash keyword SHA256 hash of the DOM before truncation — (New since v1.2)
dom.size integer Size of the DOM before truncation — (New since v1.2)
frames.domains domain Domains of frames — (New since v1.2)
frames.length integer Number of frames — (New since v1.2)
frames.urls keywordRE URLs of frames / iFrames — (New since v1.2)
links.domains domain Domains of outgoing links — (New since v1.2)
links.length integer Number of outgoing links — (New since v1.2)
links.urls keywordRE URLs of outgoing links (to different domains than page.domain) — (New since v1.2)
scanner.country keyword Scanner IP exit location (ISO 3166-1 2-letter country code) — (New since v1.2)
submitter.country keyword GeoIP country of the submission IP (ISO 3166-1 2-letter country code)
text.content text Visible text on the website, truncated to the first 20kB — (New since v1.2)
text.hash keyword SHA256 hash of the text before truncation — (New since v1.2)
text.size integer Size of the text content before truncation — (New since v1.2)
verdicts.malicious boolean Whether the page is considered malicious — (New since v1.2)
verdicts.score integer Maliciousness score of page from -100 (benign) to 100 (malicious) — (New since v1.2)
verdicts.lastVerdict date Date the latest verdict for this scan was added, only for verdicts created after the scan has finished — (New since v1.2)