urlscan.io Search API Reference v1
Last updated: 2022-04-20
Our Search API & UI allows you to find archived scans of URLs on urlscan.io. This page is a reference for the available fields that can be used to query the API. Please see explanations about the field types and visibility below!
Query String Syntax and General Instructions
- Search requests (through the UI or API) are subject to your individual Search API Quotas. Make sure to use your API key.
- The query field uses the ElasticSearch Query String to search for results.
- All queries are run in filter mode, sorted by date with the more recent scans first. There is no scoring of search results.
- You can group and concatenate search-terms with brackets ( ),AND,OR, andNOT. The default operator isAND.
- You can concatenate terms within a group, e.g. page.domain:(foo.com OR bar.com).
- Always use the field names of the fields you want to search. Wildcards for the field-name are not supported! Field names are case sensitive!
- Always escape reserved characters with backslash: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
- Limit the time-range if possible using date, e.g.date:>now-7dordate:>now-1y.
- The dateallows relative queries likedate:>now-7dor range-queries likedate:[2020-01-01 TO 2020-02-01]or both combined.
- You can only use leading wildcard searches and regular expression searches on supported fields, and only as a signed-in user.
- Everything is indexed as lowercase, even if the Search API returns values in a case-preserving manner.
- Regular expressions are always anchored to beginning/end of the tokens (implicit ^ and $). Make sure to prefix/suffix with .*to match infix strings.
- Domain fields contain the whole domain and each smaller domain component, i.e. domaincan be searched by google.com which will find hits for www.google.com
Recent Changes
January 8, 2024 - System Labels, User Tags, Meta Hits
- Additional Fields: We have added additional fields for
searching:
- meta: Contains meta information about the scan and matched searches, e.g. the IDs of Saved Searches that this item has matched.
- labels: Contains system labels controlled by urlscan.
- usertags: Contains user-defined tags applied by Saved Searches.
Field Type Legend
The available search fields can have different types, which dictate how they can be searched. It is important to understand the limits of each type to fully utilise our Search API.
- — Field can be searched and is returned in the search results.
- — Field can be searched, but the original value is not returned in the search results.
- keyword — Field is analysed as one keyword, use an exact value or a trailing wildcard search to search. Indexed as lowercase.
- keyword RE — Field can be searched by regular expression and leading wildcard. Indexed as lowercase.
- text — Field is analysed as text, broken into multiple tokens (e.g. split on slash in the URL). Phrase search with quotes possible. Indexed as lowercase.
- date — Field is analysed as date, allowing range-queries and date math, e.g. date:>now-24h.
- ip — Search by an IPv4 or IPv6, either using an exact IP or a subnet definition like ip:8.8.8.8\/24.
- domain — Search by a domain or parent domain. You can
search for www.foobar.comor justfoobar.comand it will both find scans for www.foobar.com.
- integer — Allows searching by exact value, range, or threshold, e.g. stats.uniqIPs:>5.
Searchable Fields
Field Name | Type | Field semantics, features, & notes | ||
---|---|---|---|---|
date | date | Datetime of when the scan was performed | ||
asn | keyword | Any of the AS numbers that were contacted (e.g. AS123) | ||
asnname | text | Any of the AS Names that were contacted | ||
asnname.keyword | keyword | Any of the AS names that were contacted (analysed as keyword) | ||
country | keyword | ISO 3166-1 2-letter country code of any country that was contacted | ||
domain | domain | Any domain and subdomain that was contacted | ||
domain.keyword | keywordRE | Any domain and subdomain that was contacted | ||
filename | text | Any URL that was requested | ||
filename.keyword | keywordRE | Any URL that was requested | ||
hash | keyword | Any SHA256 hash of any HTTP response | ||
ip | ip | Any IP that was contacted | ||
server | keyword | Any HTTP "Server" header of subrequests | ||
page.apexDomain | keyword | Apex domain of the page — (New since v1.2) | ||
page.asn | keyword | AS Number of the website | ||
page.asnname | text | Name of the main AS of the website | ||
page.country | keyword | Primary IP GeoIP Country (ISO 3166-1 2-letter country code) | ||
page.domain | domain | Primary Domain (Analysed as all levels of parent domains) | ||
page.domain.keyword | keywordRE | Primary Domain (Analysed as keyword) | ||
page.ip | ip | Primary IP | ||
page.mimeType | keyword | MIME type of the primary HTTP response — (New since v1.2) | ||
page.ptr | domain | DNS PTR record of primary IP | ||
page.redirected | keyword | Whether the page was redirected from task.url, can be one of same-domain, sub-domain, off-domain, https-only — (New since v1.2) | ||
page.server | text | HTTP "Server" header of primary request | ||
page.status | keyword | HTTP status code of primary request response | ||
page.title | text | Title of the page — (New since v1.2) | ||
page.title.keyword | keywordRE | Title of the page — (New since v1.2) | ||
page.tlsAgeDays | date | Age of TLS certificate when the page was scanned (in days) — (New since v1.2) | ||
page.tlsIssuer | keyword | Issuer of the page TLS certificate — (New since v1.2) | ||
page.tlsValidDays | date | TLS certificate validity period in days — (New since v1.2) | ||
page.tlsValidFrom | date | TLS certificate Valid-From date — (New since v1.2) | ||
page.umbrellaRank | integer | Cisco Umbrella Top 1 Million rank of page domain — (New since v1.2) | ||
page.url | text | URL of the primary page (after redirection) | ||
page.url.keyword | keywordRE | URL of the primary page (after redirection, analysed as keyword) | ||
stats.dataLength | integer | Data size of all subresources | ||
stats.encodedDataLength | integer | Transfer size of all subresources | ||
stats.requests | integer | Number of subrequests | ||
stats.uniqCountries | integer | Number of unique countries contacted | ||
stats.uniqIPs | integer | Number of unique IPs contacted | ||
task.domain | domain | Domain of the tasked URL | ||
task.domain.keyword | keywordRE | Domain of the tasked URL (analysed as keyword) | ||
task.method | keyword | Can be manual, api, or automatic | ||
task.source | keyword | Examples: phishtank or certstream-suspicious | ||
task.tags | keyword | User-defined tags supplied during scan submission | ||
task.url | text | The original URL that was tasked | ||
task.url.keyword | keywordRE | The original URL that was tasked (analysed as keyword) | ||
task.uuid | keyword | The unique UUID of the scan | ||
task.visibility | keyword | Can be one of public, unlisted, or private | ||
user | virtual | Scans submitted by yourself (Can only be me) | ||
team | virtual | Scans submitted by any of your teams (Can only be me) | ||
apikey | virtual | Scans submitted using one of your API keys (Can only be me) | ||
files.sha256 | keyword | SHA256 of file downloaded by the website | ||
files.filename | keywordRE | Filename of file downloaded by the website | ||
files.filesize | integer | Filesize of file downloaded by the website | ||
files.mimeType | keyword | MIME type description of file downloaded by the website | ||
urlscan Professional and Enterprise — The following fields can only be searched on the Professional, Enterprise, and Ultimate plans |
||||
brand.country | keyword | ISO 3166-1 2-letter country code of the brand | ||
brand.key | keyword | Unique key of the brand | ||
brand.name | text | Name of the brand | ||
brand.vertical | text | Industry vertical of the brand, e.g. "banking" | ||
content.cookieNames | keyword | Names of cookies set by page — (New since v1.2) | ||
content.globalNames | keyword | Names of non-standard JavaScript global variables — (New since v1.2) | ||
content.inputNames | keyword | Name attributes of input fields on page — (New since v1.2) | ||
content.inputTypes | keyword | Type attributes of input fields on page — (New since v1.2) | ||
content.technologies | keyword | Names of technologies detected according to Wappalyzer — (New since v1.2) | ||
dom.hash | keyword | SHA256 hash of the DOM before truncation — (New since v1.2) | ||
dom.size | integer | Size of the DOM before truncation — (New since v1.2) | ||
frames.domains | domain | Domains of frames — (New since v1.2) | ||
frames.length | integer | Number of frames — (New since v1.2) | ||
frames.urls | keywordRE | URLs of frames / iFrames — (New since v1.2) | ||
links.domains | domain | Domains of outgoing links — (New since v1.2) | ||
links.length | integer | Number of outgoing links — (New since v1.2) | ||
links.urls | keywordRE | URLs of outgoing links (to different domains than page.domain) — (New since v1.2) | ||
scanner.country | keyword | Scanner IP exit location (ISO 3166-1 2-letter country code) — (New since v1.2) | ||
submitter.country | keyword | GeoIP country of the submission IP (ISO 3166-1 2-letter country code) | ||
text.content | text | Visible text on the website, truncated to the first 20kB — (New since v1.2) | ||
text.hash | keyword | SHA256 hash of the text before truncation — (New since v1.2) | ||
text.size | integer | Size of the text content before truncation — (New since v1.2) | ||
verdicts.malicious | boolean | Whether the page is considered malicious — (New since v1.2) | ||
verdicts.score | integer | Maliciousness score of page from -100 (benign) to 100 (malicious) — (New since v1.2) | ||
verdicts.lastVerdict | date | Date the latest verdict for this scan was added, only for verdicts created after the scan has finished — (New since v1.2) |