Today we are launching a major overhaul to our search index powering our urlscan.io and urlscan Pro platforms. This release will offer new functionality to community and paid users. We have gathered customer feedback and internal use-cases and came up with a list of additional attributes that would be helpful to search on. This post outlines the highlights of new available search attributes. All of the new searchable fields have been integrated in a backward compatible fashion, which means that any search which previously worked on urlscan.io will continue to work.
The full list of searchable fields is available on the Search API Reference page.
New searchable attributes
Over the past two years we recognized various additional fields that we would like to be able to search. The title of the page was an obvious addition, and we have also gone ahead and added fields like the age of the TLS certificate at the time the page was scanned or the Cisco Umbrella rank for the primary hostname of the page. We hope that these new fields will allow hunting for more interesting scans on urlscan.io.
We have changed the way that certain fields are indexed so that these fields can more effectively be searched using regular expressions and wildcard expressions. Especially for fields containing arbitrary information like URLs or hostname it is often crucial to search using complex expressions, something which was hard or outright impossible to do before. With the wildcard fields, users can search for single characters in a field like the page url very quickly.
Customers on our Professional and Enterprise plans can now find historical scans by searching for strings in the text of the website. We currently index the first 20kB of visible text content per site.
Outgoing links from the website are now indexed by domain and full URL. These fields can be searched to incoming links from other websites.
Verdicts & Brand Search
We have always allowed customers on our Professional and Enterprise plans to search for detections of malicious websites on our platform by means of our brand detection system. Now we also incorporate community verdicts into our search index and combine them with our verdicts to form a global verdict and score. These attributes are grouped under the verdicts key.
As next steps we will integrate pivoting via the additional attributes to urlscan.io and our urlscan Pro threat hunting platform. Changes to these platforms will be announce on this blog and on the urlscan Pro platform.
You can reach out to us with any questions via email@example.com.
Editor’s Note: An earlier version of this blog-post was erroneously published early, we apologise for any confusion this might have caused!