www.convertcsv.com Open in urlscan Pro
52.173.245.249  Public Scan

URL: https://www.convertcsv.com/url-extractor.htm
Submission: On June 09 via manual from IT — Scanned from IT

Form analysis 1 forms found in the DOM

Name: frm1

<form id="frm1" name="frm1" class="form-inline" role="form" onsubmit="return false">
  <h3 class="headerBlue">Step 1: Select your input</h3>
  <br>
  <div class="form-group w100">
    <ul class="nav nav-tabs">
      <li class="nav-item active"><a id="defaultTabLink" data-toggle="tab" href="#inputtext">Enter Data</a></li>
      <li class="nav-item"><a id="fileTabLink" data-toggle="tab" href="#inputfile" class="nav-link">Choose File</a></li>
      <li class="nav-item"><a id="urlTabLink" data-toggle="tab" href="#inputurl" class="nav-link">Enter URL</a></li>
    </ul> &nbsp; &nbsp; <label><input id="chkList" type="checkbox" value="Y" title="Enter text to scan or list of web addresses"> Scan list of web pages</label><br>
    <label>Use this Regular Expression instead <input name="txtRegExp" id="txtRegExp" type="text" size="30" value="" title="Ex \S*www\.\S+  Do not use front and end slashes"></label><br>
    <div class="tab-content">
      <div id="inputtext" class="tab-pane active">
        <textarea class="form-control" style="width: 90%;" rows="10" cols="80" id="txt1" wrap="off" placeholder="Enter or paste here" onpaste="setTimeout(function(){ document.getElementById('btnRun').click() }, 10)"></textarea>
      </div>
      <div id="inputfile" class="tab-pane">
        <label xclass="form-control">Choose File<input type="file" id="f1" class="form-control" onchange="loadTextFile(this,assignText,event)" title="Choose a local HTML file"></label>
        <label for="txtEncoding"></label><span id="spanEncoding">Encoding</span>
        <select id="txtEncoding" class="form-control" title="Enter encoding for input file" onchange="loadTextFile(document.getElementById('f1'),assignText)">
          <option value="" selected="selected">-Default-</option>
          <option value="ISO-8859-1">ISO-8859-1 (Latin No. 1)</option>
          <option value="ISO-8859-2">ISO-8859-2 (Latin No. 2)</option>
          <option value="ISO-8859-3">ISO-8859-3 (Latin No. 3)</option>
          <option value="ISO-8859-4">ISO-8859-4 (Latin No. 4)</option>
          <option value="ISO-8859-5">ISO-8859-5 (Latin/Cyrillic)</option>
          <option value="ISO-8859-6">ISO-8859-6 (Latin/Arabic)</option>
          <option value="ISO-8859-7">ISO-8859-7 (Latin/Greek)</option>
          <option value="ISO-8859-8">ISO-8859-8 (Latin/Hebrew)</option>
          <option value="ISO-8859-9">ISO-8859-9 (Latin No. 5)</option>
          <option value="ISO-8859-13">ISO-8859-13 (Latin No. 7)</option>
          <option value="ISO-8859-15">ISO-8859-15 (Latin No. 9)</option>
          <option value="macintosh">Mac OS Roman</option>
          <option value="UTF-8">UTF-8</option>
          <option value="UTF-16">UTF-16</option>
          <option value="UTF-16BE">UTF-16 (Big-Endian)</option>
          <option value="UTF-16LE">UTF-16 (Little-Endian)</option>
          <option value="UTF-32">UTF-32</option>
          <option value="UTF-32BE">UTF-32 (Big-Endian)</option>
          <option value="UTF-32LE">UTF-32 (Little-Endian)</option>
          <option value="windows-1250">windows-1250 (Win East European)</option>
          <option value="windows-1251">windows-1251 (WinCyrillic)</option>
          <option value="windows-1252">windows-1252 (WinLatin-1)</option>
          <option value="windows-1253">windows-1253 (WinGreek)</option>
          <option value="windows-1254">windows-1254 (Win Turkish)</option>
          <option value="windows-1255">windows-1255 (Win Hebrew)</option>
          <option value="windows-1256">windows-1256 (Win Arabic)</option>
          <option value="windows-1257">windows-1257 (Win Baltic)</option>
          <option value="windows-1258">windows-1257 (Win Vietnamese)</option>
        </select>
      </div>
      <div id="inputurl" class="tab-pane">
        <label> Enter URL as data source <input type="text" size="40" value="" name="url" id="url" class="form-control" title="Enter the URL of a web page returning HTML with a table">
        </label>
        <input type="button" id="btnUrl" class="btn btn-primary" value="Load URL" title="Load HTML via URL" onclick="loadURL(document.getElementById('url').value)">
      </div>
    </div>
    <div class="">
      <input type="button" class="btn btn-primary" value="Clear Input" onclick="window.location.reload(true)"> &nbsp; <input type="button" value="Example" class="btn btn-primary" title="Load and run example"
        onclick="document.getElementById('url').value='https://www.ddginc-usa.com/';document.getElementById('btnUrl').click()">
    </div>
  </div>
  <br>
  <h3 class="headerBlue">Step 2: Choose output options <small>(optional)</small></h3><a href="#" onclick="return false" data-toggle="collapse" data-target="#p4"> <span class="glyphicon glyphicon-chevron-down"></span></a>
  <hr class="noverticalspace">
  <fieldset class="scheduler-border collapse" id="p4">
    <legend class="scheduler-border">Output Options</legend> Output Field Separator: <label><input type="radio" name="outsep" id="outSepComma" value="," checked="checked"> ,</label> &nbsp; <label><input type="radio" name="outsep" id="outSepSemicolon"
        value=";"> ;</label> &nbsp; <label><input type="radio" name="outsep" id="outSepColon" value=":"> :</label> &nbsp; <label><input type="radio" name="outsep" id="outSepPipe" value="|"> Bar-|</label> &nbsp; <label><input type="radio"
        name="outsep" id="outSepTab" value=" " onclick="this.value='\t'"> Tab</label> &nbsp; <label><input type="radio" name="outsep" id="outSepOther" value="o"> Other-Choose</label>
    <label><input type="text" size="2" id="outSepOtherVal" value="*"></label>
    <br>
    <label><input id="chkCsvHeader" type="checkbox"> Include header in first row</label>
    <br>
    <label> # of Columns Per Line: <input type="text" id="txtNumCols" value="1" class="form-control"></label>
    <br>
    <label><input id="chkSort" type="checkbox" value="Y" checked="checked"> Sort URL addresses</label>
    <br>
    <label><input id="chkDup" type="checkbox" value="Y" checked="checked" onclick="if(this.checked)document.getElementById('chkKeepDup').checked=false"> Remove duplicate URL addresses</label>
    <br>
    <label><input id="chkKeepDup" type="checkbox" value="Y" onclick="if(this.checked)document.getElementById('chkDup').checked=false"> Only Display duplicate URL addresses</label>
    <br>
    <label><input id="chkLimit" type="checkbox" value="Y"></label>
    <label>URL contains this string <input name="txtLimit" id="txtLimit" type="text" size="30" value="" onblur="if(this.value.length>0)this.form.chkLimit.checked=true;else this.form.chkLimit.checked=false"></label>
    <label> <input type="checkbox" value="Y" id="chkIsRegex" name="chkIsRegex">Is regular expression</label>
    <br>
    <label><input id="chkSocial" type="checkbox" value="Y"></label>
    <label><input type="radio" name="radShow" id="radShow" checked="checked">Only show</label>
    <label><input type="radio" name="radShow" id="radNotShow">Do not show</label> - URLs of these sites: <small>(i.e. social media)</small>
    <br> <input type="text" id="txtSocial" size="50" value="facebook.com,twitter.com,instagram.com,youtube.com,youtu.be,plus.google.com">
    <br>
    <label><input id="chkForceCsv" type="checkbox"> Force CSV style output</label>
    <br>
    <label><input id="chkAnchor" type="checkbox"> Output as Anchor tag</label>
    <br>
    <label><input id="chkAppend" type="checkbox"> Append results</label>
    <br>
    <label><input id="chkIncludeFromUrl" type="checkbox" checked="checked"> If scanning a list of web pages, output the From URL also</label>
  </fieldset>
  <h3 class="headerBlue">Step 3: Extract URLs</h3><br>
  <input type="button" id="btnRun" class="btn btn-primary" value="Extract" title="Find URLs in Text" onclick="runitonce(document.getElementById('txt1').value,false);return false">
  <input type="button" class="btn btn-primary" onclick="runitonce(document.getElementById('txt1').value,true);return false" value="Extract To Excel" title="Extract URLs to an Excel file">
</form>

Text Content

URL EXTRACTOR FOR WEB PAGES AND TEXT


USE THIS TOOL TO EXTRACT URLS IN WEB PAGES, DATA FILES, TEXT AND MORE. NEW
SUPPLY LIST OF WEB PAGES TO SCAN.

FROM CSV/EXCEL

 * CSV To Delimited
 * CSV To Flat File
 * CSV To GeoJSON
 * CSV To HTML Table
 * CSV To JSON
 * CSV To KML
 * CSV To Markdown
 * CSV To Multi-line Data
 * CSV To PDF
 * CSV To SQL
 * CSV To Word
 * CSV To XML
 * CSV To YAML
 * Pivot CSV
 * Transpose CSV
 * Query CSV with SQL

TO CSV/EXCEL

 * Flat File to CSV
 * GeoJSON To CSV
 * HTML Links To CSV
 * HTML Table To CSV
 * JSON To CSV
 * KML To CSV
 * SQL To CSV
 * XML To CSV
 * YAML To CSV

DATA TOOLS

 * CSV Escape Tool
 * CSV Template Engine
 * EDA Tool
 * CSV Editor
 * Generate Test Data
 * Email Extractor
 * Phone Extractor
 * Split Text or CSV Files
 * URL Extractor
 * Extract via RegEx
 * CSV Home




WHAT CAN THIS TOOL DO?



 * Use this tool to extract fully qualified URL addresses from web pages and
   data files.
 * Search a list of web pages for URLs
 * The output is 1 or more columns of the URL addresses. You can see the output
   below or as an Excel file

--------------------------------------------------------------------------------


WHAT ARE MY OPTIONS?

--------------------------------------------------------------------------------

 * Optionally input list of web pages to scan
 * You can choose the number of URLs per line (default 1)
 * You may choose the output delimiter if multi-column output or default is the
   comma.
 * Remove duplicate URLs.
 * Only display duplicate URLs.
 * Sort the URLs found.
 * Extract URLs (not) containing a string
 * You can add a heading or no heading.


See also HTML Links to CSV (Only extracts anchor tag information)
and... HTML Table to CSV     Regex Text Extractor




STEP 1: SELECT YOUR INPUT


 * Enter Data
 * Choose File
 * Enter URL

    Scan list of web pages
Use this Regular Expression instead

Choose File Encoding -Default- ISO-8859-1 (Latin No. 1) ISO-8859-2 (Latin No. 2)
ISO-8859-3 (Latin No. 3) ISO-8859-4 (Latin No. 4) ISO-8859-5 (Latin/Cyrillic)
ISO-8859-6 (Latin/Arabic) ISO-8859-7 (Latin/Greek) ISO-8859-8 (Latin/Hebrew)
ISO-8859-9 (Latin No. 5) ISO-8859-13 (Latin No. 7) ISO-8859-15 (Latin No. 9) Mac
OS Roman UTF-8 UTF-16 UTF-16 (Big-Endian) UTF-16 (Little-Endian) UTF-32 UTF-32
(Big-Endian) UTF-32 (Little-Endian) windows-1250 (Win East European)
windows-1251 (WinCyrillic) windows-1252 (WinLatin-1) windows-1253 (WinGreek)
windows-1254 (Win Turkish) windows-1255 (Win Hebrew) windows-1256 (Win Arabic)
windows-1257 (Win Baltic) windows-1257 (Win Vietnamese)
Enter URL as data source
 



STEP 2: CHOOSE OUTPUT OPTIONS (OPTIONAL)

--------------------------------------------------------------------------------

Output Options Output Field Separator: ,   ;   :   Bar-|   Tab   Other-Choose
Include header in first row
# of Columns Per Line:
Sort URL addresses
Remove duplicate URL addresses
Only Display duplicate URL addresses
URL contains this string Is regular expression
Only show Do not show - URLs of these sites: (i.e. social media)

Force CSV style output
Output as Anchor tag
Append results
If scanning a list of web pages, output the From URL also


STEP 3: EXTRACT URLS


Result Data:

Save your result: .csv Download Result EOL: CRLFLF

Close X

Copyright © 2013-2022 Data Design Group, Inc. All Rights Reserved  Facebook  
Twitter   Privacy Policy   Contact Us   Change Log   Terms of Use   Home