blogs.loc.gov Open in urlscan Pro
104.17.6.58  Public Scan

URL: https://blogs.loc.gov/thesignal/2011/11/the-average-lifespan-of-a-webpage/
Submission: On June 17 via manual from CA — Scanned from CA

Form analysis 2 forms found in the DOM

GET https://blogs.loc.gov/thesignal/

<form id="blog-search-box" class="site-search input-group input-group-sm pl-md-5" action="https://blogs.loc.gov/thesignal/" method="get" aria-label="Global">
  <!--div class="input-group-prepend"><label class="input-group-text" for="search-format">Search</label></div-->
  <select class="input-group-prepend custom-select border-dark ml-lg-5 loc-search-format" id="search-format" aria-label="Filter search by">
    <option value="searchThisBlog">This Blog</option>
    <option value="">Everything</option>
    <option value="original-format:sound recording">Audio Recordings</option>
    <option value="original-format:book">Books</option>
    <option value="original-format:film, video">Films, Videos</option>
    <option value="original-format:legislation">Legislation</option>
    <option value="original-format:manuscript/mixed material">Manuscripts/Mixed Materials</option>
    <option value="original-format:map">Maps</option>
    <option value="original-format:notated music">Notated Music</option>
    <option value="original-format:newspaper">Newspapers</option>
    <option value="original-format:periodical">Periodicals</option>
    <option value="original-format:personal narrative">Personal Narratives</option>
    <option value="original-format:photo, print, drawing">Photos, Prints, Drawings</option>
    <option value="original-format:software, e-resource">Software, E-Resources</option>
    <option value="original-format:archived web site">Archived Web Sites</option>
    <option value="original-format:web page">Web Pages</option>
    <option value="original-format:3d object">3-D Objects</option>
  </select>
  <input class="all-loc-search" id="search_within_default" type="hidden" name="new" value="true">
  <input class="this-blog-search all-loc-search locsuggest form-control border-dark" aria-label="Search" placeholder="Search" name="s" id="s" type="text" maxlength="250" onfocus="this.select();">
  <div class="input-group-append"><button class="btn btn-dark" value="submit" type="submit"><span class="sr-only">Go</span><i class="fas fa-search" aria-hidden="true"></i></button></div>
</form>

POST https://blogs.loc.gov/thesignal/wp-comments-post.php

<form action="https://blogs.loc.gov/thesignal/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate="">
  <p>This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of
    Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites
    are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our
    <a href="//www.loc.gov/legal/comment-and-posting-policy/">Comment and Posting Policy</a>.</p><br>
  <p>Required fields are indicated with an * asterisk.</p>
  <div class="form-group"><label class="col-form-label" for="author">Name (no commercial URLs) <span class="required">*</span></label> <input id="author" name="author" type="text" value="" size="30" aria-required="true" class="form-control"></div>
  <div class="form-group"><label class="col-form-label" for="email">Email (will not be published) <span class="required">*</span></label> <input id="email" name="email" type="email" value="" size="30" aria-required="true" class="form-control"></div>
  <div class="form-group"><label class="col-form-label" for="comment">Comment:</label> <textarea id="comment" name="comment" cols="45" rows="8" aria-required="true" class="form-control"></textarea></div>
  <p class="form-submit"><input name="submit" type="submit" id="submit" class="btn btn-primary" value="Add Comment">
    <input id="comment_reset" name="reset" type="reset" value="Clear Comment" class="btn btn-outline-primary" onclick="clearComment();">
  </p><input type="hidden" name="comment_parent" id="comment_parent" value="0"><input type="hidden" name="comment_post_ID" value="3784" id="comment_post_ID">
  <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="3b1f503b73"></p>
  <p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="1718662292528">
    <script>
      document.getElementById("ak_js_1").setAttribute("value", (new Date()).getTime());
    </script>
  </p>
</form>

Text Content

Top of page

Skip to content

The Signal Digital Happenings at the Library of Congress
ISSN 2691-672X


SHARE & SUBSCRIBE TO THIS BLOG

 * 
 * 
 * 

This Blog Everything Audio Recordings Books Films, Videos Legislation
Manuscripts/Mixed Materials Maps Notated Music Newspapers Periodicals Personal
Narratives Photos, Prints, Drawings Software, E-Resources Archived Web Sites Web
Pages 3-D Objects
Go
 1. Home
 2. The Average Lifespan of a Webpage

Menu
About this blog
Categories
 * API
 * Artificial Intelligence
 * Asian American History
 * At the Museum
 * By the People Transcription Program
 * Computational Research
 * Computing Cultural Heritage in the Cloud
 * Content Matters Interview
 * Crowdsourcing
 * Data Librarianship
 * Digital Content
 * digital humanities
 * Digital Preservation
 * digital scholarship
 * DPOE Interview
 * Education and Training
 * FADGI
 * Formats
 * Guest Posts
 * Inside the Library
 * Insights Interview
 * Interviews
 * Labs Letter
 * NDI
 * NDSA
 * NDSR
 * New on loc.gov
 * Open Access
 * Open Data
 * Open Research
 * Outreach and Events
 * Partners and Collaboration
 * Personal Archiving
 * Publications and Resources
 * Tools and Infrastructure
 * University Librarians
 * Videos and Podcasts
 * Viewshare
 * Web Archiving

ARCHIVES
 * 2024
   * January
   * February
   * March
   * April
   * May
   * June
 * 2023
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2022
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2021
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2020
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2019
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * September
   * October
   * November
   * December
 * 2018
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2017
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2016
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2015
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2014
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2013
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2012
   * January
   * February
   * March
   * April
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December
 * 2011
   * May
   * June
   * July
   * August
   * September
   * October
   * November
   * December


THE AVERAGE LIFESPAN OF A WEBPAGE

November 8, 2011

Posted by: Mike Ashenfelder


SHARE THIS POST:

 * 
 * 
 * 

The following is a guest post by Nicholas Taylor, Information Technology
Specialist for the Repository Development Group.

What is the average lifespan of webpage? Predictably, estimates vary and vary
over time. A 1997 special report in Scientific American claimed 44 days. A
subsequent 2001 academic study in IEEE Computer suggested 75 days. More
recently, in 2003, a Washington Post article indicated that the number was 100
days.

"Broken chain" by Flickr user kruemi

While there appear to be overall fewer estimates of webpage longevity floating
around than, say, the amount of data stored in the Library of Congress, we can
at least feel more assured that they’ve all come from someone who should know:
Brewster Kahle, founder of the Internet Archive.

Determining the average lifespan of a webpage is complicated not just by the
infrastructure required to analyze a plausibly representative sample of links
across the web but also because it’s easy to conflate “the average lifespan of a
webpage” with other closely-related concepts that are, in actuality, much more
difficult to measure. That is to say, we take for granted that we know what it
means that a webpage has “died.”

For instance, is a “webpage” defined by its URI or by its contents? A
non-resolving link doesn’t necessarily imply that the content once hosted there
no longer exists (1); it may have been archived or simply exist at a new
location (albeit, one mediated by a paywall) to which the web server was not
configured to redirect page requests. Conversely, a resolving link doesn’t
necessarily imply that the same content is still hosted there as it once was.

"404 Page Book Shelf Flickr" by Flickr user herzogbr

An automated link checker visiting a list of URIs and logging all ultimately
successful and failed requests would miss these subtleties. A human being with a
limitless amount of time who set out to manually check the same list might still
get hung up on exhaustively concluding that a disappeared webpage did not, in
fact, exist at a new URI or on the subjective determination of whether an extant
webpage could be said to be the “same” webpage as before.

There are additional complications in these sorts of analyses. While even the
longest of the aforementioned webpage lifespans suggests that webpages are
ephemeral, some are so fleeting that their lifespans are better measured in
hours rather than days. Analyzing its web index, Google noticed that the median
lifespan of malware-distributing domains decreased from one month in 2007 to a
mere two hours by 2010. Since most commercial web search engines penalize
listings from such domains, malware distributors are incentivized to churn
quickly through massive numbers of new domains. The number of domains being
created and their transience may skew average lifespan calculations by automated
methods downward.

Finally, there’s no practical way of knowing precisely when a webpage
disappears; we can only know the time difference between a previous visit when
the webpage existed and a later visit when it didn’t. Depending on the breadth
of the crawl and the infrastructure available, it may be days, weeks, or even
months before the crawler visits the same webpage twice. This margin of
variability may undermine the precision of webpage lifespans for which the
appropriate scale of measurement likewise appears to be days, weeks, or months.

What this all means for calculations of the average longevity of a webpage is
that, while Internet Archive’s estimates may be the best available, there are
key limitations and caveats behind any of the numbers proffered to date.
Unfortunately, it’s unlikely that we’ll have objective measurements better than
the gross methodologies permitted by automated link checking any time soon.


CATEGORIES

 * Digital Content




COMMENTS (8)

 1. Leslie Johnston says:
    November 8, 2011 at 3:42 pm
    
    The Chesapeake Digital Preservation Group has an interesting 2011 study on
    “link rot” in online legal resources:
    http://cdm16064.contentdm.oclc.org/cdm/compoundobject/collection/p266901coll4/id/3505

    
 2. Ido Peled says:
    November 8, 2011 at 4:41 pm
    
    Also interesting would be to calculate the median lifespan of a webpage and
    see how long the tail is… Are there any stats about that ?

    
 3. Randy H says:
    November 9, 2011 at 2:07 pm
    
    For these very reasons, for ten years, I manually created and painstakingly
    kept up-to-date a collection page of links and documents in a specific field
    because information was so fluid and people in the field needed to see the
    yearly change in documents, regulations and such. Fully 1/2 of my links were
    eventually to the Internet Archive and its copies as things disappeared.
    Then sone of the authoring organizations realized the old copies were
    sticking around but they did not want that history available (presumably to
    hide the fact they were changing things without notice or public
    disclosure). So they started forcing the removal of PDF and similar
    documents off the Archive and then eventually my site. So sad as even
    history for an educational understanding is now lost and been reshaped. I
    liken it to an equivalent of a book burning of old as it truly can become a
    new form of deliberate censorship and rewriting of history as well.

    
 4. Nicholas Taylor says:
    November 15, 2011 at 8:50 am
    
    @Ido Peled: The academic studies that I’ve seen on the topic of link rot
    have been concerned with the persistence of cited urls in published
    research. It’s possible that some of these may offer median webpage
    lifespans, albeit for a narrower link corpus. If you wanted to explore this,
    I’d recommend searching for publications that cite the foundational study by
    Steve Lawrence et al. (2001), “Persistence of Web References in Scientific
    Research.” I’m not aware of any statistics on the median lifespan of
    webpages generally, though.

    
 5. Rod says:
    March 7, 2013 at 10:16 pm
    
    Thanks to Mike Ashenfelder providing analysis on Nov 8, 2011, by Nicholas
    Taylor and others who comment on short lifespan of websites, including
    ephemeral links to sources, most recently in 2003, about 100 days.
    
    http://blogs.loc.gov/digitalpreservation/2011/11/the-average-lifespan-of-a-webpage/
    
    The Internet enables saving lives, time and money with traceability to
    original sources essential for performing daily work quickly and accurately.
    Low durability significantly undermines this benefit. One approach is to
    incorporate sources into a web data base that is locally maintained and cite
    it there with reference to the original source, so that content can
    consistently be reviewed quickly when needed.

    
 6. Someone says:
    May 5, 2021 at 10:40 am
    
    Is this still true nowadays?

    
 7. meow says:
    May 20, 2021 at 3:31 am
    
    did this webpage already die?

    
 8. Elizabeth Pamela Walker says:
    May 15, 2022 at 10:25 am
    
    Thanks x

    

See All Comments


ADD A COMMENT CANCEL REPLY

This blog is governed by the general rules of respectful civil discourse. You
are fully responsible for everything that you post. The content of all comments
is released into the public domain unless clearly stated otherwise. The Library
of Congress does not control the content posted. Nevertheless, the Library of
Congress may monitor any user-generated content as it chooses and reserves the
right to remove content for any reason whatever, without consent. Gratuitous
links to sites are viewed as spam and may result in removed comments. We further
reserve the right, in our sole discretion, to remove a user's privilege to post
content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.

Name (no commercial URLs) *
Email (will not be published) *
Comment:





Δ

 * Previous post: Preserving Business History
 * Blog Home
 * Next post: Profile: The National Library of New Zealand

Back to top


DISCLAIMER & POLICIES

These blogs are governed by the general rules of respectful civil discourse. By
commenting on our blogs, you are fully responsible for everything that you post.
The content of all comments is released into the public domain unless clearly
stated otherwise. The Library of Congress does not control the content posted.
Nevertheless, the Library of Congress may monitor any user-generated content as
it chooses and reserves the right to remove content for any reason whatever,
without consent. Gratuitous links to sites are viewed as spam and may result in
removed comments. We further reserve the right, in our sole discretion, to
remove a user's privilege to post content on the Library site. Read our Comment
and Posting Policy.

Links to external Internet sites on Library of Congress Web pages do not
constitute the Library's endorsement of the content of their Web sites or of
their policies or products. Please read our Standard Disclaimer.

 * Please read our Standard Disclaimer.
 * Please read our Comment & Posting Policy.

--------------------------------------------------------------------------------

 * Connect with the Library
 * Visit the Library of Congress Website


FIND US ON

 * Email
 * 
 * 
 * 
 * 
 * 
 * 
 * 


SUBSCRIBE & COMMENT

 * RSS & E-Mail
 * Blogs


DOWNLOAD & PLAY

 * Podcasts
 * Webcasts
 * iTunesU External


QUESTIONS

 * Ask a Librarian
 * Contact Us

 * About
 * Press
 * Careers
 * Donate
 * Inspector General
 * Legal
 * Accessibility
 * External Link Disclaimer
 * USA.gov

Opens in a new window