Hard-to-Find Documents

The World Wide Web is a fantastic resource for fast and easy access to government documents, company information, news, and a plethora of other resources. As an electronic medium, however, it tends to be somewhat ephemeral. Occasionally documents are removed for suspect reasons, e.g., government documents removed for ideological reasons, or corporate documents removed because of public relations issues. However, the majority of missing or deleted documents are removed simply because they are not new or current, or they are not deemed important by whoever put them on the web. The entity, whether governmental or corporate, which posted the documents does not necessarily have the same interest in retaining archival material that librarians do.

Following are selected links to sources of deleted, removed, or outdated web content. None of these function as a comprehensive source, and some have specific agendas in the content that they make available. Taken together, however, they are useful for retrieving some of this “lost content”.

ALA Government Documents Roundtable
http://sunsite.berkeley.edu/GODORT/

This website, while it does not have archived documents itself, is a good resource for tracking U.S. government material that has been removed from the web. The current news column shows stories on recent removals, and other issues related to dissemination of government information. There is also a chronology which shows, by date, removal of web-based government data, along with summaries of the reasons why and citations to stories discussing the removal (http://www2.library.unr.edu/dept/bgic/Duncan/PPBmillerchronchart.doc); currently, the chronology has not advanced from October, 2002.

Fugitive and Electronic-Only Federal Documents
http://fugitive.lawlib.asu.edu/

“The American Association of Law Libraries’ Government Documents Special Interest Section’s Fugitive & Electronic Only Documents Committee identifies fugitive electronic U.S. federal documents on law and policy; reports these documents to the U.S. Government Printing Office for cataloging and preservation; notifies the documents community about these documents; and facilitates hard copy publication of some of these documents.” This page provides links to the documents that the Committee has identified as being only available in electronic form, or as not being available through the Depository Library program. The majority of the documents listed are government reports of various types, so this can be a good place to look for these.

Google
http://www.google.com/

“Google takes a snapshot of each page examined as it crawls the web and caches these as a back-up in case the original page is unavailable.” If a link from a Google search result is dead, click on the link labeled “cache”, and you will get a snapshot of the page as it existed when Google indexed it. Also, for researching a dead link, you can enter a Google search as “cache:www.xyz.com”, to pull up the cached image, if available. As with the Wayback Machine, image content is frequently missing. Also, some sites have specifically requested that Google not cache their content.

The CyberCemetery –
http://govinfo.library.unt.edu/

“The University of North Texas Libraries and the U.S. Government Printing Office, as part of the Federal Depository Library Program, created a partnership to provide permanent public access to the electronic Web sites and publications of defunct U.S. government agencies and commissions.” The CyberCemetery makes archived copies of the complete websites of defunct Federal government entities available. The collection is not searchable, but is indexed both by the name of the agency and by subject.

The Internet Archive Wayback Machine
http://www.archive.org/

“The Internet Archive is a 501(c)(3) public nonprofit that was founded to build an ‘Internet library,’ with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.” The Internet Archive is a massive collection of archived web pages, complete with internal links within websites. While the size of the collection currently precludes free-text searching, by means of the “Wayback Machine”, you can enter a URL, and get archived content from that website, potentially back to 1996. Searching on a URL produces a list of dated links to archival content. One limitation is that the archived pages frequently lack images; only text is archived. Another is that the Archive only contains material older than six months.

The Memory Hole
http://www.thememoryhole.org/

“The Memory Hole exists to preserve and spread material that is in danger of being lost, is hard to find, or is not widely known….The emphasis is on material that exposes things that we’re not supposed to know (or that we’re supposed to forget).” The Memory Hole has, from time to time, archived documents that have been removed from Federal government websites. As a caveat, however, this is a one-man operation, and the one man has a definite bias against the current Federal administration, which informs the selection process.

The Smoking Gun
http://www.thesmokinggun.com/

“The Smoking Gun brings you exclusive documents–cool, confidential, quirky–that can’t be found elsewhere on the Web.” The Smoking Gun tends to focus on popular scandals, particularly involving celebrities. They do publish court documents, though, and other pertinent, embarrassing documents (for example, Enron’s internal ethics manual). The Smoking Gun is owned by Court TV.

Compiled by Allen Rines of Foley Hoag LLP in 2/03.

preload preload preload