The Web Preservation Society
Published: March 20, 2012
Author: Todd Mintz
“She walks to work but she’s still in a daze,
She’s Rita Hayworth or Doris Day,
And Errol Flynn’s gonna take her away,
To Oklahoma, U.S.A.”…The Kinks
Anyone who has seen Martin Scorcese’s “Hugo” has witnessed one of the most compelling arguments in favor of preserving not only the history of film, but the history of all forms of cultural expression most certainly including (in my opinion) unique and relevant content residing on the World Wide Web.
One might think that the advent of Google means that we’re creating a permanent accessible record of mankind’s digital history. There might be some truth to this since we don’t have any idea of how much of the past web that Google has stored on its servers. However much that is, it’s locked away from public access.
If you’re looking for history from a particular site, you might be able to get something from The Wayback Machine. However, the coverage is sporadic and infrequent, and the depth of crawls on particular sites is a very big question mark. Keyword search in The Wayback Machine? Fuhgeddaboudit.
Google does a wonderful job crawling and categorizing the active web, and over the last couple of years, they’ve made great strides in being able to index and make searchable a diverse array of content. However, the only content in Google’s index (apart from some scanned newspapers and some other miscellaneous products) is contained on pages actively hosted and crawlable by Googlebot. If the site goes offline and/or the crawl path gets broken, Google delists the content no matter its intrinsic value.
Now, if the website owner explicitly wants the content kept out of Google forever, he/she should have that right. Also, obvious spam pages, out-of-date product pages, and pages that lack uniqueness shouldn’t be preserved for public access.
However, why should unique, valuable content drop out of the SERPS just because it quit being hosted live? It should not only be accessible via Google cache but, depending on Google’s SERP ranking factors, be accessible and even rank in search (though perhaps not for competitive terms).
After all, what is Google’s Mission?
Organize the world’s information and make it universally accessible and useful.
Would opening the vaults of delisted web pages further Google’s Mission? Yes.
Would opening the vaults help serve the needs of the user? Yes.
Can Google afford to do this? Yes.
Can Google monetize the process to offset its additional cost? Yes…via AdWords / Google Display Network.
Would this “be evil” in any way, shape or form? I certainly don’t think so. I’m sure some of the “buried content” should stay buried for one reason or another, but there can be procedures in place to account for that.
Having all this “lost information” accessible would serve the public and private good in so many different ways. Yes, even the most rudimentary, mundane original Geocities pages have intrinsic value to the their creators and those who knew them…and there are probably works of genius and cultural significance lost just because someone didn’t renew their domain name or pay their web hosting bill.
A couple of years ago, I wrote the following blog post after reading in my college alumni magazine about the death of someone with whom I lived in a college dorm for two years. I bemoaned the fact that since he wasn’t the sort of “digital native” that I am, so very little existed on the web that showed to the public that he ever existed. When I do the search for his name now, I see even less…just one reference to his obituary and his old LinkedIn page. It won’t be long before his life will be consigned to digital oblivion in the eyes of “Public Google.” I don’t know what additional traces of his existence reside on Google’s servers, but I think that for those who knew him, as well as others no longer with us, Google should give the dearly departed the most complete digital eternity available.
– Todd Mintz, Senior SEM Manager