
If you’re old enough, like I am, you might remember the cartoon WABAC Machine from “The Rocky and Bullwinkle Show.” That was way before the Internet. Today, there’s an amazing, real Wayback Machine online. I recently spoke to the project’s director, Mark Graham.
The original WABAC Machine was a cartoon way for Mr. Peabody and Sherman to go back in time. The online Wayback Machine doesn’t exactly do *that*— but some ways, it’s almost as cool.
Sharyl: In simple terms, if you can, explain what the Wayback Machine is.
Mark Graham: The Wayback Machine is the history of the web. So it's a service that for the last 21 years has been backing up large chunks of the public Web.
Sharyl: It enables people to do what?
Mark Graham: Oh, to go back in time. To be able to look at the content associated with web pages as it's changed, as maybe those web pages have gone away. Maybe they're not available anymore on the live Web.
Sharyl: Who thought of this idea and why?
Mark Graham: So, a guy named Brewster Kahle thought of it. Brewster was a pioneer of the early part of the Internet.
Sharyl: This is a tall order, but can you explain in simple terms, technically, how this works?
Mark Graham: Sure. First of all, there's lots of computers involved. It's no longer just one computer. Literally, thousands of computers, and lots of software, that has been written and evolved over the decades now. What one does, you start with a list of websites or URLs. The software will go to each of those URLs, will look at what's on those pages, will capture all those resources, all those pictures, all those images, videos, CSS, JavaScript, HTML files, et cetera. It'll identify links on those pages, and then it'll go to those pages. It'll crawl, like a spider, to those pages, evaluate the resources on those pages, and continue on based upon whatever rules are set up.
Sharyl: I'd like to thank you for it, because I've used it quite a bit.
Mark Graham: Excellent.
Sharyl: I've found it very, very valuable in news-gathering. In one example, in 2015, I found the government had wiped some vaccine injury data statistics off of their website, and I was able to use the Wayback Machine to find them and retrieve them for a story that I was working on.
Mark Graham: That's an ongoing story, too. It's not a new story, although it's happening a lot now. It's happened a lot over time. So archiving government websites is something that we've put a lot of attention to. In fact, at the end of every administration, when there's a change in administration, you expect things are going to change. So we go in and we take extra care to archive as much of government websites as we can, about 7,000 federal government websites. The last time the administration changed, we archived about 200 terabytes of content.
Sharyl: Where are the places where the information is most at danger of disappearing?
Mark Graham: There's issues about information that's published in the first place and then not available to people, maybe within a region. For example, right now, in Turkey, you can't get access to Wikipedia. In Egypt, you can't get access to hundreds of websites. In the United States, information is removed from social media platforms. YouTube removed 8.3 million videos last year. Facebook removed 1.3 billion accounts alone. There may be very good reasons for removing those accounts or removing certain videos, violations of terms of service, et cetera, maybe in one country. In another country, the issues may be political in nature. It may be because a ruling party may not want an opposition party to have access to a platform for the people.
Sharyl: If I use the Wayback Machine, it's free to me.
Mark Graham: Sure.
Sharyl: How does it make money? How is it funded?
Mark Graham: It doesn't make money. The Wayback Machine is a service of the Internet Archive. The Internet Archive is a nonprofit. We do many things. The mission state of the Internet Archive is universal access to all knowledge, so the Web and the Wayback Machine is one piece of the work.
The Wayback Machine was launched in 2001. It’s saved more than 333 billion web pages so far. You can use it yourself anytime! To find it, just search Wayback Machine or go straight to it at: archive.org