Tag Archives: Wayback Machine

The 20th Century Time Machine

by Nancy Watzman & Katie Dahl

Jason Scott

With the turn of a dial, some flashing lights, and the requisite puff of fog, emcees Tracey Jaquith, TV Architect, and Jason Scott, Free Range Archivist, cranked up the Internet Archive 20th Century Time Machine on stage before a packed house at the Internet Archive’s annual party on October 11.

Eureka! The cardboard contraption worked! The year was 1912, and out stepped Alexis Rossi, director of Media and Access, her hat adorned with a 78rpm record.

1912

D’Anna Alexander (center) with her mother (right) and grandmother (left).

“Close your eyes and listen,” Rossi asked the audience. And then, out of the speakers floated the scratchy sounds of Billy Murray singing “Low Bridge, Everybody Down” written by Thomas S. Allen. From 1898 to the 1950s, some three million recordings of about three minutes each were made on 78rpm discs. But these discs are now brittle, the music stored on them precious. The Internet Archive is working with partners on the Great 78 Project to store these recordings digitally, so that we and future generations can enjoy them and reflect on our music history. New collections include the Tina Argumedo and Lucrecia Hug 78rpm Collection of dance music collected in Argentina in the mid-1930s.

1927

Next to emerge from the Time Machine was David Leonard, president of the Boston Public Library, which was the first free, municipal library founded in the United States. The mission was and remains bold: make knowledge available to everyone. Knowledge shouldn’t be hidden behind paywalls, restricted to the wealthy but rather should operate under the principle of open access as public good, he explained. Leonard announced that the Boston Public Library would join the Internet Archive’s Great 78 Project, by authorizing the transfer of 200,000 individual 78s and LPs to preserve and make accessible to the public, “a collection that otherwise would remain in storage unavailable to anyone.”

David Leonard and Brewster Kahle

Brewster Kahle, founder and Digital Librarian of the Internet Archive, then came through the time machine to present the Internet Archive Hero Award to Leonard. “I am inspired every time I go through the doors,” said Kahle of the library, noting that the Boston Public Library was the first to digitize not just a presidential library, of John Quincy Adams, but also modern books.  Leonard was presented with a tablet imprinted with the Boston Public Library homepage by Internet Archive 2017 Artist in Residence, Jeremiah Jenkins.

1942

Kahle then set the Time Machine to 1942 to explain another new Internet Archive initiative: liberating books published between 1923 to 1941. Working with Elizabeth Townsend Gard, a copyright scholar at Tulane University, the Internet Archive is liberating these books under a little known, and perhaps never used, provision of US copyright law, Section 108h, which allows libraries to scan and make available materials published 1923 to 1941 if they are not being actively sold. The name of the new collection: the Sony Bono Memorial Collection, named for the now deceased congressman and former representative who led the passage of the Copyright Term Extension Act of 1998, which included the 108h provision as a “gift” to libraries.

One of these books includes “Your Life,” a tome written by Kahle’s grandfather, Douglas E. Lurton, a “guide to a desirable living.” “I have one copy of this book and two sons. According to the law, I can’t make one copy and give it to the other son. But now it’s available,” Kahle explained.

1944

Sab Masada

The Time Machine cranked to 1944, out came Rick Prelinger, Internet Archive Board member, archivist, and filmmaker. Prelinger introduced a new addition to the Internet Archive’s film collection: long-forgotten footage of an Arkansas Japanese internment camp from 1944.  As the film played on the screen, Prelinger welcomed Sab Masada, 87, who lived at this very camp as a 12-year-old.

Masada talked about his experience at the camp and why it is important for people today to remember it. “Since the election I’ve heard echoes of what I heard in 1942,” Masada said. “Using fear of terrorism to target the Muslims and people south of the border.”

1972

Next to speak was Wendy Hanamura, the director of partnerships. Hanamura explained how as a sixth grader she discovered a book at the library, Executive Order 9066, published in 1972, which chronicled photos of Japanese internment camps during World War II.

“Before I was an internet archivist, I was a daughter and granddaughter of American citizens who were locked up behind barbed wire in the same kind of camps that incarcerated Sab,” said Hanamura. That one book – now out of print – helped her understand what had happened to her family.

Inspired by making it to the semi-final round of the MacArthur 100&Change initiative with a proposal that provides libraries and learners with free digital access to four million books, the Internet Archive is forging ahead with plans, despite not winning the $100 million grant. Among the books the Internet Archive is making available: Executive Order 9066.

1985

The year display turned to 1985, Jason Scott reappeared on stage, explaining his role as a software curator. New this year to the Internet Archive are collections of early Apple software, he explained, with browser emulation allowing the user to experience just what it was like to fire up a Macintosh computer back in its hay day. This includes a collection of the then wildly popular “HyperCards,” a programmatic tool that enabled users to create programs that linked materials in creative ways, before the rise of the world wide web.

1997

After Vinay Goelthis tour through the 20th century, the Time Machine was set to 1997. Mark Graham, Director of the Wayback Machine and Vinay Goel, Senior Data Engineer, stepped on stage. Back in 1997, when the Wayback Machine began archiving websites on the still new World Wide Web, the entire thing amounted to 2.2 terabytes of data. Now the Wayback Machine contains 20 petabytes. Graham explained how the Wayback Machine is preserving tweets, government websites, and other materials that could otherwise vanish. One example: this report from The Rachel Maddow Show, which aired on December 16, 2016, about Michael Flynn, then slated to become National Security Advisor. Flynn deleted a tweet he had made linking to a falsified story about Hillary Clinton, but the Internet Archive saved it through the Wayback Machine.

Goel took the microphone to announce new improvements to Wayback Machine Search 2.0. Now it’s possible to search for keywords, such as “climate change,” and find not just web pages from a particular time period mentioning these words, but also different format types — such as images, pdfs, or yes, even an old Internet Archive favorite, animated gifs from the now-defunct GeoCities–including snow globes!

Thanks to all who came out to celebrate with the Internet Archive staff and volunteers, or watched online. Please join our efforts to provide Universal Access to All Knowledge, whatever century it is from.

Editor’s Note, 10/16/17: Watch the full event https://archive.org/details/youtube-j1eYfT1r0Tc  

 

TV News Record: Wayback Machine saves deleted prez tweets

A weekly round up on what’s happening and what we’re seeing at the TV News Archive by Katie Dahl and Nancy Watzman. Additional research by Robin Chin.

In this week’s TV News Archive roundup, we explain how presidential tweets are forever, show how different TV cable news networks summarized NFL protests via Third Eye chyron data, and present FiveThirtyEight’s analysis of hurricane coverage (hint: Puerto Rico got less attention.)

Wayback Machine preserved deleted prez tweets; PolitiFact fact-checks legality of prez tweet deletions (murky)

The Internet Archive’s Wayback Machine has preserved President Donald Trump’s deleted tweets praising failed GOP Alabama U.S. Senate candidate Luther Strange following his defeat by Roy Moore on September 26. So does the Pulitzer Prize-winning investigative journalism site ProPublica, through its Politwoops project.

The story of Trump’s deleted tweets about Strange was reported far and wide, including this segment on MSNBC’s “Deadline Whitehouse” that aired on September 27.

In a fact-check on the legality of a president deleting tweets, linked in the TV News Archive clip above, John Kruzel, reports for PolitiFact that the law is murky but still being fleshed out:

Experts were split over how much enforcement power courts have in the arena of presidential record-keeping, though most seemed to agree the president has the upper hand.

“One of the problems with the Presidential Records Act is that it does not have a lot of teeth,” said Douglas Cox, a professor at the City University of New York School of Law. “The courts have held that the president has wide and almost unreviewable discretion to interpret the Presidential Records Act.”

That said, many of the experts we spoke to are closely monitoring how the court responds to the litigation around Trump administration record-keeping.

He also provides background on that litigation, a lawsuit brought by Citizens for Responsibility and Ethics in Washington. The case is broadly about requirements for preserving presidential records, and a previous set of deleted presidential tweets is a part of it.

Fact Check: NFL attendance and ratings are way down because people love their country (Mostly false)

Speaking of Trump’s tweets, the president ignited an explosion of coverage with an early morning tweet on Sunday, Sept. 24, ahead of a long day of football games: “NFL attendance and ratings are WAY DOWN. Boring games yes, but many stay away because they love our country.”

Manuela Tobias of PolitiFact rated this claim as “mostly false,” reporting, “Ratings were down 8 percent in 2016, but experts said the drop was modest and in line with general ratings for the sports industry. The NFL remains the most watched televised sports event in the United States.” “As for political motivation, there’s little evidence to suggest people are boycotting the NFL. Most of the professional sports franchises are dealing with declines in popularity.”

How did different cable TV news networks cover the NFL protests?

We first used the Television Explorer tool to see where there was a spike in the use of the word “NFL” near the word “Trump.” It looked like Sunday showed the most use of these words. After a  closer look, we saw MSNBC, Fox News, and CNN all showed highest mentions of these terms around 2 pm Pacific.

Spike at 2 pm (PST) for CNN, MSNBC, and CNN

Then we downloaded data from the new Third Eye project, which turns TV News chyrons into data, filtering for that date and hour. We were able to see how the three cable news networks were summarizing the news at that particular point in time.

At about 2:02, CNN broadcast this chyron“NFL teams kneel, link arms in defiance of Trump.”

Screen grab of chyron caught by Third Eye from 2:02 pm 9/24/17 on CNN

Fox News chose the following, also seen below tweeted from one of the Third Eye twitter bots: “Some NFL owners criticize Trump’s statements on player protests, link arms with players”

Meanwhile, MSNBC chose a different message “Taking a knee: NFL teams send a message.”

Screen grab of chyron caught by Third Eye from 2:02 pm 9/24/17 on MSNBC

About eight minutes later, all three cable channels were still reporting on the NFL protests:

Puerto Rico’s hurricane Maria got less media attention than hurricanes Harvey & Irma

Writing for FiveThirtyEight.com, Dhrumil Mehta demonstrated that both online news sites and TV news broadcasters paid less attention to Puerto Rico’s hurricane Marie than to hurricanes Harvey and Irma, which hit mainland U.S. primarily in Texas and Florida. Mehta used TV News Archive data via Television Explorer, as well as data from Media Cloud on online news coverage, to help make his case:

While Puerto Rico suffers after Hurricane Maria, much of the U.S. media (FiveThirtyEight not excepted) has been occupied with other things: a health care bill that failed to pass, a primary election in Alabama, and a spat between the president and sports players, just to name a few. Last Sunday alone, after President Trump’s tweets about the NFL, the phrase “national anthem” was said in more sentences on TV news than “Puerto Rico” and “Hurricane Maria” combined.

To receive the TV News Archive’s email newsletter, subscribe here.

 

 

Internet Archive to help First Draft News debunk fake news

We are delighted to announce a new partnership with First Draft News, a nonpartisan organization dedicated to ferreting out misinformation online.

In its short existence–it was founded in June 2015–First Draft News has already spearheaded innovative projects that bring together news organizations, social technology companies, and human rights organizations to verify the information that flows to online audiences. First Draft also helps define the problem: in February, Claire Wardle, the group’s research director, published a helpful taxonomy of the different types of fake news and misinformation that proliferate online.

Example: with French elections fast approaching on April 23, 2017, First Draft News launched CrossCheck, a project combining the efforts of more than 37 newsroom partners, as well as journalism students across France and beyond. They’ve been working together to debunk false rumors and news reports in a much-watched contest pitting the far-right National Front leader Marine Le Pen against centrist Emmanuel Macron, defender of the European Union, as well as other candidates.

This partnership has quashed reports that 30 percent of Macron’s campaign funding comes from Saudi Arabia, that France is spending 100 million euros to buy hotels to house immigrants, and that the country is planning to replace Christian public holidays with Muslim and Jewish holidays, plus many more. These false stories had been shared thousands of times on social media.

When the elections are over, First Draft News will research whether CrossCheck’s efforts were effective, or how they may be modified to become more so. “CrossCheck is a living laboratory,” says Aimee Rinehart, manager of First Draft’s Partner Network. Wardle will lead the efforts to determine whether the CrossCheck model, where several news organizations sign off on a fact-check or verification, builds public trust in the media, an increasing problem worldwide.

Already, First Draft News partners rely heavily on the Internet Archive’s Wayback Machine to verify information online. With our new collaboration, we hope to increase use of other Internet Archive resources, including our searchable collection of TV news and curated archives such as the Trump Archive, with its linked fact-checks by national fact checking organizations. We also hope the collaboration provides valuable input for our plans to apply more tools of machine learning to the TV News Archive that could help inform reliable news reporting in the future.

Preserving U.S. Government Websites and Data as the Obama Term Ends

Long before the 2016 Presidential election cycle librarians have understood this often-overlooked fact: vast amounts of government data and digital information are at risk of vanishing when a presidential term ends and administrations change.  For example, 83% of .gov pdf’s disappeared between 2008 and 2012.

That is why the Internet Archive, along with partners from the Library of Congress, University of North Texas, George Washington University, Stanford University, California Digital Library, and other public and private libraries, are hard at work on the End of Term Web Archive, a wide-ranging effort to preserve the entirety of the federal government web presence, especially the .gov and .mil domains, along with federal websites on other domains and official government social media accounts.

While not the only project the Internet Archive is doing to preserve government websites, ftp sites, and databases at this time, the End of Term Web Archive is a far reaching one.

The Internet Archive is collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. The effort is likely to preserve hundreds of millions of individual government webpages and data and could end up totaling well over 100 terabytes of data of archived materials. Over its full history of web archiving, the Internet Archive has preserved over 3.5 billion URLs from the .gov domain including over 45 million PDFs.

This end-of-term collection builds on similar initiatives in 2008 and 2012 by original partners Internet Archive, Library of Congress, University of North Texas, and California Digital Library to document the “gov web,” which has no mandated, domain-wide single custodian. For instance, here is the National Institute of Literacy (NIFL) website in 2008. The domain went offline in 2011. Similarly, the Sustainable Development Indicators (SDI) site was later taken down. Other websites, such as invasivespecies.gov were later folded into larger agency domains. Every web page archived is accessible through the Wayback Machine and past and current End of Term specific collections are full-text searchable through the main End of Term portal. We have also worked with additional partners to provide access to the full data for use in data-mining research and projects.

The project has received considerable press attention this year, with related stories in The New York Times, Politico, The Washington Post, Library Journal, Motherboard, and others.

“No single government entity is responsible for archiving the entire federal government’s web presence,” explained Jefferson Bailey, the Internet Archive’s Director of Web Archiving.  “Web data is already highly ephemeral and websites without a mandated custodian are even more imperiled. These sites include significant amounts of publicly-funded federal research, data, projects, and reporting that may only exist or be published on the web. This is tremendously important historical information. It also creates an amazing opportunity for libraries and archives to join forces and resources and collaborate to archive and provide permanent access to this material.”

This year has also seen a significant increase in citizen and librarian driven “hackathons” and “nomination-a-thons” where subject experts and concerned information professionals crowdsource lists of high-value or endangered websites for the End of Term archiving partners to crawl. Librarian groups in New York City are holding nomination events to make sure important sites are preserved. And universities such as  The University of Toronto are holding events for “guerrilla archiving” focused specifically on preserving climate related data.

We need your help too! You can use the End of Term Nomination Tool to nominate any .gov or government website or social media site and it will be archived by the project team.   If you have other ideas, please comment here or send ideas to info@archive.org.   And you can also help by donating to the Internet Archive to help our continued mission to provide “Universal Access to All Knowledge.”

Blacked Out Government Websites Available Through Wayback Machine

 

(from the Internet Archive’s Archive-it group: Announcing the first ever Archive-It US Government Shutdown Notice Awards!  )

Congress has caused the U.S. federal government to shut down and important websites have gone dark.  Fortunately, we have the Wayback Machine to help.

Many government sites are displaying messages saying that they are not being updated or maintained during the government shut down, but the following sites are some who have completely shut their doors today.  Clicking the logos will take you to a Wayback Machine archived capture of the site.    Please donate to help us keep the government websites available.  You can also suggest pages for us to archive so that we can document the shut down.

noaa.gov
National Oceanic and Atmospheric Administration
noaa.gov
parkservice
National Park Service
nps.gov
 LOClogo3
Library of Congress
loc.gov
 NSF_Logo
National Science Foundation
nsf.gov
 fcc-logo
Federal Communication Commission
fcc.gov
 CensusBureauSeal
Bureau of the Census
census.gov
 usdalogo
U.S. Department of Agriculture
usda.gov
usgs
United States Geological Survey
usgs.gov
usitc
U.S. International Trade Commission
usitc.gov
 FTC-logo
Federal Trade Commission
ftc.gov
NASA_LOGO
National Aeronautics and Space Administration
nasa.gov
trade.gov
International Trade Administration
trade.gov
Corporation_for_National_and_Community_Service
Corporation for National and Community Service
nationalservice.gov

 

A New Kind of Datacenter

A note from Internet Archive’s founder, Brewster Kahle:

Today (March 25, 2009) the Internet Archive and Sun Microsystems are launching a new datacenter that stores the whole web archive and serves the Wayback Machine.

And, it is a modular datacenter that sits outside in a shipping container. This 3Petabyte (3 million gigabyte) datacenter will handle the 500 requests per second as it takes over the full Wayback load.

Thank you to Sun and Internet Archive staff that helped conceive and build this new perspective on long term active archiving.

In the press:
Sun Microsystems
Slashdot
Metafilter
San Francisco Chronicle
Computerworld
Good Morning Silicon Valley

Bookmark and Share