Tag Archives: fact checks

End of Term Web Archive – Preserving the Transition of a Nation

It’s that time again. The 2024 End of Term crawl has officially begun! The End of Term Web Archive #EOTArchive hosts an initiative named the End of Term crawl to archive U.S. government websites in the .gov and .mil web domains — as well as those harder-to-find government websites hosted on .org, .edu, and other top level domains (TLDs) — as one administrative term ends and a new term begins. 

End of Term crawls have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020. The results of these efforts is preserved in the End of Term Web Archive. In total, over 500 terabytes of government websites and data have been archived through the End of Term Web Archive efforts. These archives can be searched full-text via the Internet Archive’s collections search and also downloaded as bulk data for machine-assisted analysis.

The purpose of the End of Term Web Archive is to preserve a record of government websites for historical and research purposes. It is important to capture these websites because they can provide a snapshot of government messaging before and after the transition of terms. The End of Term Web Archive preserves information that may no longer be available on the live web for open access.

The End of Term Archive is a collaborative effort by the Internet Archive along with the University of North Texas (UNT), Stanford University, Library of Congress (LC), U.S. Government Publishing Office (GPO), and National Archives and Records Administration (NARA). Past partners include the University of CA’s California Digital Library (CDL), George Washington University, and the Environmental Data and Governance Initiative (EDGI).

Four images of Whitehouse.gov captured between 2008 and 2020
Whitehouse.gov captures from: 2008 Sept. 15; 2013 Mar. 21; 2017 Feb. 3; and 2021 Feb. 25

We are committed to preserving a record of U.S. government websites. But we need your help to complete the 2024 End of Term crawl. 

How can you help?! 

We have a list of top level domains from the General Services Administration (GSA) and from previous End of term crawls. But we need volunteers to help us out. We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive.

Submit a url nomination by going to digital2.library.unt.edu/nomination/eth2024/.
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible. 

Individuals and institutions nominating seed urls are recognized on the individual contributors leaderboard and the institutions leaderboard!

Explore the End of Term Web Archive with full text search and download the data!

TV News Record: State of the Union, past and future

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman. Additional research by Robin Chin.

When President Donald Trump takes the podium to deliver his first official State of the Union address to a joint session of Congress on Tuesday, January 30,  he’ll be following in the footsteps of the nation’s very first president, George Washington, long before there was cable TV or radio.

In Washington’s time, the speech was not yet known as the State of the Union, but the annual message, and according to Donald Ritchie, former U.S. Senate historian, seen here in a clip from C-Span, the practice was “to [physically] cut the State of the Union message up into paragraphs and create committees to address each one of the issues the president suggested.” There were no standing committees in Congress at that time. Now it’s fact-checkers who examine the speech, line by line, and since 2017, we’ve been annotating our TV news programs with fact-checks of Trump, top administration officials, and the four top congressional leaders, Democrat and Republican.

Make history by being a beta tester of FactStream, a new free app for iPhone or iPad, which will deliver live fact-checks of Trump’s State of the Union address from national fact-checking organizations. The app is a product of Duke Reporters Lab Tech & Check collective, of which the Internet Archive’s TV News Archive is a member. We’ll be adding the fact-checks to the TV News Archive, too.

At the TV News Archive, we’ve got historical footage of some past State of the Union addresses, listed below. Last year we annotated Trump’s address to Congress – not officially a State of the Union, since he was newly inaugurated – with fact-checks from our fact-checking partners, FactCheck.org, PolitiFact, and The Washington Post’s Fact Checker. Fact-checks are noted with a red check mark on the TV News Archive filmstrip screen.

For example, the above segment of Trump’s 2017 speech, marked with a red check mark, was fact-checked by both PolitiFact and The Washington Post’s Fact Checker. Trump said, “According to data provided by the Department of Justice, the vast majority of individuals convicted of terrorism and terrorism related offenses since 9/11 came here from outside of our country.”

PolitiFact’s Miriam Valverde rated this claim as “mostly false”: “Trump’s statement contains an element of truth but ignores critical facts that would give a different impression. We rate it Mostly False.” Michelle Ye Hee Lee, writing for The Washington Post’s Fact Checker gave the claim “four Pinocchios,” stating it relied on “a grossly exaggerated misuse of federal data.”

Past State of the Union addresses

2016: Barack Obama

2015: Barack Obama

2014: Barack Obama

2013: Barack Obama

2012: Barack Obama

2011: Barack Obama

2010: Barack Obama

1995: Bill Clinton

1988: Ronald Reagan (no closed captioning)

1980: Jimmy Carter  (no closed captioning)

1975:  Gerald Ford

1969: Lyndon Johnson

1965: Lyndon Johnson

1963: John F. Kennedy (no closed captioning)

1961: John F. Kennedy (no closed captioning)

1942: Franklin D. Roosevelt (no closed captioning)

Internet Archive to help First Draft News debunk fake news

We are delighted to announce a new partnership with First Draft News, a nonpartisan organization dedicated to ferreting out misinformation online.

In its short existence–it was founded in June 2015–First Draft News has already spearheaded innovative projects that bring together news organizations, social technology companies, and human rights organizations to verify the information that flows to online audiences. First Draft also helps define the problem: in February, Claire Wardle, the group’s research director, published a helpful taxonomy of the different types of fake news and misinformation that proliferate online.

Example: with French elections fast approaching on April 23, 2017, First Draft News launched CrossCheck, a project combining the efforts of more than 37 newsroom partners, as well as journalism students across France and beyond. They’ve been working together to debunk false rumors and news reports in a much-watched contest pitting the far-right National Front leader Marine Le Pen against centrist Emmanuel Macron, defender of the European Union, as well as other candidates.

This partnership has quashed reports that 30 percent of Macron’s campaign funding comes from Saudi Arabia, that France is spending 100 million euros to buy hotels to house immigrants, and that the country is planning to replace Christian public holidays with Muslim and Jewish holidays, plus many more. These false stories had been shared thousands of times on social media.

When the elections are over, First Draft News will research whether CrossCheck’s efforts were effective, or how they may be modified to become more so. “CrossCheck is a living laboratory,” says Aimee Rinehart, manager of First Draft’s Partner Network. Wardle will lead the efforts to determine whether the CrossCheck model, where several news organizations sign off on a fact-check or verification, builds public trust in the media, an increasing problem worldwide.

Already, First Draft News partners rely heavily on the Internet Archive’s Wayback Machine to verify information online. With our new collaboration, we hope to increase use of other Internet Archive resources, including our searchable collection of TV news and curated archives such as the Trump Archive, with its linked fact-checks by national fact checking organizations. We also hope the collaboration provides valuable input for our plans to apply more tools of machine learning to the TV News Archive that could help inform reliable news reporting in the future.

In the news: Trump Archive, end-of-term preservation, & link rot

News outlets have been getting the word out on Internet Archive efforts to preserve President-elect Donald Trump’s statements; the outgoing Obama Administration’s web page and government data; as well as preventing that nasty experience of encountering a “404” when you click on a link online, aka “link rot.”

Trump Archive 

A number of journalists have been exploring the riches contained within the newly launched Trump Archive, a TV news clips of the president-elect speaking peppered with links to more than 500 fact checks by national fact-checking groups.

Annie Wiener, writing for The New Yorker, immerses herself in Trump statements and discovers 56 mentions of the escalator in Trump tower, and that Trump:

“is a fan of the word “sleaze,” and of the phrase “tough cookie,” which he has used to describe policemen, his opponents’ political donors, Paul LePage, “real-estate guys in New York and elsewhere,” an unnamed friend who is a “great financial guy,” isis, three professional football players, Reince Priebus, Lyndon Johnson, and Trump’s father, Fred. After watching long stretches of video, she writes, “It occurred to me that spending time online in the Trump Archive could be a form of immersion therapy: a means of overcoming shock through prolonged exposure.”

Geoffrey Fowler, tech columnist for The Wall Street Journal, bemoans the lack of easy-to-use tech tools to help people be responsible citizens overall, but also notes the promise–and challenge–of a curated collection like the Trump Archive:

“The Trump Archive shows what’s hard about using tech to hold officials accountable. It’s assembled and hand-curated by humans. Yet even using the transcripts, it can be hard to tell the difference between a spoken name and a person who’s actually speaking. Archive officials say making their database applicable to hundreds or thousands more politicians would require help from tech firms with capabilities in machine learning and voice and facial recognition.”

Fowler also published this video, featuring plenty of Trump, an interview with Roger Macdonald, director of the TV News Archive; and ample footage of the Internet Archive’s San Francisco headquarters.

The Trump Archive also was featured in Marketplace Tech®, The HillForbesNewsweek, Buzzfeed News TechPlzVentureBeat, engadgetand more.

Preserving Obama Administration websites, social media

The Internet Archive’s efforts to help preserve government websites via the Wayback Machine during and after the transition has continued to garner attention. Wired reports on a group of climate scientists working against the clock to archive government websites related to global warming:

One half was setting web crawlers upon NOAA web pages that could be easily copied and sent to the Internet Archive. The other was working their way through the harder-to-crack data sets—the ones that fuel pages like the EPA’s incredibly detailed interactive map of greenhouse gas emissions, zoomable down to each high-emitting factory and power plant.

The New Scientist also writes on efforts to archive climate data:

Fears that data could be misused or altered have prompted crowd-sourcing to back up federal climate and environmental data, including Climate Mirror, a distributed volunteer effort supported by the Internet Archive and the Universities of Pennsylvania and Toronto.

The Los Angeles Times and Quartz offer reports on archiving climate data.

Internet Archive works against link rot

Tech publications were quick to inform their readers about the Internet Archive’s new chrome extension that fights link rot by directing users to archived web pages. Here is Mashable:

Now Internet Archive has built a Wayback Machine Chrome extension. It works like this: If you click on a link that would normally lead to an error page (think 404), the extension will instead give users the option to load an archived version of the page. The link is no longer simply gone.

Also writing on the fight against link rot: NetworkWorldVenture BeatThe Tech PortalBleeping Computer, and ZDNet.

 

 

 

Internet Archive’s Trump Archive launches today

The Trump Archive launches today with 700+ televised speeches, interviews, debates, and other news broadcasts related to President-elect Donald Trump, created using the Internet Archive’s TV News Archive.

A work in progress, the growing collection now includes more than 520 hours of Trump video. The earliest excerpt dates from December 2009, and the collection continues through the present. It includes more than 500 video statements fact checked by FactCheck.org, PolitiFact, and The Washington Post’s Fact Checker covering such controversial topics as immigration, Trump’s tax returns, Hillary Clinton’s emails, and health care.

Full list of fact checks with links to video statements in TV News Archive.

Visit the Trump Archive.

Reporters, researchers, Wikipedians, and the general public are invited to quote, compare and contrast televised statements made by Trump.

  • Use clips in your articles and videos.
  • Create supercuts on topics like Trump’s perspectives of the US press, made with our online “Popcorn” video editor.  
  • Let us know what content we are missing.  
  • If you have the technical resources, help us enhance search and discovery by collaborating in experiments to apply artificial intelligence-driven facial recognition, voice identification, and other video content analysis approaches.
  • How would you like to use such an archive?  Comment below, or write us info@archive.org

Why a Trump Archive?

We draw on this material, and our experience with building the successful Political TV Ad Archive, to create a curated collection of material related to Trump, with an emphasis on fact-checked statements. The video is searchable, quotable, and shareable on social media.

In response to requests by our fact checking partners on the Political TV Ad Archive project and other media, we hope to provide assistance for those tracking Trump’s evolving statements on public policy issues.

For example: in July 2016, Trump told ABC’s George Stephanopoulos, “I have no relationship with Putin…I don’t think I’ve ever met him.” Stephanopoulos pressed him on this point during the interview, saying that Trump had previously claimed a relationship with him. PolitiFact ruled this statement by Trump as a “full flip flop”: “Trump’s denial of a relationship with Putin contradicted what he had said on multiple previous occasions.”

By providing a free and enduring source for TV news broadcasts of Trump’s statements, the Internet Archive hopes to make it more efficient for the media, researchers, and the public to track Trump’s statements while fact-checking and reporting on the new administration. The Trump Archive can also serve as a rich treasure trove of video material for any creative use: comedy, art, documentaries, wherever people’s inspiration takes them.

We consider the Trump Archive to be an experimental model for creating similar archives for other public officials. For example, we’ll explore the idea of creating curated collections for Trump’s nominees to head federal agencies; members of Congress of both parties (for example, perhaps the Senate and House majority and minority leadership); Supreme Court nominees, and so on.

While we’ve largely hand-curated this collection, we hope to collaborate with researchers to apply machine intelligence to expand this collection, building others and making search of our entire TV library vastly more efficient.

Such experimentation builds on our experience with first prototyping and then developing the the Political TV Ad Archive. Our first collection of political TV ads, covering ads aired in Philadelphia during the 2014 mid-term elections, was built largely by hand. However, in preparation for the Political TV Ad Archive, we created a new open source tool, the Duplitron, that was able to identify ad airings by deploying audio fingerprinting. During the course of the project, we collected nearly 3,000 ads and documented more than 364,000 ad airings.

Why now?

Just because something is broadcast or posted on the internet doesn’t mean it’s forever. Reporters and the public may take it for granted that a news story or a piece of broadcast video is only a google search away, but as newspapers, companies, and organizations fail and change, often vital information is lost. The web is far more fragile than is generally understood.

The Internet Archive’s core mission is to preserve and make accessible our cultural heritage. For example, the Wayback Machine preserves websites over time, so if pages or sites are deleted, they can still be found. For example, Rachel Maddow of MSNBC reported on how the president-elect had deleted a web page from the official transition website that had touted Trump properties.

We also preserve political and news content through the TV News Archive, which contains news broadcasts by major networks back to 2009, searchable via closed captioning. The Political TV Ad Archive archives 2016 election ads along with relevant fact checks and follow-the-money reporting by our journalism partners. Our Political Campaign web archive is preserving election-related online media, such as select candidate and political groups’ websites and Twitter and Instagram feeds.

What’s next

The Trump Archive is a work in progress; we will continue to refine the content. We hope to work with others to broaden the materials available, to make search more efficient, and otherwise make it more useful for the public. We’d like you feedback and suggestions.

The great American author William Faulkner wrote, “The past is never dead. It’s not even past.” We believe that the Trump Archive, in preserving the past, can help the public engage more knowledgeably with our future.

Many thanks to the thoughtful contributions of Robin Chin, Jessica Clark, Katie Dahl, Katie Donnelly, John Gonzalez, Wendy Hanamura, Tracey Jaquith, Jeff Kaplan, Roger Macdonald, Ralf Muehlen, Craig Newmark, Sylvia Paull, Alexis Rossi, Dan Schultz, Nancy Watzman, our Partners & Funders and the Vanderbilt Television News Archive – on whose shoulders we stand.