TV news fact-checked: Ivanka, McCaskill, Mnuchin, Perez, and Mulvaney

By Katie Dahl

This week’s roundup includes five fact-checks of statements by public officials, preserved on TV News Archive. Our fact-checking partners examined the financial disclosures of the president, who outspent whom in the Georgia election, and whether high ranking Democrats voted for a border wall a decade ago.

Claim: Child care is the largest expense in more than half of American households (mostly false)

Ivanka Trump, first daughter and adviser to President Donald Trump, participated in a panel in Berlin with German President Angela Merkel. While there, she said the “single largest expense in over half of American households is childcare, even exceeding the cost of housing.”

“Child Care Aware, a trade and advocacy group, found that it cost on average over $17,000 a year for infant day care in Massachusetts,” reported Jon Greenberg for PolitiFact. “The question is, does paying for child care top all the other expenses that half of the households have to cover, such as housing and food?…Government data suggests it does not. For most families, the No. 1 cost is housing.” The Washington Post‘s Fact Checker, Glenn Kessler, also reported,”[S]he would have been on more solid ground if she had focused on low-income households or families with small children, not all households.”

Claim: Nobody applies to the U.S. for refugee status. They apply to the U.N. (false)

“Nobody applies to the United States for refugee status. They apply to the United Nations,” said Sen. Claire McCaskill, D., Mo., during a January senate hearing for Rex Tillerson’s nomination as Secretary of State.

PolitiFact Missouri reporter Aleissa Bleyl reported this week that “about 20 percent to 30 percent of resettlement cases are handled by the United States and not the U.N… Overall, most refugees seeking resettlement to the United States must first go through the United Nations High Commissioner for Refugees. However, refugees with nuclear family members already living in the United States are given a different priority that isn’t processed through the United Nations.”

Claim: Trump has given more financial disclosure than anybody else (false)

After receiving a question about whether the president would release his tax returns, Treasury Secretary Steve Mnuchin said, “The president has no intention… The president has released plenty of information and, I think, has given more financial disclosure than anybody else. I think the American population has plenty of information.”

Allison Graves and Louis Jacobson rated Mnuchin’s statement as “False,” reporting for PolitiFact, “Trump released a financial disclosure report that all presidential candidates are required to fill out, but the fact that Trump has not released any tax filings undermines Munchin’s claim… the lack of transparency around his tax returns remains a significant omission compared with recent presidents.”

Claim: Ossof was outspent two to one in Georgia race (unsupported)

Neither candidate received enough votes to win outright in the race for Georgia’s 6th Congressional District, in a special election to replace Tom Price, who now heads the U.S. Department of Health and Human Services.  A runoff is now scheduled for June. Explaining the outcome, Democratic National Committee Chairman Tom Perez said,“By the way, Chris, he was outspent two to one. I mean Paul Ryan’s super PAC was in. They hit the panic button big-time on the Republican side.”

But, “the Federal Election Commission campaign finance records don’t support his claim that Ossoff was ‘outspent two to one.’” According to Eugene Kiely and Robert Farley at FactCheck.org, “Ossoff and the outside groups who supported him spent more than the Republican groups that opposed him.”



Claim: Obama, Schumer, and Clinton voted for a border wall in 2006 (half true)

White House budget director Mick Mulvaney recently defended proposed funding for a border wall between the United States and Mexico. “We still don’t understand why the Democrats are so wholeheartedly against it. They voted for it in 2006. Then-Sen. Obama voted for it. Sen. Schumer voted for it. Sen. Clinton voted for it,” he said.

“They did vote for the Secure Fence Act of 2006, which authorized building a fence along about 700 miles of the border between the United States and Mexico,” reported Allison Graves for PolitiFact. “Still, the fence they voted for is not as substantial as the wall Trump is proposing. Trump himself called the 2006 fence a ‘nothing wall.’”

Follow us on twitter! We’ve changed our name from @PolitAdArchive to @TVNewsArchive.


To receive the TV News Archive’s email newsletter, subscribe here.

Posted in Announcements, Television Archive | Tagged , , , , , , , , , , , , | Comments Off on TV news fact-checked: Ivanka, McCaskill, Mnuchin, Perez, and Mulvaney

CANCELED: Hitting the Wall: How the Media Shapes the Immigration Debate

We are incredibly disappointed to have to tell you all that, due to last minute unforeseen scheduling conflicts, the “Hitting The Wall” event has been cancelled. We know that many of you (us included!) were looking forward to the event and feel very passionately about this topic but circumstances beyond our control have made it necessary to cancel at this time. We appreciate your kind understanding and hope to see you at future events.

______________________________________________

How can we tell fact from fiction when it comes to a controversial topic like immigration? Join us at the Internet Archive for an evening with experienced journalists from the Center for Investigative Reporting (CIR) and Retro Report, who will work with the audience to develop strategies to fight back against propaganda and fake news.

Admission is $10 and includes tacos, beer, wine, and soda:

When: Wednesday, May 17th Doors open at 5:30 p.m. for food and drinks, and discussion starts at 7 p.m.
Where: Internet Archive
300 Funston Ave. SF, CA 94118

The program will take place in three acts.

Act 1: The Story

In Act 1, we’ll go deep on the facts and stories about immigration in the U.S.

What does the data tell us about immigration in the U.S.? Who is coming and who is going and what are the trends for both? What is the mission of the U.S. Border Patrol? What would it actually mean to build a wall along the entire U.S.-Mexico Border? What does the term “sanctuary city” mean?

Act 2: The Challenge

In Act 2, we’ll work with the audience to find practical strategies to make the public debate over immigration fact-based and productive.

The CIR and Retro Report teams will work with the audience to hone in on key questions in the immigration debate, with special attention for the points of tension in the immigration debate.  What are common misunderstandings about immigration? How and why do they emerge?

Act 3: Solutions

In Act 3, we’ll do a group brainstorm on how to burst filter bubbles and work for constructive debate and change on immigration–and other issues

With the audience, the journalists will identify practical strategies they can take back to the newsroom and share with other media when reporting on controversial issues. How can the media work directly with communities, provide trustworthy reporting on a complex issue, and help the public recognize fake news?

Get Tickets Here


Retro Report is an award-winning, digital-first documentary news organization dedicated to bringing context to today’s headlines by telling the story behind the news; it is non-partisan, independent and non-profit.  Retro Report is founded on the conviction that without an engaging and forward-looking review of high-profile events and the news coverage surrounding them, we lose a critical opportunity to understand the lessons of history.  In a culture increasingly disposed towards trending news and Twitter-sized sound bites, the importance of that mission is amplified. Retro Report has produced more than 100 short documentaries and video series and partnered with The New York Times, PBS, NBC, Politico, the Guardian, Univision and others. 


The mission of The Center for Investigative Reporting is to engage and empower the public through investigative journalism and groundbreaking storytelling in order to spark action, improve lives and protect our democracy. Founded in 1977 as the nation’s first nonprofit investigative journalism organization, we are celebrating our 40th anniversary this year. Over those four decades, we have developed a reputation for being among the most innovative, credible and relevant media organizations in the country. Reveal – our website, public radio program, podcast and social media platform – is where we publish our multiplatform work.

Posted in Event | Comments Off on CANCELED: Hitting the Wall: How the Media Shapes the Immigration Debate

Celebrate a major advance in access to knowledge in India and America — Wednesday, June 14 6PM in SF

By Carl Malamud

Please join us on June 14 at the Internet Archive for a special event celebrating our collections from India including the collected works of Mahatma Gandhi and much, much more. Our doors open at 6 p.m. with a reception and our program starts promptly at 7 p.m.

Get Free Tickets Here

Our special guest for this event will be Hon. Dr. Sam Pitroda, a former senior advisor and Cabinet Minister under 3 Prime Ministers and widely acknowledged as the father of the telecommunications revolution in India, the man who brought a telephone to every village in India. Dr. Pitroda will be joined by Hon. Ambassador Venkatesan Ashok, Consul-General of San Francisco. Rounding out our program will be Carl Malamud of Public Resource and the Internet Archive’s own Brewster Kahle.

Our event will be celebrating three collections hosted at the Internet Archive:

First, the Internet Archive is delighted to be hosting a mirror of the Digital Library of India, a collection of 463,000 books in 50 languages. The collection was created in India under government auspices and features 45,000 books in Hindi, 33,000 in Sanskrit, 30,160 in Bengali, and much more. In addition to hosting a mirror of the collection, the Internet Archive is adding value to the collection by creating e-books, using optical character recognition, and improving the metadata and cataloging information.

Second, we will feature the Hind Swaraj collection, materials that are integral to the story of Indian independence. Here you can read all 100 volumes of the Collected Works of Mahatma Gandhi, as well as the complete writings of Dr. B.R. Ambedkar and Jawaharlal Nehru. You can also listen to 129 audio recordings from All India Radio of Gandhiji speaking at prayer meetings and view all 53 episodes of the remarkable television series Bharat Ek Khoj.

Third, we will discuss additional collections of Indian materials, such as thousands of photographs from the Ministry of Information and Broadcasting and other sources which Public Resource hosts on Flickr and a collection of all technical public safety India Standards hosted on the Internet Archive and the Public Resource site.

Carl Malamud and Sam Pitroda have spent several years building out these collections. We hope you will join us at this event to hear more about how this came to pass and what the plans are for making this material ever more useful. We also hope to have some exciting announcements as well about new resources that will be available.

Universal access to knowledge is the goal of the Internet Archive. We are delighted to celebrate the immense contributions of India and the vital role both India and the United States—the world’s largest democracies—play in make knowledge available to all. Please join us on June 14!

Get Tickets Here

When: Wednesday, June 14th. Doors open at 6 p.m. for food and drinks, and program starts at 7 p.m.
Where: Internet Archive
300 Funston Ave. SF, CA 94118

Posted in Announcements, Event | Comments Off on Celebrate a major advance in access to knowledge in India and America — Wednesday, June 14 6PM in SF

TV News Lab: Hyperaudio improving TV news video captioning and sharing

In a new blog series, TV News Lab, we’ll demonstrate how the Internet Archive is partnering with technology, journalism, and academic organizations to experiment with and improve the TV News Archive, our free, public, online library of TV news shows. Here we interview Mark Boas, founder of the Hyperaudio project, an organization that works to make audio and video more accessible and shareable on the web, by providing an easy-to-use interface for copying and pasting bits of transcripts to create mash ups of shareable video. You can find the open source code powering Hyperaudio on GitHub.

Mark Boas talks to the Internet Archive about Hyperaudio.

NW: What is the problem you’re trying to solve by applying Hyperaudio technology to the TV News Archive?

MB: People find TV news credible. It’s very hard to fake TV news. I’d love to see people using TV news to back up any sort of political or other expression a public official is trying to make, by showing the source material and also the arguments about those statements. I think this also has implications for improving media literacy.

(An example mix made at Chattanooga Public Library.)

NW: What stands in the way of people sharing TV news video right now?

MB:  One of the problems is that audio and video on the web has been a black box in a way. It has not been very well integrated into the web because it’s difficult to do that. If you see a big block of text, it’s easy to highlight, copy, paste and send it off. But if you have an interesting piece of audio to share, how do you do that? There are ways to do it, but it’s not intuitive.

Coupled with that is it’s also hard to find audio on the internet. If you’re searching for search terms, you may or may not find what you want, but only if someone has added sufficient metadata so it’s discoverable. Transcripts allow you to search, but also provide a way to share. And the key to that is that you need not just the transcript, but also you need to match the words in the transcript to the proper times in the audio.

NW: Why is it hard to match the transcript to the audio in a video?

The first step is getting a good quality transcript. It’s great that the TV News Archive uses open captions, but it’s not perfect. (Note: the TV News Archive is searchable via closed captioning, but there’s often a several-second lag between the captions and the video, as well as other quality issues.) The transcript usually needs to be cleaned up. The better the transcript, the better the match. Closed captions are done in real time by humans who make mistakes.

The next challenge is to try and minimize the time it takes to match the words in the transcript to the audio. If we want to automate the process, we need to figure out how to do that more quickly. It’s very intensive on the computing side. I’m experimenting with chunking up the video to speed up the process. I think we’ll see that the matching is an exponential task: a one hour transcript might take 30 minutes and a three hour transcript might take more than three times that. But if we split it up into smaller chunks, the processing might become more efficient

How do people try out Hyperaudio?

Hyperaudio is not a commercial software as a service. It’s more of a demo of the underlying technology. We work with groups like the Studs Terkel Radio Archive (WFMT Chicago), to help them make the most of their content and data; whatever we make flows back into our open source code on GitHub. What we do is very experimental, but it will give you an idea what’s possible. If you want to experiment with TV News Archive, you can do that at http://newsarchive.hyperaud.io/. More info on our experiments and collaborations can be found on our blog.

Posted in Television Archive | Tagged , , , | Comments Off on TV News Lab: Hyperaudio improving TV news video captioning and sharing

Internet Archive wins Webby Lifetime Achievement award

Internet Archive wins Webby awardWe are honored to announce that the Internet Archive has been named as one of the 21st annual Webby Awards winners!

Hailed as one of the Internet’s highest honors, we’re excited to receive a Webby Lifetime Achievement award and join the ranks of our friends like Sir Tim Berners-Lee, Lawrence Lessig, and Vint Cerf.

Webby Lifetime Achievement: Archive.org for its commitment to making the world’s knowledge available online and preserving the history of the Internet itself. With a vast collection of digitized materials and tools like the Wayback Machine, Archive.org has become a vital resource not only to catalogue an ever-changing medium, but to safeguard a free and open Internet for everyone.

We thank the International Academy of Digital Arts and Sciences for the award and are looking forward to attending the Webby Awards in New York City on May 15.

The complete list of Webby Award winners is available here.

Posted in Announcements, News | 8 Comments

Macintosh Collection Hand-Screenshotted… Plus: HyperCard!

The Internet Archive’s emulated Early Mac collection, which was announced last week, has had all its content screenshotted by hand for maximum visual beauty and accuracy.

Normally, we utilize a set of automated scripts that do screenshotting, allowing for a large amount of uploads to be visually described, but the combination of many different permutations of where to click and which folders to open meant we weren’t getting the best shots for each item. Now, they’re doing justice to the unique and interesting early Mac experience.

Like many other cases in computer history, the seeming limitations of black-and-white-only screens on early Macintoshes gave rise to truly beautiful and complicated art, which expressed itself crisply on the 9-inch monitors.

Response to the early Macintosh collection has been resoundingly positive; thanks again to all the volunteers who helped the system work as well as it does. With 60+ titles added and more to come, this is likely to be one of our most memorable and stellar playable software collections on the Archive.

But one more thing….

Throughout the testing process and discussions about emulating Macintosh, a steady drumbeat of requests could be summarized as: “What about HyperCard?”

HyperCard, a hypertext authoring system for the Macintosh, is a legendary environment for creating “Stacks”, which were clickable cards with a wide range of options and features. It is absolutely the inspiration for what ultimately became the World Wide Web.

It was possible to write truly complicated and complete applications in HyperCard, and stacks allowing everything from reference books to games to music – whatever the authors of stacks could come with. It was particularly popular with academics and writers. A great retrospective of HyperCard at its 25th anniversary was written by Ars Technica.

So.. what about HyperCard? Yes, we have HyperCard.

The Emularity Loader utilized by the Internet Archive allows the combining of the content of two items in the Archive’s collections, meaning there can be a “general boot disk” with HyperCard, and then pulling in an uploaded Hypercard Stack.

As of this writing, we’ve added a small number of Stacks to prove the technology, including the “BeerStack” beer-reference, the Adventures of Sean (an interactive cartoon), and a re-created Stack designed by none other than Douglas Adams for calculating the volume of a Megapode nest.

Adding new stacks is relatively complicated, and we’re working on adding more from such sites as HYPERCARD.ORG who have been gathering amazing Stacks for years. If you’re someone who worked on a HyperCard stack in the past, or oversee a collection of Stacks created by others, please feel free to contact hypercard@textfiles.com to receive assistance in adding your stacks, emulated, to the Archive.

We hope this is the start of a large, quality collection of emulated programs at the Archive around the Macintosh, and thank you for spreading the word about it, and the importance of providing instant worldwide access to historical software.

A shout-out to volunteer Stephen Cole who has taken on the mantle of adding new titles to the Macintosh collection over time, including the ingestion of HyperCard stacks. 

Posted in News, Software Archive | Comments Off on Macintosh Collection Hand-Screenshotted… Plus: HyperCard!

TV news highlights: visitor logs, voter turnout, China, North Korea, United Airlines

By Katie Dahl

In this week’s look back at TV news highlights that have been fact-checked by our partners, we get a history of the White House policy on releasing visitor logs; a look at how 2016 voter turnout compared to other elections; an examination of whether United Airlines was contractually obliged against removing a ticketed passenger; a comparison of U.S. vs. China on their polluting record; and an analysis of how positively China regards Trump’s recent actions on North Korea.

Claim: In not releasing visitor logs to the public, the White House is following the same policy as every U.S. administration “from the beginning of time” except Obama (true)

Press Secretary Sean Spicer defended the White Houses’s announcement it would not be releasing names of White House visitors to the public, saying, “I think as was noted on Friday, we’re following the same policy that every administration from the beginning of time has used with respect to visitor logs.” Later in that daily press briefing, he refined his statement: “[I]t’s the same policy that every administration had up until the Obama administration.”

Former President Barack Obama’s decision to make visitor logs public was done in the name of transparency–although only after pressure from outside groups, explained Louis Jacobson for PolitiFact. But Spicer is correct that “[h]istorically speaking, the policy under Obama was the exception, rather than the rule.”

Claim: Voter turnout for the 2016 presidential race was the lowest in 20 years (false)

On CNN’s “State of the Union,” Sen. Bernie Sanders, D., Vt., talked about his national tour to increase civic participation, citing a statistic on voting turnout: “So many of our people are giving up on the political process. It is very frightening. In the last presidential election, when Trump won, we had the lowest voter turnout over — in 20 years. And in the previous two years before that, in the midterm election, we had the lowest voter turnout in 70 years.”

“In fact, turnout was higher than it was in 2012,” according to Eugene Kiely at FactCheck.org. PolitiFact’s Jacobson confirmed that Sanders was wrong on the claim that 2016 had the “lowest turnout” in the last 20 years, but correct that 2014 saw the lowest turnout in 70 years. Both reporters cited the work of Michael McDonald, a political scientist at the University of Florida.

Claim: United Airlines passenger had a right to stay on the plane (false)

After United Airlines violently removed paying passenger David Dao from a plane to make room for employees, syndicated columnist Andrew Napolitano said on Fox News: “By dislodging this passenger against his will, United violated its contractual obligation. He paid for the ticket, he bought the ticket, he passed the TSA, he was in his seat, he has every right to stay there.”

“False,” wrote Joshua Gillan forPunditFact, a project of PolitiFact. “Napolitano’s blanket assertion is incorrect. Experts told us that airlines, including United, outline dozens of reasons why they might remove a passenger after he has already boarded.”

Claim: China and India pollute more than the United States (four Pinocchios)

During a recent TV interview, Environmental Protection Agency Administrator Scott Pruitt said, “China and India had no obligations under the [Paris Accord] agreement until 2030.” He also claimed that “[Europe, China, India] are polluting way more than we are.”

“[B]oth countries pledge to reach these goals by 2030, meaning they are taking steps now to meet their commitments,” reported Glenn Kessler for The Washington Post’s Fact Checker. “China (but not India) does produce more carbon dioxide than the United States, but it has nearly 1.4 billion people compared to 325 million for the United States. So, on a per capita basis, the United States in 2015 produced more than double the carbon dioxide emissions of China — and eight times more than India.”

Claim: Trump on North Korea: “We’ve never seen such a positive response on our behalf from China” (half true)

Defending his decision to step back from labeling China a currency manipulator, President Donald Trump said he is working with China on a “bigger problem,” North Korea, and that the result is good, claiming, “[N]obody has ever seen such a positive response on our behalf from China.”

PolitiFact’s Jon Greenberg shared results of interviews with several experts as support for his rating of “half true” for the claim: “It is difficult to quantify a ‘positive response.’ Whether the latest moves represent a sea shift that ‘no one has ever seen,’ or the logical conclusion of a longer pattern, probably lies in the eye of the beholder.”

To receive the TV News Archive’s email newsletter, subscribe here.

Posted in Television Archive | Tagged , , , , , , , , , , , | Comments Off on TV news highlights: visitor logs, voter turnout, China, North Korea, United Airlines

Find O’Reilly Factor clips on TV News Archive

With yesterday’s announcement Fox News had ousted Bill O’Reilly from the helm of “The O’Reilly Factor,” following mounting complaints of sexual harassment, the pugilistic host’s reign as the “king of cable news” passes into history.

However, a good portion of that American political history is preserved for posterity as part of the TV News Archive, the Internet Archive’s searchable collection of television news. We’ve got some 3,000 hours of “The O’Reilly Factor” dating back to 2009,  including at least 20 segments that have been fact-checked by PolitiFact.

Perhaps O’Reilly described his mission best with his response to a viewer, who urged him in October 2016, “Stick to the facts, not your personal opinion.” Said O’Reilly: “The O’Reilly factor is built around my personal opinions, sir. Twenty years…thus the name: ‘The O’Reilly Factor.'”

Here are several fact-checked O’Reilly highlights from recent years:

Guns.  O’Reilly claimed that  Supreme Court nominee Merrick Garland “voted, so the folks know, in Washington, D.C., to keep guns away from private citizens.” PunditFact: “False….Garland didn’t vote on this case at all.” (March 2016.)

Crime. From 2014 to 2015, said O’Reilly in October 2015, Austin’s “murder rate is up a whopping 83 percent.” PolitiFact Texas: Mostly False. “[I]f O’Reilly had pulled back the camera, so to speak, he could have determined that Austin appears on pace to have a lower murder rate in 2015 than in 2014.”

Iran, China, and Russia. O’Reilly: Russia and China “absolutely said pretty clearly” they would not keep economic sanctions on Iran if the United States “walked away from the deal.” This time O’Reilly earned a “Mostly True,” from PolitiFact: “O’Reilly is pushing the envelope when he said “absolutely” clear, as they haven’t issued formal statements. But all of their actions indicate that what O’Reilly said is substantially accurate.”

Muslims cheering 9-11. “Thousands of Muslims, regular folks, celebrated in the streets… . these people are a minority but they were not called out in any official way by Muslim nations around the world.” PolitiFact: “Half True.” “So far as we can tell, there was no official condemnation of people celebrating the 9/11 attacks. However, Muslim governments, and religious leaders, condemned the attacks themselves, as did many average Muslims.”

There’s more! Popcorn fact-check annotation experiment

For a reel of fact-checks of O’Reilly statements over the years, check out this compilation created with a recent version of Mozilla’s Popcorn editor by TV News Archive Director Roger Macdonald.

Popcorn allows viewers to feed TV News Archive video into an editor and mix it up with other videos, add text annotations, hyperlinks, and more. We believe this is a glimpse of the future: giving people the tools to put the messages that bombard them in context, rather than being passive viewers.

Mozilla launched the innovative tool in 2012; while they no longer support it, the source code is open for others to improve. Please be patient with occasional buffering glitches.  Try clicking on some of the text for links and the orange quote icon link to citations.  And, if you want to go wild, click the arrows triangle icon and try your hand at remixing.

If impatient with problems playing the Popcorn version, here is a plain-old mp4.  No embedded links or remix options.

Posted in Television Archive | Tagged , , , , , , | Comments Off on Find O’Reilly Factor clips on TV News Archive

DRM for the Web is a Bad Idea

I asked our crawler folks what the impact of the EME proposal could be to us, and what they came back with seems well reasoned but strongly negative to our mission.

I have posted the analysis below for the public to consider.

-brewster

At your request we have assessed what the possible effects of the Encrypted Media Extensions (EME) as a W3C recommendation would be.

We believe it will be dangerous to the open web unless protections are put in place for those who engage in activities, such as archiving, that are threatened by the legal regime governing the standard.

One major issue is that people who bypass EME, even for legitimate reasons, have reason to fear retaliation under section 1201 of the US Digital Millennium Copyright Act, and laws like it around the world, such as Article 6 of the European Union Copyright Directive, which indiscriminately bar circumvention even for lawful purposes. Locking up standards-defined video streams with digital rights management (DRM) could put our archiving activities at serious risk. DRM, which imposes technological restrictions that control what users can do with digital media, is antithetical to the open web. Moreover, EME opens the possibility that DRM could spread to non-video content such as typography or images, which poses an even more existential threat. Web archiving and the Wayback Machine would suffer.

Archiving is not the only activity endangered by anti-circumvention laws and EME: from accessibility adaptation to security research to the kinds of legitimate innovative activities that you began your career with — inventing the first search engines — the normal course of the open, standards-defined internet is incompatible with the anti-circumvention regime that comes into play if the W3C publishes EME as a recommendation.

The Electronic Frontier Foundation has proposed a sensible and simple compromise: binding W3C members not to invoke anti-circumvention laws unless there is some other cause of action. This preserves the legitimate interests of rightsholders against those who trespass on their copyrights, trade secrets and contractual obligations, without turning the W3C standards process into a backdoor to creating new legal rights to prevent legitimate, vital activities.

Every organization involved in creating and preserving the open web is facing unprecedented challenges and pressures today. It is up to the guardians of the open web to meet those challenges with an unwavering commitment to our core principles: that the web must be free for anyone to write, to read, to connect to, to adapt, to archive and to preserve. As such, I recommend that we object to the publication of EME as a W3C specification without safeguarding these foundational principles of the open web.

Posted in Announcements, News | Comments Off on DRM for the Web is a Bad Idea

Robots.txt meant for search engines don’t work well for web archives

Robots.txt files were invented 20+ years ago to help advise “robots,” mostly search engine web crawlers, which sections of a web site should be crawled and indexed for search.

Many sites use their robots.txt files to improve their SEO (search engine optimization) by excluding duplicate content like print versions of recipes, excluding search result pages, excluding large files from crawling to save on hosting costs, or “hiding” sensitive areas of the site like administrative pages. (Of course, over the years malicious actors have also used robots.txt files to identify those same sensitive areas!)  Some crawlers, like Google, pay attention to robots.txt directives, while others do not.

Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes.  Internet Archive’s goal is to create complete “snapshots” of web pages, including the duplicate content and the large versions of files.  We have also seen an upsurge of the use of robots.txt files to remove entire domains from search engines when they transition from a live web site into a parked domain, which has historically also removed the entire domain from view in the Wayback Machine.  In other words, a site goes out of business and then the parked domain is “blocked” from search engines and no one can look at the history of that site in the Wayback Machine anymore.  We receive inquiries and complaints on these “disappeared” sites almost daily.

A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign.  We are now looking to do this more broadly.  

We see the future of web archiving relying less on robots.txt file declarations geared toward search engines, and more on representing the web as it really was, and is, from a user’s perspective.

Posted in Announcements, News | 34 Comments