Wayback Machine Chrome extension now available

The Wayback Machine Chrome browser extension helps make the web more reliable by detecting dead web pages and offering to replay archived versions of them.  You can get it here.

For the past 20 years, the Internet Archive has recorded and preserved web pages, and hundreds of billions of them are available via the Wayback Machine.  This is good because we are learning the web is fragile and ephemeral.  For example a 2013 Harvard study found that 49% of the URLs referenced in U.S. Supreme Court decisions are now dead.  Those decisions affect everyone in the U.S., and the evidence the opinions are based on is disappearing.

When previously valid URLs don’t respond, but instead return a result code of 404, we call that link rot.  The Wayback Machine Chrome extension is designed to help mitigate against link rot and other common web breakdowns.  

By using the “Wayback Machine” extension for Chrome, users are automatically offered the opportunity to view archived pages whenever any one of several error conditions, including code 404, or “page not found,” are encountered.  If those codes are detected, the Wayback Machine extension silently queries the Wayback Machine, in real-time, to see if an archived version is available.  If one is available, a notice is displayed via Chrome, offering the user the option to see the archived page.

The Internet Archive considers the privacy of our users to be of critical importance. We try not to record IP addresses, and we have fought National Security letters.  You can rest assured that the use of the Wayback Machine Chrome extension will not expose your browsing history.  In addition we are in conversation with Google about adding a proxy server as an additional layer of protection.

Thank you for giving the Wayback Machine for Chrome extension a try.  You can test it with this URL: http://www.pfaw.org:80/attacks.htm  We are committed to supporting better web browsing experiences and welcome your feedback and suggestions about how we can improve.  Please send us your bug reports, feature requests and other feedback directly to info@archive.org.

Posted in Announcements, News | 29 Comments

Internet Archive’s Trump Archive launches today

The Trump Archive launches today with 700+ televised speeches, interviews, debates, and other news broadcasts related to President-elect Donald Trump, created using the Internet Archive’s TV News Archive.

A work in progress, the growing collection now includes more than 520 hours of Trump video. The earliest excerpt dates from December 2009, and the collection continues through the present. It includes more than 500 video statements fact checked by FactCheck.org, PolitiFact, and The Washington Post’s Fact Checker covering such controversial topics as immigration, Trump’s tax returns, Hillary Clinton’s emails, and health care.

Full list of fact checks with links to video statements in TV News Archive.

Visit the Trump Archive.

Reporters, researchers, Wikipedians, and the general public are invited to quote, compare and contrast televised statements made by Trump.

  • Use clips in your articles and videos.
  • Create supercuts on topics like Trump’s perspectives of the US press, made with our online “Popcorn” video editor.  
  • Let us know what content we are missing.  
  • If you have the technical resources, help us enhance search and discovery by collaborating in experiments to apply artificial intelligence-driven facial recognition, voice identification, and other video content analysis approaches.
  • How would you like to use such an archive?  Comment below, or write us info@archive.org

Why a Trump Archive?

We draw on this material, and our experience with building the successful Political TV Ad Archive, to create a curated collection of material related to Trump, with an emphasis on fact-checked statements. The video is searchable, quotable, and shareable on social media.

In response to requests by our fact checking partners on the Political TV Ad Archive project and other media, we hope to provide assistance for those tracking Trump’s evolving statements on public policy issues.

For example: in July 2016, Trump told ABC’s George Stephanopoulos, “I have no relationship with Putin…I don’t think I’ve ever met him.” Stephanopoulos pressed him on this point during the interview, saying that Trump had previously claimed a relationship with him. PolitiFact ruled this statement by Trump as a “full flip flop”: “Trump’s denial of a relationship with Putin contradicted what he had said on multiple previous occasions.”

By providing a free and enduring source for TV news broadcasts of Trump’s statements, the Internet Archive hopes to make it more efficient for the media, researchers, and the public to track Trump’s statements while fact-checking and reporting on the new administration. The Trump Archive can also serve as a rich treasure trove of video material for any creative use: comedy, art, documentaries, wherever people’s inspiration takes them.

We consider the Trump Archive to be an experimental model for creating similar archives for other public officials. For example, we’ll explore the idea of creating curated collections for Trump’s nominees to head federal agencies; members of Congress of both parties (for example, perhaps the Senate and House majority and minority leadership); Supreme Court nominees, and so on.

While we’ve largely hand-curated this collection, we hope to collaborate with researchers to apply machine intelligence to expand this collection, building others and making search of our entire TV library vastly more efficient.

Such experimentation builds on our experience with first prototyping and then developing the the Political TV Ad Archive. Our first collection of political TV ads, covering ads aired in Philadelphia during the 2014 mid-term elections, was built largely by hand. However, in preparation for the Political TV Ad Archive, we created a new open source tool, the Duplitron, that was able to identify ad airings by deploying audio fingerprinting. During the course of the project, we collected nearly 3,000 ads and documented more than 364,000 ad airings.

Why now?

Just because something is broadcast or posted on the internet doesn’t mean it’s forever. Reporters and the public may take it for granted that a news story or a piece of broadcast video is only a google search away, but as newspapers, companies, and organizations fail and change, often vital information is lost. The web is far more fragile than is generally understood.

The Internet Archive’s core mission is to preserve and make accessible our cultural heritage. For example, the Wayback Machine preserves websites over time, so if pages or sites are deleted, they can still be found. For example, Rachel Maddow of MSNBC reported on how the president-elect had deleted a web page from the official transition website that had touted Trump properties.

We also preserve political and news content through the TV News Archive, which contains news broadcasts by major networks back to 2009, searchable via closed captioning. The Political TV Ad Archive archives 2016 election ads along with relevant fact checks and follow-the-money reporting by our journalism partners. Our Political Campaign web archive is preserving election-related online media, such as select candidate and political groups’ websites and Twitter and Instagram feeds.

What’s next

The Trump Archive is a work in progress; we will continue to refine the content. We hope to work with others to broaden the materials available, to make search more efficient, and otherwise make it more useful for the public. We’d like you feedback and suggestions.

The great American author William Faulkner wrote, “The past is never dead. It’s not even past.” We believe that the Trump Archive, in preserving the past, can help the public engage more knowledgeably with our future.

Many thanks to the thoughtful contributions of Robin Chin, Jessica Clark, Katie Dahl, Katie Donnelly, John Gonzalez, Wendy Hanamura, Tracey Jaquith, Jeff Kaplan, Roger Macdonald, Ralf Muehlen, Craig Newmark, Sylvia Paull, Alexis Rossi, Dan Schultz, Nancy Watzman, our Partners & Funders and the Vanderbilt Television News Archive – on whose shoulders we stand.

Posted in Announcements, News | Tagged , , , , , , , , , , | 82 Comments

Join us for a White House Social Media and Gov Data Hackathon!

gov_hackathonJoin us at the Internet Archive this Saturday January 7 for a government data hackathon! We are hosting an informal hackathon working with White House social media data, government web data, and data from election-related collections. We will provide more gov data than you can shake a script at! If you are interested in attending, please register using this form. The event will take place at our 300 Funston Avenue headquarters from 10am-5pm.

We have been working with the White House on their admirable project to provide public access to eight years of White House social media data for research and creative reuse. Read more on their efforts at this blog post. Copies of this data will be publicly accessible at archive.org. We have also been furiously archiving the federal government web as part of our collaborative End of Term Web Archive and have also collected a voluminous amount of media and web data as part of the 2016 election cycle. Data from these projects — and others — will be made publicly accessible for folks to analyze, study, and do fun, interesting things with.

At Saturday’s hackathon, we will give an overview of the datasets available, have short talks from affiliated projects and services, and point to tools and methods for analyzing the hackathon’s data. We plan for a loose, informal event. Some datasets that will be available for the event and publicly accessible online:

  • Obama Administration White House social media from 2009-current, including Twitter, Tumblr, Vine, Facebook, and (possibly) YouTube
  • Comprehensive web archive data of current White House websites: whitehouse.gov, petitions.whitehouse.gov, letsmove.gov and other .gov websites
  • The End of Term Web Archives, a large-scale collaborative effort to preserve the federal government web ( .gov/.mil) at presidential transitions, including web data from 2008, 2012, and our current 2016 project
  • Special sub-collections of government data, such as every powerpoint in the Internet Archive’s web archive from the .mil web domain
  • Extensive archives of of social media data related to the 2016 election including data from candidates, pundits, and media
  • Full text transcripts of Trump candidate speeches
  • Python notebooks, cluster computing tools, and pointers to methods for playing with data at scale.

Much of this data was collected in partnership with other libraries and with the support of external funders. We thank, foremost, the current White House Office of Digital Strategy staff for their advocacy for open access and working with us and others to make their social media open to the public. We also thank our End of Term Web Archive partners and related community efforts helping preserve the .gov web, as well as the funders that have supported many of the collecting and engineering efforts that makes all this data publicly accessible, including the Institute of Museum and Library Services, Altiscalethe Knight Foundation, the Democracy Fund, the Kahle-Austin Foundation, and others.

Posted in Announcements, News | Tagged , , , , , , | 19 Comments

Would Like to Archive Government Web Services, not just Web Sites– Please help

Archiving .gov and .mil websites is going on now, with lots of help—but what if we could archive full government web services? This would mean keeping interactive sites that include databases and forms, available for future use even if the original website changes or is removed.

We like this idea because we would preserve how websites worked, not just what they looked like. As websites become more database driven and interactive, this would be a bigger help than the already helpful Wayback Machine.

We believe this is possible now given the increased use of virtual machines and cloud services. Webmasters are adjusting to having their systems work in an isolated environment and one that can be snapshot’d.

What we need are some webmasters who would like to try this. We think that government websites would be perfect because they tend to change as administrations change and the datasets are often public data.

If you run a website and would like to participate in this experiment or would like to help on the receiving end, please send a note to info@archive.org or reply to this post.

Archiving web services could usher in a completely new age in archiving of Internet resources.

 

 

Posted in Announcements, News | 4 Comments

A Year-end Message from the TV News Archive

by Katie Donnelly

Over the past extremely unpredictable election year, the Internet Archive invented new methods and tools to give journalists, researchers, and the public the power to access, scrutinize, share, and thoroughly fact-check political ads, presidential debates, and TV news broadcasts.

Our efforts were designed to help citizens better understand the patterns of political messages designed to persuade them and find factual, reliable information in what is disturbingly being seen as a “post-truth” world.

The Political TV Ad Archive project proved to be highly useful to our high-profile fact-checking partners, as well as reporters at an array of outlets including The New York Times, The Washington Post, FOX News, The Economist, The Atlantic, and more. By providing data about when, where, and how many times political ads aired on TV in key markets, the project unlocked new creative potential for data reporters to analyze how campaigns and outside groups were targeting messages to voters in different locations.

Breaking events, like political debates and speeches, also offered a chance for archived TV content to shine, allowing reporters to isolate and share clips in near-real time, and fact-checkers to harvest dubious statements for further exploration. In addition, the project’s experience with developing audio fingerprinting (through a new invention we call the Duplitron) for identifying instances of ads inspired a new use: tracking candidate debate sound bites in subsequent TV news shows.

In this way, reporters and researchers were able to analyze and report on which political statements were trending across different TV programs. This provided a way to show how political statements were trending across various networks, revealing the ideological, and agenda-setting and other editorial choices made by news producers about what issues to highlight and overlook.

screenshot-2016-12-19-13-21-14

As Roger Macdonald, director of the TV News Archive, wrote to project partners: “Citizens will increasingly hunger for sound information to inform wise electoral decisions. With our Republic being riven by increasing socio-political chaos and infectious divisions, whose magnitude has not been seen since before our Civil War, we think there are uncommon opportunities to serve citizens with the information for which they will increasingly yearn. We have an historic opportunity to thoughtfully place some grains of sand on the balance pan of reason.”

The project was supported by a generous grant from the Knight News Challenge, funded in partnership with the Knight Foundation, the Democracy Fund, the Hewlett Foundation and the Rita Allen Foundation, and received additional support from the Rita Allen Foundation, the Democracy Fund, PLCB Foundation, Craig Newmark, Christopher Buck, and others

Here is a quick look at project accomplishments:

Political TV Ad Archive

  • Total number of archived ad views, most embedded in partner sites: 2,036,063
  • Number of ads collected: 2,991
  • Political ads broadcast 364,822 times over 26 markets
  • Number of fact and source checks: 131
  • Press coverage: 156 articles

Katie Donnelly is associate director at Dot Connectors Studio, a Philadelphia-based strategy firm that has worked with the Political TV Ad Archive.

Posted in News | Tagged , , , , , , , , , , | Comments Off on A Year-end Message from the TV News Archive

New Research Tool for Visualizing Two Million Hours of Television News

Guest post by Kalev Leetaru

Today the Internet Archive announces a new interactive timeline visualization–the Television Explorer–that lets you trace how any keyword–think “emails”, “tax returns”, “alt-right”–has been covered on U.S. television news over the past half-decade.

See the Television Explorer, a new tool for exploring TV News.

screenshot-2016-12-19-09-50-09

Over the past year and a half, the GDELT Project and the Internet Archive’s Television News Archive have worked closely together to visualize how U.S. television news has covered the contentious 2016 political campaign.

One of the tools we created was the 2016 Candidate Television Tracker, which used closed captioning to count how many times each of the presidential candidates was mentioned on television and offered a day-by-day timeline showing the ebbs and flows of who was “winning” the free media wars. (Answer: President-elect Donald Trump.) This tool was used by such media outlets as The Atlantic, The Washington Post, FiveThirtyEight, Politico and The Guardian, among many others.

Now we are adapting this tool to allow more sophisticated searches: rather than just the presidential candidates, now you can trace television news coverage of any keyword of your choosing. You can even run advanced searches that find words in conjunction with other works or phrases, such as finding mentions of Hillary Clinton that also discuss her email server. All search results are available for download via CSV and JSON export, making it possible for data journalists, researchers, and advocates to fine tune their analysis of the data.

When searching, you get back a visual timeline showing how often that word or phrase has appeared on American television news over the past half-decade. Nearly two million hours of television news totaling more than 5.7 billion words from over 150 distinct stations spanning July 2009 to present (though not all stations were monitored for the entire period) are searchable in this interface.

Unlike the Internet Archive’s Television New Archive interface, which returns results at the level of an hour or half-hour “show,” the interface here reaches inside of those six and a half years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump one or more times, the interface here will count how many sentences uttered on CNN yesterday mentioned his name–a vastly more accurate metric for assessing media attention.

Explore how CNN covered the presidential campaign of 2012 versus 2016 and understand just how big of a media event this year’s election really was. See precisely when Edward Snowden burst onto the scene and how Wikileaks got more coverage during the 2016 presidential election than its debut in 2010. Watch the seasonal spikes of Thanksgiving, or see how ebola received little attention, even as thousands died in Africa, becoming a topic only after the first Americans became infected.

Using the “near” search feature, plot coverage of Wikileaks that also mentioned either “Podesta,” “email,” or “emails” nearby and discover that FOX paid far more attention to the DNC and Podesta email hacks than CNN, MSNBC, CNBC or Bloomberg. In contrast, CNN focused more intensely on the Trayvon Martin shooting (Aljazeera America and Bloomberg were not yet being monitored by the Archive), while Aljazeera led coverage of the Michael Brown and Eric Garner deaths.

screenshot-2016-12-19-09-53-55

Search of term “Wikileaks” near Podesta, emails, Clinton

Search for “ivory” to see that Aljazeera America (which ceased operation in April 2016) devoted vastly more of its coverage to elephant poaching in Africa than any other monitored national network. It also paid the most attention to “Africa” and to the “refugee” crisis. On the other hand, Bloomberg has devoted much more of its time to “China” and to the economic crisis in “Greece” last year.

We look forward to seeing what people do with this new tool Please share your favorite searches on Twitter with the hashtag “#internetarchivetvsearch”. If you have any questions, please email kalev.leetaru5@gmail.com or nancyw@archive.org.

Kalev Leetaru is an independent data journalist. 

Posted in Announcements, News | Tagged , , , , , , , , , , , , , , , , , , , , , | 3 Comments

Robots.txt Files and Archiving .gov and .mil Websites


The Internet Archive is
collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. Some have asked if we ignore URL exclusions expressed in robots.txt files.

The answer is a bit complicated.  Historically, sometimes yes and sometimes no; but going forward the answer is “even less so.”

mollymonsterRobots.txt files live on the top level of a website at a url like this: https://example.com/robots.txt. This standard was developed in 1994 to guide search engine crawlers in a variety of ways, including some areas to avoid crawling.   This standard is used by Google, for instance.

These files were useful 20 years ago for the Internet Archive’s crawlers, but have become less and less so over the years because many sites have not actively maintained the files from the point of view of archiving. Also, large websites or hosted websites often do not make it easy for their users to edit these files, and large websites increasingly guide or block crawlers with technological measures. Another problem is knowing when a domain name changes hands, so a current robots.txt file is not relevant to a different era. As time has gone on, for those who want to exclude their sites we encourage webmasters to send exclusion requests to info@archive.org and encourage them to specify what time period they apply to.

Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies.  We have had little or no negative feedback on this, and little or no positive feedback — in fact little feedback at all. The Wayback Machine has also been replaying the captured .gov and .mil webpages for some time in the beta wayback, regardless of robots.txt.   

Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future.

Posted in News, Wayback Machine - Web Archive | 3 Comments

Preserving U.S. Government Websites and Data as the Obama Term Ends

Long before the 2016 Presidential election cycle librarians have understood this often-overlooked fact: vast amounts of government data and digital information are at risk of vanishing when a presidential term ends and administrations change.  For example, 83% of .gov pdf’s disappeared between 2008 and 2012.

That is why the Internet Archive, along with partners from the Library of Congress, University of North Texas, George Washington University, Stanford University, California Digital Library, and other public and private libraries, are hard at work on the End of Term Web Archive, a wide-ranging effort to preserve the entirety of the federal government web presence, especially the .gov and .mil domains, along with federal websites on other domains and official government social media accounts.

While not the only project the Internet Archive is doing to preserve government websites, ftp sites, and databases at this time, the End of Term Web Archive is a far reaching one.

The Internet Archive is collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. The effort is likely to preserve hundreds of millions of individual government webpages and data and could end up totaling well over 100 terabytes of data of archived materials. Over its full history of web archiving, the Internet Archive has preserved over 3.5 billion URLs from the .gov domain including over 45 million PDFs.

This end-of-term collection builds on similar initiatives in 2008 and 2012 by original partners Internet Archive, Library of Congress, University of North Texas, and California Digital Library to document the “gov web,” which has no mandated, domain-wide single custodian. For instance, here is the National Institute of Literacy (NIFL) website in 2008. The domain went offline in 2011. Similarly, the Sustainable Development Indicators (SDI) site was later taken down. Other websites, such as invasivespecies.gov were later folded into larger agency domains. Every web page archived is accessible through the Wayback Machine and past and current End of Term specific collections are full-text searchable through the main End of Term portal. We have also worked with additional partners to provide access to the full data for use in data-mining research and projects.

The project has received considerable press attention this year, with related stories in The New York Times, Politico, The Washington Post, Library Journal, Motherboard, and others.

“No single government entity is responsible for archiving the entire federal government’s web presence,” explained Jefferson Bailey, the Internet Archive’s Director of Web Archiving.  “Web data is already highly ephemeral and websites without a mandated custodian are even more imperiled. These sites include significant amounts of publicly-funded federal research, data, projects, and reporting that may only exist or be published on the web. This is tremendously important historical information. It also creates an amazing opportunity for libraries and archives to join forces and resources and collaborate to archive and provide permanent access to this material.”

This year has also seen a significant increase in citizen and librarian driven “hackathons” and “nomination-a-thons” where subject experts and concerned information professionals crowdsource lists of high-value or endangered websites for the End of Term archiving partners to crawl. Librarian groups in New York City are holding nomination events to make sure important sites are preserved. And universities such as  The University of Toronto are holding events for “guerrilla archiving” focused specifically on preserving climate related data.

We need your help too! You can use the End of Term Nomination Tool to nominate any .gov or government website or social media site and it will be archived by the project team.   If you have other ideas, please comment here or send ideas to info@archive.org.   And you can also help by donating to the Internet Archive to help our continued mission to provide “Universal Access to All Knowledge.”

Posted in Announcements, News | Tagged , , , | 14 Comments

Internet Archive Canada and National Security Letter in the news: roundup

The Internet Archive garnered major media attention over the past week, first, on our plan to create a Canadian copy, and second, on the news we received a National Security Letter (NSL) requesting personal information about a user, the second in our history.

Canadian copy

Brewster Kahle’s post explaining why, in light of the new administration, the Internet Archive is raising money to build a copy of its collections in Canada hit a nerve.  More details were in a FAQ.

On November 29, Rachel Maddow led her MSNBC show with a segment about how the Internet Archive’s Wayback Machine helps reporters by preserving a record of what politicians say online, even when they later delete it.

One of her main examples: how soon after winning the election, President-elect Donald Trump’s official federal transition web page included a “rundown ….of all of the ‘world’s top properties that Donald Trump’s owns.”

The website has since been deleted, Maddow noted.

Maddow also called the Internet Archive, a “national treasure…an international treasure.” (We’re blushing.)

Meanwhile, Paul Sawers noted in Venture Beat:

 Given that lies and fake news played a crucial part in the 2016 U.S. presidential election narrative, it is somewhat notable that the Internet Archive had launched the Political TV Ad Archive back in January to help journalists fact-check claims made during political campaigning.

In The Washington Times, Andrew Blake wrote about the Internet Archive’s plans to create a Canadian copy and also reported:

Mr. Trump’s office did not immediately respond to a request for comment Wednesday. Prior to being elected president, however, the Republican businessman suggested taking action to prevent Americans from becoming radicalized online by the Islamic State terror group’s social media recruitment efforts.

Here’s a link to Trump’s speech referenced by The Washington Times.

Sam Thielman reported in The Guardian on challenges facing libraries generally, including the Internet Archive’s decision to create a Canadian copy of data. The piece also discusses how the New York Public Library has changed its privacy policies to assure readers that it will not keep user data longer than expected.

Other media outlets reporting on the Internet Archive’s news include NBC News, the BBC, the New RepublicRecode Daily, and Newsweek.

Increasing transparency on National Security Letters

Last week the Internet Archive also revealed we received a National Security Letter (NSL), requesting we turn over personal information about a particular user, the second in our history. We worked with the Electronic Frontier Foundation (EFF) to challenge the letter and gain the right to release it in redacted form; in the process, we also highlighted an error in the NSL about the right to appeal, which may have affected thousands of other letters.

Kim Zetter, a reporter for The Intercept, reported at length about how the Internet Archive took the unusual step of challenging the NSL–and won:

Now, Kahle and the archive are notching another victory, one that underlines the progress their original fight helped set in motion. The archive, a nonprofit online library, has disclosed that it received another NSL in August, its first since the one it received and fought in 2007. Once again it pushed back, but this time events unfolded differently: The archive was able to challenge the NSL and gag order directly in a letter to the FBI, rather than through a secretive lawsuit. In November, the bureau again backed down and, without a protracted battle, has now allowed the archive to publish the NSL in redacted form.

Dhrumil Mehta of FiveThirtyEight.com reported on the error exposed by the Internet Archive and the EFF–namely, the NSL incorrectly described the means for possible appeals of the gag order preventing an organization that has received such a letter from publicizing it. Mehta has filed a Freedom of Information Act request (FOIA) to find out how many letters sent out by the Federal Bureau of Investigation (FBI) contain this error:

This letter was particularly troublesome to privacy advocates because it contained misinformation about the rights of a letter recipient to challenge the nondisclosure requirement. The letter stated that the Internet Archive could “make an annual challenge to the nondisclosure requirement.” The Electronic Frontier Foundation, an advocacy organization that is legally representing the Internet Archive, pointed out in a press release that the passage of the USA Freedom Act in June of 2015 changed the law to allow letter recipients to challenge the National Security Letter at any time, not just once annually. In response to the EFF’s claim, the FBI withdrew its National Security Letter, allowed the Internet Archive to publish a redacted version of the letter containing the error and promised to correct the mistake by informing everyone else who got the same erroneous language.

It’s not just us

Tim Johnson of McClatchyDC drew all the themes together, linking the Internet Archive’s Canada announcement, the news on the NSL, and actions other library organizations are taking, all in one piece.

It turns out the nonprofit Internet Archive isn’t alone in taking action.

The New York Public Library announced a change this week to its privacy policy, informing users that it would retain less information about their activities.

The American Library Association, headquartered in Chicago, embraced that move and encourages others, including telling public libraries to encrypt all communications and lock up stored data to protect it from a prying government.

 

Posted in Announcements, News | Tagged , , , , , , , , , , , , , , , | 17 Comments

FAQs about the Internet Archive Canada

Responses from Brewster Kahle, Founder & Digital Librarian of the Internet Archive

Based on interest from our letter that mentioned our raising money to make a copy of Internet Archive’s digital collections in Canada, press and others have asked a bunch of good questions. Here is a compendium of our answers:

Q. Were you working on a back-up before the election of Trump?
Yes, we have a partial copy of the Internet Archive in Alexandria, Egypt, and in Amsterdam, the Netherlands.

And also before the election we had been planning with the University of Toronto and University of Alberta to host the materials digitized from Canadian libraries at the Internet Archive Canada, which is a completely separate nonprofit from ours.

The statements by Trump on the campaign trail (see below) have ramped us into higher gear, moving us further and faster than we would have. The election led us to think bigger.

Q. Was there anything specific about Trump’s win that made you want to step up your game in terms of a backup archive? What in particular concerns you about what he has said/done? What potential risks do you see?
Upon his election we looked through our archive to find what his stand might be on the Internet policies and found announcements.

At this point, I think it would be prudent to take President-elect Trump at his word. Here are some of his statements, preserved in our Television News Archive. https://archive.org/tv

CNN Republican Presidential Debate
CNN December 15, 2015
Wolf Blitzer: Mr. Trump, are you open to closing parts of the internet?
Donald Trump: I would certainly be open to closing areas where we are at war with somebody. I sure as hell don’t want to let people that want to kill us and kill our nation use our internet. Yes, sir, I am.

https://archive.org/details/CSPAN_20151208_063000_Key_Capitol_Hill_Hearings
Donald Trump quote at a campaign rally at the USS Yorktown in South Carolina CSPAN broadcast speech on December 8, 2015
Donald Trump: So the press has to be responsible. They’re not being responsible, because we are losing a lot of people because of the internet. We have to do something. We have to go see Bill Gates and a lot of different people that really understand what is happening. We have to talk to them, maybe in certain areas, closing that internet up in some way. Some of you will say, “Oh, freedom of speech, freedom of speech.” these are foolish people. We have a lot of foolish people. We have a lot of foolish people. We have got to maybe do something with the internet because they are recruiting by the thousands.

Donald Trump on freedom of the press:
https://archive.org/details/R_macdonald-trumpOnPressV6

Q. How does this work? What goes into creating a backup of this magnitude (in whatever brief lay terms you can condense it to)?
There are stages we can take to achieve our overall goal. The first stage would be done with the University of Toronto and University of Alberta: to make a copy of what has been digitized from these Canadian collections (books and microfilm) and move that onto their university servers.

The next stage is to create a partial mirror at the Internet Archive Canada, which we have been planning to do.

Then the next stage is to create a “backup copy” in Canada for researchers. The best case scenario would be to have an active organization running a live copy of as much of the Internet Archive’s collections as makes sense. This is what we would like to do.

Q: Is there a specific dollar amount that you are aiming for?
To build a running archive in Canada will cost approximately $5 million, which is our goal. But we can take steps in this direction with less. Then there is ongoing support.

Q: How will you raise the money?
Great question. We are asking for donations from our users and supporters. Donations to the Internet Archive are tax-deductible in the US and can be made at https://archive.org/donate/

Q. What is the Internet Archive of Canada? Can I make a donation to it?
The Internet Archive Canada is a Not-For-Profit Corporation, registered under number 435509-1. It has been running for years and employs 11 book scanners in Toronto and Alberta. It is not a registered public charity, and donations are tax-deductible on donors’ US income only. To donate, please send cheques to:

Internet Archive Canada
130 St. George St.
Suite 7001
Toronto, ON M5V 3T5
CANADA

Q. What does it mean when you say you archive the “Internet.” Is this national? Or is it a global endeavor?
The Internet Archive archives many things: books, music, video, webpages, television and makes these materials available for free on the archive.org, openlibrary.org, and archive-it.org sites.  Take, for instance, the scope of our Web archiving in the Wayback Machine: https://archive.org/web. It houses a massive archive of over 250 billion web pages, made up of many collections. The Wayback Machine is freely accessible to anyone and it is used by hundreds of thousands of people every day. It is a global project to archive these pages.

Q. What else does the Internet Archive preserve, beyond the Wayback Machine?
The Internet Archive is a non-profit digital library founded by Brewster Kahle in 1996 with the mission to provide “Universal access to all Knowledge.” The organization seeks to preserve the world’s cultural heritage and to provide open access to our shared knowledge in the digital era, supporting the work of historians, scholars, journalists, students, the blind and reading disabled, as well as the general public. The Internet Archive’s digital collections include more than 26 petabytes of data: 279 billion web pages, moving images (2.2 million films and videos), audio (2.5 million recordings, 140,000 live concerts), texts (8 million texts including 3 million digital books), software (100,000 items) and television (3 million hours). Each day, 2-3 million visitors use or contribute to the Internet Archive, making it one of the world’s top 250 sites. It has created new models for digital conservation by forging alliances with more than 450 libraries, universities and national archives around the world.

Posted in News | 30 Comments