TV news highlights with fact checks

By Nancy Watzman and Katie Dahl

Last week, our national fact checking partners concentrated on two events featuring President Donald Trump: a press conference on February 16, and his rally in Melbourne, Florida on February 18. The Conservative Political Action Conference is being hosted this week. Look out for fact-checking of President Trump’s speech soon.  Here are some highlights, along with TV news segments from the Trump Archive and TV News Archive.

Steve Bannon and Reince Priebus addressed the conference yesterday. Bannon again called the press the “opposition party.”

Claim: Obama released Gitmo detainee that recently became a suicide bomber (wasn’t him)

Deputy assistant to the president, Sebastian Gorka, on Fox & Friends: “So President Obama released lots and lots of people that were there for a very good reason, and what happened? Almost half the time, they returned to the battlefield. This individual… goes and executes a suicide attack in Iraq.” At FactCheck.org, Farley wrote “Gorka wrongly suggested the man was released by President Barack Obama. He was transferred… President George W. Bush… then wrongly claimed that among detainees released by Obama, ‘almost half the time, they returned to the battlefield.’ According to the Office of the Director of National Intelligence, about 12.4 percent of those transferred from Gitmo under Obama are either confirmed or suspected of reengaging.”

Claim: there are 13, 14, 15 million undocumented people in the country (too high)

At a press briefing this week, White House Press Secretary Sean Spicer said “12, 14, 15 million people [are] in the country illegally,” but Yee gave him Three Pinocchios for The Washington Post’s Fact Checker. “Spicer’s statement that there are about 12 million people in the country illegally is safely within the margin of error in credible demographics research. But once he enters the realm of ‘13, 14, 15 million’ or ‘potentially more,’ his claim becomes problematic.”

Claim: Thomas Jefferson said “nothing can be believed which is seen in a newspaper.” (out of context)

At his rally in Florida, Trump said President Thomas Jefferson had said that “nothing can be believed which is seen in a newspaper. Truth itself….becomes suspicious by being put into that polluted vehicle.”

However, “Trump selectively quotes from Jefferson here, who, for most of his life, was a fierce defender of the need for a free press,” Kessler wrote for The Washington Post’s Fact Checker. PolitiFact staff made a similar point, using this quote as evidence: “And were it left to me to decide whether we should have a government without newspapers, or newspapers without a government, I should not hesitate a moment to prefer the latter.”

Claim: something happened in Sweden. (Not exactly)

By far the quote that received the most attention from the president’s rally were his comments about Sweden: “We’ve got to keep our country safe … You look at what’s happening last night in Sweden… Sweden? Who would believe this? Sweden. They took in large numbers. They’re having problems like they never thought possible.”

“This was a very strange comment. Nothing had happened the night before in Sweden,” wrote Kessler for The Washington Post’s Fact Checker.  A White House spokesperson said later that he “was talking about rising crime and recent incidents in general and not referring to a specific incident.”

PolitiFact reporter Miriam Valverde reported on the Fox news interview on Swedish crime rates, which aired the night before the rally and purportedly inspired Trump’s comments. Valverde quoted several Swedish experts countering the argument that crime rates are rising in Sweden, including political scientist Henrik Selin, who said that “[i]n general, crime statistics have gone down the last (few) years, and no there is no evidence to suggest that new waves of immigration has lead to increased crime.”

Robert Farley reported for  FactCheck.org, “Swedish authorities and criminologists say President Donald Trump is exaggerating crime in Sweden as a result of its liberal policy of accepting refugees from Syria and other Middle Eastern countries.”

Claim: The stock market has hit record numbers (mostly true)

The President mentioned the economy at a press conference, saying “The stock market has hit record numbers, as you know. And there has been a tremendous surge of optimism in the business world.” At PolitiFact, Miriam Valverde rated this as “Mostly True,” reporting “All three major stock indexes closed at record highs for five days in row on Feb. 15.”

Claim: the media is less trustworthy than Congress (mostly false, but…)

Also at the press conference, President Trump excoriated the media, saying journalists “will not tell you the truth and treat the wonderful people of our country with the respect that they deserve,” that the “press is out of control,” and that the media has a “lower approval rate than Congress, I think that’s right, I don’t know.”

PolitiFact reporter Jon Greenberg rated the trust claim as “mostly false”: “Congress actually ranks below the news media, according to surveys from three different research groups spanning several years. In two polls, mistrust in the media broke 40 percent, which is hardly anything to brag about. But in those studies, mistrust in Congress was over 50 percent.”

Glenn Kessler and Michelle Ye Hee Lee at The Washington Post’s Fact Checker agreed that Congress ranks lower than the media–but that that isn’t saying much: “[B]esides Congress, only ‘big business’ ranks lower than the media — but it’s enough to make Trump’s claim incorrect.”

FactCheck.org chimed in, noting that the “public’s approval of Congress is lower than its trust in the media,” but pointed out there’s more public trust in Trump than in the media: “Trump would have been correct to say that trust in the media is even lower than approval of himself. According to Gallup, Trump’s approval rating stood at 41 percent, as of the week ending Feb. 12, while the public’s trust in the media was down to 32 percent.”  

Claim: Trump had biggest electoral college win since Ronald Reagan. (False)

President Trump claimed his victory marked “the biggest electoral college win since Ronald Reagan.” NBC reporter Peter Alexander challenged him on the spot, saying, “Why should Americans trust you when you have accused information they have received as being fake when you have been providing information that is fake?” Trump didn’t answer the question, but rather pivoted by asking whether the reporter agreed that his victory was substantial.  

According to our fact-checking partners, there have been three presidents since Reagan who received more electoral college votes than Trump. FactCheck.org noted “Trump’s Electoral College victory margin ranks 46th out of 58 presidential elections.” Kessler and Lee wrote: “Of the nine presidential elections since 1984, Trump’s electoral college win ranks seventh.”

Claim: Hillary Clinton gave away 20 percent of the uranium in the United States (false)

President Trump asserted a claim the Washington Post Fact Checker has given Four Pinocchios, that Hillary Clinton “gave away 20 percent of the uranium in the United States,” going on to say, “you know what uranium is, right? This thing called nuclear weapons and other things like lots of things are done with uranium, including some bad things.” insinuate that the uranium could be used in a Russian nuclear weapon. FactCheck.org wrote: “The deal Clinton had a role in approving gave Russia ownership of 20 percent of U.S. production capacity — not existing stocks of uranium. Furthermore, Clinton alone could not have stopped the deal; only the president could have done that with a finding that national security would be endangered. Lastly, none of the uranium goes to Russia. That would require export licenses.”

 

Posted in News, Television Archive, tv archive | Tagged , , , , , , , , , , | Leave a comment

The Internet Archive Pushes Back on “Notice and Staydown” in Recent Comments to the Copyright Office

The US Copyright Office sought comments in its ongoing study of the Digital Millennium Copyright Act (DMCA) Section 512 safe harbor study. They are generally looking to find out how well the notice and takedown system is working for everyone—Internet platforms and users, as well as creators and copyright holders. We think the 1998 statute struck the right balance and is generally working well, a view shared by nearly all Internet platforms and users. However, some incumbent rightsholders and their advocacy organizations disagree and think the system needs to be completely redone because it is too hard to police copyright infringement online. These complaints fail to account for the exceeding high statutory damages rightsholders can claim and other mechanisms in copyright law that favor certain categories of rightsholders over new media creators and consumers.

One dangerous idea that rightsholders continue to push for is a “notice and staydown” system. This sounds like a minor edit to notice and takedown, but in reality it would amount to mandatory filtering of the Internet for the purpose of policing copyright. Last summer we noted many of the general reasons why this idea is both dangerous and impractical. In our most recent comments, we focus more specifically on the direct threat such a system would pose to the Internet Archive and our various projects such as the Wayback Machine and the TV News Archive:

For one thing, the Internet Archive preserves the state of any given web page as it existed on a particular date via the Wayback Machine. Being forced to automatically remove material from the Wayback Machine would irreparably harm the historical record. This would be harmful for journalists who use the Wayback Machine to report on important stories of which there would be no evidence without the Archive. It would be harmful for attorneys and litigants who regularly use the Wayback Machine as evidence in legal proceedings. The very knowledge that a filter was running on the Wayback Machine would undermine its credibility as an accurate snapshot of the Internet at a given point in time. Therefore, filtering is a direct threat to our mission.

The Internet Archive also hosts the Political TV Ad Archive and the TV News Archive. As with the Wayback Machine, the very point of these archives is to preserve the historical record and ensure that politicians can be held accountable for their statements in ads or in TV appearances. A mandatory filter run on the TV News Archive might catch a famous song used in a political ad or at a campaign rally, and determine that such material must be removed. However, this would distort the historical record. This puts the Internet Archive in the untenable position of having to choose between protecting the historical record for future generations, and protecting its own legal interests.

A notice and staydown system would do far more harm than good, making Swiss cheese of the historical record and censoring legitimate speech with overly aggressive algorithms. We will continue to monitor and push back on this proposal.

Read our full comments here.

Posted in News | Leave a comment

Internet Archive files amicus brief in support of fair use and innovation in libraries

 

 

 

 

 

 

 

 

Today marks the beginning of Fair Use Week, which celebrates the importance of fair use for libraries, students, teachers, journalists, creators, and the public. Last week, the Internet Archive joined the American Library Association, the Association of Research Libraries, and the Association of College and Research Libraries on a friend of the court brief in the Capitol Records v. Redigi case. This case raises the important question about whether it is legal to resell lawful copies of digital music files—that is, whether the first sale right exists in digital form, and how that right interacts with fair use. The first sale right, codified at Section 109(a) of the Copyright Act, is the same law that allows libraries to lend books and other copyrighted works to the public. As library collections become increasingly digital, libraries are relying on fair use and first sale rights in order to perform their everyday duties, including preservation and lending.

The brief argues first that the court’s fair use analysis should favor secondary uses that have the same underlying purpose as the first sale right.
“In Authors Guild v. HathiTrust… [the Second Circuit Court] used the rationale for a specific exception—17 U.S.C. § 121, which permits the making of accessible format copies for the print disabled—to support a finding of a valid purpose under the first factor. Likewise, the Copyright Office has repeatedly based fair use conclusions on specific exceptions in the context of a rulemaking under section 1201 of the Digital Millennium Copyright Act, 17 U.S.C. § 1201. As this Court did in HathiTrust or the Copyright Office did in the section 1201 rulemaking, the district court should have recognized that the purpose behind the first sale doctrine tilted the first fair use factor in favor of ReDigi.”

Second, the brief argues that a positive fair use determination in the Redigi case would enable libraries to provide new and innovative digital services to their users. The brief states:
“Fair use findings in technology cases have encouraged libraries to provide new, digitally-based services such as the HathiTrust Digital Library. In addition to enabling researchers to find relevant texts and perform critical data-mining, HathiTrust provides full-text access to over fourteen million volumes to people who have print disabilities. A fair use finding in this case would provide libraries with additional legal certainty to roll out innovative services such as the Internet Archive’s Open Library. Such a result would increase users’ access to important content without diminishing authors’ incentive to create new works.”

You can read the full text of the brief here.

Posted in Announcements, News | 4 Comments

Internet Archive Reaches Semifinals in MacArthur Foundation’s Competition for $100 Million Grant

by Wendy Hanamura

The Internet Archive headquarters: a temple to universal access to knowledge.

At the Internet Archive, we believe that libraries can be instruments of change.

So we are proud to announce that the Internet Archive is one of eight groups named semi-finalists today in 100&Change, a global competition for a single $100 million grant from the John D. and Catherine T. MacArthur Foundation. The competition seeks bold solutions to critical problems of our time. Here’s how we propose creating transformative, lasting change:

Our vision empowers libraries to unlock their rich analog collections for a new generation of learners, enabling free, long-term, public access to knowledge.

In today’s digital world, a new generation explores knowledge largely through their computers and phones. So as digital librarians, we worry when millions of books, representing a century of knowledge, are still not accessible online to scholars, journalists, students, and the public. Libraries have been stymied by huge costs, restrictions on eBooks, and missing technology. The legal path forward has not been clear. All of this means libraries haven’t been able to meet the digital demands of a new generation. And access to libraries is still not universal or equitable.

Our plan provides libraries and learners with free digital access to four million books. With our partners, we will curate, digitize, and enable digital lending of these digital volumes to any library in the country that owns the physical book. We plan to start with the books most widely held and used in libraries and classrooms. The scale of the project will help reduce digitization costs by 50 percent or more. How do we know this can work? We’ve been prototyping this model for six years at Open Library, digitizing 540,000 modern books originating from 100 partners.  Through Open Library, we lend books to the public in a manner that respects the rights of authors and publishers, in a process that mirrors the traditional way libraries circulate physical books.

What makes this a gamer-changer? Today, the Internet Archive already offers public access to 2.5 million books in the public domain, and 540,000 modern works. We need to be bigger and bolder. At the Internet Archive, we only lend one copy at a time, so in order to serve more learners, we seek thousands of libraries to join us. That can happen if we build the technical infrastructure that allows libraries everywhere to leverage those digital books. Plus, this is an issue of dollars-and-cents. Libraries should never pay to digitize a book more than once. Right now libraries pay an average of $17.50 for each interlibrary loan of a physical book. As books become electronic, those funds can be directed to more urgent needs. And above all, this grant will help all libraries become digital libraries, releasing the tremendous value in the collections they have curated over centuries.

With so many brilliant, effective thinkers applying to 100&Change, it always felt as if our chances were one in a hundred—and indeed they were! There was robust participation: 7,069 competition registrants submitted 1,904 proposals. Of those, 801 passed an initial administrative review and were evaluated by a panel of expert judges who each provided ratings on four criteria: meaningfulness, verifiability, durability, and feasibility. MacArthur’s Board of Directors made the final selection.  To be one of eight semifinalists from 800 qualified applicants is a tremendous honor.  

Eileen Alfaro, San Francisco fifth-grader. One day she could be carrying 4 million eBooks under her arm.

And as we work hard to hone our plans in the months ahead, here’s what propels us forward: Eileen Alfaro, the Internet Archive’s brightest rising star. Every day after school, this San Francisco fifth-grader does her homework at the Internet Archive, while her mother Roxana works. A straight-A student, Eileen loves nothing more than reading. We can put four million of the best books into her hands. Forever. For free.

Our proposal? Making libraries instruments of change for a new generation of learners like Eileen.

 

 

 

 

A summary of the Internet Archive’s solution, an overview video of its project and a MacArthur video describing our proposal is available here www.macfound.org/InternetArchive.

Posted in Announcements, News | 6 Comments

Internet Archive Offers to Host PACER Data

 

 

 

 

 

 

 

The Internet Archive has long supported the efforts of the Free Law Movement to make the laws and edicts of government of the United States more broadly available. With our colleague Aaron Swartz and the efforts of numerous groups across the country including the Free Law Foundation and Princeton’s Center for Information Technology Policy, we host the RECAP repository of documents from the federal district courts.  Many of these public domain document were downloaded by users of the goverment’s PACER  system for $0.10 per page and uploaded to the Internet Archive. The RECAP repository is available for free, and in bulk, which is useful for researchers.

On Tuesday, February 14, the U.S. Congress will hold the first hearings in over a decade examining the operation of the PACER system. The hearing will be before the Subcommittee on Courts, Intellectual Property and the Internet of the Judiciary Committee in the House of Representatives. The Internet Archive was pleased to accept the committee’s invitation to submit a statement for the record and we have submitted the following, which includes an offer to host the PACER data now and forever to make the works of our federal courts more readily available to inform the citizenry and to further the effective and fair administration of justice.

Our courts must function in the light of day, and in this day and age that means on the Internet. The Internet Archive is happy to try to help.

February 10, 2017

The Honorable Darrell Issa, Chairman
The Honorable Jerry Nadler, Ranking Member
Subcommittee on Courts, Intellectual Property and the Internet
Committee on the Judiciary
House of Representatives
Washington, DC 20515

Dear Chairman Issa and Ranking Member Nadler,

Thank you for the opportunity to submit comments on the Judiciary Committee’s hearing entitled “Judicial Transparency and Ethics.” I write on behalf of the Internet Archive, a non-profit digital library that is based in San Francisco with facilities throughout the world.

For more than 20 years, the Internet Archive has been archiving digital collections and making them available at no cost and with no restriction on the Internet. The Internet Archive works with the Library of Congress, the National Archives, and numerous national libraries around the world to collect, store, and provide permanent access to millions of books, videos, audio and hundreds of millions of pages of U.S. government documents, including over 14,000 hours of video of Congressional hearings.

By this submission, the Internet Archive would like to clearly state to the Judiciary Committee, as well as to the Administrative Office of the U.S. Courts and the Judicial Conference of the United States, that we would be delighted to archive and host—for free, forever, and without restriction on access to the public—all records contained in PACER.

People download more than 20 million books from the Internet Archive each month. We preserve 1 billion web pages each week for public access through the “Wayback Machine.” Indeed, the Wayback Machine is the only publicly accessible archive of all the websites of Congress. At any given moment, we are delivering about 30 gigabits of data per second. We host more than 20 petabytes of data in total.

By comparison, the PACER corpus is a fraction of a petabyte and does not use a significant amount of bandwidth. We have the capacity to host this information, and I know there are many other organizations on the Internet who would be able to make dramatic increases in the usability and utility of our Federal Judiciary’s database if it were made available in a more modern fashion and without artificial restrictions on use.

The stated purpose of PACER is to make public court records “freely available to the greatest extent possible.” Sixteen years ago, the United States Courts predicted that PACER would allow the public to “surf to the courthouse door on the Internet.” Today, anyone visiting a federal courthouse can view the public record for free. PACER, on the other hand, charges users per-page fees that are prohibitive for many members of the public. The Judiciary could resolve this unfortunate discrepancy—immediately—at no cost. This is our offer.

The Internet Archive has deep experience with collections of this kind. In fact, we already host the records from over a million federal court cases that have been donated by the public as part of the RECAP Project. However, a million cases is a small portion of the hundreds of millions of cases that PACER contains, and we are frustrated that it is so difficult to obtain and serve the workings of our federal courts to the public. This is a fairly trivial technical task, and we would welcome the opportunity to make much more data available.

I must also note that the Internet Archive is not alone in being well-equipped to offer this service. There are other large digital repositories that similarly serve the public for free. I cannot speak for them, but I believe that once the corpus is available for no fee and without restriction, they too will replicate it and offer similar service. Indeed, others may build useful tools for reading, searching, and studying the corpus of public court records that makes up our federal case law.

In order to recognize the vision of universal free access to public court records, the Federal Judiciary would essentially have to do nothing. We are experts at “crawling” online databases in an efficient and careful fashion that does not burden those systems. We are already able to comprehensively crawl PACER from a technical perspective, but the resulting fees would be astronomical. The Federal Judiciary has a Memorandum of Understanding with both the Executive Office for US Trustees and with the Government Printing Office that gives each entity no-fee access for the public benefit. The collection we would provide to the public would be far more comprehensive than the GPO’s current court opinion program—although I must laud that program for providing a digitally-authenticated collection of many opinions.

By making federal judicial dockets available in this manner, the Federal Judiciary would enable free and unlimited public access to all records that exist in PACER, finally living up to the name of the program. In today’s world, public access means access on the Internet. Public access also means that people can work with big data without having to pass a cash register for each document.

This PACER collection we would maintain and improve would have far more detailed metadata and contextual information than the GPO service or the PACER Case Locator service. And, that’s just for starters, because we know that there are thousands of eager researchers, journalists, and government workers (including Congressional staff) who would immediately jump in and work with us.

By providing no-cost access to the Internet Archive to PACER and accepting our commitment to make this information available for use without restriction in perpetuity, we believe we can work with our government to make the workings of our court more usable to government attorneys, to members of the bar, and to the public at large.

Sincerely yours,

Brewster Kahle
Digital Librarian and Founder, Internet Archive

Notes:

  1. S. Rep. 107–174, 107th Cong., 2d Sess., at 23 (2002), https://www.govinfo.gov/content/pkg/CRPT-107srpt174/pdf/CRPT-107srpt174.pdf.
  2. Electronic Public Access at 10, THE THIRD BRANCH: NEWSLETTER OF THE FEDERAL COURTS, Sep. 2000, at 3, https://archive.org/details/thirdbranch32332200001fede/.
Posted in News | 6 Comments

Apple Pie Potluck and Constitutional Law Teach-In — Friday Feb 17th 5:30-9PM


Initial information — more details to come:
In honor of the General Strike:

Constitutional Law Teach-in at the Internet Archive with EFF and Others

EFF and other lawyers will lead a conversation about the current issues and threats in constitutional law. Focusing on specific sections and amendments we will talk about current cases on censorship, surveillance, search and seizure, and more.

Workshops on using encryption tools and maybe musical performances will accompany.
If you want to present, perform, or have other ideas, please email us.

When: Friday, February 17th 5:30pm-9pm (program 6-8)
Where: Internet Archive
300 Funston Ave. SF, CA 94118
Potluck-style: Please bring apple pie or other food
Reserve your free ticket here
Streamed via Facebook Live
Donations welcome

Lawyers Attending:

  • Cindy Cohn – Executive Director of EFF
  • Corynne McSherry – Legal Director of EFF
  • Victoria Baranetsky – First Look Media Technology Legal Fellow for the Reporter’s Committee for Freedom of the Press
  • Geoff King – Lecturer at UC Berkeley, and Non-Residential Fellow at Stanford Center for Internet and Society
  • Bill Fernholz – Lecturer In Residence at Berkeley Law

For those who cannot attend in person, we will stream the event on Facebook Live, so make sure you’re following us on Facebook.

Posted in Announcements, News | Comments Off on Apple Pie Potluck and Constitutional Law Teach-In — Friday Feb 17th 5:30-9PM

This week’s TV news highlights with fact checks

by Katie Dahl

As part of a new regular feature, the Internet Archive presents highlights from our national fact checking partners of TV news segments aired over the past week. These include President Donald Trump’s assertion that the number of police officers killed on the beat has increased; his latest attack on the press; his claim that sanctuary cities breed crime; the proposition that Nordstrom’s decision to drop Ivanka Trump’s apparel line was political;  several Trump statements from his Super Bowl interview with O’Reilly, and background on the silencing of Sen. Elizabeth Warren, D., Mass., on the floor of the Senate. 

Claim: Number of officers shot and killed in line of duty increased (true)

Trump earned a rare “Gepetto’s checkmark” for truthfulness from The Washington Post’s Fact Checker when he told a gathering of law enforcement that, The number of officers shot and killed in the line of duty last year increased by 56 percent from the year before.” Reporter Michelle Ye Hee Lee wrote, “Trump’s grim statistic seemed too remarkable to be correct:…But the figure is solid. Last year was a notable year in police deaths, largely because of the number of police officers who were fatally shot in ambush attacks across the country.”

Claim: press doesn’t want to report on terrorism (wrong)From our Trump Archive: in describing “radical islamic terrorist” attacks around the world, President Trump claimed the “very very dishonest press doesn’t want to report” them. The fact-checkers at PolitiFact found no evidence for this assertion, rating the claim as “Pants on Fire”: “The media may sometimes be cautious about assigning religious motivation to a terrorist attack when the facts are unclear or still being investigated. But that’s not the same as covering them up through lack of coverage.” Reporters at FactCheck.org called Trump’s claim “nonsense.”

Claim: Sanctuary cities breed crime (no evidence)

Also from the Trump Archive: in an interview on FOX News, host Bill O’Reilly asked for Trump’s reaction to news that officials in California are discussing whether to become a sanctuary state. Trump responded that he is opposed to sanctuary cities, saying they “breed crime.” PolitiFact reporter Allison Graves wrote that there isn’t much research on the impact of sanctuary cities on crime, but that at least one recent study shows no effect on crime rates. Michelle Ye Hee Lee gave the claim “three Pinocchios” from The Washington Post’s Fact Checker: “Trump goes too far declaring that the cities “breed crime.” He not only makes a correlation, but also ascribes a causation, without facts to support either.”

 

Claim: Putin’s a killer (experts say yes)

In the Super Bowl interview, O’Reilly pressed President Trump about his respect for Putin, saying “Putin’s a killer.” Trump’s response was “We got a lot of killers. You think our country is so innocent?” PolitiFact’s Graves reported on O’Reilly’s assertion that Putin is a killer, writing that “the political climate in Russia is responsible for a sizable amount of journalists murders in the country…. Many of the perpetrators are thought to be government and military officials and political groups.”

Claim: Three million undocumented immigrants voted illegally in November elections (no evidence)

Trump continued his unsubstantiated claim that three million undocumented immigrants voted illegally in the November election. When pushed on the need for evidence, Trump was undeterred, saying “[m]any people have come out and said I’m right. You know that.” PolitiFact repeated its finding that there is no evidence for this kind of voter fraud: “Trump’s claim is undermined by years of publically available information such as a report that found just 56 cases of noncitizens voting between 2000 and 2011.”

Claim: Nordstrom’s decision to drop Ivanka Trump’s apparel line was political (No evidence)

After Nordstrom dropped his daughter Ivanka Trump’s apparel line, President Trump attacked the decision as political. His press secretary, Sean Spicer, followed at a news conference saying, “[T]his is a direct attack on his policies and her name.” Reporting for The Washington Post Fact Checker, Lee cited an internal company email from November 2016, which states the company would continue to sell the brand as long as it was profitable. Then on February 2, Nordstrom announced it was dropping the line, because of “poor sales.” Lee gave the claim “four Pinocchios.”

Explainer: what is “Senate rule XIX” (rarely invoked)

During a Senate floor debate about the nomination of then Sen. Jeff Sessions, R., Ala., to be attorney general, Senate Majority Leader Mitch McConnell, R., Ky., silenced Sen. Elizabeth Warren, D., Mass., as she read from a letter by Corretta Scott King. In doing so, he cited an obscure rule, known as Senate rule XIX, which reads: “[N]o Senator in debate shall, directly or indirectly, by any form of words impute to another Senator or to other Senators any conduct or motive unworthy or unbecoming a Senator.” PolitiFact reporter Louis Jacobson provided a useful primer on the rule, including statistics on how often it’s been invoked in Senate history: most likely, only twice, once in 1915 and another tie in 1952.

Katie Dahl is a research associate with the TV New Archive.

Posted in Announcements, News | Tagged , , , , , , , , , , , , , , , , , , , | 1 Comment

Upgraded Secure Communications Applications I am Now Using

I am upgrading the security of my communications while still being easy to use. I thought I would share what I currently use in case it is helpful to copy and I would appreciate comments.

I want end-to-end encryption so nobody can intercept what I am saying (unless they have infected my phone or computer, but that is another issue), and bonus points for making it so that it is unknown who I am communicating with and when (private metadata and traffic). Skype, phonecalls, sms/texts, slack and email are now known to not be private (at least by default) thanks to Edward Snowden. This is too bad since I still use these. (Slack is not end-to-end encrypted even for direct messages, which it could and should.) So far I have only partially achieved the first step: end-to-end encryption. I am migrating to:

  • txt and sms replacement, somewhat phonecalls: Signal for point-to-point instant messaging replacing sms and skype. Free software, free of cost, and open source, works on smart phones.I have donated.
  • skype texting replacement: Signal for laptops and with a chrome-based desktop Signal app on my Mac (which is what I mostly use). It uses phone numbers as identifiers, which is kind of a pain. EFF friend called this “best of breed” for security. Small development staff.   There is a tip for updating it to have names rather than phonenumbers: go to the … menu, go to settings, at the bottom is update contacts.
  • skype video/slack audiovideo replacement:    appear.in for 1-on-1 and small group video chat that is end-to-end encrypted replacing Skype for me. This does not require a download or an account. Go to the homepage, type a bunch of characters to make a meeting room, then send the resulting url to someone and they can use that throw-away meeting room. Super easy. Uses webrtc (now standard in browsers), and https with it, they say it is end-to-end encrypted. They have a iphone app as well, but don’t know about security. This does not seemed designed for super high security, but seems to be pretty good.
  • webex replacement:   zoom.us for larger group video chats replacing Webex for me. Free of cost for most of my uses, easy to use (requires download, but is super easy) . It says it is end-to-end encrypted with a little lock icon when in use and encrypted.
  • Facetime occasionally on my iphone replacing cellphone calls to friends with an iphone. Apple says that it is end-to-end encrypted.
  • Thunderbird + Enigmail to sign all email, receive encrypted email, and sometimes sending encypted Email, with an organizational email server (archive.org not gmail). Enigmail is moderately hard to set up, I had help in a meetup. Cost free, and I believe free and open source software. I am donating.
  • encrypted notes file (the mac Notes app) on my mac for high priority secure notes. It syncs the encrypted file with my iphone via icloud.
  • Breadwallet, bitcoin wallet on my iphone, for small amounts of bitcoin for casual purchases. Super easy and a full wallet (does not hang off a server). Love this wallet. Cost free. I invested a tiny amount of money in the company– great guys.
  • Torbrowser for private web browsing beyond Firefox’s Private browsing feature. Free and open source software, cost free. I have donated.
  • On Macintosh os/x it’s easy to turn on full disk encryption (FileVault). Go to the “Security and Privacy” setting and turn on FileVault. If you do, be sure *not* to accept its offer to store the key in iCloud. Write down the “recovery key”, and hide it somewhere away from the computer. The security of this approach is based on the security of your normal login password, so if it’s lame, change it to something that can’t be guessed or brute forced easily.  (from a commenter, Eric Blossom)
  • Web search: DuckDuckGo or StartPage.com. (from a commenter, Reinout)

Any comments or ideas are welcome. I realize have traded off security for ease of use. I hope stronger tools get easier and I suggest we all invest in tools based on donations and development help. I wish I knew my mac and iphone were not compromised. Not sure how to do that.

I have tried ricochet as an instant messaging client that secures who I am talking to via Tor, easy to use, but few I know use it, so I don’t use it often. I have tried encrypting my email using pgp via enigmail but have run into trouble with others being able to read it, so I do not encrypt email by default. As an aside, encryption is related in a funny way to content-addressible systems, which is a different subject, but this is magic and the future.

(earlier version of this post is on http://brewster.kahle.org )

Posted in Announcements, News | 4 Comments

Micropayments to Archive.org by using the Brave Browser (and bitcoin)

I hope Ted Nelson is proud. The Internet Archive just signed up for getting micropayments from participating Brave Browser users.  Brave Browser is an alt-browser for controlling ads, mostly, but they added a micropayments feature (beta).

You need put in some bitcoin that will then be distributed to the sites you visit in a month. Cool! (they help you get bitcoin)

We don’t expect it will raise the money we need to make a copy of archive.org in Canada, but we are glad to participate in this program.  Thank you, Brave, and our intrepid users.

Posted in Announcements, News | 2 Comments

If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine

In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

There are several ways to save pages and whole sites so that they appear in the Wayback Machine.  Here are 6 of them.

1. Save Page Now

Put a URL into the form, press the button, and we save the page.  You will instantly have a permanent URL for your page.

save page now

At the moment, there are a few exceptions for this method – some sites prohibit crawling, a few have SSL (security) settings that make it break – but this method will work for most pages.  The feature saves the page you enter including the images and CSS.  It does not save any of the outlinks, and can’t be used to initiate a crawl of an entire web site. We do not keep your IP address, so your submission is anonymous.

2. Chrome extension

Install the Wayback Machine Chrome extension in your browser.  Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We will save the page and give you a permanent URL.

Chrome extension allows save page now

The same provisos from “Save Page Now” apply – there are some pages where it won’t work, and it only saves one page at a time.  One plus to installing the extension though is that now as you surf around, when you run into a missing page we will alert you if we have a saved copy.

We also have a Firefox add-on; it will have Save Page Now functionality soon.  We are working on a Safari extension as well.

3. Wikipedia JavaScript Bookmarklet

Nobody loves a primary source more than a Wikipedia editor.  To that end, they offer a Wayback Machine JavaScript Bookmarklet that allows you to quickly save a web page from any browser.

wikipedia wayback bookmarklet

4. Volunteer for Archive Team

Archive Team is an entirely volunteer driven group who are interested in saving Internet history.  Many of the sites and pages they save end up in the Wayback Machine.  Visit the Archive Team site to learn more about how to volunteer with them.

Archive Team

5. Sign up for an Archive-It Account

Archive-It is a subscription service provided by Internet Archive that allows you to run your own crawling projects without any technical expertise.  Tell us what to crawl and how often to crawl it, and we execute the crawl and put the results in the Wayback Machine.

Archive-It

Archive-It is a paid subscription service with technical and web archivist support. This option is most appropriate for organizations that have a mandate to save certain types or categories of web content on a regular basis. If your institution is a current Archive-It partner, contact them for how you can contribute.

6. End of Term Archive

Every time the US government administration changes, Internet Archive works with partners to make a copy of government-related sites and web presences.  We call it the End of Term Archive.  You can help us discover new government sites by using the Nomination Tool to suggest pages or sites.  These nominations are added to the crawl and end up in the Wayback Machine.

End of term archive nomination tool

 

The Internet Archive has been saving web pages for 20 years.  This archive has been built by thousands of people, and we would like you to help.  Use one of the methods above to make sure we have the pages you care about.

 

Posted in Announcements, News, Wayback Machine, Web Archive | 13 Comments

In the news: Trump Archive, end-of-term preservation, & link rot

News outlets have been getting the word out on Internet Archive efforts to preserve President-elect Donald Trump’s statements; the outgoing Obama Administration’s web page and government data; as well as preventing that nasty experience of encountering a “404” when you click on a link online, aka “link rot.”

Trump Archive 

A number of journalists have been exploring the riches contained within the newly launched Trump Archive, a TV news clips of the president-elect speaking peppered with links to more than 500 fact checks by national fact-checking groups.

Annie Wiener, writing for The New Yorker, immerses herself in Trump statements and discovers 56 mentions of the escalator in Trump tower, and that Trump:

“is a fan of the word “sleaze,” and of the phrase “tough cookie,” which he has used to describe policemen, his opponents’ political donors, Paul LePage, “real-estate guys in New York and elsewhere,” an unnamed friend who is a “great financial guy,” isis, three professional football players, Reince Priebus, Lyndon Johnson, and Trump’s father, Fred. After watching long stretches of video, she writes, “It occurred to me that spending time online in the Trump Archive could be a form of immersion therapy: a means of overcoming shock through prolonged exposure.”

Geoffrey Fowler, tech columnist for The Wall Street Journal, bemoans the lack of easy-to-use tech tools to help people be responsible citizens overall, but also notes the promise–and challenge–of a curated collection like the Trump Archive:

“The Trump Archive shows what’s hard about using tech to hold officials accountable. It’s assembled and hand-curated by humans. Yet even using the transcripts, it can be hard to tell the difference between a spoken name and a person who’s actually speaking. Archive officials say making their database applicable to hundreds or thousands more politicians would require help from tech firms with capabilities in machine learning and voice and facial recognition.”

Fowler also published this video, featuring plenty of Trump, an interview with Roger Macdonald, director of the TV News Archive; and ample footage of the Internet Archive’s San Francisco headquarters.

The Trump Archive also was featured in Marketplace Tech®, The HillForbesNewsweek, Buzzfeed News TechPlzVentureBeat, engadgetand more.

Preserving Obama Administration websites, social media

The Internet Archive’s efforts to help preserve government websites via the Wayback Machine during and after the transition has continued to garner attention. Wired reports on a group of climate scientists working against the clock to archive government websites related to global warming:

One half was setting web crawlers upon NOAA web pages that could be easily copied and sent to the Internet Archive. The other was working their way through the harder-to-crack data sets—the ones that fuel pages like the EPA’s incredibly detailed interactive map of greenhouse gas emissions, zoomable down to each high-emitting factory and power plant.

The New Scientist also writes on efforts to archive climate data:

Fears that data could be misused or altered have prompted crowd-sourcing to back up federal climate and environmental data, including Climate Mirror, a distributed volunteer effort supported by the Internet Archive and the Universities of Pennsylvania and Toronto.

The Los Angeles Times and Quartz offer reports on archiving climate data.

Internet Archive works against link rot

Tech publications were quick to inform their readers about the Internet Archive’s new chrome extension that fights link rot by directing users to archived web pages. Here is Mashable:

Now Internet Archive has built a Wayback Machine Chrome extension. It works like this: If you click on a link that would normally lead to an error page (think 404), the extension will instead give users the option to load an archived version of the page. The link is no longer simply gone.

Also writing on the fight against link rot: NetworkWorldVenture BeatThe Tech PortalBleeping Computer, and ZDNet.

 

 

 

Posted in Announcements, News | Tagged , , , , , , , , , , , , , , , , , , , , , , , | 8 Comments

Lost Landscapes of San Francisco: Fundraiser Benefitting Internet Archive — Monday, January 30th, 2017

By Rick Prelinger, Prelinger Archives

Internet Archive presents the 11th annual Lost Landscapes of San Francisco show on Monday, January 30 at 7:30 pm. The show will be preceded by a small reception at 6:30 pm, when doors will open.

Get tickets here!

While this is the seventh year we’ve been presenting this participatory archival film show at the Archive, the story goes back much further. I’ve been collecting historical footage of San Francisco and the Bay Area in earnest since 1993, when we acquired the collection assembled by noted local historian and film preservationist Bert Gould. Since that time I’ve worked to collect film material showing the history of this dynamic and complex region. Much of it is online for free viewing, downloading and reuse as part of the Prelinger Collection.

In 1996 Chris Carlsson and LisaRuth Elliott of Shaping San Francisco encouraged me to put together a little show of historical footage for a talk at CounterPULSE. Shaping SF, by the way, is a highly active local history organization, a longtime partner of the Archive and presently working with IA to digitize a large collection of San Francisco community newspapers. I made a program and planned a narration. The little CounterPULSE dance studio theater filled quickly on show night and we had to turn many away, but the people who were able to get in talked their way through the show, asking questions, identifying places and people and arguing over precise identifications with their neighbors. It was a wonderful event — nothing like the kind of film showing that takes place in church-like silence, but an active, participatory event where people freely shared their knowledge and experience of San Francisco’s history. A new show the year afterward was also jammed. Long Now Foundation stepped up and offered to make this event part of their Seminars on Long-Term Thinking talk series, and in year 3 we moved to the 400-seat Cowell Theater at Fort Mason. This was at once a wonderful experience and an occasion for great chagrin, because at least 250 people who showed up were unable to get in. And so we moved to the beautiful Herbst Theater and in 2011 to the 1410-seat Castro Theatre, where we’ve been every year since then. And for the last eight years we’ve also been putting on Lost Landscapes at Internet Archive. Many great things have happened at the Archive showings: people have recognized their relatives in the films, and many have seen their own streets and neighborhoods as they’ve never before seen them.

Combining favorites from past years with this year’s footage discoveries, the 11th annual feature-length program shows San Francisco’s neighborhoods, infrastructures, celebrations and people from 1906 through the 1970s. This year’s program features new scenes of San Franciscans working, playing, marching and partying during the Great Depression; unseen footage of Seals Stadium and the Cow Palace in the late 1930s; newly-discovered footage of the San Francisco Produce Market in operation; glimpses of neighborhoods now gone; Cathedral Hill on the cusp of redevelopment; 1960s antiwar activism; newly found footage of Tom Mooney’s victory parade after his release from Alcatraz in 1939; Bay ferries in operation; rare images of southeastern San Francisco and the Hunters Point drydock; the 1975 Gay Freedom Day parade; a 1940s-era ode to our fog; and many more newly discovered gems.

As always, the audience makes the soundtrack! This is a great room for the show, as the shape of the Great Room makes it easy for participants to hear one another’s comments. Come prepared to identify places, people and events, to ask questions and to engage in spirited real-time repartee with fellow audience members, and look for hints of San Francisco’s future in the shape of its lost past.

Monday, January 30th
6:30 pm Reception
7:30 pm Interactive Film Program

Internet Archive
300 Funston Ave.
San Francisco, CA 94118

Get tickets here!

Posted in Announcements, News | 3 Comments

See Trump Archive fact checks in one place

Robin Chin, Katie Dahl, Tracey Jaquith, Roger Macdonald, Nancy Watzman, and Dan Schultz are contributing research and engineering for the Trump Archive. 

Now it’s easier to find fact checks of specific statements by President-elect Donald Trump in our new Trump Archive, an experimental collection of TV news clips featuring Trump–including fact checks of his press conference on January 11, his first since July 2016.

We’ve got 500+ fact checks by FactCheck.org, the Pulitzer-prize winning PolitiFact, and The Washington Post‘s Fact Checker embedded within the Trump Archive; these are now viewable on this dedicated page, with the option of downloading a csv containing links to fact checks, links to TV news clips, date of airing, and topics covered.

The Internet Archive’s Trump Archive launched on January 5 with 700+ televised speeches, interviews, debates, and other news broadcasts related to President-elect Donald Trump, and it continues to grow.

We created the Trump Archive in response to journalists and scholars who had trouble finding clips of Trump speaking through the caption search function in our TV News Archive library. We are hand-curating this collection as an experimental prototype for learning how to engineer solutions so similar archives can be created–whether by the Internet Archive or members of the public–about other elected officials and topics of interest. We are looking for collaborative partners to explore artificial intelligence approaches to creating such collections, with an ease and scale far beyond what can be accomplished now by hand.

The list of fact checks in the Trump Archive includes claims made by Trump during his press conference on January 11 covering issues from health care to ISIS to Trump’s connections to Russia. Here’s a sampling.

Health care

Trump said: “Obamacare is a complete and total disaster. It’s imploding as we said. Some states have over 100 percent increase.”

FactCheck.org: “Only Arizona has an average increase that high, and 84 percent with marketplace coverage in 2016 received tax credits to purchase insurance.”

PolitiFact: “While the average premium increase in Arizona rose by 145 percent in 2017, it is the only state with a triple-digit increase. Alabama saw the second highest increase, 71 percent. On the other end, a few states saw decreases. The average premium increase across all states was 25 percent.”

The Washington Post‘s Fact Checker: “Trump exaggerates here, and appears to misunderstand a fundamental part of the Affordable Care Act. State-by-state weighted average increases range from just 1.3 percent in Rhode Island to as high as 71 percent in Oklahoma. But the most common plans in the marketplace in 2017 experienced an average increase of 22 percent. These plans have been used as the benchmark to calculate government subsidies.”

ISIS

Trump: “I mean if you look, this administration created ISIS by leaving at the wrong time. The void was created, ISIS was formed.”

FactCheck.org: “Trump continues to oversimplify the situation by placing the entirety of the blame for the creation of ISIS on Obama’s decision to withdraw troops from Iraq.”

PolitiFact: “This is a more tempered version of Trump’s previous Pants on Fire claim that Obama and Clinton “founded ISIS.” Experts told PolitiFact that you can reasonably criticize the Obama administration’s withdrawal from Iraq, lack of support to anti-Assad rebels in Syria, and intervention in Libya for contributing to the power of ISIS. But the timeline was set in motion by the Bush administration.”

The Washington Post‘s Fact Checker: “Trump greatly simplifies a complex situation.”

Russia

Trump: “I have no deals that could happen in Russia, because we’ve stayed away. And I have no loans with Russia.”  

PolitiFact:  “It’s true that Trump has yet to build a hotel or tower in Russia, but he has eyed the Moscow skyline for decades.

We don’t know for sure about the extent of Trump’s business dealings in Russia, because he hasn’t released his tax returns. But his son, Donald Trump Jr., said in a 2008 real estate conference that “Russians make up a pretty disproportionate cross-section of a lot of our assets.”

We do know that Trump agreed to host the Miss Universe pageant in Moscow in 2013, a $20 million deal facilitated by a Russian real estate mogul and billionaire Aras Agalarov. (Trump also cameoed in Agalarov’s son’s dance-pop music video). He also made millions selling a 17-bedroom Florida mansion to a Russian billionaire.

The Washington Post‘s Fact Checker: “Trump is being misleading when he says he has stayed away from Russia. Trump repeatedly sought deals in Russia. In 1987, he went to Moscow to find a site for luxury hotel; no deal emerged. In 1996, he sought to build a condominium complex in Russia; that also did not succeed. In 2005, Trump signed a one-year deal with a New York development company to explore a Trump Tower in Moscow, but the effort fizzled.

In a 2008 speech, Donald Trump Jr. made it clear that the Trumps want to do business in Russia, but were finding it difficult. “Russians make up a pretty disproportionate cross-section of a lot of our assets,” Trump’s son said at a real estate conference in 2008, according to an account posted on the website of eTurboNews, a trade publication. “We see a lot of money pouring in from Russia.”

Posted in News | Tagged , , , , , , , , | 7 Comments

Wayback Machine Chrome extension now available

The Wayback Machine Chrome browser extension helps make the web more reliable by detecting dead web pages and offering to replay archived versions of them.  You can get it here.

For the past 20 years, the Internet Archive has recorded and preserved web pages, and hundreds of billions of them are available via the Wayback Machine.  This is good because we are learning the web is fragile and ephemeral.  For example a 2013 Harvard study found that 49% of the URLs referenced in U.S. Supreme Court decisions are now dead.  Those decisions affect everyone in the U.S., and the evidence the opinions are based on is disappearing.

When previously valid URLs don’t respond, but instead return a result code of 404, we call that link rot.  The Wayback Machine Chrome extension is designed to help mitigate against link rot and other common web breakdowns.  

By using the “Wayback Machine” extension for Chrome, users are automatically offered the opportunity to view archived pages whenever any one of several error conditions, including code 404, or “page not found,” are encountered.  If those codes are detected, the Wayback Machine extension silently queries the Wayback Machine, in real-time, to see if an archived version is available.  If one is available, a notice is displayed via Chrome, offering the user the option to see the archived page.

The Internet Archive considers the privacy of our users to be of critical importance. We try not to record IP addresses, and we have fought National Security letters.  You can rest assured that the use of the Wayback Machine Chrome extension will not expose your browsing history.  In addition we are in conversation with Google about adding a proxy server as an additional layer of protection.

Thank you for giving the Wayback Machine for Chrome extension a try.  You can test it with this URL: http://www.pfaw.org:80/attacks.htm  We are committed to supporting better web browsing experiences and welcome your feedback and suggestions about how we can improve.  Please send us your bug reports, feature requests and other feedback directly to info@archive.org.

Posted in Announcements, News | 29 Comments

Internet Archive’s Trump Archive launches today

The Trump Archive launches today with 700+ televised speeches, interviews, debates, and other news broadcasts related to President-elect Donald Trump, created using the Internet Archive’s TV News Archive.

A work in progress, the growing collection now includes more than 520 hours of Trump video. The earliest excerpt dates from December 2009, and the collection continues through the present. It includes more than 500 video statements fact checked by FactCheck.org, PolitiFact, and The Washington Post’s Fact Checker covering such controversial topics as immigration, Trump’s tax returns, Hillary Clinton’s emails, and health care.

Full list of fact checks with links to video statements in TV News Archive.
Note: We are working to update this spreadsheet with improved links. Stay tuned.

Visit the Trump Archive.

Reporters, researchers, Wikipedians, and the general public are invited to quote, compare and contrast televised statements made by Trump.

  • Use clips in your articles and videos.
  • Create supercuts on topics like Trump’s perspectives of the US press, made with our online “Popcorn” video editor.  
  • Let us know what content we are missing.  
  • If you have the technical resources, help us enhance search and discovery by collaborating in experiments to apply artificial intelligence-driven facial recognition, voice identification, and other video content analysis approaches.
  • How would you like to use such an archive?  Comment below, or write us info@archive.org

Why a Trump Archive?

We draw on this material, and our experience with building the successful Political TV Ad Archive, to create a curated collection of material related to Trump, with an emphasis on fact-checked statements. The video is searchable, quotable, and shareable on social media.

In response to requests by our fact checking partners on the Political TV Ad Archive project and other media, we hope to provide assistance for those tracking Trump’s evolving statements on public policy issues.

For example: in July 2016, Trump told ABC’s George Stephanopoulos, “I have no relationship with Putin…I don’t think I’ve ever met him.” Stephanopoulos pressed him on this point during the interview, saying that Trump had previously claimed a relationship with him. PolitiFact ruled this statement by Trump as a “full flip flop”: “Trump’s denial of a relationship with Putin contradicted what he had said on multiple previous occasions.”

By providing a free and enduring source for TV news broadcasts of Trump’s statements, the Internet Archive hopes to make it more efficient for the media, researchers, and the public to track Trump’s statements while fact-checking and reporting on the new administration. The Trump Archive can also serve as a rich treasure trove of video material for any creative use: comedy, art, documentaries, wherever people’s inspiration takes them.

We consider the Trump Archive to be an experimental model for creating similar archives for other public officials. For example, we’ll explore the idea of creating curated collections for Trump’s nominees to head federal agencies; members of Congress of both parties (for example, perhaps the Senate and House majority and minority leadership); Supreme Court nominees, and so on.

While we’ve largely hand-curated this collection, we hope to collaborate with researchers to apply machine intelligence to expand this collection, building others and making search of our entire TV library vastly more efficient.

Such experimentation builds on our experience with first prototyping and then developing the the Political TV Ad Archive. Our first collection of political TV ads, covering ads aired in Philadelphia during the 2014 mid-term elections, was built largely by hand. However, in preparation for the Political TV Ad Archive, we created a new open source tool, the Duplitron, that was able to identify ad airings by deploying audio fingerprinting. During the course of the project, we collected nearly 3,000 ads and documented more than 364,000 ad airings.

Why now?

Just because something is broadcast or posted on the internet doesn’t mean it’s forever. Reporters and the public may take it for granted that a news story or a piece of broadcast video is only a google search away, but as newspapers, companies, and organizations fail and change, often vital information is lost. The web is far more fragile than is generally understood.

The Internet Archive’s core mission is to preserve and make accessible our cultural heritage. For example, the Wayback Machine preserves websites over time, so if pages or sites are deleted, they can still be found. For example, Rachel Maddow of MSNBC reported on how the president-elect had deleted a web page from the official transition website that had touted Trump properties.

We also preserve political and news content through the TV News Archive, which contains news broadcasts by major networks back to 2009, searchable via closed captioning. The Political TV Ad Archive archives 2016 election ads along with relevant fact checks and follow-the-money reporting by our journalism partners. Our Political Campaign web archive is preserving election-related online media, such as select candidate and political groups’ websites and Twitter and Instagram feeds.

What’s next

The Trump Archive is a work in progress; we will continue to refine the content. We hope to work with others to broaden the materials available, to make search more efficient, and otherwise make it more useful for the public. We’d like you feedback and suggestions.

The great American author William Faulkner wrote, “The past is never dead. It’s not even past.” We believe that the Trump Archive, in preserving the past, can help the public engage more knowledgeably with our future.

Many thanks to the thoughtful contributions of Robin Chin, Jessica Clark, Katie Dahl, Katie Donnelly, John Gonzalez, Wendy Hanamura, Tracey Jaquith, Jeff Kaplan, Roger Macdonald, Ralf Muehlen, Craig Newmark, Sylvia Paull, Alexis Rossi, Dan Schultz, Nancy Watzman, our Partners & Funders and the Vanderbilt Television News Archive – on whose shoulders we stand.

Posted in Announcements, News | Tagged , , , , , , , , , , | 82 Comments

Join us for a White House Social Media and Gov Data Hackathon!

gov_hackathonJoin us at the Internet Archive this Saturday January 7 for a government data hackathon! We are hosting an informal hackathon working with White House social media data, government web data, and data from election-related collections. We will provide more gov data than you can shake a script at! If you are interested in attending, please register using this form. The event will take place at our 300 Funston Avenue headquarters from 10am-5pm.

We have been working with the White House on their admirable project to provide public access to eight years of White House social media data for research and creative reuse. Read more on their efforts at this blog post. Copies of this data will be publicly accessible at archive.org. We have also been furiously archiving the federal government web as part of our collaborative End of Term Web Archive and have also collected a voluminous amount of media and web data as part of the 2016 election cycle. Data from these projects — and others — will be made publicly accessible for folks to analyze, study, and do fun, interesting things with.

At Saturday’s hackathon, we will give an overview of the datasets available, have short talks from affiliated projects and services, and point to tools and methods for analyzing the hackathon’s data. We plan for a loose, informal event. Some datasets that will be available for the event and publicly accessible online:

  • Obama Administration White House social media from 2009-current, including Twitter, Tumblr, Vine, Facebook, and (possibly) YouTube
  • Comprehensive web archive data of current White House websites: whitehouse.gov, petitions.whitehouse.gov, letsmove.gov and other .gov websites
  • The End of Term Web Archives, a large-scale collaborative effort to preserve the federal government web ( .gov/.mil) at presidential transitions, including web data from 2008, 2012, and our current 2016 project
  • Special sub-collections of government data, such as every powerpoint in the Internet Archive’s web archive from the .mil web domain
  • Extensive archives of of social media data related to the 2016 election including data from candidates, pundits, and media
  • Full text transcripts of Trump candidate speeches
  • Python notebooks, cluster computing tools, and pointers to methods for playing with data at scale.

Much of this data was collected in partnership with other libraries and with the support of external funders. We thank, foremost, the current White House Office of Digital Strategy staff for their advocacy for open access and working with us and others to make their social media open to the public. We also thank our End of Term Web Archive partners and related community efforts helping preserve the .gov web, as well as the funders that have supported many of the collecting and engineering efforts that makes all this data publicly accessible, including the Institute of Museum and Library Services, Altiscalethe Knight Foundation, the Democracy Fund, the Kahle-Austin Foundation, and others.

Posted in Announcements, News | Tagged , , , , , , | 19 Comments

Would Like to Archive Government Web Services, not just Web Sites– Please help

Archiving .gov and .mil websites is going on now, with lots of help—but what if we could archive full government web services? This would mean keeping interactive sites that include databases and forms, available for future use even if the original website changes or is removed.

We like this idea because we would preserve how websites worked, not just what they looked like. As websites become more database driven and interactive, this would be a bigger help than the already helpful Wayback Machine.

We believe this is possible now given the increased use of virtual machines and cloud services. Webmasters are adjusting to having their systems work in an isolated environment and one that can be snapshot’d.

What we need are some webmasters who would like to try this. We think that government websites would be perfect because they tend to change as administrations change and the datasets are often public data.

If you run a website and would like to participate in this experiment or would like to help on the receiving end, please send a note to info@archive.org or reply to this post.

Archiving web services could usher in a completely new age in archiving of Internet resources.

 

 

Posted in Announcements, News | 4 Comments

A Year-end Message from the TV News Archive

by Katie Donnelly

Over the past extremely unpredictable election year, the Internet Archive invented new methods and tools to give journalists, researchers, and the public the power to access, scrutinize, share, and thoroughly fact-check political ads, presidential debates, and TV news broadcasts.

Our efforts were designed to help citizens better understand the patterns of political messages designed to persuade them and find factual, reliable information in what is disturbingly being seen as a “post-truth” world.

The Political TV Ad Archive project proved to be highly useful to our high-profile fact-checking partners, as well as reporters at an array of outlets including The New York Times, The Washington Post, FOX News, The Economist, The Atlantic, and more. By providing data about when, where, and how many times political ads aired on TV in key markets, the project unlocked new creative potential for data reporters to analyze how campaigns and outside groups were targeting messages to voters in different locations.

Breaking events, like political debates and speeches, also offered a chance for archived TV content to shine, allowing reporters to isolate and share clips in near-real time, and fact-checkers to harvest dubious statements for further exploration. In addition, the project’s experience with developing audio fingerprinting (through a new invention we call the Duplitron) for identifying instances of ads inspired a new use: tracking candidate debate sound bites in subsequent TV news shows.

In this way, reporters and researchers were able to analyze and report on which political statements were trending across different TV programs. This provided a way to show how political statements were trending across various networks, revealing the ideological, and agenda-setting and other editorial choices made by news producers about what issues to highlight and overlook.

screenshot-2016-12-19-13-21-14

As Roger Macdonald, director of the TV News Archive, wrote to project partners: “Citizens will increasingly hunger for sound information to inform wise electoral decisions. With our Republic being riven by increasing socio-political chaos and infectious divisions, whose magnitude has not been seen since before our Civil War, we think there are uncommon opportunities to serve citizens with the information for which they will increasingly yearn. We have an historic opportunity to thoughtfully place some grains of sand on the balance pan of reason.”

The project was supported by a generous grant from the Knight News Challenge, funded in partnership with the Knight Foundation, the Democracy Fund, the Hewlett Foundation and the Rita Allen Foundation, and received additional support from the Rita Allen Foundation, the Democracy Fund, PLCB Foundation, Craig Newmark, Christopher Buck, and others

Here is a quick look at project accomplishments:

Political TV Ad Archive

  • Total number of archived ad views, most embedded in partner sites: 2,036,063
  • Number of ads collected: 2,991
  • Political ads broadcast 364,822 times over 26 markets
  • Number of fact and source checks: 131
  • Press coverage: 156 articles

Katie Donnelly is associate director at Dot Connectors Studio, a Philadelphia-based strategy firm that has worked with the Political TV Ad Archive.

Posted in News | Tagged , , , , , , , , , , | Comments Off on A Year-end Message from the TV News Archive

New Research Tool for Visualizing Two Million Hours of Television News

Guest post by Kalev Leetaru

Today the Internet Archive announces a new interactive timeline visualization–the Television Explorer–that lets you trace how any keyword–think “emails”, “tax returns”, “alt-right”–has been covered on U.S. television news over the past half-decade.

See the Television Explorer, a new tool for exploring TV News.

screenshot-2016-12-19-09-50-09

Over the past year and a half, the GDELT Project and the Internet Archive’s Television News Archive have worked closely together to visualize how U.S. television news has covered the contentious 2016 political campaign.

One of the tools we created was the 2016 Candidate Television Tracker, which used closed captioning to count how many times each of the presidential candidates was mentioned on television and offered a day-by-day timeline showing the ebbs and flows of who was “winning” the free media wars. (Answer: President-elect Donald Trump.) This tool was used by such media outlets as The Atlantic, The Washington Post, FiveThirtyEight, Politico and The Guardian, among many others.

Now we are adapting this tool to allow more sophisticated searches: rather than just the presidential candidates, now you can trace television news coverage of any keyword of your choosing. You can even run advanced searches that find words in conjunction with other works or phrases, such as finding mentions of Hillary Clinton that also discuss her email server. All search results are available for download via CSV and JSON export, making it possible for data journalists, researchers, and advocates to fine tune their analysis of the data.

When searching, you get back a visual timeline showing how often that word or phrase has appeared on American television news over the past half-decade. Nearly two million hours of television news totaling more than 5.7 billion words from over 150 distinct stations spanning July 2009 to present (though not all stations were monitored for the entire period) are searchable in this interface.

Unlike the Internet Archive’s Television New Archive interface, which returns results at the level of an hour or half-hour “show,” the interface here reaches inside of those six and a half years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump one or more times, the interface here will count how many sentences uttered on CNN yesterday mentioned his name–a vastly more accurate metric for assessing media attention.

Explore how CNN covered the presidential campaign of 2012 versus 2016 and understand just how big of a media event this year’s election really was. See precisely when Edward Snowden burst onto the scene and how Wikileaks got more coverage during the 2016 presidential election than its debut in 2010. Watch the seasonal spikes of Thanksgiving, or see how ebola received little attention, even as thousands died in Africa, becoming a topic only after the first Americans became infected.

Using the “near” search feature, plot coverage of Wikileaks that also mentioned either “Podesta,” “email,” or “emails” nearby and discover that FOX paid far more attention to the DNC and Podesta email hacks than CNN, MSNBC, CNBC or Bloomberg. In contrast, CNN focused more intensely on the Trayvon Martin shooting (Aljazeera America and Bloomberg were not yet being monitored by the Archive), while Aljazeera led coverage of the Michael Brown and Eric Garner deaths.

screenshot-2016-12-19-09-53-55

Search of term “Wikileaks” near Podesta, emails, Clinton

Search for “ivory” to see that Aljazeera America (which ceased operation in April 2016) devoted vastly more of its coverage to elephant poaching in Africa than any other monitored national network. It also paid the most attention to “Africa” and to the “refugee” crisis. On the other hand, Bloomberg has devoted much more of its time to “China” and to the economic crisis in “Greece” last year.

We look forward to seeing what people do with this new tool Please share your favorite searches on Twitter with the hashtag “#internetarchivetvsearch”. If you have any questions, please email kalev.leetaru5@gmail.com or nancyw@archive.org.

Kalev Leetaru is an independent data journalist. 

Posted in Announcements, News | Tagged , , , , , , , , , , , , , , , , , , , , , | 3 Comments

Robots.txt Files and Archiving .gov and .mil Websites


The Internet Archive is
collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. Some have asked if we ignore URL exclusions expressed in robots.txt files.

The answer is a bit complicated.  Historically, sometimes yes and sometimes no; but going forward the answer is “even less so.”

mollymonsterRobots.txt files live on the top level of a website at a url like this: https://example.com/robots.txt. This standard was developed in 1994 to guide search engine crawlers in a variety of ways, including some areas to avoid crawling.   This standard is used by Google, for instance.

These files were useful 20 years ago for the Internet Archive’s crawlers, but have become less and less so over the years because many sites have not actively maintained the files from the point of view of archiving. Also, large websites or hosted websites often do not make it easy for their users to edit these files, and large websites increasingly guide or block crawlers with technological measures. Another problem is knowing when a domain name changes hands, so a current robots.txt file is not relevant to a different era. As time has gone on, for those who want to exclude their sites we encourage webmasters to send exclusion requests to info@archive.org and encourage them to specify what time period they apply to.

Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies.  We have had little or no negative feedback on this, and little or no positive feedback — in fact little feedback at all. The Wayback Machine has also been replaying the captured .gov and .mil webpages for some time in the beta wayback, regardless of robots.txt.   

Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future.

Posted in News, Wayback Machine, Web Archive | 3 Comments