Tag Archives: artificial intelligence

Internet Archive Submits Comments on Copyright and Artificial Intelligence

On Monday the Internet Archive joined thousands of others in submitting comments to the US Copyright Office as part of its study on Copyright and Artificial Intelligence.

Our high level view is that copyright law has been adapting to disruptive technologies since its earliest days and our existing copyright law is adequate to meet the disruptions of today. In particular, copyright’s flexible fair use provision deals well with the fact-specific nature of new technologies, and has already addressed earlier innovations in machine learning and text-and-data mining. So while Generative AI presents a host of policy challenges that may prompt different kinds of legislative reform, we do not see that new copyright laws are needed to respond to Generative AI today.

Our comments are guided by three core principles.


First, regulation of Artificial Intelligence should be considered holistically–not solely through the isolated lens of copyright law. As explained in the Library Copyright Alliance Principles for Artificial Intelligence and Copyright, “AI has the potential to disrupt many professions, not just individual creators. The response to this disruption (e.g., support for worker retraining through institutions such as community colleges and public libraries) should be developed on an economy-wide basis, and copyright law should not be treated as a means for addressing these broader societal challenges.” Going down a typical copyright path of creating new rights and licensing markets could, for AI, serve to worsen social problems like inequality, surveillance and monopolistic behavior of Big Tech and Big Media.

Second, any new copyright regulation of AI should not negatively impact the public’s right and ability to access information, knowledge, and culture. A primary purpose of copyright is to expand access to knowledge. See Authors Guild v. Google, 804 F.3d 202, 212 (2d Cir. 2015) (“Thus, while authors are undoubtedly important intended beneficiaries of copyright, the ultimate, primary intended beneficiary is the public, whose access to knowledge copyright seeks to advance  . . . .”). Proposals to amend the Copyright Act to address AI should be evaluated by the impact such new regulations would have on the public’s access to information, knowledge, and culture. In cases where proposals would have the effect of reducing public access, they should be rejected or balanced out with appropriate exceptions and limitations.

Third, universities, libraries, and other publicly-oriented institutions must be able to continue to ensure the public’s access to high quality, verifiable sources of news, scientific research and other information essential to their participation in our democratic society. Strong libraries and educational institutions can help mitigate some of the challenges to our information ecosystem, including those posed by AI. Libraries should be empowered to provide access to educational resources of all sorts– including the powerful Generative AI tools now being developed.

Read our full comments here.

Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

We are excited to announce the public availability of ARCH (Archives Research Compute Hub), a new research and education service that helps users easily build, access, and analyze digital collections computationally at scale. ARCH represents a combination of the Internet Archive’s experience supporting computational research for more than a decade by providing large-scale data to researchers and dataset-oriented service integrations like ARS (Archive-it Research Services) and a collaboration with the Archives Unleashed project of the University of Waterloo and York University. Development of ARCH was generously supported by the Mellon Foundation.

ARCH Dashboard

What does ARCH do?

ARCH helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those datasets. In line with best practices in reproducibility, ARCH supports open publication and preservation of user-generated datasets. ARCH is currently optimized for working with tens of thousands of web archive collections, covering a broad range of subjects, events, and timeframes, and the platform is actively expanding to include digitized text and image collections. ARCH also works with various portions of the overall Wayback Machine global web archive totaling 50+ PB going back to 1996, representing an extensive archive of contemporary history and communication.

ARCH, In-Browser Visualization

Who is ARCH for? 

ARCH is for any user that seeks an accessible approach to working with digital collections computationally at scale. Possible users include but are not limited to researchers exploring disciplinary questions, educators seeking to foster computational methods in the classroom, journalists tracking changes in web-based communication over time, to librarians and archivists seeking to support the development of computational literacies across disciplines. Recent research efforts making use of ARCH include but are not limited to analysis of COVID-19 crisis communications, health misinformation, Latin American women’s rights movements, and post-conflict societies during reconciliation. 

ARCH, Generate Datasets

What are core ARCH features?

Build: Leverage ARCH capabilities to build custom research collections that are well scoped for specific research and education purposes.

Access: Generate more than a dozen different research-ready datasets (e.g., full text, images, pdfs, graph data, and more) from digital collections with the click of a button. Download generated datasets directly in-browser or via API. 

Analyze: Easily work with research-ready datasets in interactive computational environments and applications like Jupyter Notebooks, Google CoLab, Gephi, and Voyant and produce in-browser visualizations.

Publish and Preserve: Openly publish datasets in line with best practices in reproducible research. All published datasets will be preserved in perpetuity. 

Support: Make use of synchronous and asynchronous technical support, online trainings, and extensive help center documentation.

How can I learn more about ARCH?

To learn more about ARCH please reach out via the following form

TV News Record: Whoops, they said it again (on taxes)

A biweekly round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman

This week, we demonstrate GOP and Democrat talking points on taxes; display a case of mistaken facial identity; and present fact checks on the GOP tax proposal.

Whoops, they said it again

Was that House Minority Leader Nancy Pelosi, D., Calif., who said “tax cuts for the rich”? Or was that Senate Minority Leader Chuck Schumer, D., N.Y.? Wait: they both said it. Often.

Meanwhile, House Speaker Paul Ryan, R., Wis., keeps talking about tax reform being a “once in a generation opportunity,” and, coincidence!, so does Sen. Majority Leader Mitch McConnell, R., Ky. It’s a recurring theme.

These types of repeated phrases, often vetted via communication staff, are known as “talking points,” and it’s the way politicians, lobbyists, and other denizens of the nation’s capital sell policy. The TV News Archive is working toward the goal of applying artificial intelligence (AI) to our free, online library of TV news to help ferret out talking points so we can better understand how political messages are crafted and disseminated.

For now, we don’t have an automated way to identify such repeated phrases from the thousands of hours of television news coverage. However, searching within our curated archives of top political leaders can provide a quick way to check for a phrase you think you’re hearing often. Visit archive.org/tv to find our Trump archive, executive branch archive, and congressional archives, click into an archive, then search for the phrase within that archive.

Sample search results in the congressional archive

Funny, you look familiar

Wait, is this former President George W. Bush trying out a new look?

No, it’s not. This is Bob Massi, a legal analyst for Fox Business News and host of “Bob Massi is the Property Man.”  In a test run of new faces for our Face-o-Matic facial detection tool, Massi’s uncanny resemblance (minus the hair) to the former president earned him a “false positive” – the algorithm identified this appearance as Bush incorrectly.

This doesn’t get us too worried, as we still include human testers and editors in our secret sauce: we’ll retrain our algorithm to disregard photos of Massi in the TV news stream. It does point toward why we want to be very careful, particularly with facial recognition, where a private individual may be tracked inadvertently or a public official misrepresented. Our concern about developing ethical practices with facial recognition is why, for the present, we are restricting our face-finding to elected officials. We invite discussion with the greater community about ethical practices in applying AI to the TV News Archive at tvnews@archive.org.

In our current Face-o-Matic set we track the faces of President Donald Trump and the four congressional leaders in their TV news appearances. After receiving feedback from journalists and researchers, our next set will include living ex-presidents and recent major presidential party nominees: Jimmy Carter, Bill Clinton, George H.W. Bush, George W. Bush,  Barack Obama, Hillary Clinton, John McCain, and Mitt Romney. Stay tuned, while we fine tune our model.

Fact-check: everyone will get a tax cut (false)

In an interview on November 7, on Fox News’s new “The Ingraham Angle,” House Speaker Paul Ryan, R., Wis., says: “Everyone enjoys a tax cut all across the board.”

Pulling in information from the Tax Policy Center and a tax model created by the American Enterprise Institute, The Washington Post’s Fact Checker Glenn Kessler counters Ryan’s claim: “In the case of married families with children — whom Republicans are assiduously wooing as beneficiaries of their plan — about 40 percent are estimated to receive tax hikes by 2027, even if the provisions are retained.”

Ryan changed his language, according to Kessler, following an inquiry on November 8 from the Fact Checker. Now he is saying, “the average taxpayer in all income levels gets a tax cut.”

Fact-check: tax bill not being scored by CBO as is tradition (false)

In an interview on November 12 on CNN’s “State of the Union,” Senate Minority Whip Dick Durbin, D., Ill., claimed that the GOP tax plan is “not being scored by the Congressional Budget Office, as it is traditionally. It’s because it doesn’t add up.”

“Under the most obvious interpretation of that statement, Durbin is incorrect. The nonpartisan analysis for tax bills is actually a task handled by the Joint Committee on Taxation, and the committee has been actively analyzing the Republican tax bills,” reported Louis Jacobson of PolitiFact.

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

Face-o-Matic data show Trump dominates – Fox focuses on Pelosi; MSNBC features McConnell

For every ten minutes that TV cable news shows featured President Donald Trump’s face on the screen this past summer, the four congressional leaders’ visages were presented  for one minute, according an analysis of Face-o-Matic downloadable, free data fueled by the Internet Archive’s TV News Archive and made available to the public today.

Face-o-Matic is an experimental service, developed in collaboration with the start-up Matroid, that tracks the faces of selected high level elected officials on major TV cable news channels: CNN, Fox News, MSNBC, and the BBC. First launched as a Slack app in July, the TV News Archive, after receiving feedback from journalists, is now making the underlying data available to the media, researchers, and the public. It will be updated daily here.

Unlike caption-based searches, Face-o-Matic uses facial recognition algorithms to recognize individuals on TV news screens. Face-o-Matic finds images of people when TV news shows use clips of the lawmakers speaking; frequently, however, the lawmakers’ faces also register if their photos or clips are being used to illustrate a story, or they appear as part of a montage as the news anchor talks.  Alongside closed caption research, these data provide an additional metric to analyze how TV news cable networks present public officials to their millions of viewers.

Our concentration on public officials and our bipartisan tracking is purposeful; in experimenting with this technology, we strive to respect individual privacy and extract only information for which there is a compelling public interest, such as the role the public sees our elected officials playing through the filter of TV news. The TV News Archive is committed to doing this right by adhering to these Artificial Intelligence principles for ethical research developed by leading artificial intelligence researchers, ethicists, and others at a January 2017 conference organized by the Future of Life Institute. As we go forward with our experiments, we will continue to explore these questions in conversations with experts and the public.

Download Face-o-Matic data here.

We want to hear from you:

What other faces would you like us to track? For example, should we start by adding the faces of foreign leaders, such as Russia’s Vladimir Putin and South Korea’s Kim Jong-un? Should we add former President Barack Obama and contender Hillary Clinton? Members of the White House staff? Other members of Congress?

Do you have any technical feedback? If so, please let us know what they are by contacting tvnews@archive.org or participating in the GitHub Face-o-Matic page.

Trump dominates, Pelosi gets little face-time

Overall, between July 13 through September 5, analysis of Face-o-Matic data show:

  • All together, we found 7,930 minutes, or some 132 hours, of face-time for President Donald Trump and the four congressional leaders. Of that amount, Trump dominated with 90 percent of the face-time. Collectively, the four congressional leaders garnered 15 hours of face-time.
  • House Minority leader Nancy Pelosi, D., Calif., got the least amount of time on the screen: just 1.4 hours over the whole period.
  • Of the congressional leaders, Senate Majority Leader Mitch McConnell’s face was found most often: 7.6 hours, compared to 3.8 hours for House Speaker Paul Ryan, R., Wis.; 1.7 hours for Senate Minority Leader Chuck Schumer, D., N.Y., and 1.4 hours for Pelosi.
  • The congressional leaders got bumps in coverage when they were at the center of legislative fights, such as in this clip of McConnell aired by CNN, in which the senator is shown speaking on July 25 about the upcoming health care reform vote. Schumer got coverage on the same date from the network in this clip of him talking about the Russia investigation. Ryan got a huge boost on CNN when the cable network aired his town hall on August 21.

Fox shows most face-time for Pelosi; MSNBC, most Trump and McConnell

The liberal cable network MSNBC gave Trump more face-time than any other network. Ditto for McConnell. A number of these stories highlight tensions between the senate majority leader and the president. For example, here, on August 25, the network uses a photo of McConnell, and then a clip of both McConnell and Ryan, to illustrate a report on Trump “trying to distance himself” from GOP leaders. In this excerpt, from an August 21 broadcast, a clip of McConnell speaking is shown in the background to illustrate his comments that “most news is not fake,” which is interpreted as “seem[ing] to take a shot at the president.”

MSNBC uses photos of both Trump and McConnell in August 12 story on “feud” between the two.

While Pelosi does not get much face-time on any of the cable news networks examined, Fox News shows her face more than any other. In this commentary report on August 20, Jesse Waters criticizes Pelosi for favoring the removal of confederate statues placed in the Capitol building. “Miss Pelosi has been in Congress for 30 years. Now she speaks up?” On August 8, “Special Report With Bret Baier” uses a clip of Pelosi talking in favor of women having a right to choose the size and timing of her family as an “acid test for party base.”

Example of Fox News using a photo of House Minority Leader Nancy Pelosi to illustrate a story, in this case about a canceled San Francisco rally.

While the BBC gives some Trump face-time, it gives scant attention to the congressional leaders. Proportionately, however, the BBC gives Trump less face-time than any of the U.S. networks.

On July 13 the BBC’s “Outside Source” ran a clip of Trump talking about his son, Donald Trump, Jr.’s, meeting with a Russian lobbyist.

For details about the data available, please visit the Face-O-Matic page. The TV News Archive is an online, searchable, public archive of 1.4 million TV news programs aired from 2009 to the present.  This service allows researchers and the public to use television as a citable and sharable reference. Face-O-Matic is part of ongoing experiments in generating metadata for reporters and researchers, enabling analysis of the messages that bombard us daily in public discourse.

 

McConnell, Schumer, Ryan, Pelosi fact-checked clips featured in new TV News Archive collections

Today the Internet Archive’s TV News Archive unveils growing TV news collections focused on congressional leadership and top Trump administration officials, expanding our experimental Trump Archive to other newsworthy government officials. Together, all of the collections include links to more than 1,200 fact-checked clips–and counting–by our national fact-checking partners, FactCheck.org, PolitiFact, and The Washington Post‘s Fact Checker.

These experimental video clip collections, which contain more than 3,500 hours of video, include archives focused on Senate Majority Leader Mitch McConnell, R., Ky.; Sen. Minority Leader Charles (“Chuck”) Schumer, D., N.Y.; House Speaker Paul Ryan, R., Wis.; and House Minority Leader, Nancy Pelosi, D., Calif., as well as top Trump officials past and present such as Secretary of State Rex Tillerson and former White House Press Secretary Sean Spicer.

Download a csv of fact-checked video statements or see all the fact-checked clips.

Visit the U.S. Congress archive.

Visit the Executive Branch archive.

Visit the Trump Archive.

We created these largely hand-curated collections as part of our experimentation in demonstrating how Artificial Intelligence (AI) algorithms could be harnessed to create useful, ethical, public resources for journalists and researchers in the months and years ahead. Other experiments include:

  • the Political TV Ad Archive, which tracked airings of political ads in the 2016 elections by using the Duplitron, an open source audio fingerprinting tool;
  • the Trump Archive, launched in January;
  • Face-O-Matic, an experimental Slack app created in partnership with Matroid that uses facial detection to find congressional leaders’ faces on TV news. Face-O-Matic has quickly proved its mettle by helping our researchers find clips suitable for inclusion in the U.S. Congress Archive; future plans include making data available in CSV and JSON formats.
  • in the works: TV Architect Tracey Jaquith is experimenting with detection of text in the chyrons that run on the bottom third of cable TV news channels. Stay tuned.

Red check mark shows there’s a fact-check in this footage featuring House Minority Leader Nancy Pelosi, D., Calif. Follow the link below the clip to see the fact-check, in this case by The Washington Post’s Fact Checker.

At present, our vast collection of TV news –1.4 million shows collected since 2009–is searchable via closed-captioning. But closed captions, while helpful, can’t help a user find clips of a particular person speaking; instead, when searching a name such as “Charles Schumer” it returns a mix of news stories about the congressman, as well as clips where he speaks at news conferences, on the Senate floor, or in other venues.

We are working towards a future in which AI enrichment of video metadata will more precisely identify for fact-checkers and researchers when a public official is actually speaking, or some other televised record of that official making an assertion of fact. This could include, for example, camera footage of tweets.

Such clips become a part of the historical record, with online links that don’t rot, a central part of the Internet Archive’s mission to preserve knowledge. And they can help fact-checkers decide where to concentrate their efforts, by finding on-the-record assertions of fact by public officials. Finally, these collections could prove useful for teachers, documentary makers, or anybody interested in exploring on-the-record statements by public officials.

For example, here are two dueling views of the minimum wage, brought to the public by McConnell and Schumer.

In this interview on Fox News in January 2014, McConnell says, “The minimum wage is mostly an entry-level wage for young people.” PolitiFact’s Steve Contorno rated this claim as “mostly true.” While government statistics do show that half of the people making the minimum wage are young, 20 percent are in their late 20s or early 30s and another 30 percent are 35 or older. Contorno also points out that it’s a stretch to call these jobs “entry-level,” but rather are “in the food or retail businesses or similar industries with little hope for career advancement.”

Schumer presents a different assertion on the minimum wage, saying on “Morning Joe” in May 2014 that with a rate of $10.10/hour “you get out of poverty.” PolitiFact’s Louis Jacobson rated this claim as “half true”: “Since the households helped by the $10.10 wage account for 46 percent of all impoverished households, Schumer is right slightly less than half the time.”

These new collections reflect the hard work of many at the Internet Archive, including Robin Chin, Katie Dahl, Tracey Jaquith, Roger MacDonald, Dan Schultz, and Nancy Watzman.

As we move forward, we would love to hear from you. Contact us with questions, ideas, and concerns at tvnews@archive.org. And to keep up-to-date with our experiments, sign up for our weekly TV News Archive newsletter.