Building a Better Internet: Internet Archive Convenes DC Workshop

Photo of workshop participants, by Caralee Adams

Thought leaders from libraries, academia, and civil society gathered at Georgetown Law Center in Washington, D.C., on June 23 to discuss how to best advance policies that improve the ease, affordability and equity in how people access knowledge in the digital age.

Convened by the Internet Archive, this workshop was designed as a continuation of a conversation that the public interest community, including Internet Archive, Creative Commons, Public Knowledge, Library Futures, and the Wikimedia Foundation, started last summer around building a better internet centered on public interest values.

While U.S. lawmakers’ focus on internet policy has largely been directed at reigning in the “Big Tech” commercial platforms, this workshop took a different approach. Rather than centering the challenges with today’s for-profit, commercial platforms, the workshop centered the barriers libraries face and potential opportunities for them to help solve challenges with our digital information ecosystem.

Our hope as organizers was that we could map the terrain, find common ground, and identify areas for further discussion. And even in a short amount of time, we were able to do that in spades. Here are a few of our key learnings, and what’s next.

Key Opportunities

Participants recognized that libraries could fill an important gap in the current online environment, as they have done for hundreds of years offline–indeed, providing free access to high quality, trusted information is libraries’ primary mission. As our information ecosystem becomes increasingly digital, the world often looks to libraries to do even more. For instance, scholar Joan Donovan has suggested that platform companies hire 10,000 librarians to help curate their services and support access to quality information. Others have suggested libraries could be doing fact checking, building and hosting social media networks, and more. One important way to combat misinformation is with better information provided by libraries; however, this is not without its challenges.

Key Barriers

Participants identified two significant barriers for full library participation in the digital information ecosystem as media consolidation and copyright overreach by powerful publishers. The group discussed a wide range of possible solutions to these challenges including antitrust scrutiny, contract preemption, supporting a robust public domain, controlled digital lending, and digital ownership.


The group was motivated by a desire to serve the public over commercial interests, and expressed their commitment to making sure equity was woven through all proposals in a thoughtful and authentic manner. Libraries support access to information and creative empowerment for all. We understand that a better internet must work for everyone, including underserved and vulnerable populations.

As the organizers of this event, we are very grateful to all the participants for contributing their time and expertise to this effort. Up next, we will hold virtual workshops to include additional members of the community in these discussions followed by the publication of a longer report with our findings and policy recommendations. Stay tuned for future updates as this effort moves forward.


July Book Talk: The Library: A Fragile History

“A comprehensive and fascinating deep dive into the evolution of libraries… Bibliophiles should consider this a must-read.”—Publishers Weekly

Perfect for book lovers, this is a fascinating exploration of the history of libraries and the people who built them, from the ancient world to the digital age.

Join historian Abby Smith Rumsey for a book talk & conversation with Andrew Pettegree and Arthur der Weduwen, authors of The Library: A Fragile History.

REGISTER NOW

Many have decried the perilous state of the library in the 21st century, a situation that was made only worse when public libraries across the world were forced to shut their doors in the face of a global pandemic. But across centuries of existence, libraries have faced ruin from war, fire, neglect, and dispersal—only to be reborn again.

In The Library, historians Andrew Pettegree and Arthur der Weduwen trace the extraordinary history of the institution, from the famed collections of the ancient world to the modern public resource of today. Along the way, they encounter the librarians, historians, readers, supporters and antagonists that have shaped the library and its offerings over centuries. Do libraries last? Register for our book talk to find out from the authors.

Purchase a copy from our local bookstore, The Booksmith.

July Book Talk: The Library: A Fragile History
Historian Abby Smith Rumsey in conversation with authors Andrew Pettegree and Arthur der Weduwen.
July 20 @ 9am PT
Register now for this virtual event

ABOUT THE SPEAKERS:

Abby Smith Rumsey is a writer and historian focusing on the creation, preservation, and use of the cultural record in all media. She writes and lectures widely on analog and digital preservation, online scholarship, the nature of evidence, the changing roles of libraries and archives, and the impact of new information technologies on perceptions of history, time, and identity. She is the author of When We Are No More: How Digital Memory is Shaping our Future (2016).

Andrew Pettegree is Professor of Modern History at St Andrews University, where he directs the Universal Short Title Catalogue, a database of information about all books published before 1650. A leading expert on the history of book and media transformations, Pettegree is the award-winning author of several books on the subject. He lives in Scotland. 

Arthur der Weduwen is a historian and postdoctoral fellow at St. Andrews, where he serves as an associate editor of the Universal Short Title Catalogue. This is his fifth book. He lives in Scotland.

Save our Safe Harbor, continued: Internet Archive Supports Libraries and Nonprofits in Submission to the Copyright Office

As many of our readers will know, Section 512 of the Digital Millennium Copyright Act is the 1998 law that established the notice-and-takedown system that protects online platforms of all kinds—including, libraries, archives, and other nonprofits—from liability for the copyright infringement of others. While the law is not perfect, the safe harbor provided by the DMCA has been important in allowing libraries, nonprofits, and other smaller participants to harness the power of the internet and play a meaningful role in the online information ecosystem. More broadly, as our friends at the Wikimedia Foundation have noted, “Section 512 is crucial to the functioning of many of the most popular and important segments of the Internet, and the creative expression that happens there.”

Unfortunately, Section 512 has been under attack for some time. In addition to various legislative proposals, the United States Copyright Office has repeatedly been asked to conduct work on Section 512 that could threaten the safe harbor status of libraries and nonprofits and the communities of their patrons and users. In 2016, for instance, Internet Archive submitted comments to the Copyright Office’s first large Section 512 study, as outlined in a blog post entitled “Save our Safe Harbor“—there, we noted the special importance of the DMCA to “libraries and other nonprofit organizations” which rely in substantial part on volunteer communities and which “are unlikely to be able to bring to bear the sorts of resources [available to] larger commercial entities.” Then again in 2020, as the Copyright Office kept working towards Section 512 reform, the Internet Archive (in collaboration with the New York University Technology Law & Policy Clinic) urged the Copyright Office to consider how changes to the DMCA could have “disproportionately negative impacts on public service non–profits such as the Internet Archive and our patrons.”

This year, the Copyright Office is continuing with ever more work streams on DMCA reform. And while the conversation remains dominated by the commercial interests of some of the world’s largest corporations, Internet Archive has again submitted comments seeking to correct this imbalance. Most recently, in a May 27, 2022 comment on the Copyright Office’s study of Section 512(i) Standard Technical Measures, we emphasized that—notwithstanding industry attempts to use Section 512(i) to impose burdensome technical mandates which could threaten all but the largest commercial intermediaries—nothing in the law “admits of a standard technical measure which would impose substantial burdens and costs on libraries [and] non-profits.”

The DMCA Safe Harbors, while imperfect, have been essential to the ability of libraries, nonprofits, and others to develop public-interest-minded spaces online. And while much has changed since the DMCA’s enactment, it is as important as ever that our legal and regulatory systems allow library and other public interest spaces to flourish online.

June Book Talk: The Catalogue of Shipwrecked Books

“Wilson-Lee’s pioneering study makes Hernando’s life every bit as compelling as his father’s. But that is not all: as we accompany Hernando on his various European journeys of compulsive acquisition, we are not only led through a richly evoked early modern world, but also prompted to reflect on our own data-saturated age.” —The Times Literary Supplement

The Internet Archive invites you to watch a book talk with Edward Wilson-Lee, author of The Catalogue of Shipwrecked Books: Christopher Columbus, His Son, and the Quest to Build the World’s Greatest Library, followed by a conversation with Brewster Kahle, founder of the Internet Archive.

Purchase your copy from The Booksmith, our local bookstore.

In The Catalogue of Shipwrecked Books, Edward Wilson-Lee tells the compelling story of Hernando Colón, who sailed with his father Christopher Columbus on his final voyage to the New World, a journey that ended in disaster, bloody mutiny, and shipwreck. After Columbus’s death in 1506, eighteen-year-old Hernando sought to continue—and surpass—his father’s campaign to explore the boundaries of the known world by building a library that would collect everything ever printed: a vast holding organized by summaries and catalogues, the first database for the exploding diversity of written matter as the printing press proliferated across Europe.

Hernando held the groundbreaking conviction that a library of universal knowledge should include “all books, in all languages and on all subjects,” even material often dismissed: ballads, erotica, news pamphlets, almanacs, popular images, romances, fables. The loss of part of his collection to another maritime disaster in 1522, set off the final scramble to complete this sublime project, a race against time to realize a vision of near-impossible perfection.

Book Talk: The Catalogue of Shipwrecked Books
Author Edward Wilson-Lee in conversation with Internet Archive’s Brewster Kahle.
June 28 @ 10am PT
Watch the recording from the virtual event

Edward Wilson-Lee is a Fellow in English at Sidney Sussex College, University of Cambridge, and a specialist in the literature and the history of the book in the early modern period. He is the author of The Catalogue of Shipwrecked Books, Shakespeare in Swahililand and Translation and the Book Trade in Early Modern Europe.

GITCOIN Grants: Donate a Few Tokens, Defend a Public Treasure

CALLING ALL COMMUNITY MEMBERS:

In just a few months, the lawsuit Hachette v. Internet Archive will be heard in court. In 2020, four of the world’s largest publishers sued our non-profit library to stop us from digitizing books and lending them for free to the public. The publishers and the corporations who own them, including News Corp and Bertelsmann, are demanding $20 million in damages and that we destroy 1.4 million digitized books. What’s really at stake? The right of all libraries to own, digitize and lend books of any kind. (Here’s what Harvard’s copyright advisor has to say about the consequences of our case.) Starting today, make a small donation through Gitcoin and have an enormous impact for the defense of Internet Archive, through Gitcoin’s quadratic funding.

Today, Gitcoin Grant Round 14 opens, supporting advocacy groups around the world. When you donate even $1 worth of crypto to the Internet Archive, it can result in $3-400+ from the matching pool. Quadratic funding rewards the number of community members who give, along with the amount. So many small donations can really have an enormous impact.

This is an example of the matching funds allotted in a previous Gitcoin Grant round.

HOW TO DONATE:

  1. First you’ll need to create or log in your Github account. 
  2. Use that account to authorize in to Gitcoin.  
  3. Choose one or both of our gitcoin grants here:
  1. You’ll need a crypto wallet like Metamask or Rainbow Wallet with some Ethereum or other tokens.
  2. Select how much you want to donate. (For example: .003 ETH = about $5.00 US)
  3. Do you want to also add some money to the matching pool? Be sure to set an amount in that field as well.
  4. Hit the “I’m Ready to Checkout” button.
  5. In the drop down menu, pick Standard Checkout, Polygon, or zkSync.
  6. Connect and log in to your crypto wallet to pay.
  7. BONUS: You can verify your identity by creating a Gitcoin Passport via Ceramic to maximize the matching funds (up to 150%).
  8. The more people who give, the greater the percentage of the matching pool we receive.
Checkout module for the Gitcoin Grant 14 Advocacy Round.

Thank you for taking these steps to unleash huge support for the Internet Archive, helping us pay the millions of dollars in legal fees we have already incurred. Your support helps ensure the Wayback Machine, Open Library, and all our games, concerts, books and films will be available to you for free for a very long time.

We Can Rebuild It: Using the Internet Archive to Discover Original Order

Guest post by: Amanda Hill, Archivist of the Community Archives of Belleville and Hastings County, a member of the Community Webs program and a contributor to the Internet Archive.

One of the things archivists get excited about is the importance of ‘original order’. This is the idea that the arrangement of records by their creator has significance to our understanding of the records themselves. Wherever possible, archivists will try to determine the original order of materials in their care.

An item received at the Community Archives of Belleville and Hastings County in 2015 presented something of a puzzle in this respect. It was a scrapbook from the First World War, of newspaper clippings and other memorabilia which had been pasted into a printed book. The binding of the book had partially come apart and the early pages of the scrapbook had been jumbled into no particular order, with clippings dated 1917 mixed in with those from 1916.

Examination of the scrapbook revealed that its owner was Alice Deacon, born in Belleville, Ontario, on September 27th, 1899. She was the child of Daniel Deacon and his wife, Catherine Dugan. During the First World War, the Deacons were living at 107 Station Street, Belleville. They were Roman Catholics and Alice was probably a student at St. Michael’s Academy on Church Street.

Alice had three older brothers: James, Frederick and Francis (Frank). Frank joined the Canadian Expeditionary Force on March 23rd, 1916 in Belleville and it may have been this event which triggered Alice’s interest in the war. Frank’s service record is available from Library and Archives Canada.

The scrapbook mainly comprises cuttings from The Daily Intelligencer newspaper during the war, where Alice carefully recorded references to Belleville boys overseas, sometimes annotating the clippings with her own observations about whether a man had returned from the front, or which school he had attended.

Alongside the newspaper extracts are other more personal items, such as postcards, theatre programs, calling cards, invitations and ticket stubs. This page illustrates some of the variety:

Here we find an invitation, two pressed flowers “from ruins of a French village, May 1917” and a picture “off a box of chocolates Jim gave me for my birthday, 1916.”

Alice did not begin with blank pages: she used a copy of Richardson’s New Method for the Piano-Forte, originally published in 1859 by Nathan Richardson. In between Alice’s pastings, we can see parts of the text of the underlying book. Some of the pages still had visible page numbers, although most did not, but the majority had at least some legible words and phrases. This was the key to re-creating Alice’s original order.

We discovered that the Richardson book had been digitized by the University of North Carolina at Chapel Hill and was available online through the Internet Archive.

This digital copy proved essential in discovering the original order of the scrapbook. Using the Internet Archive’s searching facility, we were able to locate the identifiable words and match them to the page numbers of the original book. Once all the pages were identified, it was a simple matter to put them back into the order they would have been in when the book was intact.

Alice’s brother Frank came home safely from the war and was demobilized on May 23rd, 1919. Alice worked as a stenographer and bookkeeper in Belleville until 1929, when she married Leo Houlihan in St. Michael’s Church. She then left Belleville to live with Leo in Lindsay, Ontario. She died in 1955 and was buried in the Our Lady of Mercy Roman Catholic cemetery in Sarnia, Ontario.

Her scrapbook arrived back in Belleville by mail, sixty years after Alice’s death. Thanks are due to the anonymous donor for sharing this glimpse into a young woman’s wartime life, and also to our colleagues at the University of North Carolina and the Internet Archive for making it possible to reconstruct the scrapbook as it was when Alice first created it.

We have mainly been sharing our newspaper collection at the Internet Archive, but once it was digitized, we felt compelled to share Alice’s scrapbook there, too! 

A New Approach To Understanding War Through Television News: Introducing The TV News Visual Explorer & The Belarusian, Russian & Ukrainian TV News Archive

For more than 20 years, the Internet Archive’s Television News Archive has monitored television news, preserving more than 9.5 million broadcasts totaling more than 6.6 million hours from across the world, with a continuous archive spanning the past decade. Today just a small sliver of that archive is accessible to journalists and scholars due to the inaccessibility of video at this scale: fast forwarding through that much television news is simply beyond the ability of any human to make sense of. The small fraction of programs that contain closed captioning, speech recognition transcripts or OCR’d onscreen text can be keyword searched through the TV Explorer and TV AI Explorer, but for the majority of this global multi-decade archive, there has until now been no way for researchers to assess and understand the narratives of television news at scale, especially the visual landscape that distinguishes television from other forms of media and which is so central to understanding many of the world’s biggest stories from war to pandemics to the economy.

As the TV News Archive enters its third decade, it is increasingly exploring the ways in which it can preserve the domestic and international response to global events as it did with 9/11 two decades ago. As a first step towards this vision, over the last few months the Archive has preserved more than 46,000 broadcasts from domestic Belarusian, Russian and Ukrainian television news channels, including (in the order they were added to the Archive) Russia Today (part of the Archive since July 2010 but included in this collection starting January 1), Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16).

Why preserve television news coverage in a time of war? For journalists today it makes it possible to digest and report on how the war is being framed and narrated, with an eye towards how these narratives influence and shape popular support for the conflict and its potential future trajectory. For future generations of scholars, it makes it possible to look back at the contemporary information environment and prevailing public information, perspectives, and narratives.

While there are myriad options for the general public to watch these channels today in realtime, there is no research-oriented archival interface designed for journalists and scholars to understand their coverage at the scale of days to months, to scan for key visuals and events and to comment, discuss and illustrate how nations are portraying major stories.

To address this critical need, today we are tremendously excited to unveil the Television News Visual Explorer, a collaboration of the GDELT Project, the Internet Archive’s Television News Archive and the Media-Data Research Consortium to explore new approaches to enabling rapid exploration and understanding of the visual landscape of television news.

The Visual Explorer converts each broadcast into a grid of thumbnails, one every 4 seconds, displayed in a grid six frames wide and scrolling vertically through the entire program, making it possible to skim an hour-long broadcast in a matter of seconds. Clicking on any thumbnail plays a brief 30 second clip of the broadcast at that point, making it trivial to rapidly triage a broadcast for key moments. The underlying thumbnails can even be downloaded as a ZIP file to enable non-consumptive computational analysis, from OCR to augmented search.

Machines today can catalog the basic objects and activities they see in video and generate transcripts of their spoken and written words, but the ability to contextualize and understand the meaning of all that coverage remains a uniquely human capability. No person could watch the entirety of the Archive’s 6.6 million hours of broadcasts, yet even just the 46,000 broadcasts in this new collection would be difficult for a single researcher to watch or even fast forward through in their entirety. Television’s linear format means coverage has historically been consumed a single moment at a time like a flashlight in a darkened warehouse. In contrast, this new interface makes it possible to see an entire broadcast all at once in a single display, making television news “skimmable” for the first time.

The Visual Explorer and this new research collection of Belarusian, Russian and Ukrainian television news coverage represent early glimpses into a new initiative reimagining how memory institutions like the Archive can make their vast television news archives more accessible to scholars, journalists and informed citizens. Beneath the simple and intuitive interface lies an immensely complex and highly experimental set of workflows prototyping both an entirely new scholarly and journalistic interface to television news and entirely new approaches to rapidly archiving international television coverage of global events.

Over the coming weeks, additional channels from the TV News Archive will become available through the new Visual Explorer, as well as a variety of experiments with the new lenses that tools like automatic transcription and translation can offer in helping journalists and scholars make sense of such vast realtime archives.

Get Started With The Television News Visual Explorer!

About Kalev Leetaru

For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.

New additions to the Internet Archive for May 2022

Many items are added to the Internet Archive’s collections every month, by us and by our patrons. Here’s a round up of some of the new media you might want to check out. Logging in might be required to borrow certain items. 

Notable new collections from our patrons: 

Books – 52,300 New items in May

This month we’ve added books on varied subjects in more than 20 languages. Click through to explore, but here are a few interesting items to start with:

Audio Archive – 89,325 New Items in May

The audio archive contains recordings ranging from alternative news programming, to Grateful Dead concerts, to Old Time Radio shows, to book and poetry readings, to original music uploaded by our users. Explore.

LibriVox Audiobooks – 92 New Items in May

Founded in 2005, Librivox is a community of volunteers from all over the world who record audiobooks of public domain texts in many different languages. Explore.

78 RPMs and Cylinder Recordings – 112 New Items in May

Listen to this collection of 78rpm records, cylinder recordings, and other recordings from the early 20th century. Explore.

Live Music Archive – 807 New Items in May

The Live Music Archive is a community committed to providing the highest quality live concerts in a lossless, downloadable format, along with the convenience of on-demand streaming (all with artist permission). Explore.

Netlabels223 New Items in May

This collection hosts complete, freely downloadable/streamable, often Creative Commons-licensed catalogs of ‘virtual record labels’. These ‘netlabels’ are non-profit, community-built entities dedicated to providing high quality, non-commercial, freely distributable MP3/OGG-format music for online download in a multitude of genres. Explore.

Movies – 110 New Items in May

Watch feature films, classic shorts, documentaries, propaganda, movie trailers, and more! Explore.

Special Event: Universal Access to All Knowledge @ The New York Society Library

Saturday, June 4, 2:00 PM ET
The New York Society Library, 53 East 79th Street, Manhattan
or by livestream


Register now for the in-person session or the livestream.

Join Brewster Kahle of the Internet Archive for a special two-part presentation and
discussion on using this massive resource and on the societal and policy
issues affecting access to knowledge.

This is a great time to be an archivist and librarian—digital memory is ever more important and more difficult to manage.

Advances in computing and communications mean that we can cost-effectively store every book, sound recording, movie, software package, and public webpage ever created and provide access to these collections via the Internet to students and adults all over the world. By using mostly existing institutions and funding sources, we can build this, as well as compensate authors, within the current worldwide library budget. Technological advances, for the first time since the loss of the Library of Alexandria, may allow us to collect all published knowledge in a similar way. But now we can take the original goal another step further to make all the published works of humankind accessible to everyone, no matter where they are in the world.
 
Will we allow ourselves to re-invent our concept of libraries and archives to expand and to use the new technologies?  This is fundamentally a societal and policy issue. These issues are reflected in our governments’ spending priorities, and in law.

This event takes place in two interrelated hours:
2:00-3:00 PM – The Internet Archive: What It Is and How to Use It
3:00-4:00 PM – Universal Access to All Knowledge: Technologies, Societies, Legalities

Register now for the in-person session or the livestream.

Music Library Association Opens Publications at Internet Archive

For librarians who specialize in caring for music collections, it can be challenging to keep up with the latest technology and resources in the profession. The Music Library Association recently helped address this problem by making many of its publications openly available online.

The MLA donated 21 of its monographs to the Internet Archive for digitization and worked with authors to make the material free to the public under Creative Commons licenses. 

The new collection of backlist titles includes information on careers in music librarianship and history of the field. It also covers planning and building music library collections, which can be complicated and involve individual creators and small publishers, said Kathleen DeLaurenti, who helped lead the partnership with the Internet Archive in her role as MLA’s first open access editor. There are also valuable materials on music library approaches to technical services—everything from how to preserve music materials to how to bind and catalog them.

“Increasingly in librarianship, we have people who are being tasked to do this work who don’t have a specialized background, especially in smaller organizations, rural places, and public libraries,” DeLaurenti said. “We’re really excited to be able to make this content available to folks who may not have access to professional development in those spaces, and who may be looking for some materials to bolster their training and their own work.”

The MLA has been publishing new research of interest to music librarians since the 1970s and wanted to find a platform to make the information easier to discover, said DeLaurenti, director of the Arthur Friedheim Library at the Peabody Institute of Johns Hopkins University in Baltimore. The Internet Archive provided the open infrastructure to share and leverage the work of the MLA, which is a small organization with about 1,000 members.

While the MLA began with 21 of the monographs, it is working to obtain rights clearance for an additional 20 titles and DeLaurenti hopes the online collection will grow. So far, authors have been excited that the association is making their work available as it increases access for scholars with the potential for more citations of their research.

The audience for the online collection will likely be “accidental music librarians”—people tasked with music library responsibilities who aren’t musicians but are looking for professional development resources in the area, DeLaurenti said, as well as individuals considering music librarianship as a career.

“As libraries are looking at what kinds of open infrastructure is out there and available, I think the work that the Internet Archive has done through COVID has really changed our perception and how they can work as a potential collaborator in that space,” DeLaurenti said. “We hope to continue different kinds of collaborations with [the Archive] in the future.”

LEARN MORE