Internet Archive Expresses Concerns Over Sweeping Copyright Reform Proposal

You may have heard that, in the waning days of 2020, controversial new copyright provisions were slipped into the end-of-year, must-pass COVID relief bill. Many commenters were troubled by this departure from the ordinary legislative process. Unfortunately, there are more controversial copyright revisions waiting in the wings.

Recently, Senator Thom Tillis released draft legislation which would substantially change the copyright landscape for the worse. It’s called the “Digital Copyright Act,” and our friends at the Electronic Frontier Foundation have described it as disastrous. The proposed Digital Copyright Act would change the rules that govern the Internet in a lot of ways, including requiring automated content filtering that would reduce access to knowledge. While the proposal nods towards making the rules better for Internet users, the draft legislation is still far better for Big Content and Big Tech than it is for libraries, non-profits and regular people.

Even small changes to copyright rules can have substantial consequences for the internet information ecosystem. That is why it is so important that sweeping proposals like this one not be passed in the dead of night, but instead be subject to rigorous study and open comment by everyone. We have drafted a short comment on this proposal which you can review here.

Search Scholarly Materials Preserved in the Internet Archive

Looking for a research paper but can’t find a copy in your library’s catalog or popular search engines? Give Internet Archive Scholar a try! We might have a PDF from a “vanished” Open Access publisher in our web archive, an author’s pre-publication manuscript from their archived faculty webpage, or a digitized microfilm version of an older publication.

We hope Internet Archive Scholar will aid researchers and librarians looking for specific open access papers that may not be otherwise available to them. Judith van Stegeren (@jd7g on Twitter), a PhD candidate in the Netherlands, encountered just such a situation recently when sharing a workshop paper on procedural generation in computer games: “Towards Qualitative Procedural Generation” by Mark R. Johnson, originally presented at the Computational Creativity & Games Workshop in 2016. The papers for this particular year of the workshop are not indexed in the usual bibliographic catalogs, and the original workshop website hosting the Open Access papers is no longer accessible. Fortunately, copies of all the 2016 workshop papers were captured in the Wayback Machine, and can be found today by searching IA Scholar by title or conference name.

As another example, dozens of papers from the Open Journal of Hematology are no longer resolvable via DOI. As mentioned in a previous blog post, the publisher’s website vanished and has been replaced with unrelated advertisements. But before that happened, the papers were captured in the Wayback Machine, indexed in our catalog, and can now be searched in full:

IA Scholar Search Results

IA Scholar is a simple, access-oriented interface to content identified across several Internet Archive collections, including web archives, archive.org files, and digitized print materials. The full text of articles is searchable for users that are hunting for particular phrases or keywords. This complements our existing full-text search index of millions of digitized books and other documents on archive.org.

The service builds on Fatcat, an open catalog we have developed to identify at-risk and web-published open scholarly outputs that can benefit from long-term preservation, additional metadata, and perpetual access. Fatcat includes resources that may be useful to librarians and archivists, such as bulk metadata dumps, a read/write API, command-line tool, and file-level archival metadata. If you are interested in collaborating with us, or are a researcher interested in text analysis applications, we have a public chat channel or can be contacted by email at info@archive.org.

IA Scholar marks a milestone in our work initiated in 2018 to leverage the automation and scale of web and API harvesting in providing open infrastructure for the preservation of and perpetual access to scholarly materials from the public web. We particularly want to thank the Mellon Foundation for their original and ongoing support of this work, our many current partners, and the other collaborators, contributors, and volunteers.

All of this is possible because of the incredible open research ecosystem built and collectively maintained by Open Access advocates. Thank you to the DOAJ and other groups for helping catalog open access journals which has aided preservation. Thank you to the Biodiversity Heritage Library and its supporters for digitizing print journal literature. And thank you to the many other organizations we have worked with, integrated, or whose services we have utilized, including open web indices (Unpaywall, CORE, CiteseerX, Microsoft Academic, Semantic Scholar), directories of open journals (DOAJ, ROAD SHERPA/ROMEO, JURN, Wikidata), and open bibliographic catalogs (Crossref, Datacite, J-STAGE, Pubmed, dblp). 

IA Scholar is built from open source software components, and is itself released as Free Software. The website has been translated into eight languages (so far!) by generous volunteers.

Leveling the Playing Field for Students with Print Disabilities

The Internet Archive is bringing more periodicals and scholarly resources to students directly and by working with disability offices in the United States, Canada and elsewhere.

As more students with disabilities pursue higher education, demand is growing for books, journal articles and other learning materials to be available in accessible formats. This includes digitizing print materials for people who are blind or have low vision, those with dyslexia or attention deficit/hyperactivity disorder (ADHD), and people with limited mobility who might have difficulty holding print documents.

The Internet Archive is part of an expanding effort to make it easier for people with print disabilities to access information by digitizing books, periodicals, and microfilm needed to succeed in school and beyond. Once print materials are converted to machine-readable formats, users can listen with a screenreader, text-to-speech software or other forms of audio delivery—starting, stopping, and slowing down the information flow, as well as change the colors of text and background of pages.

With 10 percent or more of students at colleges in the United States requesting accessibility accommodations (Government Accountability Office, 2009, p.37), providing digitized learning materials is critical. Each semester Disability Service Offices (DSOs) on campuses respond to student requests to convert materials into accessible formats—often doing so in silos with limited budgets.

Libraries are being called into action to coordinate the delivery of accessible instructional materials. Doing its part to improve access to knowledge for all, Internet Archive is collaborating with others to share its collection and streamline the search process.

A level playing field

“There is a need for a fast turnaround with materials. Students [with print disabilities] need a level playing field,” said John Unsworth, dean of libraries at the University of Virginia. “The library is not just here for the able-bodied.”

John Unsworth, University of Virginia

UVA is working with the Internet Archive, BookShare, and the HathiTrust to reduce duplication of efforts across the country to convert text materials to accessible formats. Together, they are participating in the Federating Repositories of Accessible Materials for Higher Education (FRAME) project funded with a $1 million grant from The Andrew W. Mellon Foundation.

Since 2019, the partners have established Educational Materials Made Accessible (EMMA), a hub and repository for digitized materials. The pilot includes six other universities: George Mason University, University of Virginia, Texas A&M University, University of Illinois at Urbana-Champaign, University of Northern Arizona, University of Wisconsin-Madison and Vanderbilt University.


“When looking at the intersection between copyright and civil rights…civil rights win every time”

John Unsworth, university librarian, University of Virginia

EMMA provides DSO staff (on behalf of students) with a central place to retrieve—and library staff re-deposit—machine-readable texts from the Internet Archive, HathiTrust, and Bookshare. It provides a searchable database to locate materials requested by students more efficiently. Users can filter by repository, format and accessibility features—which will become more valuable as texts are remediated. The project relies on the Internet Archive as a large digital repository to provide a federated network of storage and delivery, as well as technical expertise.

Unsworth said the goal of EMMA is to speed up access to materials and help DSOs avoid duplication. If faculty tinker with a syllabus and add a book at the last minute, students with print disabilities need to be able to have a copy they can use at the same time their peers do. “It’s the nature of education that what you need to read changes during the semester,” Unsworth said. “[Students with print disabilities] can’t get materials at the last minute when everyone else has had it for two weeks.”

Often, libraries are not involved in collecting, cataloguing, or preserving educational materials for people with disabilities on their own campus, or making them discoverable to others. EMMA is designed to connect DSOs and libraries on the same campus — and with other institutions. Once materials are remediated, DSOs put them in a drop box that the library validates with the new metadata and uploads it.  “Libraries shoulder the burden of sharing—and by doing that, they help fulfill their mission,” Unsworth said.

Despite publisher warnings about what DSOs can do with their remediated content, Unsworth said concerns are not supported by law. “When looking at the intersection between copyright and civil rights…civil rights win every time,” Unsworth said. “Libraries are used to pushing back on publisher claims. Libraries bring a willingness to stand up to appropriate use rights.”

A coordinating hub for materials was desperately needed and, Unsworth said, something DSOs have been waiting to have for years.


“Everyone should have the same shot at succeeding”

– Angella Anderson, disability specialist, University of Illinois at Urbana-Champaign

Angella Anderson, UIUC

Based on a student’s syllabus, Angella Anderson, a disability specialist at the University of Illinois at Urbana-Champaign, arranges for needed accessible materials for students at all levels—from undergraduates to law students to doctoral students. “We have several students who—without this service—would have had significant challenges being successful in their programs.”

Now, with EMMA, if a book or journal article a student needs is already shared on the hub, the DSO can download it and save time. Anderson estimates it has cut her time searching for learning materials by half. “The problem we’ve all had over the years is that we are converting the same book at the same time. That’s a huge resource drain,” Anderson said, noting the potential benefit of EMMA. “Everyone should have the same shot at succeeding at whatever it is they want to do, so I feel this will be extremely useful to a lot of schools and a lot of students.”

Canadian efforts advance

In Canada, the Internet Archive supports work of the Accessible Content E-Portal (ACE), a service of the Ontario Council of University Libraries. At the Internet Archive digitization center at the University of Toronto, staff digitize on demand and prioritize requests received by ACE from students who need materials for accessibility. The turnaround used to take weeks, but Andrea Mills, digitization program manager, said the system has been improved and students with print disabilities now can get materials digitized often in less than two days. 

Andrea Mills, Internet Archive

Mills said requested materials most often include non-fiction research books and novels, often printed between 1990 and 2010—before e-books were widely available. Elsewhere in Canada at the University of Alberta, another Internet Archive scanning center provides the same service, through their Accessibility Resources office, to students who have qualifying perceptual challenges.. 

“Sometimes people not part of the mainstream are forgotten,” Mills said. “It may only be a handful of users who have this need, and not represent a high number of downloads or uses, but these are people who truly need assistance.”

Learn more

Librarians: Join our free program to qualify your patrons to access the Internet Archive’s resources for users with print disabilities. Individuals can gain access by having a qualifying authority like the Vermont Mutual Aid Society enroll you in their program.

Register Now: A (re)Introduction to Book Scanning at the Internet Archive

The Internet Archive has been partnering with libraries to digitize their collections for more than 15 years. Following a recent viral video featuring our book digitization efforts, and increased demands for e-resources, we’ve had renewed interest in our book scanning partnerships, with libraries wondering how we might be able to help them reach their patrons through digitization. Join scanning center managers Andrea Mills and Elizabeth MacLeod for a virtual event to learn about the ways in which the Internet Archive can help turn your print collections digital, and the impacts that these digital collections are having on remote learners.

Registration for the virtual event is free and open to the public. The live session is being offered twice to accommodate schedules and flexibility; if you are interested in joining, you only need to register for one session:
March 24 @ 10am ET / 2pm GMT
March 25 @ 1pm ET / 5pm GMT

Howard University Joins Open Libraries, Embraces Digital Access for Students

Howard University’s Founders Library. Image courtesy Tyrone Turner / WAMU

Like campuses across the country, Howard University in Washington, D.C., shut down last March when COVID-19 hit. Most of its nearly 6,000 undergraduate students have been remote learning ever since.

Without access to the physical library, demand for e-books has increased.  The university recently joined the Open Libraries program to expand the digital materials that students can borrow. Through the program, users can check out a digital version of a book the library owns using controlled digital lending (CDL).

Amy Phillips, head of technical services for Howard University Libraries, learned about the opportunity last fall through the Washington Research Library Consortium. Howard is one of nine D.C.-area libraries in the nonprofit consortium, which recently collaborated with the Internet Archive to do an overlap analysis of its shared collection. When the digital materials became available to use for free through the consortia, Howard decided to join, too.

After Alisha Strother, metadata librarian, ran an analysis of books in the Howard collection by International Standard Book Number (ISBN), it was discovered that more than 14,000 books matched a copy that the Internet Archive had acquired and digitized. Howard decided to join the Open Libraries program in January. This means that students can now check out these Howard books from across the country as they engage in online instruction.

“I see this as being an important resource for students to be able to access materials from anywhere,” Phillips said. “And I think it will have value and be heavily utilized even when we are back on campus.”

Historic view of Howard University’s Founders Library.

Howard is one of nearly 100 historically black colleges and universities (HBCU) in the United States. One of Howard’s most important entities is the Moorland-Spingarn Research Center, which is recognized as one of the world’s largest and most comprehensive repositories for the documentation of the history and cultural of people of African descent in Africa, the Americas and other parts of the world. Portions of its materials are also now available for digital borrowing through the Open Libraries program. The collection will now have greater exposure since it had previously only been accessible onsite for researchers who scheduled appointments. 

“This opens up a premier collection to public usage. From a scholarly and cultural point of view, this material is very much in demand,” Phillips said. “Looking forward, we think it will get a lot of traffic.”

COVID-19 has disproportionately affected people of color, prompting Howard to be cautious and extending online learning into the spring semester for most all students, Phillips said.  The university is doing all it can to connect students with resources and its libraries have been investing more in digital items. But budgets are limited and licensing agreements curb the library’s ability to broadly lend e-books.

“The Internet Archive has been an important way to open up more library materials to students,” said Phillips, adding that it’s new and just beginning to be promoted to students and faculty. “We’re excited and we know this will have a positive impact on student success and scholarship.”

Register Now: New Developments in Controlled Digital Lending

Controlled Digital Lending (CDL) is growing in popularity, as is the community of practice around the library lending model. Next week, join Chris Freeland, director of Open Libraries at the Internet Archive, for a one-hour session covering new developments in CDL. Attendees will learn how libraries are using CDL, the emerging community around CDL, and the impacts of the library practice.

Register now
Registration for the virtual event is free and open to the public. The live session is being offered twice for your scheduling flexibility; if you’d like to join, you only need to register for one session:

Watch ahead
If you’re new to Controlled Digital Lending and would like to brush up before the session, check out the short video, Controlled Digital Lending Explained.

Behind the Scenes of the Decentralized Web Principles

Since 2016, a global community of developers, organizers, entrepreneurs, and academics have gathered to share ideas and approaches to building a Decentralized Web. The DWeb they dreamt of would stand in stark contrast to today’s Web, where a handful of powerful, centralized corporations rule over our data, social networks, and network infrastructure. The DWeb would enable people to have control over their own digital lives. In order to “lock the Web open,” DWeb infrastructure would be distributed itself, in ways that could be foolproof against concentrated control. And as Internet Archive founder Brewster Kahle also said, it’s a Web that needs to be more “private, secure, and fun”.

While several thousand people have participated in DWeb-related events and discussions organized by the Internet Archive, we still lacked a general consensus about the principles we collectively stand for. What values do we share beyond giving people more control, and not being “centralized”? What specific features did DWeb projects need to have to be considered, well, DWeb? These were the underlying questions that motivated us to create these DWeb Principles.

Evening at the Wayback Wheel at the Mushroom Farm, DWeb Camp 2019. Up against a backdrop of a foggy skyline, a colorful shade structure stands ahead with a lit path leading up to it.
Evening at the Wayback Wheel at the Mushroom Farm, DWeb Camp 2019

What are the shared values of the DWeb community?

The Internet Archive has been one of the lead organizers of DWeb events since 2014. As one of the world’s largest repositories of online knowledge and culture, the Internet Archive has a stake in ensuring that the Web remains free and open. It has brought together those who are transparent about their approaches and are interested in engaging across projects to learn and collaborate. 

Notes from a brainstorming session at DWb Camp 2019. Red and yellow post-its are scattered across poster papers that say "As a contributor I want...", "As a visitor I want..." and "As a company I want..."
Notes from a brainstorming session at DWb Camp 2019.

It would have been impossible to capture what the DWeb means for everyone in these principles. That’s why from the onset, this project was meant to exemplify the values of a specific group of people—those who continue to show up and engage in these conversations about the DWeb. That includes large organizations, community networks, individual developers, policy people, artists, and journalists. Each in their own way, they’re creating building blocks of a Decentralized Web that actively invites participation. 

Why create yet another set of principles?

As stewards, we felt that we needed to crystallize the shared vision of this community, to demonstrate how and why we are building a Decentralized Web. Our aim is to identify our guiding principles through discussion and distill them into a living document that we can point to. It is to create a set of practical guiding values as we design and build the Web of the future.

But beyond the document itself, the objective is to help set some ethical norms for the DWeb. If we all see ourselves as contributors to the Web, we hope these values will help people examine what they are building and for what purpose. It is to inspire projects that are driven by these values, and to hold each other accountable to ensure we continue to uphold them.

How We Developed DWeb Principles Version 1.0

John Conor Ryan and I began to work on these principles beginning in May 2020. Wendy Hanamura, Director of Partnerships at the Internet Archive, asked us jointly to lead this project. Though we agreed on many things, we also brought starkly different perspectives and experiences around what it meant to build tech for good. Those differences created a healthy environment for open exchange.

So how do you develop something as centralized as a unified set of principles for a diverse, decentralized group? In order to have something to work with, the two of us began by creating a draft ourselves. It was meant to be a starting point, a mound of clay that could be reformed and moulded by active participants in the DWeb space.

Group discussion at the Tree of Life, Mushroom Farm, DWeb Camp 2019. A group sits at the base of a large cypress tree.
Group discussion at the Tree of Life, Mushroom Farm, DWeb Camp 2019

Development Timeline

From there, we went through several rounds of reshaping, with the editing process involving over 30 individuals. These were the phases of its development:

Phase 1: Initial Draft — The stewards of the project, Mai and John, drafted a rough document for the DWeb community to discuss and consider. It was commented on and edited by other contributors. (May – Jun 2020)

Phase 2: First Feedback — Introduced the project to individuals in the DWeb community and solicited their comments and ideas. Presented the working draft at the DWeb Meetup on July 29, 2020. (Jun – Sep 2020)

Phase 3: Focus Groups — Held a series of focus group conversations with DWeb community members about the Principles to discuss intent, purpose, and future application (Sep – Dec 2020).

Phase 4: Revise Principles — Incorporate feedback from focus group discussions into the draft Principles. (Dec 2020)

Phase 5: Second Feedback & Gather Support — Solicit final round of feedback on the Principles. (Jan 2021)

Phase 6: Publish the Principles — Launch the first version of the Principles on the DWeb website. Hold DWeb Meet-up to launch the new website and present Principles. (Feb 2021)

Mesh network wiring configuration at DWeb Camp 2019. A server sits on a wooden foldable table with several ethernet cables plugged in and criss-crossed all over the table.
Mesh network wiring configuration at DWeb Camp 2019

The Result

Every single word in this document was thoroughly and repeatedly dissected and examined. What were the implications of certain terminologies? For example, what is presumed when we use the word “empower” versus “enable”? Why did we decide not to use the term “user”, and instead opt for “people” or “individual”? We worked with the contributors to be as deliberate as possible with our language. 

It was through this process that we illuminated something crucial about the aims of the Decentralized Web community: That it is about more than the technical infrastructure, it is about social and organizational norms and aspirations. Technical specifications can enable or prevent certain outcomes, of course. But what is fundamental and subversive about this Decentralized Web movement is that it is about elevating both individual and collective human agency. It is about creating more just and equitable relations between people, and creating networks that help us address the urgent challenges, not exacerbate them. 

A group discussion inside the Dome of Decentralization at DWeb Camp 2019. A group of people sit inside a white geodesic dome.
A group discussion inside the Dome of Decentralization at DWeb Camp 2019

A large part of this project was reflecting on the inherited dynamics that we take for granted with the internet we have today. By putting into words our shared ideals for a better web of the future, we had to shed certain assumptions about what constitutes success. 

Our contributors continued to point to other sets of principles that articulate values raised in this one, but often with more depth and clarity. We decided that it was important to acknowledge those other principles. The DWeb Principles are not designed to supplant these other frameworks, nor does pointing to them mean that all of those in this DWeb community agree with all that is said in them. It is meant to signify that we are not alone in our pursuit for more fun, equitable, and secure networked systems, and stand alongside these other communities’ efforts.

This process resulted in five overarching principles, with sub points that expand upon them. The principles are ordered from specific to general, beginning with more explicit technical features of a DWeb:

1) Technology for Human Agency
2) Distributed Benefits
3) Mutual Respect
4) Humanity
5) Ecological Awareness

Code of Conduct (in yellow) prominently displayed at the center of the Mushroom Farm, DWeb Camp 2019.
Code of Conduct (in yellow) prominently displayed at the center of the Mushroom Farm, DWeb Camp 2019

What Comes Next

We hope this is an accurate snapshot of the types of concerns that this DWeb community engages with and upholds as we strive to build better networks. We hope people will read it, share it, and even take what they agree with and remix it if they’d like. If someone were to be inspired by these principles, adapt it for their own needs and put forth their own version, we would see that as a success on its own. 

Being explicit about what a project stands for is a big first step in establishing trust, not just among its contributors, but also with the people who use their tools and services. A strong value statement allows others to hold organizations accountable, to ensure that they continue striving for their highest aspirations while doing all they can to avoid making harmful tradeoffs.

At least knowing where projects stand for, at least knowing what they care about, is a big first step in our ability to know which projects are worth investing in with our time, energy and attention. These principles define what values the DWeb community stands for, not just what it stands against. We hope this document will help guide those who are already creating the building blocks of the DWeb, and appeal to those who want to join the movement to build better, more resilient decentralized webs of connection and knowledge.

The Dome of Decentralization at night, DWeb Camp 2019. A colorful geodesic dome lit from the inside, silhouettes of people scattered in groups around it.
The Dome of Decentralization at night, DWeb Camp 2019

Mai Ishikawa Sutton at DWeb Camp 2019
Mai Ishikawa Sutton at DWeb Camp 2019

Mai Ishikawa Sutton is a co-founder and editor of COMPOST, an online decentralized magazine about the digital commons, Associate Producer of DWeb Projects and DWeb Camp 2019, and Digital Commons Fellow with the Commons Network. Their previous projects and employers include People’s Open Network, Oakland Public Library, Shareable, and Electronic Frontier Foundation.

John Conor Ryan, center, at DWeb Camp 2019
John Conor Ryan, center, at DWeb Camp 2019

John Conor Ryan has focused on corporate strategy, while thinking as a mathematician and physicist, looking at ways to succeed where the technology is new and difficult, and the path to success not evident. He previously was part of the People Centered Internet project, with Vint Cerf and MeiLin Fung, and with the One Laptop Per Child project his wife co-led.  John has more recently cofounded two startups based on decentralized technologies.

Giving “Last Chance Books” New Life Through Digitization

The Dedication of Books by H.B. Wheatley (1887), as presented for scanning. View the digitized book online.

Sometimes they arrive tied up in string because their binding is broken. Others are in envelopes to protect the brittle pages from further damage.

Aging books are sent from libraries to the Internet Archive for preservation. Thanks to the careful work of the nearly 70 people who scan at digitization centers in the United States, United Kingdom and Canada, the books get a second life with a new audience.

Scanners sometimes call these “Last Chance Books” and they take pride in restoring them. As they turn the pages one at time to be photographed and digitized, they develop a daily cadence—but it must be adjusted with fragile materials.

“We do our best with the flaking or cracking pages,” said Andrea Mills, digitization program manager for the Internet Archive stationed in Toronto, Canada. “You have to be really cautious that the flake doesn’t fall off and cover a word. It’s almost like a puzzle.”

Elizabeth MacLeod, demoing a Scribe in the foyer of the Internet Archive in San Francisco, pre-COVID.

Some books that land at the Internet Archive digitization centers date back to the 1700s. They are fiction and nonfiction, journals and pamphlets covering a range of topics. And, it can be surprising to learn what reviving the material means to patrons.

“We chuckled when we digitized a book on sea captains. We thought – who will care? And then a year later, it had hundreds of views,” said Elizabeth MacLeod, senior manager of satellite digitization services who manages remote operations out of Wilmington, North Carolina.

Digitization helps preserve materials that are no longer in circulation at their holding library because they are falling apart. It also gives new exposure to books that are out of print that may otherwise be forgotten.

Both Mills and MacLeod began working for the Internet Archive more than 10 years ago as book scanners – also known as Scribe operators. Mills has an arts degree in jewelry design and teaching; MacLeod studied biology. They were both drawn to the mission of the Internet Archive and share a passion of connecting people with resources.

A cart of “last chance books” awaiting digitization at the University of Toronto.

Over the years, Mills and MacLeod have worked closely with librarians and archivists around the world to digitize their collections, learning more with each project. They now manage digitization and support sites with training and best practices, many embedded in libraries, in 10 countries and upwards of 30 locations. Digitizing is a somewhat solitary task and some people “get in the zone” while scanning; others are very chatty or listen to music, Mills said.

Andrea Mills, showing off the Scribe to a tour celebrating the 2020 ALCTS Outstanding Collaboration Citation for digitizing a collection of Tamil materials at University of Toronto.

Many employees have worked together for nearly a decade and there is a friendly, collaborative vibe at the centers. “We have all sorts of people—artists, printers and photographers. They are people who are meticulous and love books,” Mills said. A recent viral video shared on the Internet Archive’s Twitter account features Scribe operator Eliza Zhang, who has worked at the Archive for more than ten years. Book conservators from larger institutional partners also offer additional training for Internet Archive operators on best practices for handling their unique collections.

MacLeod says the scanners are all committed to providing a service to readers and it’s satisfying to help people with disabilities connect with books, “It’s energizing to be part of an organization that is thinking outside the box,” she said. “I want people to be able to have more access to whatever they are trying to find.”

Added Mills: “I’m an information junky. I love the search and the hunt and the finding the answer. The power of the internet and digitization is that you can find that answer faster. It just sort of opens up the possibilities of what you can do.”

Library Holdings from the University of Tokyo Now Available Through the Internet Archive

Written by Professor Tom Gally, University of Tokyo, and Katie Barrett, Internet Archive & JET Program Alum
Translations by Tomoki Sakakibara, University of Tokyo

(日本語はページ下部にあります。 Scroll down for Japanese version.)

As our global society grows ever more connected, it can be easy to assume that all of human history is just one click away. Yet language barriers and physical access still present major obstacles to deeper knowledge and understanding of other cultures, even on the world wide web. That is why the Internet Archive is thrilled to announce a new partnership with the University of Tokyo General Library. Spearheaded by Masaya Nakatake as a member of the UTokyo Academic Archives Project Office, the Internet Archive partnership provides expanded access and a digital backup for some of the library’s most precious artifacts. 

Since June 2020, our Collections team has worked in tandem with library staff to ingest thousands of digital files from the General Library’s servers, mapping the metadata for over 4,000 priceless scrolls, texts, and papers. The collection, representing meticulous digitization efforts by Japanese historians and scholars, showcases hundreds of years of rich Japanese history expressed through prose, poetry, and artwork. 

Among the highlights of the holdings are manuscripts and annotated books from the personal collection of the novelist Mori Ōgai (1862–1922), an early manuscript of the Tale of Genji, and a unique collection of Chinese legal records from the Ming Dynasty.

しんよし原大なまづゆらひ

Most of the works are written in Japanese, but some of them include illustrations that can be appreciated by anyone now. A search through the collection for 地震 (jishin, “earthquake”), for example, yields a fascinating set of depictions of earthquakes and their impact in past centuries.

In one satirical illustration, thought to date from shortly after the 1855 Edo earthquake, courtesans and others from the demimonde, who suffered greatly in the disaster, are shown beating the giant catfish that was believed to cause earthquakes. The men in the upper left-hand corner represent the construction trades; they are trying to stop the attack on the fish, as rebuilding from earthquakes was a profitable business for them.

Oreste ed Elettra

Seismic destruction is expressed more horrifically in ukiyo-e prints of the burning of Edo (Tokyo) after that same 1855 earthquake and of buildings collapsing during the 1891 Mino-Owari earthquake. They are a sobering reminder of the role that natural disasters have played in Japanese life.

Other highlights are high-resolution images from the Kamei Collection of original etchings from Opere di Giovanni Battista Piranesi, Francesco Piranesi e d’altri, originally published by Firmin Didot Freres in Paris between 1835 and 1839. 

We hope this partnership and collection will expand access to history and culture from Japan and spur a new generation of usage and scholarship.

About the University of Tokyo General Library

The University of Tokyo was established in 1877 as the first national university in Japan. As a leading research university, UTokyo offers courses in essentially all academic disciplines at both undergraduate and graduate levels and conducts research across the full spectrum of academic activity. The University of Tokyo Library System is composed of 30 libraries, with the General Library being the largest among them. While providing services to the researchers and students of UTokyo, the General Library also plays a central role in the operation and management of the Library System. The General Library’s history can be traced back nearly 130 years to the university’s founding and it now houses approximately 1.3 million books, including rare collections inherited from academies in the Edo period. 

About the Internet Archive

The Internet Archive is one of the largest libraries in the world and home of the Wayback Machine, a repository of 475 billion webpages. Founded in 1996 by Internet Hall of Fame member Brewster Kahle, the Internet Archive now serves more than 1.5 million patrons each day, providing access to 70 petabytes of data—books, web pages, music, television, and software—and working with more than 800 library and university partners to create a digital library, accessible to all. To make a donation to the Internet Archive, please visit https://archive.org/donate/

東京大学総合図書館の所蔵資料がインターネットアーカイブから利用可能に

世界のネットワーク化がかつてなく進んだ現在、人間の歴史のどんなことでもワンクリックで調べられるようになったと思いがちです。しかし、外国文化についての知識や理解を深めるうえで、たとえインターネット上であっても言語の障壁や物理的制約が大きな妨げとなる現状は変わりません。それだからこそ、インターネットアーカイブでは、東京大学総合図書館との新たな提携を発表できることをたいへん嬉しく思います。インターネットアーカイブと東大総合図書館が共同で進める本事業は、東京大学学術資産アーカイブ化推進室の中竹聖也氏を中心として進められ、東大総合図書館が所蔵する極めて貴重な資料群のデジタル・バックアップを提供し、アクセスを拡大するものです。

インターネットアーカイブのコレクションチームでは2020年6月より、同館スタッフと協力して東大総合図書館のサーバーから数千単位のデジタルファイルを取り込み、4000点以上の極めて貴重な巻物、写本、資料などのメタデータ整備を進めてまいりました。このコレクションは日本の歴史家や研究者による緻密なデジタル化作業から生まれたものであり、何世紀にも及ぶ日本の豊かな歴史が散文、韻文、図像によって表現されています。

特筆すべき資料としては、森鴎外 (1862-1922) の個人文庫に収められていた鴎外自筆の写本や鴎外本人による書き込みがある書物を集めた鴎外文庫、源氏物語の初期の写本、さらに、中国明代中期の条例(皇帝の判断に基づく法令や先例)をまとめた皇明條法事類纂(同館以外での所蔵は確認されていません)などがあります。

日本語で書かれた作品ではありますが、どなたでも理解できる図像入りの作品も少なくありません。たとえば、「地震」を検索語としてこのコレクションを検索すれば、過去数世紀に起きた地震とその影響が描かれた興味深い資料群を閲覧できます。

1855年(安政2年)の安政江戸地震直後の作と思われるある風刺画には、地震で大被害を被った吉原の遊女や町の人々が、地震の元凶と信じられていた大鯰(おおなまず)を懲らしめている様子が描かれています。左上の男たちが止めに入ろうとしていますが、それは震災後の復興で商売が潤った建築職の人たちだからです。

地震による被害がもっと恐ろしく表現されているのが、同じく安政江戸地震後に発生した江戸の火災や、1891年(明治24年)の濃尾地震による家屋倒壊が描かれた浮世絵です。いずれも自然災害が日本人の暮らしにおいて果たしてきた役割を改めて思い起こさせてくれる作品です。

そのほかの特筆すべき資料としては、亀井文庫『ピラネージ版画集 Opere di Giovanni Battista Piranesi, Francesco Piranesi e d’altri』 (1835-1839、パリ、フィルマン・ ディド兄弟出版社刊) の高精細画像が挙げられます。

インターネットアーカイブでは、東大総合図書館との提携と本コレクションの公開によって、日本の歴史と文化が世界からいっそうアクセスしやすいものとなり、新たな世代の資料活用と学術研究が促進されることを願っています。

東京大学総合図書館について

東京大学は1877年(明治10年)に創設された日本最初の国立大学です。世界トップクラスの総合研究大学として、広範な専門分野における学部・大学院レベルの教育活動と、あらゆる学術領域にわたる研究活動をおこなっています。東京大学総合図書館は30の部局図書館・室で構成される東京大学附属図書館の中で最大の図書館であり、東大の研究者・学生向けのサービスをおこなうとともに、東大附属図書館の事務・業務を支える上で中心的な役割を果たしています。130年近くに及ぶその歴史は東大創設当初にまでさかのぼります。所蔵図書数は約130万冊にのぼり、江戸時代の学問所から継承された貴重な資料群も所蔵しています。

インターネットアーカイブについて

インターネットアーカイブ (Internet Archive) は世界有数のライブラリ(図書館)であり、4750億ページもの保存済みウェブサイトをアーカイブ検索できる「ウェイバックマシン (Wayback Machine) 」を運営する非営利法人です。インターネットの発展への功労者を表彰する「インターネットの殿堂 (Internet Hall of Fame)」に名を連ねるブリュースター・ケール (Brewster Kahle) によって1996年に設立されました。現在、一日150万人以上のアクティブユーザーに、書籍やウェブページ、音楽、TV、ソフトウェアなど70ペタバイト(約7万テラバイト)規模のデータへのアクセスを提供するとともに、世界800以上の図書館や大学と連携して、万人に開かれたデジタル図書館の構築を進めています。インターネットアーカイブへの寄付は https://archive.org/donate/ で受け付けています。

(訳:榊原知樹)

Mythbusting Controlled Digital Lending: Community Rallies to Fight Misinformation About the Library Practice

Academics, legal experts, and authors explained the thoughtful reasoning and compelling need for libraries to engage in Controlled Digital Lending (CDL) at a webinar hosted by the Internet Archive and Library Futures on February 11. A recording of the session is now available.

The panel dispelled myths about CDL, the digital lending model in which a library lends a digital version of a print book it owns. Emphasizing the limited and controlled aspect of the practice, the speakers said CDL allows libraries to fulfill their mission of serving the public in the digital age. The global pandemic only underscores the importance of providing flexibility in how people can access information.

Isn’t CDL digital piracy? No, CDL is not like Napster, said Kyle K. Courtney, copyright advisor at Harvard University, referring to the music file-sharing service. Twenty years ago, the actions of Napster were ruled illegal because it made unlimited reproductions of MP3 music to anyone, anywhere.

“CDL uses technology to replicate a library’s right to loan works in a digital format—one user at a time,” Courntey said. Libraries are using rights they already have, leveraging the same technology as publishers to make sure that the books are controlled when they’re loaned—not duplicated, copied or redistributed.

“Libraries are not pirates. There is a vast difference between the Napster mission and the library mission,” Courtney said. “We can loan books to patrons. Only now we’re harnessing that right in the digital space.”

In laying out the rationale behind CDL, Courtney described the “superpower” granted to libraries by Congress through copyright law to serve the public. The “fair use” section of the law allows libraries to responsibly lend materials, and experts say logically includes both print and digital works.

The webinar featured the premiere of “Controlled Digital Lending Explained,” a short video that describes how CDL works.

The idea of “fair use” has been around as long as there has been copyright, and it applies to new technologies, said Michelle Wu, attorney and law librarian at the webinar. The Internet Archive did not invent CDL. Wu is the visionary behind CDL, developing the concept in 2002 as a way to protect a library’s print collection from natural disaster—an imperative she faced in rebuilding a library destroyed by flooding.

Just as libraries lend out entire books, fair use allows the scanning of whole books, said panelist Sandra Aya Enimil, copyright librarian and contract specialist at Yale University. The law makes no mention of the amount of material that can be made available under “fair use,” so for libraries to fulfill their purpose they can make complete books—whether in print or digital—available to patrons, she said.

It’s a myth that librarians need author and publisher permission for CDL, explained Jill Hurst-Wahl, copyright scholar and professor emerita in Syracuse University’s School of Information Studies. “Authors and publisher control ends at the time a book is published, then fair use begins,” she said. “Once a work is legally acquired by you, by a library, the copyright owners’ rights are exhausted.”

Library lending is viewed as fair use, in part, because it is focused on socially beneficial, non-commercial outcomes, like literacy, said Hurst-Wahl. Also, libraries loan physical books without concern about the market effect—so the same rules apply if a digital version of the book is substituted.

CDL does not harm authors or publisher sales, the panelists emphasized. Indeed, it can provide welcome exposure.

“The reality is that CDL can help authors by enhancing discoverability, availability and accessibility of their works,” said Brianna Schofield, executive director of Authors Alliance, speaking at the event. “It helps authors to spread their ideas, and it helps authors to build their audiences.”

Many of the books that are circulated by CDL are rare, out-of-print books that would otherwise be unavailable. This source material can be useful for writers as they develop their creative works.

“Digital and physical libraries contribute to a healthy publishing ecosystem and increase sales and engagement for creative works,” said Jennie Rose Halpin, executive director of Library Futures, a newly formed nonprofit coalition advocating for libraries to operate in the digital space. Research shows that leveraged digitization increases sales of physical additions by about 34% and increases the likelihood of any sale by 92%, particularly for less popular and out-of-print works.

Because digitized versions can be made more readily available, CDL can extend access to library collections to people with print disabilities or mobility issues, the panelists noted. CDL also allows libraries to preserve material in safe, digital formats with the best interest of the public—not profits—at the center of its work.

“People love books and will buy if they’re able. But we have to remember that paper books and even some ebooks do not serve the needs of all readers,” said Andrea Mills, digitization program manager at the Internet Archive and lead on the Archive’s accessibility efforts. “Accessibility is a human right that must be vigilantly protected.”

For anyone interested in learning more about how to get involved with CDL, the Internet Archive now has 2 million books available to borrow for free, and an active program for libraries that want to make their collections available through CDL. 

“The CDL community of practice is thriving,” said Chris Freeland, director of Open Libraries at the Internet Archive.  “We are in a pandemic. Libraries are closed. Schools are closed. CDL just makes sense and solves problems of access.”

 To learn more about CDL, and to show your support for the library practice, join the #EmpoweringLibraries movement.