Author Archives: Caralee Adams

Library Leaders Forum Recap

This year’s Library Leaders Forum kicked off on October 12 with news of promising research, digitization projects and advocacy efforts designed to best shape the library of the future.

The virtual gathering also called on participants to take action in sharing resources and promoting a variety of public interest initiatives underway in the library community.

Watch session recording:

Chris Freeland, director of Open Libraries, moderated the first event of the 2022 forum with librarians, policy experts, publishers and authors. (A complete recording of the virtual session is available here) The second session will take place Oct. 19, live in San Francisco and via Zoom starting at 7 p.m. PT. (Registration is still open).

Libraries have a vital role to play in educating citizens, combating misinformation and preserving materials that the public can use to hold officials accountable. To help meet those challenges, Internet Archive Founder Brewster Kahle gave a preview of a new project: Democracy’s Library. The vision is to establish a free, open, online compendium of government research and publications from around the world.

“We have the big opportunity to help inform users of the internet and bring as good information to them as possible to help them understand their world,” said Kahle, who will launch the initiative next week and invited others to join in the effort. “We need your input and partnership.”

The virtual forum covered the latest on Controlled Digital Lending (CDL), the library practice that is growing in popularity in the wake of pandemic closures when physical collections were unavailable to the public. Freeland announced the 90th library recently joined the Open Libraries program, which embraces CDL as the digital equivalent of traditional library lending, allowing patrons to borrow one copy at a time of a title the library owns.

As librarians look for ways of safeguarding digital books, Readium LCP was highlighted as a promising, open source technology gaining popularity. Participants were encouraged in this same space to spread the word about the advocacy work of the nonprofit Library Futures, and recognize many authors who have recently offered public support for libraries, CDL and digital ownership of books.

Lila Bailey reported on an emerging coalition of nonprofits working on a policy agenda to build a better internet centered on public interest values. A forthcoming paper will outline four digital library rights that without which it would be impossible to function in the 21st century. They include the right to collect, preserve, lend and access material. This encouraging collaboration is the result of two convenings earlier this year, including one in Washington, D.C. in July.

CDL Community of Practice

A panel at the forum discussed projects within the CDL community of practice.

Nettie Lagace of the National Information Standards Organization gave an update on an initiative, funded by the Mellon Foundation, to create a consensus framework and recommendations on CDL. Working groups are focused now on considering digital objects, circulation and reserves, interlibrary loans and asset sharing. Public comments on the draft will be welcome in the coming months, with a final document likely released next summer.

Amanda Wakaruk a copyright and scholarly communications librarian at the University of Alberta, announced a new paper exploring the legal considerations of CDL for Canadian libraries. She is one of the co-authors on the research, along with others in the Canadian Federation of Library Associations. The preprint is available now and the final paper will be published soon in the journal, Partnership: The Canadian Journal of Library and Information Practice and Research.

Working with Project ReShare, the Boston Library Consortium is leveraging CDL as a mechanism for interlibrary loan. “BLC really believes that CDL is an extension of existing resource sharing practices, both in the legal sense–the same protections and opportunities afforded to interlibrary loan also apply to CDL,” said Charlie Bartow, executive director, “but, also in a services sense–that existing resource sharing systems and practices can be readily adapted to include CDL.”

Also, speaking in the session was Caltech’s Mike Hucka. He described efforts on his campus to provide students with learning materials when the pandemic hit by creating a simple model they named the Digital Borrowing System (DIBS).

In Canada, a large digitization project is underway at the University of Toronto, where 40,000 titles in the library’s government collection are being scanned and made available online for easier public access.

Take action

In the final segment, Freeland announced that Carl Malamud is the recipient of the 2022 Internet Archive Hero Award for his dedication in making government information accessible to all. Malamud will receive the Hero Award onstage at next week’s evening celebration, “Building Democracy’s Library.”

Freeland concluded the event with a final call to action: To join the #OwnBooks campaign. People are encouraged to take a photo of themselves holding a book they own that has special meaning, perhaps something that has influenced their career path or has sentimental value. As the Internet Archive fights for the right for libraries to own books, this is a chance to bring attention to the issue and build public support.

Internet Archive to Honor Carl Malamud with 2022 Hero Award

Carl Malamud, founder of Public.Resource.Org and a champion for making government information accessible to all, will receive the 2022 Internet Archive Hero Award. He will be presented the award at next week’s evening celebration, “Building Democracy’s Library.”

The Internet Archive Hero Award is an annual award that recognizes those who have exhibited leadership in making information available for digital learners all over the world. Previous recipients have included librarians Kanta Kapoor and Lisa Radha Vohra, copyright expert Michelle Wu, the Biodiversity Heritage Library, and the Grateful Dead.

This year, the Internet Archive is honoring Carl as a tireless advocate for free access to government information. Some highlights of his work include: 

  • In the early days of the internet, Carl was a pioneer in pushing for public materials to be available online. Over three decades, he has digitized and uploaded thousands of documents from Congressional hearings, government films, and worked with the executive branch to shape public policy on information sharing.
  • He is to thank for EDGAR (Electronic Data Gathering, Analysis, and Retrieval system) Online, the free Securities and Exchange Commission database of corporate information and putting the database of U.S. patents on the internet. 
  • Carl is relentless in his ongoing quest to have detailed codes for buildings, product safety, and infrastructure available to the public on the internet.
  • He founded Public.Resource.Org, a nonprofit based in California in 2007. Several contractors and pro-bono attorneys work with him to unleash public information from behind paywalls—sometimes landing him in court to defend his actions, all done in the name of the public good.
  • Carl is known as a dedicated, passionate, principled individual whose creative strategies—and, at times, dose of humor and flair—have fueled his success in opening up access to public knowledge.

Carl has been a supporter of the Internet Archive since its inception. Much of his work appears in the Internet Archive collection including his book, “Exploring the Internet,” a movie, Open Access Ninja, about his philosophy with Public Resource.org and a video, “Show Me the Manual,” about making building and electrical codes available.

Join with us in celebrating Carl at Building Democracy’s Library on October 19.  Register now

Stay tuned for a full profile on Carl’s work and impact next week here on the Internet Archive blog.

New eBook Protection Software Gaining Popularity Among Publishers and Libraries

A new digital rights management (DRM) technology that is open source—and embraced by publishers—is gaining traction in the library eBook world. 

Readium LCP was developed five years ago to protect digital files from unauthorized distribution. Unlike proprietary platforms, the technology is open to anyone who wants to look inside the codebase and make improvements. It is a promising alternative for libraries and users wanting to avoid the limitations of traditional DRM. 

“It’s important to have a decentralized, open source system for lending and vending eBooks,” said Brewster Kahle, Internet Archive founder. “LCP is a new generation of software protection that is proving popular with both libraries and publishers.” 

LCP is a flexible, vendor-neutral, low-cost solution against over-sharing of content for eBooks, as well as audiobooks. The codebase is open source with the exception of an algorithm that protects the files.

“LCP was developed in conjunction with publishers to make sure it would meet their criteria to safeguard the content of their books,” said Brenton Cheng, senior engineer at the Internet Archive. “Yet, it’s an open format, and not tied to one particular company or commercial entity. In that spirit of openness, it’s available to anyone who wants to protect their content.” 


A number of leading publishers, libraries and book distributors have adopted LCP, including:

  • HarperCollins integrated LCP into its Harlequin Plus subscription service. 
  • Academic publisher John Libbey Eurotext has adopted LCP for its 2022 publications.
  • Stockholm Public Library has incorporated LCP into its Bibblix mobile app for young readers.
  • Numilog has deployed LCP for more than 500,000 eBooks in French & English.
  • BiblioVault adopted LCP in 2021, serving more than 90 scholarly presses & 40,000 books.
  • The Palace Project has integrated LCP into its mobile apps.

Source: LCP adopters


It’s a simple system that allows readers to access eBooks and audiobooks—and does not limit the selection of titles from a single source (as with Amazon or Apple). 

It offers a large freedom in the choice of a reading solution, keeps intact the accessibility of digital publications and does not leak personal data, says Laurent Le Meur, chief technology officer, with EDRLab, the open source software development laboratory which develops LCP and receives funding from publishers, eBooks distributors, libraries and public bodies.

With LCP’s structure, there is no need to go through a third-party source to be authorized to download a protected book. Therefore, there is no threat of personal information being compromised. LCP is interoperable by design and socially engineered to be a sustainable, nonprofit DRM solution. 

“Open source technologies like LCP protect authors and their works,” said Maria Bustillos, editor at The Brick House Cooperative, a publishing platform designed, owned and operated by journalists. “As a publisher committed to preserving traditional library rights, The Brick House looks forward to exploring the integration of LCP into our forthcoming projects.”

As a new technology, LCP is being used around the world with Europe and Canada leading the way. For organizations working on accessibility, LCP is the natural solution they have been waiting for, said Le Meur. In 2025, the EU Accessibility act will require all distributors of digital publications to offer accessible services and LCP is a DRM format that complies with the mandate. 

“LCP is appealing because it’s not locked,” Cheng said. “There’s a greater sense that it might last. It has more transparency and accountability because the source code is out there and available for anyone to see.”


Image by Freepik

Colgate University Libraries Donates to Expanding Government Document Microfiche Collection

Case Library and Geyer Center for Information Technology, Colgate University. Photo credit: Colgate University Office of University Communications.

From 1970 to 2004, Colgate University amassed as many as 1.5 million microfiche cards with documents from the U.S. federal government. 

The small, private liberal arts institution housed the collection in a central location accessible to the former reference service point and the circulation desk in Hamilton, New York. 

“Every single campus tour that goes through the library walks past this collection. Our well meaning student ambassadors would announce ‘Here’s our microfiche that no one uses,’” said Debbie Krahmer, accessible technology & government documents librarian at Colgate. 

Since the popularity of the miniaturized thumbnails of pages waned several years ago, many libraries have struggled with what to do with their microfiche collections, as they contain important information but are difficult to use. 

Krahmer was looking for ways to offload the materials and discovered the Internet Archive would accept microfiche donations for digitization. It was a way to preserve the content, make it easier for the public to access, and avoid putting the microfiche in a landfill.

“These government documents are meant to be available and accessible to the general public. For many there’s still a lot of good information in this collection,” said Courtney L. Young, the university librarian. “While the microfiche has been stored in large metal cabinets on the main level, many of our users do not see them. This project will improve that visibility and accessibility.”

About the donation

In July, the Internet Archive arranged for the twelve cabinets of microfiche, each in excess of 600 pounds, to be loaded onto pallets and shipped to the Internet Archive for preservation and digitization. Materials include Census data, documents from the Department of Education, Congressional testimony, CIA documents, and foreign news translated into English. 

Microfiche cabinets ready for shipping to the Internet Archive for preservation and digitization.

Colgate also gave indexes of the microfiche that will be “game changers” for other government libraries once they are digitized because the volumes are expensive and hard to acquire, Krahmer added. 

Krahmer said the moving process with the Internet Archive was easy and would recommend the option to other librarians.

“This is a lot easier than trying to figure out how to get these materials recycled,” Krahmer said. “In addition to improving discovery and access, this supports the university’s sustainability plan. It’s going to get digitized, be made available online, and preserved. This is win-win no matter how you look at it.”

Public access to government publications

Government documents from microfiche are coming to archive.org based on the combined efforts of the Internet Archive and its Federal Depository Library Program library partners. The Federal Depository Library Program (FDLP), founded in 1813, provides designated libraries with copies of bills, laws, congressional hearings, regulations, and executive and judicial branch documents and reports to share with the public.

Colgate joins Claremont Colleges, Evergreen State College, University of Alberta, University of California San Francisco, and the University of South Carolina that have contributed over 70 million pages on over one million microfiche cards. Other libraries are welcome to join this project.

Music Library Association Opens Publications at Internet Archive

For librarians who specialize in caring for music collections, it can be challenging to keep up with the latest technology and resources in the profession. The Music Library Association recently helped address this problem by making many of its publications openly available online.

The MLA donated 21 of its monographs to the Internet Archive for digitization and worked with authors to make the material free to the public under Creative Commons licenses. 

The new collection of backlist titles includes information on careers in music librarianship and history of the field. It also covers planning and building music library collections, which can be complicated and involve individual creators and small publishers, said Kathleen DeLaurenti, who helped lead the partnership with the Internet Archive in her role as MLA’s first open access editor. There are also valuable materials on music library approaches to technical services—everything from how to preserve music materials to how to bind and catalog them.

“Increasingly in librarianship, we have people who are being tasked to do this work who don’t have a specialized background, especially in smaller organizations, rural places, and public libraries,” DeLaurenti said. “We’re really excited to be able to make this content available to folks who may not have access to professional development in those spaces, and who may be looking for some materials to bolster their training and their own work.”

The MLA has been publishing new research of interest to music librarians since the 1970s and wanted to find a platform to make the information easier to discover, said DeLaurenti, director of the Arthur Friedheim Library at the Peabody Institute of Johns Hopkins University in Baltimore. The Internet Archive provided the open infrastructure to share and leverage the work of the MLA, which is a small organization with about 1,000 members.

While the MLA began with 21 of the monographs, it is working to obtain rights clearance for an additional 20 titles and DeLaurenti hopes the online collection will grow. So far, authors have been excited that the association is making their work available as it increases access for scholars with the potential for more citations of their research.

The audience for the online collection will likely be “accidental music librarians”—people tasked with music library responsibilities who aren’t musicians but are looking for professional development resources in the area, DeLaurenti said, as well as individuals considering music librarianship as a career.

“As libraries are looking at what kinds of open infrastructure is out there and available, I think the work that the Internet Archive has done through COVID has really changed our perception and how they can work as a potential collaborator in that space,” DeLaurenti said. “We hope to continue different kinds of collaborations with [the Archive] in the future.”

LEARN MORE

Preserving Pro-Democracy Books From Shuttered Hong Kong Bookstore

Albert Wan ran Bleak House Books, an independent bookstore in Hong Kong, for nearly five years, before closing it in late 2021. The changing political climate and crackdown on dissent within Hong Kong made life too uncertain for Wan, his wife and two children. 

As they were preparing to move, Wan packed a box of books at risk of being purged by the government. He brought them on a plane back to the United States in January and donated them to the Internet Archive for preservation. 

The collection includes books about the pro-democracy protests of 2019—some photography books; another was a limited edition book of essays by young journalists who covered the event. There was a book about the Tiananmen Square massacre and volumes about Hong Kong politics, culture and history—most written in Chinese. 

“In Hong Kong, because the government is restricting and policing speech in a way that is even causing libraries to remove books from shelves, I thought that it would be good to digitize books about Hong Kong that might be in danger of disappearing entirely,” Wan said.

“I thought that it would be good to digitize books about Hong Kong that might be in danger of disappearing entirely.”

Albert Wan, owner of the now-closed Bleak House Books

Hearing that Bleak House Books would be shutting its doors, the Internet Archive reached out and offered to digitize its remaining books. As it happens, Wan said his inventory was dwindling quickly. So, he gathered contributions from others, and along with some from his own collection, donated about thirty books and some periodicals to the Internet Archive for preservation and digitization. Wan said he was amazed at how flexible and open the Archive was in the process, assisting with shipping and scanning the materials at no cost to him. (See Hong Kong Community Collection.)

Now, Wan wants others to do the same.

“There are still titles out there that have never been digitized and might be on the radar for being purged or sort of hidden from public view,” Wan said. “The hope is that more people would contribute and donate those kinds of books to the Archive and have them digitized so that people still have access to them.”

Do you have books you’d like to donate to the Internet Archive? Learn more.

Wan said he likes how the Internet Archive operates using controlled digital lending (CDL) where the items can be borrowed one at time, not infringing on the rights of the authors, while providing broad public access.

Before his family moved to Hong Kong for his wife’s university teaching job, Wan was a civil rights and criminal defense attorney in private practice. Now, they are all getting settled in Rochester, New York, where Wan plans to open another bookshop.

Goodbye Facebook. Hello Decentralized Social Media?

The pending sale of Twitter to Elon Musk has generated a buzz about the future of social media and just who should control our data.

Wendy Hanamura, director of partnerships at the Internet Archive, moderated an online discussion April 28 “Goodbye Facebook, Hello Decentralized Social Media?” about the opportunities and dangers ahead. The webinar is part of a series of six workshops, “Imagining a Better Online World: Exploring the Decentralized Web.” 

Watch the session recording:

The session featured founders of some of the top decentralized social media networks including Jay Graber, chief executive officer of R&D project Bluesky, Matthew Hodgson, technical co-founder of Matrix, and Andre Staltz, creator of Manyverse. Unlike Twitter, Facebook or Slack, Matrix and Manyverse have no central controlling entity. Instead the peer-to-peer networks shift power to the users and protect privacy. 

If Twitter is indeed bought and people are disappointed with the changes, the speakers expressed hope that the public will consider other social networks. “A crisis of this type means that people start installing Manyverse and other alternatives,” Staltz said. “The opportunity side is clear.” Still in the transition period if other platforms are not ready, there is some risk that users will feel stuck and not switch, he added.

Hodgson said there are reasons to be both optimistic and pessimistic about Musk purchasing Twitter. The hope is that he will use his powers for good, making it available to everybody and empowering people to block the content they don’t want to see. The risk is with no moderation, Hodgson said, people will be obnoxious to one another without sufficient controls to filter, and the system will melt down. “It’s certainly got potential to be an experiment. I’m cautiously optimistic on it,” he said.

People who work in decentralized tech recognize the risk that comes when one person can control a network and act for good or bad, Graber said. “This turn of events demonstrates that social networks that are centralized can change very quickly,” she said. “Those changes can potentially disrupt or drastically alter people’s identity, relationships, and the content that they put on there over the years. This highlights the necessity for transition to a protocol-based ecosystem.” 

When a platform is user-controlled, it is resilient to disruptive change, Graber said. Decentralization enables immutability so change is hard and is a slow process that requires a lot of people to agree, added Staltz.

The three leaders spoke about how decentralized networks provide a sustainable alternative and are gaining traction. Unlike major players that own user data and monetize personal information, decentralized networks are controlled by users and information lives in many different places.

“Society as a whole is facing a lot of crises,” Graber said. “We have the ability to, as a collective intelligence, to investigate a lot of directions at once. But we don’t actually have the free ability to fully do this in our current social architecture…if you decentralize, you get the ability to innovate and explore many more directions at once. And all the parts get more freedom and autonomy.”

Decentralized social media is structured to change the balance of power, added Hanamura: “In this moment, we want you to know that you have the power. You can take back the power, but you have to understand it and understand your responsibility.”

The webinar was co-sponsored by DWeb and Library Futures, and presented by the Metropolitan New York Library Council (METRO).

The next event in the series, Decentralized Apps, the Metaverse and the “Next Big Thing,” will be held Thursday, May 26 at 4-5 p.m.EST, Register here

Library as Laboratory Recap: Analyzing Biodiversity Literature at Scale

At a recent webinar hosted by the Internet Archive, leaders from the Biodiversity Heritage Library (BHL) shared how its massive open access digital collection documenting life on the planet is an invaluable resource of use to scientists and ordinary citizens.

“The BHL is a global consortium of the  leading natural history museums, botanical gardens, and research institutions — big and small— from all over the world. Working together and in partnership with the Internet Archive, these libraries have digitized more than 60 million pages of scientific literature available to the public”, said Chris Freeland, director of Open Libraries and moderator of the event.

Watch session recording:

Established in 2006 with a commitment to inspiring discovery through free access to biodiversity knowledge, BHL has 19 members and 22 affiliates, plus 100 worldwide partners contributing data. The BHL has content dating back nearly 600 years alongside current literature that, when liberated from the print page, holds immense promise for advancing science and solving today’s pressing problems of climate change and the loss of biodiversity.

Martin Kalfatovic, BHL program director and associate director of the Smithsonian Libraries and Archives, noted in his presentation that Charles Darwin and colleagues famously said “the cultivation of natural science cannot be efficiently carried on without reference to an extensive library.”

“Today, the Biodiversity Heritage Library is creating this global, accessible open library of literature that will  help scientists, taxonomists, environmentalists—a host of people working with our planet—to actually have ready access to these collections,” Kalfatovic said. BHL’s mission is to improve research methodology by working with its partner libraries and the broader biodiversity and bioinformatics community. Each month, BHL draws about 142,000 visitors and 12 million users overall.

“The outlook for the planet is challenging. By unlocking this historic data [in the Biodiversity Heritage Library], we can find out where we’ve been over time to find out more about where we need to be in the future.”

Martin Kalfatovic, program director, Biodiversity Heritage Library

Most of the BHL’s materials are from collections in the global north, primarily in large, well-funded institutions. Digitizing these collections helps level the playing field, providing researchers in all parts of the world equal access to vital content.

The vast collection includes species descriptions, distribution records, climate records, history of scientific discovery, information on extinct species, and records of scientific distributions of where species live. To date, BHL has made over 176,000 titles and 281,000 volumes available. Through a partnership with the Global Names Architecture project, more than 243 million instances of taxonomic (Latin) names have been found in BHL content.

Kalfatovic underscored the value of BHL content in understanding the environment in the wake of recent troubling news from the Sixth Assessment Report (AR6) published by the  Intergovernmental Panel on Climate Change about the impact of the earth’s warming. 

Biodiversity Heritage Library by the numbers.

“The outlook for the planet is challenging,” he said. “By unlocking this historic data, we can find out where we’ve been over time to find out more about where we need to be in the future.”

JJ Dearborn, BHL data manager, discussed how digitization transforms physical books into digital objects that can be shared with “anyone, at any time, anywhere.” She describes the Wikimedia ecosystem as “fertile ground for open access experimentation,” crediting the organization with giving BHL the ability to reach new audiences and transform its data into 5-star linked open data. “Dark data” that is locked up in legacy formats, JP2s, and OCR text are sources of valuable checklist, species occurrence, and event sampling data that the larger biodiversity community can use to improve humanity’s collective ability to monitor biodiversity loss and the destructive impacts of climate change, at scale.  

The majority of the world’s data today is siloed, unstructured, and unused, Dearborn explained. This “dark data” “represents an untapped resource that could really transform human understanding if it could be truly utilized,” she said. “It might represent a gestalt leap for humanity.” 

The event was the fifth in a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The final session of the Library as Laboratory series will be a series of lightning talks on May 11 at 11am PT / 2pm ET—register now!

Library as Laboratory Recap: Opening Television News for Deep Analysis and New Forms of Interactive Search

Watching a single episode of the evening news can be informative. Tracking trends in broadcasts over time can be fascinating. 

The Internet Archive has preserved nearly 3 million hours of U.S. local and national TV news shows and made the material open to researchers for exploration and non-consumptive computational analysis. At a webinar April 13, TV News Archive experts shared how they’ve curated the massive collection and leveraged technology so scholars, journalists and the general public can make use of the vast repository.

Roger Macdonald, founder of the TV News Archive, and Kalev Leetaru, collaborating data scientist and GDELT Project founder, spoke at the session. Chris Freeland, director of Open Libraries, served as moderator and Internet Archive founder Brewster Kahle offered opening remarks.

Watch video

“Growing up in the television age, [television] is such an influential, important medium—persuasive, yet not something you can really quote,” Kahle said. “We wanted to make it so that you could quote, compare and contrast.” 

The Internet Archive built on the work of the Vanderbilt Television Archive, and the UCLA Library Broadcast NewsScape to give the public a broader “macro view,” said Kahle. The trends seen in at-scale computational analyses of news broadcasts can be used to understand the bigger picture of what is happening in the world and the lenses through which we see the world around us.

In 2012, with donations from individuals and philanthropies such as the Knight Foundation, the Archive started repurposing the closed captioning data stream required of all U.S. broadcasters into a search index. “This simple approach transformed the antiquated experience of searching for specific topics within video,” said Macdonald, who helped lead the effort. “The TV caption search enabled discovery at internet speed with the ability to simultaneously search millions of programs and have your results plotted over time, down to individual broadcasters and programs.”

“[Television] is such an influential, important medium—persuasive, yet not something you can really quote. We wanted to make it so that you could quote, compare and contrast.”

Brewster Kahle, Internet Archive

Scholars and journalists were quick to embrace this opportunity, but the team kept experimenting with deeper indexing. Techniques like audio fingerprinting, Optical Character Recognition (OCR) and Computer Vision made it possible to capture visual elements of the news and improve access, Macdonald said. 

Sub-collections of political leaders’ speeches and interviews have been created, including an extensive Donald Trump Archive. Some of the Archive’s most productive advances have come from collaborating with outsiders who have requested more access to the collection than is available through the public interface, Macdonald said. With appropriate restrictions to maintain respect for broadcasters and distribution platforms, the Archive has worked with select scientists and journalists as partners to use data in the collection for more complex analyses.

Treating television as data

Treating television news as data creates vast opportunities for computational analysis, said Leetaru. Researchers can track word frequency use in the news and how that has changed over time.  For instance, it’s possible to look at mentions of COVID-related words across selected news programs and see when it surged and leveled off with each wave before plummeting downward, as shown in the graph below.

The newly computed metadata can help provide context and assist with fact checking efforts to combat misinformation. It can allow researchers to map the geography of television news—how certain parts of the world are covered more than others, Leetaru said. Through the collections, researchers have explored  which presidential tweets challenging election integrity got the most exposure on the news.  OCR of every frame has been used to create models of how to identify names of every “Dr.” depicted on cable TV after the outbreak of COVID-19 and calculate air time devoted to the medical doctors commenting on one of the virus variants.  Reverse image lookup of images in TV news has been used to determine the source of photos and videos.  Visual entity search tools can even reveal the increasing prevalence of bookshelves as backdrops during home interviews in the pandemic, as well as appearances of books by specific authors or titles. Open datasets of computed TV news metadata are available that include all visual entity and OCR detections, 10-minute interval captioning ngrams and second by second inventories of each broadcast cataloging whether it was “News” programming, “Advertising” programming or “Uncaptioned” (in the case of television news this is almost exclusively advertising).

From television news to digitized books and periodicals, dozens of projects rely on the collections available at archive.org for computational and bibliographic research across a large digital corpus. Data scientists or anyone with questions about the TV News Archives, can contact info@archive.org.

Up Next

This webinar was the fourth a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The next will be about Analyzing Biodiversity Literature at Scale on April 27. Register here.

Meet the Librarians: Alexis Rossi, Media & Access

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


Alexis Rossi has always loved books and connecting others with information. After receiving her undergraduate degree in English and creative writing, she became a book editor and then worked in online news. 

Alexis Rossi

In 2006, Rossi joined the staff of the Internet Archive. She was working on the launch of the Open Library project when she recognized the need to learn more about how to best organize materials. She enrolled at San Jose State University and earned her Master’s of Library and Information Science in 2010.

“It gave me a better grasp of how to hierarchically organize information in a way that is sensible and useful to other libraries,” Rossi said. “It also gave me better familiarity with how other more traditional libraries actually work—the types of data and systems they use.”

Rossi concentrated on web interfaces for library information, understanding digital metadata, and how to operate as a digital librarian. In addition to overseeing the Open Library project, at the Internet Archive, Rossi managed a revamp of the organization’s website, ran the Wayback Machine for four years, founded the webwide crawling program, and is currently a librarian and director of media & access.

“One of the themes of my life is trying to empower people to do whatever they want to do,” said Rossi, who grew up in Monterey, California, and now lives in San Francisco. “Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

“Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

Alexis Rossi, Media & Access

Rossi acknowledges she is privileged to have means to avail herself to an abundance of information, while many in other parts of the world do not. There are so many societal problems she cannot solve, Rossi said, but she believes her work is making a contribution.  

“We can build a library that allows people to access information for free, wherever they are, and however they can get to it, in whatever way. That, to me, is incredibly important,” Rossi said. It’s also rewarding to help patrons discover new information and recover materials they may have thought were lost, she added.

When she’s not working, Rossi enjoys making funky jewelry and elaborate cakes (a skill she learned on YouTube).

Among the millions of items and collections in the Internet Archive, what is Rossi’s favorite? Video and audio recordings of her dad, now 73, playing the piano, organ and accordion: “It’s just so good. It’s such a perfect little piece of history.”