Tag Archives: democracys library

End of Term Web Archive – Preserving the Transition of a Nation

It’s that time again. The 2024 End of Term crawl has officially begun! The End of Term Web Archive #EOTArchive hosts an initiative named the End of Term crawl to archive U.S. government websites in the .gov and .mil web domains — as well as those harder-to-find government websites hosted on .org, .edu, and other top level domains (TLDs) — as one administrative term ends and a new term begins. 

End of Term crawls have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020. The results of these efforts is preserved in the End of Term Web Archive. In total, over 500 terabytes of government websites and data have been archived through the End of Term Web Archive efforts. These archives can be searched full-text via the Internet Archive’s collections search and also downloaded as bulk data for machine-assisted analysis.

The purpose of the End of Term Web Archive is to preserve a record of government websites for historical and research purposes. It is important to capture these websites because they can provide a snapshot of government messaging before and after the transition of terms. The End of Term Web Archive preserves information that may no longer be available on the live web for open access.

The End of Term Archive is a collaborative effort by the Internet Archive along with the University of North Texas (UNT), Stanford University, Library of Congress (LC), U.S. Government Publishing Office (GPO), and National Archives and Records Administration (NARA). Past partners include the University of CA’s California Digital Library (CDL), George Washington University, and the Environmental Data and Governance Initiative (EDGI).

Four images of Whitehouse.gov captured between 2008 and 2020
Whitehouse.gov captures from: 2008 Sept. 15; 2013 Mar. 21; 2017 Feb. 3; and 2021 Feb. 25

We are committed to preserving a record of U.S. government websites. But we need your help to complete the 2024 End of Term crawl. 

How can you help?! 

We have a list of top level domains from the General Services Administration (GSA) and from previous End of term crawls. But we need volunteers to help us out. We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive.

Submit a url nomination by going to digital2.library.unt.edu/nomination/eth2024/.
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible. 

Individuals and institutions nominating seed urls are recognized on the individual contributors leaderboard and the institutions leaderboard!

Explore the End of Term Web Archive with full text search and download the data!

The International Democracy’s Library Team Came Together for Presentations, Discussion, and a Workshop About Gov Docs (3.16.23)

Let’s Build It Together!

Video: https://archive.org/details/full-democracys-library-3.16.23-presentation

On March 16, 2023, the Internet Archive hosted the “Democracy’s Library Workshop: Community Collaboration.” This event marked the first public presentation and discussion of the Democracy’s Library Project since its inauguration at the 2022 Annual Event, following several months of research, supported by the Filecoin Foundation, from November 2022 to February 2023. The presentation, a collaboration between Internet Archive staff and a visiting government official, aims to preserve government information and make it much more meaningfully accessible to the public. The event was live-streamed and can be viewed at the provided video link.

Presentation includes:

  • Brewster Kahhale, founder of The Internet Archive, providing an introduction and discussing why we need to “Build Our Collections Together.”
  • Andrea Mills, Executive Director of Internet Archive Canada, discussing the incredible progress made in Canada working with their foundational partner, the University of Toronto, in digitizing government information. 
  • Jamie Joyce,  leading the Democracy’s Library initiative at Internet Archive in the U.S., reporting on the U.S. landscape analysis and stakeholder interviews.

To librarians and archivists: please know we are still collecting feedback from government information professionals. So if you are a librarian or archivist, we would love to hear from your experience. If you’re interested in sharing, please fill out this survey.

See existing Democracy’s Library here: https://archive.org/details/democracys-library 

Also, In Case You Missed Them…Recommendations and Strategic Plans from the GPO: 

Declaring Democracy’s Library (U.S.)

A video presentation of findings, an executive summary, and more to come from the United States team.

Video: https://archive.org/details/jamie-joyces-democracys-library-presentation

After the declaration of Democracy’s Library at the 2022 Internet Archive Annual Event [video], the U.S. team underwent a 4-month landscape analysis to discover the state of the United States’ collective knowledge management. 

Over the course of this blog series we’ll discuss our findings, including the various ways in which our federated national infrastructure contributes to the immense complexity which inhibits easy and meaningful access to the public’s information. 

But for now, we would like to share our executive summary. This piece is informed from interviews with librarians, archivists, information professionals, after review of various pieces of legislation, government agency reports, as well as consultation with government representatives at various departments, technologists working on civic-tech and gov-tech applications, and users of government information.

A huge thanks again to all who were interviewed, involved, and are excited about this program.

EXECUTIVE SUMMARY OF THE DEMOCRACY LIBRARY (U.S.) REPORT

    Every year, the United States government spends billions of dollars generating data: including reports, research, records, and statistics. Both governments and corporations know that this data is a highly valuable strategic asset. Yet meaningful access to this critical data is effectively kept out of the public’s hands. Though much of it is intended to be publicly accessible, we do not have a publicly-accessible central repository where we can search for all government artifacts. We do not have a public library of all government data, documents, research, records, and publications. These artifacts are not easy for everyone to get a hold of.

    Instead, this data is organized only to be kept behind paywalls, vended to multinational corporations, guarded by “data cartels,” or sits inaccessibly among thousands of disjointed agency websites, with non-standardized archival systems that are stewarded by under-resourced librarians and archivists. This data is siloed within agencies, never before linked together. Although by law, we are entitled to this data – by default, journalists, activists, democracy technologists, academics, and the public are deprived of meaningful access. Instead, it’s a pay to play system in which many are priced out.

    However, if we could reduce the public burden in accessing this knowledge – as the federal government has stated is a priority – then it might be the lynchpin to transforming democratic systems and making them more efficient, actionable, and auditable in the future. This work could potentiate a big data renaissance in political science and public administration. It could equip every local journalist with comprehensive, ‘investigative access’ to policy-making across the country. It could even provide key insights which ensure that democracy survives, thrives, adapts, and evolves in the 21st century; like so many desperately want it to and yet so many fear that it may never. To make our democracy more resilient and prepared for the digital age, we need Democracy’s Library. 

Democracy’s Library is a 10 year, multi-pronged, partnership effort to collect, preserve, and link our democracy’s data in a centralized, queryable repository. This repository of data will be sourced from all levels of the U.S. government, for the purpose of informing innovation, enabling transparency, advancing new fields like mass political informatics, and overall, digitizing our democracy. Access to this data is a necessary substrate for that innovation, and to propel our antiquated system into a lightning fast future, we need to overcome challenges from the artifact-level to the systems-level. 

    Fortunately, the Internet Archive is perfectly primed to comprehensively take on these challenges alongside our partners (like the Filecoin Foundation) through this new initiative, supported by a groundswell of legislative and political support. The time is right, the network is primed, and most of the tools are already built and being deployed. So, the only thing that remains is for funding partners to step up to scale the effort to revolutionize the U.S. government once again.

To librarians and archivists: please know we are still collecting feedback from government information professionals. So if you are a librarian or archivist, we would love to hear from your experience. If you’re interested in sharing, please fill out this survey.

See existing Democracy’s Library here: https://archive.org/details/democracys-library

Community Turns Out to Celebrate Promise of Democracy’s Library

Friends and supporters of the Internet Archive gathered October 19 at the organization’s headquarters in San Francisco to celebrate the launch of Democracy’s Library.

Plans to collect government documents from around the world and make them easily accessible online were met with enthusiasm and endorsements. Speakers at the event expressed an urgency to preserve the public record, make valuable research discoverable, and keep the citizenry informed—all potential benefits of Democracy’s Library. 

“If we really succeed — and we have to succeed — then Democracy’s Library might become an inspiration for openness in areas that are becoming more and more closed,” said Internet Archive founder Brewster Kahle. 

The 10-year project aims to make freely available the massive volume of government publications (from the U.S. and other democracies), including books, guides, reports, surveys, laws and academic research results, which are all funded with taxpayer money, but often difficult to find. 

To kick off the project, Kahle announced the Internet Archive’s initial contributions to Democracy’s Library:

  • United States .gov websites collected since 2008; 
  • Crawls of the U.S. state government websites;
  • Digitized microfilm and microfiche from the U.S. Government Publishing Office, NASA and other government entities;
  • Crawls of government domains from 200 other countries;
  • 50 million government PDF documents made into text searchable information.

It will be a collaborative effort, said Kahle, calling upon others to join in the ambitious undertaking to contribute to the online collection.

The need for Democracy’s Library

“We need Democracy’s Library. The Internet Archive’s work leading this project represents a critical step in the evolution of democracy,” said Jamie Joyce, executive director of The Society Library and emcee of the program. “Archives and libraries, as they’ve always done in the past, will continue to change in their scope, scale, and capabilities to be of critical use to society, especially democratic societies. Tonight is about witnessing another transformation.”

Although there is more data available than ever before, Joyce said, society’s knowledge management system is badly broken. Misinformation is rampant, while high quality government data is buried and scattered across different federal, state and local agencies. 

Having public material consolidated, digitized and machine readable will allow journalists, activists, and others to be better informed. It will also make democracy more transparent and accountable, as well as protect the historical documents. “We will not be able to compute in the future what we do not save today,” Joyce said.

At a time when polarized politics can put information at risk, the event highlighted the need to safeguard public data.

Gretchen Gehrke, co-founder of the Environmental Data and Governance Initiative, has been working in partnership with the Internet Archive to track changes in federal environmental websites. 

“People should be able to know about environmental issues and have a say in environmental decisions,” she said. “For the last 20 years, the majority of this information has been delivered through the web, but the right to access that information through the web is not protected.”

Gehrke described how public resources and tools related to the federal Clean Power Plan, a hallmark environmental regulation of the Obama administration, were taken down from the Environmental Protection Agency’s website under President Trump’s tenure. 

“There are no policies protecting federal website information from suppression or outright censorship,” Gehrke said. “This case serves as an example of why we need Democracy’s Library to preserve and provide continued access to these critical government documents.”

When statistics are being cited in policy debates, citizens need to be able to have access to sources of claims. For example, Sharon Hammond, chief operating officer of The Society Library, said documents related to the environmental impact of California’s Diablo Canyon power plant should be easily available. There are nearly 5 different government bodies that have some role in monitoring the plant’s ecological impact, but the agencies house the reports on their own websites. 

“Finding governmental records about public policy matters should not be a barrier to becoming an informed participant in these collective decisions,” Hammond said. “When we connect evidence directly to the claims and make that information publicly accessible as a resource, we can improve the public discourse.”

Hammond said a searchable, machine readable repository of government documents, with active links and a register of relevant government agencies, will dramatically increase meaningful access to the public’s information.

An international vision

The effort is an international one, and Canada has stepped forward as an early partner.

Canada has contributed crawls by the Library and Archives Canada of all the country’s government websites, as well as digitized microfilm and books from the Canadian Research Knowledge Network, Canadiana, and the University of Toronto.

Leslie Weir, librarian and archivist of Canada, spoke in support of the initiative. 

“We know by making our collection and work of government openly accessible, we will create a more engaged community, a community that participates in elections, school board meetings, in public consultations, and yes, even and especially in protests,” Weir said. “Access is the key to understanding. And understanding is the underpinning of democracy.”

Celebrating heroes

The festivities concluded with a tribute to Carl Malamud, recipient of the 2022 Internet Archive Hero Award. Corynne McSherry, legal director of the Electronic Frontier Foundation, presented the award. “Carl has always seen what the internet could be. He has dedicated his life to building that internet,” she said. “He is a true hero.”

Malamud said government information is more than just a good idea. “It is about the law. It is about our rulebook. It is the manual on how we, as citizens, choose to run our society. We own this manual,” he said. “We cannot honor our obligations to future generations if we cannot freely read and speak and even change that rulebook.”

Malamud urged the audience to get involved to realize the vision of Democracy’s Library and guarantee universal access to human knowledge. 

“This is our moment. We must build a distributed and interoperable internet for our global village. We must make the increase in diffusion of knowledge our mutual and everlasting mission,” Malamud said. “We must seize the means of computation and share their fruits with all the people. Let us all swim together in the ocean of knowledge.”

For more on Malamud’s career and contributions, read his profile here.

Introducing Democracy’s Library

Democracies need an educated citizenry to thrive. In the 21st century, that means easy access to reliable information online for all. 

To meet that need, the Internet Archive is building Democracy’s Library—a free, open, online compendium of government research and publications from around the world.

“Governments have created an abundance of information and put it in the public domain, but it turns out the public can’t easily access it,” said Internet Archive founder Brewster Kahle, who is spearheading the effort to collect materials for the digital library. 

By having a wealth of public documents curated and searchable through a single interface, citizens will be able to leverage useful research, learn about the workings of their government, hold officials accountable, and be more informed voters. 

Too often, the best information on the internet is locked behind paywalls, said Kahle, who has helped create the world’s largest digital library.

“It’s time to turn that scarcity model upside down and build an internet based on abundance,” Kahle said. There is a need for equitable access to objective, historical information to balance the onslaught of misinformation online.  

Libraries have long played a vital role in collecting and preserving materials that can educate the public. This mission continues, but the collections need to include digital items to meet the needs of patrons of the internet generation today.

Over the next decade, the Internet Archive is committing to work with libraries, universities, and agencies everywhere to bring the government’s historical information online. It is inviting citizens, libraries, colleges, companies, and the Wikipedians of the world to unlock good information and weave it back into the Internet.

Democracy’s Library will be celebrated at the October 19 event, Building Democracy’s Library, in San Francisco and online. 

Watch the livestream of Building Democracy’s Library:

The project is part of Kahle’s vision to build a better Internet—one that keeps the public interest above private profit. It is based on an abundance model, in which data can be uncovered, unlocked and reused in new and different ways. 

“We know there’s an information flood, but it’s not necessarily all that good,” Kahle said. “It turns out the information on the Internet is not very deep. If you know a subject well, you find that the best information is buried or not even online.”

Democracy’s Library is a move to make governments’ massive investment in research and publications open to all. 

Kahle added: “Democracy’s Library is a stepping stone toward citizens who are more empowered and more engaged.“

The first steps of Democracy’s Library are available online at https://archive.org/details/democracys-library.

An Update from Hugh Halpern, Director of the U.S. Government Publishing Office

What are some of the new initiatives from the U.S. Government Publishing Office? Director Hugh Halpern offers an update, which has been incorporated into our program for tonight’s Building Democracy’s Library event.

Many thanks to Director Halpern and the U.S. Government Publishing Office for sharing this update!