Tag Archives: archive-it

Community Webs Seeks Applicants from the US, Canada and Around the World

The Internet Archive is seeking applicants for its next cohort of Community Webs! We are thrilled to announce that the program is now open to additional cultural heritage organizations in the US, as well as any public library or local memory organization in Canada and internationally.

Community Webs provides infrastructure and services, training and education, and professional community cultivation for public libraries and cultural heritage organizations to document local history and the lives of their communities. Launched in the US in 2017 with kickoff funding from the Institute of Museum and Library Services (IMLS), Community Webs began expanding nationally in 2020 with generous support from The Andrew W. Mellon Foundation. Building on the program’s success and continued growth, Internet Archive is now supporting expansion of the program into Canada and to the international community, and is accepting applications for our next cohort kicking off in late-Summer 2021. The deadline for applications is August 2, 2021.

The program offers a unique opportunity for participating organizations to build capacity in digital collecting. Community Webs participants work alongside peer organizations and with their local communities to document the lives of their citizens, marginalized voices, and groups often absent from the historical record. All Community Webs participants receive: 

  • A guaranteed multi-year free subscription to the Archive-It web archiving service, which includes perpetual storage and access provided by the Internet Archive.
  • Access to additional Internet Archive non-profit services, such as digitization and digital preservation, either for free (as funding allows) or at or below actual cost.
  • Training and educational resources related to digital collections, web archiving, digital preservation, and other topics, as well as access to a cohort community pursuing similar work and to networking spaces, events, and knowledge sharing platforms.
  • The option to leverage program partnerships and integrations to include community web archives in other aggregators or access platforms beyond Internet Archive.

The program currently includes over 100 public libraries from across the United States. These organizations have collectively archived over 70 terabytes of web-based community heritage materials. Some highlights include:

Archived web page: Reporte Hispano, April 6, 2021. New Brunswick Free Public Library. Spanish Newspapers collection.
Archived web page: KC Friends of Alvin Ailey, January 10, 2021. Kansas City Public Library, Arts & Culture collection.

The benefits of the program are wide-ranging and impactful for both participants and their communities. As Community Webs member Makiba J. Foster of the African American Research Library and Cultural Center in Broward County, Florida stated during a recent Community Webs event, Archiving the Black Diaspora, “Community Webs provided me with the training, they provided me with the cohort support, […] provided me with services, and particularly it helped to develop an expertise for me in terms of creating collections of historically significant web materials documenting our local communities.” The program “allowed me to start a project of recovery and documentation of digitally born content related to the Black experience.” More information about what Foster and other Community Webs members are up to can be found by viewing our recent program announcements.

Find out more about the program and keep up to date by visiting the Community Webs website. Apply online today and spread the word! 

Archive-It and Archives Unleashed Join Forces to Scale Research Use of Web Archives

Archived web data and collections are increasingly important to scholarly practice, especially to those scholars interested in data mining and computational approaches to analyzing large sets of data, text, and records from the web. For over a decade Internet Archive has worked to support computational use of its web collections through a variety of services, from making raw crawl data available to researchers, performing customized extraction and analytic services supporting network or language analysis, to hosting web data hackathons and having dataset download features in our popular suite of web archiving services in Archive-It. Since 2016, we have also collaborated with the Archives Unleashed project to support their efforts to build tools, platforms, and learning materials for social science and humanities scholars to study web collections, including those curated by the 700+ institutions using Archive-It

We are excited to announce a significant expansion of our partnership. With a generous award of $800,000 (USD) to the University of Waterloo from The Andrew W. Mellon Foundation, Archives Unleashed and Archive-It will broaden our collaboration and further integrate our services to provide easy-to-use, scalable tools to scholars, researchers, librarians, and archivists studying and stewarding web archives.  Further integration of Archives Unleashed and Archive-It’s Research Services (and IA’s Web & Data Services more broadly) will simplify the ability of scholars to analyze archived web data and give digital archivists and librarians expanded tools for making their collections available as data, as pre-packaged datasets, and as archives that can be analyzed computationally. It will also offer researchers a best-of-class, end-to-end service for collecting, preserving, and analyzing web-published materials.

The Archives Unleashed team brings together a team of co-investigators.  Professor Ian Milligan, from the University of Waterloo’s Department of History, Jimmy Lin, Professor and Cheriton Chair at Waterloo’s Cheriton School of Computer Science, and Nick Ruest, Digital Assets Librarian in the Digital Scholarship Infrastructure department of York University Libraries, along with Jefferson Bailey, Director of Web Archiving & Data Services at the Internet Archive, will all serve as co-Principal Investigators on the “Integrating Archives Unleashed Cloud with Archive-It” project. This project represents a follow-on to the Archives Unleashed project that began in 2017, also funded by The Andrew W. Mellon Foundation.

“Our first stage of the Archives Unleashed Project,” explains Professor Milligan, “built a stand-alone service that turns web archive data into a format that scholars could easily use. We developed several tools, methods and cloud-based platforms that allow researchers to download a large web archive from which they can analyze all sorts of information, from text and network data to statistical information. The next logical step is to integrate our service with the Internet Archive, which will allow a scholar to run the full cycle of collecting and analyzing web archival content through one portal.”

“Researchers, from both the sciences and the humanities, are finally starting to realize the massive trove of archived web materials that can support a wide variety of computational research,” said Bailey. “We are excited to scale up our collaboration with Archives Unleashed to make the petabytes of web and data archives collected by Archive-It partners and other web archiving institutions around the world more useful for scholarly analysis.” 

The project begins in July 2020 and will begin releasing public datasets as part of the integration later in the year. Upcoming and future work includes technical integration of Archives Unleashed and Archive-It, creation and release of new open-source tools, datasets, and code notebooks, and a series of in-person “datathons” supporting a cohort of scholars using archived web data and collections in their data-driven research and analysis. We are grateful to The Andrew W. Mellon Foundation for their support of this integration and collaboration in support of critical infrastructure supporting computational scholarship and its use of the archived web.

Primary contacts:
IA – Jefferson Bailey, Director of Web Archiving & Data Services, jefferson [at] archive.org
AU – Ian Milligan, Professor of History, University of Waterloo, i2milligan [at] uwaterloo.ca

Archiving Information on the Novel Coronavirus (Covid-19)

The Internet Archive’s Archive-It service is collaborating with the International Internet Preservation Consortium’s (IIPC) Content Development Group (CDG) to archive web-published resources related to the ongoing Novel Coronavirus (Covid-19) outbreak. The IIPC Content Development Group consists of curators and professionals from dozens of libraries and archives from around the world that are preserving and providing access to the archived web. The Internet Archive is a co-founder and longtime member of the IIPC. The project will include both subject-expert curation by IIPC members as well as the inclusion of websites nominated by the public (see the nomination form link below).

Due to the urgency of the outbreak, archiving of nominated web content will commence immediately and continue as needed depending on the course of the outbreak and its containment. Web content from all countries and in any language is in scope. Possible topics to guide nominations and collections: 

  • Coronavirus origins 
  • Information about the spread of infection 
  • Regional or local containment efforts
  • Medical/Scientific aspects
  • Social aspects
  • Economic aspects
  • Political aspects

Members of the general public are welcomed to nominate websites and web-published materials using the following web form: https://forms.gle/iAdvSyh6hyvv1wvx9. Archived information will also be available soon via the IIPC’s public collections in Archive-It. [March 23, 2020 edit: the public collection can now be found here, https://archive-it.org/collections/13529.]

Members of the general public can also take advantage of the ability to upload non-web digital resources directly to specific Internet Archive collections such as Community Video or Community Texts. For instance, see this collection of “Files pertaining to the 2019–20 Wuhan, China Coronavirus outbreak.” We recommend using a common subject tag, like coronavirus to facilitate search and discovery. Fore more information on uploading materials to archive.org, see the Internet Archive Help Center.

A special thanks to Alex Thurman of Columbia University and Nicola Bingham of the British Library, the co-chairs of the IIPC CDG, and to other IIPC members participating in the project. Thanks as well to any and all public nominators assisting with identifying and archiving records about this significant global event.