Category Archives: Announcements

Internet Archive Launches New Pilot Program for Interlibrary Loan

Photo by Alfons Morales on Unsplash

The pandemic has resulted in a renewed focus on resource sharing among libraries. In addition to joining resource sharing organizations like the Boston Library Consortium, the Internet Archive has started to participate in the longstanding library practice of interlibrary loan (ILL). 

Internet Archive is now making two million monographs and three thousand periodicals in its physical collections available for non-returnable fulfillment through a pilot program with RapidILL, a prominent ILL coordination service. To date, more than seventy libraries have added the Internet Archive to their reciprocal lending list, and Internet Archive staff are responding to, on average, twenty ILL requests a day. If your library would like to join our pilot in Rapid, please reach out to Mike Richins at Mike.Richins@exlibrisgroup.com and request that Internet Archive be added to your library’s reciprocal lending list.

If there are other resource sharing efforts that we should investigate as we pilot our ILL service, please reach out to Brewster Kahle at brewster@archive.org.

Introducing 50+ New Public Library Members of the Internet Archive’s Community Webs Program

The Internet Archive’s Community Webs Program provides training and education, infrastructure and services, and professional community cultivation for public librarians across the country to document their local history and the lives of their patrons. Following our recent announcement of the program’s national expansion, with support from the Andrew W. Mellon Foundation, we are excited to welcome the first class of 50+ new public libraries to the program. This brings the current number of new and returning Community Webs participants to 90+ libraries from 33 states and 3 US territories. This diverse group of organizations includes multiple state libraries representing their regions, as well as a mix of large metropolitan library systems, small libraries in rural areas, and libraries like the Feleti Barstow Public Library in American Samoa. All will be working to document their communities, with a particular focus on archiving materials from traditionally underrepresented groups.

The new cohort class kicked off with virtual introductory events in mid-March, where participants met one another and shared stories about their communities and their goals for preserving and providing access to local history materials. Member libraries are currently receiving training in topics such as collection development and starting to build digital collections that reflect local diversity, events, and culture.

Program participant Kathleen Pickering, Director of the Belen Public Library and Harvey House Museum in Belen, New Mexico notes that their library “is committed to free and open-source electronic resources for our patrons, especially given the low-income status of many of our residents” and Community Webs will help further that goal. Similarly, new cohort member Aaron Ramirez of Pueblo City-County Library District (PCCLD) found Community Webs to be a great fit for existing institutional goals and initiatives. “PCCLD’s five-year strategic plan directs us to embrace local cultures, to include individuals of all skill levels and physical abilities, and to enrich established partnerships and collaborations. The groups that have not seen themselves in our archives will find through this project PCCLD’s intention and means to listen and go forward as allies and as a resource of support, rather than an institution serving only the affluent.”

Makiba J. Foster

Makiba J. Foster, Manager of The African American Research Library and Cultural Center of Broward County, Florida pointed out that “as content becomes increasingly digital, we need this opportunity to document the digital life and content of our community which includes a diverse representation of the Black Diaspora.”  Makiba was a member of the original Community Webs cohort in a previous position at the Schomburg Center for Research in Black Culture at New York Public Library, and recently presented on her work archiving the black diaspora to a group of more than 200 attendees.

The Community Webs Program is continuing to grow towards the milestone of over 150 participating libraries across the United States and will soon announce another call for applicants for a U.S. cohort starting in late summer. The program also is beginning to expand internationally, starting in Canada, exploring the addition of other types of libraries and cultural heritage organizations, and expanding its suite of training and services available to participants. Expect more news on these initiatives soon. 

Welcome to our new cohort of Community Webs libraries! The full list of new members: 

  • Alamogordo Public Library (New Mexico)
  • Amelia Island Museum of History (Florida)
  • ART | library deco (Texas)
  • Asbury Park Public Library (New Jersey)
  • Atlanta History Center (Georgia)
  • Bartholomew County Public Library (Indiana)
  • Bedford Public Library System (Virginia)
  • Belen Public Library and Harvey House Museum (New Mexico)
  • Bensenville Community Public Library (Illinois)
  • Biblioteca Municipal Aurea M. Pérez (Puerto Rico)
  • Carbondale Public Library (Illinois)
  • Cedar Mill & Bethany Community Libraries (Oregon)
  • Charlotte Mecklenburg Library (North Carolina)
  • Chicago Public Library (Illinois)
  • City Archives & Special Collections, New Orleans Public Library (Louisiana)
  • Dayton Metro Library (Ohio)
  • Elba Public Library (Alabama)
  • Essex Library Association (Connecticut)
  • Everett Public Library (Washington)
  • Feleti Barstow Public Library (American Samoa)
  • Forsyth County Public Library (North Carolina)
  • Hartford History Center, Hartford Public Library (Connecticut)
  • Heritage Public Library (Virginia)
  • Huntsville-Madison County Public Library (Alabama)
  • James Blackstone Memorial Library (Connecticut)
  • Jefferson Parish Library (Louisiana)
  • Jefferson-Madison Regional Library (Virginia)
  • Laramie County Library System (Wyoming)
  • Lawrence Public Library (Massachusetts)
  • Los Angeles Public Library (California)
  • Mill Valley Public Library, Lucretia Little History Room (California)
  • Missoula Public Library (Montana)
  • Niagara Falls Public Library (New York)
  • Pueblo City-County Library District (Colorado)
  • Rochester Public Library (New York)
  • Santa Cruz Public Libraries (California)
  • South Pasadena Public Library (California)
  • State Library of Pennsylvania (Pennsylvania)
  • Tangipahoa Parish Library (Louisiana)
  • The African American Research Library and Cultural Center (Florida)
  • The Ferguson Library (Connecticut)
  • Three Rivers Public Library District (Illinois)
  • Virginia Beach Public Library (Virginia)
  • Waltham Public Library (Massachusetts)
  • Watsonville Public Library (California)
  • West Virginia Library Commission (West Virginia)
  • William B Harlan Memorial Library (Kentucky)
  • Worcester Public Library (Massachusetts)
  • Your Heritage Matters (North Carolina)

A New Short Film Gives Us a Poetic Look at the Internet Archive

What remains of the initial hope that digitization and Internet technology can contribute to human emancipation and a more just future? Today, surveillance scandals, dominance by a few mega-corporations, and hollow egocentricity increasingly dominate our perception of the digital world. But these negative trends are challenged by independent actors who vehemently defend the early dream of a free Internet. I believe the Internet Archive is one of the important institutions in this fight.

Recently, we had the pleasure of hosting two amazing emerging artists who created a work of art with these ideals in mind. Thomas Georg Blank from Germany, and Işık Kaya from Turkey are an artistic duo who spent several days at our San Francisco headquarters creating their own archive of visual and sound recordings. Blank and Kaya bring together text, video, and audio fragments to form a composition showing that, in the right hands, the Internet does not have to become an instrument of surveillance and control, but, on the contrary, can be graceful and divine.

Their short film, When looking at stones i get sucked into deep time, when looking at my harddrive i’m afraid that it will break, poetically interprets the Internet Archive’s headquarters in San Francisco.

Here is the film, available here on archive.org and embedded below:

More About the Artists

Thomas Georg Blank, born 1990 in Germany, was first trained in cultural and media education focusing on photography before studying Visual Arts in Karlsruhe and Mexico City. He currently lives in Darmstadt and San Diego and has participated in exhibitions in galleries and museums, including Hek Basel, Historisches Museum Frankfurt, Kunsthalle Darmstadt, Blue Star Contemporary and C/O Berlin. His works won many awards and he has been a scholar of DAAD at Uinversity of California San Diego’s Center for Human Imagination.

Moving between research and speculative interpretations, Blank explores how spatial and habitual representations of individual and collective imagination affect the world we are living in, and vice versa. By creating multidirectional, spatial narratives he offers spectators a space to reconfigure and change of their perspectives.

www.thomasgeorgblank.de

@thomas_g_blank

Işık Kaya was born in Turkey and currently lives in the USA, where she is pursuing an MFA in Visual Arts at the University of California, San Diego. In recent years, her work has been featured internationally in art institutions and was shortlisted and won awards in many competitions and festivals. She holds a BA degree in Photography and Videography from Bilgi University and had worked for major art galleries, museums, and publications in Istanbul before moving to California.

Space plays a crucial role in both the practice and thinking of Işık Kaya. Her lens-based practice explores the ways in which humans shape contemporary landscape. In her work, she focuses on traces of economic infrastructures to examine power dynamics in built environments. By framing her subjects exclusively at night, she accentuates the artificial and uncanny qualities of urban landscapes.

www.isikkaya.com

@ayakkisi

The Librarian’s Copyright Companion Goes Open Access

As a law librarian and author, Ben Keele wants to share his expertise on copyright with as many people as possible.

His book, The Librarian’s Copyright Companion, 2nd edition (William S. Hein, 2012), coauthored with James Heller and Paul Hellyer, covers restrictions on use of copyrighted materials, library exemptions, fair use, and licensing issues for digital media.  (Heller wrote the first edition in 2004.) The authors recently regained rights to the book in order to make it open access. So after years of being available through controlled digital lending (CDL) at the Internet Archive, the book is now available under the Creative Commons Attribution license (CC BY 4.0), which means that anyone is free to share and adapt the work, as long as they provide attribution, link to the license, and indicate if changes were made.

“Nearly 10 years had passed. It’s probably been commercially exploited to the point that it will be,” Keele said. “This is what I would suggest to any faculty member. It’s sold what it will, and the publisher got the money it deserved, so we asked for the copyright back.”

To arrange the transfer of rights, Keele followed the Author’s Alliance’s advice. The California-based nonprofit provided a guide to rights reversions that he said made the process smooth and involved simple signatures by all parties. His publisher, William S. Hein & Co., was in agreement, as long as the authors were willing to give it first right of refusal for a 3rd edition.

The Librarian’s Copyright Companion, 2nd Edition, now available via CC BY license.

Keele said he believes copyright is overly protective and he would advise others to do the same and make their works openly available.

“In academia, the currency is attention,” Keele said. “For me, it’s a very small statement. Copyright did for me what it needed to do: it provided an incentive for the publisher to be willing to market and produce the book. I think we achieved the monetary value we were looking for. At that point, I feel like the bargain that I’m getting from copyright has been fulfilled. We don’t need to wait until 70 years after I die for people to be able to read it freely.”

To balance the pervasive messaging from publishers about authors’ rights, this book emphasizes the aspect of copyright law that favors users’ interests, said coauthor Paul Hellyer, reference librarian at William & Mary Law Library.

“There aren’t many people who are advocating for users’ rights and a more robust interpretation of fair use,” Hellyer said. “Librarians are one of the few groups of people who can do that in an organized way. That was our main motivation for writing this book. With that in mind, we are very excited to now have an open source book that anyone can just download. That’s very much in line with our view of how we should think about copyright protection—it should be for a limited period.”

The authors have also uploaded the book into the institutional repositories at their home institutions, where it is also being offered for free.

Keele has long been a fan of the Internet Archive. In his work as a librarian at the Indiana University Robert H. McKinney School of Law, he often uses the Wayback Machine to verify citations and check to see how websites have changed over time—frequently saving him research time. He says he was pleased to be able to contribute his work to the Internet Archive to be accessible more broadly.

Added Keele: “There’s so much bad information out there that’s free. Having some good information that is also free, I think is important.”

Internet Archive Joins Boston Library Consortium

Cross-posted from the Boston Library Consortium web site.

The Boston Library Consortium (BLC) has welcomed the Internet Archive as its newest affiliate member – joining 19 other libraries in the BLC’s network working on innovative solutions that enrich the creation, dissemination and preservation of knowledge.  

The Internet Archive, the non-profit library which celebrates its 25th anniversary this year, has large physical, born-digital and digitized collections serving a global user base. The Internet Archive’s history with the BLC goes back to the formation of the Open Content Alliance, through which the member libraries committed $845,000 to begin digitizing out-of-copyright books from their collections in 2007.

As part of the affiliate membership, the Internet Archive will participate in many of the BLC’s programs, including the consortium’s membership communities and professional development initiatives. The BLC will also pilot an expansion of its resource sharing program, allowing faculty, students, and scholars across the membership to tap into the Internet Archive’s vast digital collection through inter-library lending of non-returnables.

“Resource sharing is core to the mission and purpose of the Boston Library Consortium,” said Anne Langley, president of the BLC and dean of the UConn Library. “We are enthusiastic about leveraging our shared expertise to mobilize the digital collections that the Internet Archive stewards.”

For Brewster Kahle, founder and digital librarian of the Internet Archive, this membership builds on a longstanding partnership with the BLC. “We love the BLC and its libraries,” said Kahle. “We’ve been working with the BLC and its member libraries as we have digitized our collections for more than ten years. Being welcomed into the consortium will enable further and closer collaboration between this forward-looking collective of libraries.”

Charlie Barlow, executive director of the BLC, who worked to bring the Internet Archive into the consortium, said the BLC recognizes the value of extending its reach. “The BLC is thinking about new mechanisms upon which we can share knowledge,” said Barlow. “The events of the past year only reinforced our belief that the more we can draw on digital resources, the more effectively we can serve our membership and the scholarly community.”

About the Boston Library Consortium

Founded in 1970, the BLC is an academic library consortium serving public and private universities, liberal arts colleges, state and special research libraries in New England. The BLC members collaborate to deliver innovative and cost-effective sharing of print and digital content, professional development initiatives, and projects across a wide range of library practice areas.

About the Internet Archive

The Internet Archive is one of the largest libraries in the world and home of the Wayback Machine, a repository of 475 billion web pages. Founded in 1996 by Internet Hall of Fame member Brewster Kahle, the Internet Archive now serves more than 1.5 million patrons each day, providing access to 70+ petabytes of data—books, web pages, music, television and software—and working with more than 800 library and university partners to create a digital library, accessible to all.

Welcome to the Webspace Jam

It stood as either a memorial, embarrassment or in-joke: the promotional website for the 1996 film Space Jam, a comedy-action-sports film starring Michael Jordan and the Warner Brothers Looney Tunes characters.

Created at a time when the exact relevance of websites in the spectrum of mass media promotion was still being worked out, www.spacejam.com held many of the fashionable attributes of a site in 1996: an image map that you could click on, a repeating star background, and a screen resolution that years of advancement have long left in the dust. The limits of HTML coding and computer power were pushed as far as they could go. The intended audience was a group of people primarily using dial-up modems and single-threaded browsers to connect to what was still called The Information Superhighway.

By all rights, the Space Jam site should have died back in the 1990s, lost in the shifting sands of pop culture attention and flashier sites arriving with each passing day.

But it didn’t die, go offline or get replaced with a domain hosting advertisement or a 404.

Unlike a lot of websites from the 1990s, the Space Jam movie site simply didn’t change.

It persisted.

Just as every city seems to have that one bar or restaurant that can trace itself back for over a century, this one website became known, to people who looked for it, as a strange exception – unchanging, unshifting, with someone paying for the hosting and advertising a movie that, while a lot of fun, was not necessarily an oscar-winning cinematic experience. You could go to the site and be instantly transported back to a World Wide Web that in many ways felt like ancient history, absolutely gone.

Years turned into decades.

For those in the know and who paid close attention to this odd online relic, the real mystery was that the site was not actually staticsomeone was making modifications to the code of the website, the settings and web hosting, to jump past several notable shifts in how websites work, to ensure that deprecated features and unaccounted browser issues were handled. That costs money; that’s the work of people. Somehow, this silly movie site represented the held-out flame that with a small bit of care and dedication, a website could live forever, like we were once promised.

It wasn’t just a clickable brochure – it became a beacon in the dark, a touchstone for some who were just children when the World Wide Web was started, and who grew up with this online world, which has shifted and consolidated and closed and tracked us.

Then the unthinkable happened.

In 2021, the sequel arrived.

It is abundantly clear the abnormally long life of the original 1996 site helped see the sequel through the endless mazes and corridors of Hollywood development turnaround.

Because websites and online presence are the way that movies are now promoted, the very place that spawned this consistent brand through decades had to go. A new Space Jam site was created, using the www.spacejam.com domain.

In a nod to its beginnings, the 1996 website still exists, shoved into a back room; adding /1996 to the URL will give you the old site as it used to appear before this year, and a small note in the corner lets you know you could optionally visit this once-dependable hangout.

But now the site is broken.

Links from around the net to the Space Jam site, to specific sub-pages and specific images, now break. A browser arriving at the spacejam.com page from a link elsewhere will see Just Another Movie Promotion Site, utilizing all the current fads: Layered windows to YouTube videos (which will break), javascript calls (which will break) and a dedication to being as flashy, generically designed and film-promoting as literally any other movie site currently up. Links that worked for decades have been cast aside for the spotlight of the moment.

The word is disposable.

There’s still one place you can see the old site, as it was once arranged, though.

The same year the Space Jam movie and website arrived, another website started: The Internet Archive.

Unlike Space Jam, the Internet Archive’s site did change constantly. You can use the Wayback Machine to see all the changes as they came and went; over half-a-million captures have been done on archive.org.

We have changed across the last 25 years, but we also have not.

The ideas that the Web should keep URLs running, that the interdependent linking and reference cooked into it from day one should be a last-resort change, and that the experience of online should be one of flow and not of constant interruptions, still live here.

Hundreds of webpages that have also survived since the time of Space Jam are inside the stacks of the Wayback Machine, some of them still running, and still looking unchanged since those heady days of promises and online wishes.

And if the unthinkable happens to them, we’ll be ready.




Filecoin Foundation Grants 50,000 FIL to the Internet Archive

Amidst the speculative boom for NFTs and crypto-currencies, one decentralized technology foundation is taking the long view by investing in deep history and the far future. 

Today, the Filecoin Foundation announced a 50,000 FIL grant to the Internet Archive – the largest single donation in the digital library’s 25-year history. 

“Holy Crow! This is a big deal,” said Brewster Kahle, the Internet Archive’s founder. “And what are we going to do with it? We’re going to invest it in making the Internet Archive more decentralized, so that our digital history is available from thousands of computers, not just a few. The idea is to make a robust and private Internet that has a history that will persist over decades and maybe centuries.”

Filecoin is a decentralized storage system designed to preserve humanity’s most important information. The creators of Filecoin envisioned an independent foundation that would serve as the long-term governance body for the Filecoin ecosystem. In awarding the grant to the Internet Archive, Filecoin Foundation board chair, Marta Belcher, stressed the two organizations’ “common goal of preserving the web and fostering its future.”

It was back in 2015 that Protocol Labs‘ founder, Juan Benet, first visited the Internet Archive, to share his vision for an academic conference dedicated to preserving “humanity’s greatest treasures using decentralized storage.” Building on these conversations, the Internet Archive organized the  Decentralized Web Summit in 2016 in San Francisco, the first gathering of its kind. Back then, a decentralized web was mostly a concept, with little working code.

Decentralized technologists, Trent McConaghy of Ocean and Juan Benet of Protocol Labs at the 2016 Decentralized Web Summit at the Internet Archive in San Francisco.

Since 2016, the Internet Archive has worked with several decentralized tech startups to create a decentralized prototype of the digital library. And when the Filecoin main net took off in 2020, stored in Filecoin servers were public domain audiobooks and films from the Internet Archive. Together, the two organizations created the Filecoin Archives, a community-led project to curate, disseminate and preserve important open access to information often at risk of being lost.

“It’s wonderful to see Filecoin come of age. We started six years ago by putting out a call to make a Decentralized Web, a web that would serve us better than the current web–one that is now starting to be dominated by just a few tech behemoths. Can we make a game with many winners?” asked Kahle. “Filecoin has made a huge step forward by deploying decentralized storage at the exabyte level. That’s very different from AWS (Amazon Web Services). It has many participants, not just one player. And its protocols are open-source. We want to see more technologies like this. This was the original vision of the Decentralized Web that the Internet Archive was hoping for five, six years ago. And it’s starting to come to fruition and Filecoin is a leader in that area.”

Although purveyors of cryptocurrencies are often accused of being driven only by short-term gain, in this group Kahle sees a different motivation. “This donation by the Filecoin Foundation is significant financially for the Internet Archive, but I’d say it’s a more interesting one than that,” said the Internet Hall of Fame engineer. “It’s a donation by a new generation of technologists that are building interesting new technologies…bringing the Archive along with it to make it so that history is preserved –that the Internet Archive makes it into this next generation. That is an interesting thing! You don’t often see that. But the Filecoin Foundation, Filecoin and IPFS, and Juan Benet himself have always been interested in preserving history and how history can be woven into the present and the future of these technologies.”

Author and Open Source Advocate VM Brasseur: Internet Archive ‘Legitimately Useful’ for Lending and Preservation of Her Work

In her 20-year career in the tech industry, VM (Vicky) Brasseur has championed the use of free and open source software (FOSS). She hails it as good for businesses and the community, writing and presenting extensively about its merits.

VM Brasseur, Raleigh, North Carolina, 2018. Credit: Peter Adams Photography

To spread the word, Brasseur has made her book, Forge Your Future With Open Source, available for borrowing through the Internet Archive. She’s also saved all of her blogs, articles, talks and slides in the Wayback Machine for preservation and access to anyone.  

“I do it to share the knowledge,” Brasseur said. “Uploading the resources to Internet Archive ensures that more people will be able to see it and will be able to see it forever.”

As soon as her book was published by The Pragmatic Programmers in 2018, Brasseur said she wanted to have it represented in the Internet Archive. She donated a copy so it could be available through Controlled Digital Lending (CDL).

“I think CDL is great. I love libraries,” Brasseur said. “To me, I don’t see how CDL is any different from walking into my local branch of the public library, picking up one of the copies that they have, going up to the circ desk, and taking it home. How is that different from the Internet Archive? They have one copy of my book and check it out one copy at a time. It just happens to be an e-book version. I, frankly, don’t see the material difference.”

A supporter of the Internet Archive since its inception, Brasseur says she’s a regular user of the Wayback Machine. It’s been useful for her to be able to do research and for others to find her body of work. Recently, she revamped her blog and removed some pages—later getting a request from someone who wanted some of the deleted material. Brasseur provided a Wayback Machine link to where she’d stored them, making it easy for that person to find the missing pages. “It’s a gift. It’s legitimately useful,” she said. “Having the Wayback means that other people can still have access” to materials she no longer has on her website.

Borrow the book through the Internet Archive, or purchase a copy for your own library.

Brasseur has led software development departments and teams, providing technical management and strategic consulting for businesses, and helping companies understand and implement FOSS. She wrote her book not just for programmers, but rather says it’s intended to be inclusive and for anyone interested in FOSS including technical writers, designers, project managers, those involved in security issues, and all other roles in the software development process.

In the book, she helps walk readers through why they might want to contribute to FOSS and how to best embrace the practices involved. The book was been positively received and was #1 on the BookAuthority list of 18 Best New Software Development Books To Read In 2018. Recently, it has been picked up by people transitioning to telecommuting and looking for resources for doing collaborative work.

“Obviously, I do want people to buy the book, but I’m also strongly pro library, as most intelligent publishers are. My publisher is a big fan of making sure that their books are available in libraries,” Brasseur said. “So the Internet Archive is a library that anyone can access all over the world. And it just makes it a lot easier to make sure that the book gets in the hands of people.”

Brasseur is committed to helping people contribute to open source; for people who can’t afford to buy the book, checking it out from the library is an alternative. “If they can get a copy from Internet Archive, then they can learn how to contribute and they can make a difference from wherever they are in the world. Nigeria, Thailand, Netherlands, or Montana. You don’t have to worry if your local library has it,” she said. “In these times, in particular, it’s very difficult to get to your library. This is a great service that the Internet Archive is providing.”


Forge Your Future with Open Source by VM Brasseur is available for purchase through a variety of retailers and local book stores.

Early Web Datasets & Researcher Opportunities

In July, we announced our partnership with the Archives Unleashed project as part of our ongoing effort to make new services available for scholars and students to study the archived web. Joining the curatorial power of our Archive-It service, our work supporting text and data mining, and Archives Unleashed’s in-browser analysis tools will open up new opportunities for understanding the petabyte-scale volume of historical records in web archives.

As part of our partnership, we are releasing a series of publicly available datasets created from archived web collections. Alongside these efforts, the project is also launching a Cohort Program providing funding and technical support for research teams interested in studying web archive collections. These twin efforts aim to help build the infrastructure and services to allow more researchers to leverage web archives in their scholarly work. More details on the new public datasets and the cohorts program are below. 

Early Web Datasets

Our first in a series of public datasets from the web collections are oriented around the theme of the early web. These are, of course, datasets intended for data mining and researchers using computational tools to study large amounts of data, so are absent the informational or nostalgia value of looking at archived webpages in the Wayback Machine. If the latter is more your interest, here is an archived Geocities page with unicorn GIFs.

GeoCities Collection (1994–2009)

As one of the first platforms for creating web pages without expertise, Geocities lowered the barrier of entry for a new generation of website creators. There were at least 38 million pages displayed by GeoCities before it was terminated by Yahoo! in 2009. This dataset collection contains a number of individual datasets that include data such as domain counts, image graph and web graph data, and binary file information for a variety of file formats like audio, video, and text and image files. A graphml file is also available for the domain graph.

GeoCities Dataset Collection: https://archive.org/details/geocitiesdatasets

Friendster (2003–2015)

Friendster was an early and widely used social media networking site where users were able to establish and maintain layers of shared connections with other users. This dataset collection contains  graph files that allow data-driven research to explore how certain pages within Friendster linked to each other. It also contains a dataset that provides some basic metadata about the individual files within the archival collection. 

Friendster Dataset Collection: https://archive.org/details/friendsterdatasets

Early Web Language Datasets (1996–1999)

These two related datasets were generated from the Internet Archive’s global web archive collection. The first dataset, “Parallel Language Records of the Early Web (1996–1999)” provides a dataset of multilingual records, or URLs of websites that have the same text represented in multiple languages. Such multi-language text from websites are a rich source for parallel language corpora and can be valuable in machine translation. The second dataset, “Language Annotations of the Early Web (1996–1999)” is another metadata set that annotates the language of over four million websites using Compact Language Detector (CLD3).

Early Web Language collection: https://archive.org/details/earlywebdatasets

Archives Unleashed Cohort Program

Applications are now being accepted from research teams interested in performing computational analysis of web archive data. Five cohorts teams of up to five members each will be selected to participate in the program from July 2021 to June 2022. Teams will:

  • Participate in cohort events, training, and support, with a closing event held at Internet Archive, in San Francisco, California, USA tentatively in May 2022. Prior events will be virtual or in-person, depending on COVID-19 restrictions
  • Receive bi-monthly mentorship via support meetings with the Archives Unleashed team
  • Work in the Archive-It Research Cloud to generate custom datasets
  • Receive funding of $11,500 CAD to support project work. Additional support will be provided for travel to the Internet Archive event

Applications are due March 31, 2021. Please visit the Archives Unleashed Research Cohorts webpage for more details on the program and instructions on how to apply.

Milton Public Library Reaches Patrons Through Controlled Digital Lending

Leaders at the Milton Public Library (MPL) in Canada say they are continually questioning their operations and looking for ways to better serve their patrons. That’s why the Ontario institution joined the Internet Archive’s Open Libraries program.

“We are always keen to innovate, in meaningful ways” said Mark Williams, MPL chief executive officer and chief librarian. “Why would we not want to be in this partnership that expands our collection, but also extends assets to other people’s collections in a digital realm? It was a no brainer.”

In making its decision to become part of Open Libraries in September 2019, Williams said rather than being concerned about publishers, the focus was on the interests of the public. 

Mark Williams, Milton Public Library

“If it challenges the status quo for the benefit of readers, wherever those readers are, then I think we should engage,” Williams said.

As it happens, the timing of its membership was fortuitous. With COVID-19 disrupting access to the print collection at its branches, being part of the Open Libraries meant broader access to digital materials for patrons quarantined at home.

MPL has been a central part of the Milton, Ontario, community since 1855, serving a population of more than 120,000 through three physical libraries and its website (and with a bookmobile and four new branches in the pipelines over the course of the next 10 years), Library services were forced to be flexible in the past year as health circumstances changed in the province.

The three MPL locations closed on March 17, 2020, under a state of emergency in Ontario. By May, a phased reopening allowed libraries to begin limited operations. During the state of emergency, librarians pivoted to providing access to services only through virtual interactions and the website was changed to focus on promoting electronic resources. As restrictions eased, MPL provided curbside, contactless pickup. Eventually, 50 to 100 patrons were allowed inside the buildings with safety protocols. The libraries had to close again when COVID-19 cases spiked in the winter, and then reopened in February.

We’ve seen overwhelming demand…Patrons think it’s a fantastic option…

Mark Williams, Milton Public Library

“The staff have been remarkably agile and good at adapting their approach,” Williams said. “We’ve done the best we possibly could to ensure the public library services continued, but the way we deliver it is different than anyone would have expected.”

In addition to joining Open Libraries, MPL donated 30,000 books to the Internet Archive. Williams said the expanded access to content in the larger online library has been a boon to the public. Regardless of the pandemic, MPL would have spread the word about access to Open Libraries, he said, but it was likely accelerated because there was no choice but to focus on digital offerings in the pandemic.

Milton Public Library

“The lockdown highlighted the ability for us to raise awareness about the partnership and introduce it to more patrons,” Williams said. MPL is creating a new portal on its website that will be dedicated to Open Libraries but has been promoting its availability in the meantime and the response has been positive.

“We’ve seen overwhelming demand,” Williams said. “Patrons think it’s a fantastic option for them to have increased materials than we currently have available.”

The transition to becoming part of the Open Libraries program was seamless, said Williams, and he’s encouraging other libraries to consider joining.

“I hope if other libraries sign up, they will be equally inspired by the partnership. The content is amazing,” Williams said. “Our patrons think it’s phenomenal. Our board thinks it’s a great idea, philosophically. Everyone believes this is an important service addition.”

To browse the books now available for lending through Milton Public Library’s participation in the Open Libraries program, please visit: https://archive.org/details/miltonpubliclibrary-ol. Learn how your library can participate in the Open Libraries program.