Tag Archives: open access

How the Internet Archive is Ensuring Permanent Access to Open Access Journal Articles

Internet Archive has archived and identified 9 million open access journal articles– the next 5 million is getting harder

Open Access journals, such as New Theology Review (ISSN: 0896-4297) and Open Journal of Hematology (ISSN: 2075-907X), made their research articles available for free online for years. With a quick click or a simple query, students anywhere in the world could access their articles, and diligent Wikipedia editors could verify facts against original articles on vitamin deficiency and blood donation.  

But some journals, such as these titles, are no longer available from the publisher’s websites, and are only available through the Internet Archive’s Wayback Machine. Since 2017, the Internet Archive joined others in concentrating on archiving all scholarly literature and making it permanently accessible.

The World Wide Web has made it easier than ever for scholars to collaborate, debate, and share their research. Unfortunately, the structure of today’s web means that content can disappear just as easily: as of today the official publisher websites and DOI redirects for both of the above journals go nowhere or have been replaced with unrelated content.


Wayback Machine captures of Open Access journals now “vanished” from publisher websites

Vigilant librarians saw this problem coming decades ago, when the print-to-digital migration was getting started. They insisted that commercial publishers work with contract digital preservation organizations (such as Portico, LOCKSS, and CLOCKSS) to ensure long-term access to expensive journal subscription content. Efforts have been made to preserve open articles as well, such as Public Knowledge Project’s Private LOCKSS Network for OJS journals and national hosting platforms like the SciELO network. But a portion of all scholarly articles continues to fall through the cracks.

Researchers found that 176 open access journals have already vanished from their publishers’ website over the past two decades, according to a recent preprint article by Mikael Laakso, Lisa Matthias, and Najko Jahn. These periodicals were from all regions of the world and represented all major disciplines — sciences, humanities and social sciences. There are over 14,000 open access journals indexed by the Directory of Open Access Journals and the paper suggests another 900 of those are inactive and at risk of disappearing. The pre-print has struck a nerve, receiving news coverage in Nature and Science.

In 2017, with funding support from the Andrew Mellon Foundation and the Kahle/Austin Foundation, the Internet Archive launched a project focused on preserving all publicly accessible research documents, with a particular focus on open access materials. Our first job was to quantify the scale of the problem.

Monitoring independent preservation of Open Access journal articles published from 1996 through 2019. Categories are defined in the article text.

Of the 14.8 million known open access articles published since 1996, the Internet Archive has archived, identified, and made available through the Wayback Machine 9.1 million of them (“bright” green in the chart above). In the jargon of Open Access, we are counting only “gold” and “hybrid” articles which we expect to be available directly from the publisher, as opposed to preprints, such as in arxiv.org or institutional repositories. Another 3.2 million are believed to be preserved by one or more contracted preservation organizations, based on records kept by Keepers Registry (“dark” olive in the chart). These copies are not intended to be accessible to anybody unless the publisher becomes inaccessible, in which case they are “triggered” and become accessible.

This leaves at least 2.4 million Open Access articles at risk of vanishing from the web (“None”, red in the chart). While many of these are still on publisher’s websites, these have proven difficult to archive.

One of our goals is to archive as many of the articles on the open web as we can, and to keep up with the growing stream of new articles published every day. Another is to look back over the vast petabytes of web content in the Wayback Machine, back to 1996, and find any content we might already have but is not easily findable or discoverable. Both of these projects are amenable to software automation, but made more difficult by the evolving nature of HTML and PDFs and their diverse character sets and encodings. To that end, we have approached this project not just as a technical one, but also as a collaborative one that aims to add another piece to the distributed infrastructure supporting open scholarship.

To expand our reach, we built an editable catalog (https://fatcat.wiki) with an open API to allow anybody to contribute. As the software is free and open source, as is the data, we invite others to reuse and link to the content we have archived. We have also indexed and made searchable much of the literature to help manage our work and help others find if we have archived particular articles. We want to make scholarly material permanently available, and available in new ways– including via large datasets for analysis and “meta research.” 

We also want to acknowledge the many partnerships and collaborations that have supported this work, many of which are key parts of the open scholarly infrastructure, including ISSN, DOAJ, LOCKSS, Unpaywall, Semantic Scholar, CiteSeerX, Crossref, Datacite, and many others. We also want to acknowledge the many Internet Archive staff and volunteers that have contributed to this work, including Bryan Newbold, Martin Czygan, Paul Baclace, Jefferson Bailey, Kenji Nagahashi, David Rosenthal, Victoria Reich, Ellen Spertus, and others.

If you would like to participate in this project, please contact the Internet Archive at webservices@archive.org.

When An Island Shuts Down: Aruba & the National Emergency Library

The island nation of Aruba, population 110,000, lies 18 miles north of Venezuela, part of the Kingdom of the Netherlands.

On March 15, the small island nation of Aruba, part of the Dutch Caribbean, closed its borders to visitors. Cruise ships packed with tourists stopped coming. Casinos, libraries and schools shut their doors, as Aruba’s 110,000 residents locked down to halt the spread of COVID-19.

That’s when the Biblioteca Nacional Aruba (National Library of Aruba) swung into action. 

Librarians quickly gathered reading lists from students, parents and schools. With high school graduation exams just a month away, the required literature books would be crucial. Aruban students are tested on books in Dutch, English, Spanish and their native language of Papiamento. “Just before your literary final exams, you need to re-read the books,” explained Peter Scholing, who leads digitization efforts at the National Library of Aruba. “The libraries are closed. Your school libraries are closed. You can order from Amazon, but it takes weeks and weeks to arrive. If you are in an emergency, then you hope your books are online.”

Peter Scholing of the National Library of Aruba also works with UNESCO, preserving cultural heritage

Scholing was relieved to discover that most of the required literature in English and Spanish was available in the Internet Archive’s National Emergency Library. As library staff moved to work from home, they grabbed the tools to digitize the books in Papiamento that were missing. Many local authors were easy to track down and most gladly gave permission for free downloads or loaning their works. Scholing reports, “Some of them choose digital lending. But a lot of them  say, ‘Well it was a limited print run….I’ve sold all the copies of my books, now you can just make it available for download.’

Preservation Pays Off

Classroom in Aruba, 1944, filled with children of expatriates, working in oil refineries.

For many years, the library’s small Special Collections staff had been diligently digitizing key collections: photographs, historic texts, newspapers, and perhaps the world’s largest collection of texts in Papiamento. But with few technical resources, the National Library of Aruba had no way to provide access to those works. Scholing says the Internet Archive proved to be the “missing link.” In March 2019, the Library was able to unveil its new Digital Collection, 18,800 texts, videos and audio now accessible to the world on archive.org. Today, with libraries and schools closed, these materials are the keys to unlocking the doors to online learning.

 “We didn’t imagine something like the Covid crisis could happen,” said Scholing. “But for our preservation efforts, this is the Big One. We are really lucky to be able to provide access to information that we couldn’t otherwise without the Internet Archive.”

This Papiamento literary journal is among the 18,800 items now online thanks to the Biblioteca Nacional Aruba

When Waitlists Won’t Work

Novels, biographies and non-fiction titles in Papiamento are part of the Aruban curriculum and now many are accessible online

Although Scholing had permission from the authors to lend their recent books, several times we accidentally reinstituted the waiting list, since the National Emergency Library does not include books from the last five years. That meant students reading the work suddenly would have had to wait, sometimes for weeks, to move up the waiting list. Scholing wrote to us immediately:  “There must be an alternative. I’m getting emails from students and teachers already.”

Eventually we worked out the kinks so Aruba’s books in the National Emergency Library wouldn’t get taken down. In addition, hundreds of texts in Papiamento from 1844-2020 are now available without waitlist. It’s part of a bigger vision on the island to teach students to read and write the language they speak at a higher level. “A lot of textbooks come straight from the Netherlands…you are reading about snow, trains and windmills,” Scholing explained. “It’s better to use something from a newspaper or magazine produced locally…It’s their own context. It speaks more to them.”

He even received this note from a local author, written in Papiamento:

Peter aprecia, (Dear Peter,)

Hopi admiracion pa e trabou cu bo ta desplegando pa Aruba y nos hendenan.

(A lot of admiration for the work that you are carrying out for Aruba and for our people.)

This week, schools in Aruba are scheduled to reopen. Since March, the library has tripled the number of items in its digital collection, and visitors have increased by 300%. Scholing sees this as evidence that  the National Emergency Library will have lasting benefit. “All the thresholds and barriers to access this unique information have been lifted, once you put it online.”

You can now access newspapers, photos, maps, government publications, literature and rare books from Aruba in their collection at the Internet Archive.