Tag Archives: open access

Internet Archive Participates in DOAJ-Led Collaboration to Improve the Preservation of OA Journals

Since 2017, Internet Archive has pursued dedicated technical and partnership work to help preserve and provide perpetual access to open access scholarly literature and other outputs. See our original announcement related to this work and a recent update on progress. The below official press release announces an exciting new multi-institutional collaboration in this area.

The Directory of Open Access Journals (DOAJ), the CLOCKSS Archive, Internet Archive, Keepers Registry/ISSN International Centre and Public Knowledge Project (PKP) have agreed to partner to provide an alternative pathway for the preservation of small-scale, APC-free, Open Access journals.

The recent study authored by M.Laakso, L.Matthias, and N.Jahn has revived academia’s concern over the disappearance of the scholarly record disseminated in Open Access (OA) journals.

Their research focuses on OA journals as at risk of vanishing, and “especially small-scale and APC-free journals […] with limited financial resources” that often “opt for lightweight technical solutions” and “cannot afford to enroll in preservation schemes.” The authors have used data available in the Directory of Open Access Journals to come up with the conclusion that just under half of the journals indexed in DOAJ participate in preservation schemes. Their findings “suggest that current approaches to digital preservation are successful in archiving content from larger journals and established publishing houses but leave behind those that are more at risk.” They call for new preservation initiatives “to develop alternative pathways […] better suited for smaller journals that operate without the support of large, professional publishers.”

Answering that call, the joint initiative proposed by the five organisations aims at offering an affordable archiving option to OA journals with no author fees (“diamond” OA) registered with DOAJ, as well as raising awareness among the editors and publishers of these journals about the importance of enrolling with a preservation solution. DOAJ will act as a single interface with CLOCKSS, PKP and Internet Archive and facilitate a connection to these services for interested journals. Lars Bjørnhauge, DOAJ Managing Editor, said: “That this group of organisations are coming together to find a solution to the problem of “vanishing” journals is exciting. It comes as no surprise that journals with little to no funding are prone to disappearing. I am confident that we can make a real difference here.”

Reports regarding the effective preservation of the journals’ content will be aggregated by the ISSN International Centre (ISSN IC) and published in the Keepers Registry. Gaëlle Béquet, ISSN IC Director, commented: “As the operator of the Keepers Registry service, the ISSN International Centre receives inquiries from journal publishers looking for archiving solutions. This project is a new step in the development of our service to meet this need in a transparent and diverse way involving all our partners.”

About 50% of the journals identified by DOAJ as having no archiving solution in place use the Open Journal System (OJS). Therefore, the initiative will also identify and encourage journals on PKP’s OJS platform to preserve their content in the PKP Preservation Network (PKP PN), or to use another supported solution if the OJS instance isn’t new enough to be compatible with the PN integration (OJS 3.1.2+). 

The partners will then follow up by assessing the success and viability of the initiative with an aim to open it up to new archiving agencies and other groups of journals indexed in DOAJ to consolidate preservation actions and ensure service diversity.

DOAJ will act as the central hub where publishers will indicate that they want to participate. Archiving services, provided by CLOCKSS, Internet Archive and PKP will expand their existing capacities. These agencies will report their metadata to the Keepers Registry to provide an overview of the archiving efforts. 

Project partners are currently exploring business and financial sustainability models and outlining areas for technical collaboration.


DOAJ is a community-curated list of peer-reviewed, open access journals and aims to be the starting point for all information searches for quality, peer reviewed open access material. DOAJ’s mission is to increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline, geography or language. DOAJ will work with editors, publishers and journal owners to help them understand the value of best practice publishing and standards and apply those to their own operations. DOAJ is committed to being 100% independent and maintaining all of its services and metadata as free to use or reuse for everyone.

CLOCKSS is a not-for-profit joint venture among the world’s leading academic publishers and research libraries whose mission is to build a sustainable, international, and geographically distributed dark archive with which to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community. https://www.clockss.org.

Internet Archive is a non-profit digital library, top 200 website at https://archive.org/, and archive of over 60PB of millions of free books, movies, software, music, websites, and more. The Internet Archive partners with over 800 libraries, universities, governments, non-profits, scholarly communications, and open knowledge organizations around the world to advance the shared goal of “Universal Access to All Knowledge.” Since 2017, Internet Archive has pursued partnerships and technical work with a focus on preserving all publicly accessible research outputs, especially at-risk, open access journal literature and data, and providing mission-aligned, non-commercial open infrastructure for the preservation of scholarly knowledge.

Keepers Registry hosted by the ISSN International Centre, an intergovernmental organisation under the auspices of UNESCO, is a global service that monitors the archiving arrangements for continuing resources including e-serials. A dozen archiving agencies all around the world currently report to Keepers Registry. The Registry has three main purposes: 1/ to enable librarians, publishers and policy makers to find out who is looking after what e-content, how, and with what terms of access; 2/ to highlight e-journals which are still “at risk of loss” and need to be archived; 3/ to showcase the archiving organizations around the world, i.e. the Keepers, which provide the digital shelves for access to content over the long term.

PKP is a multi-university and long-standing research project that develops (free) open source software to improve the quality and reach of scholarly publishing. For more than twenty years, PKP has played an important role in championing open access. Open Journal Systems (OJS) was released in 2002 to help reduce cost as a barrier to creating and consuming scholarship online. Today, it is the world’s most widely used open source platform for journal publishing: approximately 42% of the journals in the DOAJ identify OJS as their platform/host/aggregator. In 2014, PKP launched its own Private LOCKSS Network (now the PKP PN) to offer OJS journals unable to invest in digital preservation a free, open, and trustworthy service. 

For more information, contact: 

DOAJ: Dom Mitchell, dom@doaj.org

CLOCKSS: Craig Van Dyck, cvandyck@clockss.org

Internet Archive: Jefferson Bailey, jefferson@archive.org

Keepers Registry: Gaëlle Béquet, gaelle.bequet@issn.org

PKP: James MacGregor, jbm9@sfu.ca

How the Internet Archive is Ensuring Permanent Access to Open Access Journal Articles

Internet Archive has archived and identified 9 million open access journal articles– the next 5 million is getting harder

Open Access journals, such as New Theology Review (ISSN: 0896-4297) and Open Journal of Hematology (ISSN: 2075-907X), made their research articles available for free online for years. With a quick click or a simple query, students anywhere in the world could access their articles, and diligent Wikipedia editors could verify facts against original articles on vitamin deficiency and blood donation.  

But some journals, such as these titles, are no longer available from the publisher’s websites, and are only available through the Internet Archive’s Wayback Machine. Since 2017, the Internet Archive joined others in concentrating on archiving all scholarly literature and making it permanently accessible.

The World Wide Web has made it easier than ever for scholars to collaborate, debate, and share their research. Unfortunately, the structure of today’s web means that content can disappear just as easily: as of today the official publisher websites and DOI redirects for both of the above journals go nowhere or have been replaced with unrelated content.


Wayback Machine captures of Open Access journals now “vanished” from publisher websites

Vigilant librarians saw this problem coming decades ago, when the print-to-digital migration was getting started. They insisted that commercial publishers work with contract digital preservation organizations (such as Portico, LOCKSS, and CLOCKSS) to ensure long-term access to expensive journal subscription content. Efforts have been made to preserve open articles as well, such as Public Knowledge Project’s Private LOCKSS Network for OJS journals and national hosting platforms like the SciELO network. But a portion of all scholarly articles continues to fall through the cracks.

Researchers found that 176 open access journals have already vanished from their publishers’ website over the past two decades, according to a recent preprint article by Mikael Laakso, Lisa Matthias, and Najko Jahn. These periodicals were from all regions of the world and represented all major disciplines — sciences, humanities and social sciences. There are over 14,000 open access journals indexed by the Directory of Open Access Journals and the paper suggests another 900 of those are inactive and at risk of disappearing. The pre-print has struck a nerve, receiving news coverage in Nature and Science.

In 2017, with funding support from the Andrew Mellon Foundation and the Kahle/Austin Foundation, the Internet Archive launched a project focused on preserving all publicly accessible research documents, with a particular focus on open access materials. Our first job was to quantify the scale of the problem.

Monitoring independent preservation of Open Access journal articles published from 1996 through 2019. Categories are defined in the article text.

Of the 14.8 million known open access articles published since 1996, the Internet Archive has archived, identified, and made available through the Wayback Machine 9.1 million of them (“bright” green in the chart above). In the jargon of Open Access, we are counting only “gold” and “hybrid” articles which we expect to be available directly from the publisher, as opposed to preprints, such as in arxiv.org or institutional repositories. Another 3.2 million are believed to be preserved by one or more contracted preservation organizations, based on records kept by Keepers Registry (“dark” olive in the chart). These copies are not intended to be accessible to anybody unless the publisher becomes inaccessible, in which case they are “triggered” and become accessible.

This leaves at least 2.4 million Open Access articles at risk of vanishing from the web (“None”, red in the chart). While many of these are still on publisher’s websites, these have proven difficult to archive.

One of our goals is to archive as many of the articles on the open web as we can, and to keep up with the growing stream of new articles published every day. Another is to look back over the vast petabytes of web content in the Wayback Machine, back to 1996, and find any content we might already have but is not easily findable or discoverable. Both of these projects are amenable to software automation, but made more difficult by the evolving nature of HTML and PDFs and their diverse character sets and encodings. To that end, we have approached this project not just as a technical one, but also as a collaborative one that aims to add another piece to the distributed infrastructure supporting open scholarship.

To expand our reach, we built an editable catalog (https://fatcat.wiki) with an open API to allow anybody to contribute. As the software is free and open source, as is the data, we invite others to reuse and link to the content we have archived. We have also indexed and made searchable much of the literature to help manage our work and help others find if we have archived particular articles. We want to make scholarly material permanently available, and available in new ways– including via large datasets for analysis and “meta research.” 

We also want to acknowledge the many partnerships and collaborations that have supported this work, many of which are key parts of the open scholarly infrastructure, including ISSN, DOAJ, LOCKSS, Unpaywall, Semantic Scholar, CiteSeerX, Crossref, Datacite, and many others. We also want to acknowledge the many Internet Archive staff and volunteers that have contributed to this work, including Bryan Newbold, Martin Czygan, Paul Baclace, Jefferson Bailey, Kenji Nagahashi, David Rosenthal, Victoria Reich, Ellen Spertus, and others.

If you would like to participate in this project, please contact the Internet Archive at webservices@archive.org.

When An Island Shuts Down: Aruba & the National Emergency Library

The island nation of Aruba, population 110,000, lies 18 miles north of Venezuela, part of the Kingdom of the Netherlands.

On March 15, the small island nation of Aruba, part of the Dutch Caribbean, closed its borders to visitors. Cruise ships packed with tourists stopped coming. Casinos, libraries and schools shut their doors, as Aruba’s 110,000 residents locked down to halt the spread of COVID-19.

That’s when the Biblioteca Nacional Aruba (National Library of Aruba) swung into action. 

Librarians quickly gathered reading lists from students, parents and schools. With high school graduation exams just a month away, the required literature books would be crucial. Aruban students are tested on books in Dutch, English, Spanish and their native language of Papiamento. “Just before your literary final exams, you need to re-read the books,” explained Peter Scholing, who leads digitization efforts at the National Library of Aruba. “The libraries are closed. Your school libraries are closed. You can order from Amazon, but it takes weeks and weeks to arrive. If you are in an emergency, then you hope your books are online.”

Peter Scholing of the National Library of Aruba also works with UNESCO, preserving cultural heritage

Scholing was relieved to discover that most of the required literature in English and Spanish was available in the Internet Archive’s National Emergency Library. As library staff moved to work from home, they grabbed the tools to digitize the books in Papiamento that were missing. Many local authors were easy to track down and most gladly gave permission for free downloads or loaning their works. Scholing reports, “Some of them choose digital lending. But a lot of them  say, ‘Well it was a limited print run….I’ve sold all the copies of my books, now you can just make it available for download.’

Preservation Pays Off

Classroom in Aruba, 1944, filled with children of expatriates, working in oil refineries.

For many years, the library’s small Special Collections staff had been diligently digitizing key collections: photographs, historic texts, newspapers, and perhaps the world’s largest collection of texts in Papiamento. But with few technical resources, the National Library of Aruba had no way to provide access to those works. Scholing says the Internet Archive proved to be the “missing link.” In March 2019, the Library was able to unveil its new Digital Collection, 18,800 texts, videos and audio now accessible to the world on archive.org. Today, with libraries and schools closed, these materials are the keys to unlocking the doors to online learning.

 “We didn’t imagine something like the Covid crisis could happen,” said Scholing. “But for our preservation efforts, this is the Big One. We are really lucky to be able to provide access to information that we couldn’t otherwise without the Internet Archive.”

This Papiamento literary journal is among the 18,800 items now online thanks to the Biblioteca Nacional Aruba

When Waitlists Won’t Work

Novels, biographies and non-fiction titles in Papiamento are part of the Aruban curriculum and now many are accessible online

Although Scholing had permission from the authors to lend their recent books, several times we accidentally reinstituted the waiting list, since the National Emergency Library does not include books from the last five years. That meant students reading the work suddenly would have had to wait, sometimes for weeks, to move up the waiting list. Scholing wrote to us immediately:  “There must be an alternative. I’m getting emails from students and teachers already.”

Eventually we worked out the kinks so Aruba’s books in the National Emergency Library wouldn’t get taken down. In addition, hundreds of texts in Papiamento from 1844-2020 are now available without waitlist. It’s part of a bigger vision on the island to teach students to read and write the language they speak at a higher level. “A lot of textbooks come straight from the Netherlands…you are reading about snow, trains and windmills,” Scholing explained. “It’s better to use something from a newspaper or magazine produced locally…It’s their own context. It speaks more to them.”

He even received this note from a local author, written in Papiamento:

Peter aprecia, (Dear Peter,)

Hopi admiracion pa e trabou cu bo ta desplegando pa Aruba y nos hendenan.

(A lot of admiration for the work that you are carrying out for Aruba and for our people.)

This week, schools in Aruba are scheduled to reopen. Since March, the library has tripled the number of items in its digital collection, and visitors have increased by 300%. Scholing sees this as evidence that  the National Emergency Library will have lasting benefit. “All the thresholds and barriers to access this unique information have been lifted, once you put it online.”

You can now access newspapers, photos, maps, government publications, literature and rare books from Aruba in their collection at the Internet Archive.