Looking for a research paper but can’t find a copy in your library’s catalog or popular search engines? Give Internet Archive Scholar a try! We might have a PDF from a “vanished” Open Access publisher in our web archive, an author’s pre-publication manuscript from their archived faculty webpage, or a digitized microfilm version of an older publication.
We hope Internet Archive Scholar will aid researchers and librarians looking for specific open access papers that may not be otherwise available to them. Judith van Stegeren (@jd7g on Twitter), a PhD candidate in the Netherlands, encountered just such a situation recently when sharing a workshop paper on procedural generation in computer games: “Towards Qualitative Procedural Generation” by Mark R. Johnson, originally presented at the Computational Creativity & Games Workshop in 2016. The papers for this particular year of the workshop are not indexed in the usual bibliographic catalogs, and the original workshop website hosting the Open Access papers is no longer accessible. Fortunately, copies of all the 2016 workshop papers were captured in the Wayback Machine, and can be found today by searching IA Scholar by title or conference name.
As another example, dozens of papers from the Open Journal of Hematology are no longer resolvable via DOI. As mentioned in a previous blog post, the publisher’s website vanished and has been replaced with unrelated advertisements. But before that happened, the papers were captured in the Wayback Machine, indexed in our catalog, and can now be searched in full:
IA Scholar is a simple, access-oriented interface to content identified across several Internet Archive collections, including web archives, archive.org files, and digitized print materials. The full text of articles is searchable for users that are hunting for particular phrases or keywords. This complements our existing full-text search index of millions of digitized books and other documents on archive.org.
The service builds on Fatcat, an open catalog we have developed to identify at-risk and web-published open scholarly outputs that can benefit from long-term preservation, additional metadata, and perpetual access. Fatcat includes resources that may be useful to librarians and archivists, such as bulk metadata dumps, a read/write API, command-line tool, and file-level archival metadata. If you are interested in collaborating with us, or are a researcher interested in text analysis applications, we have a public chat channel or can be contacted by email at email@example.com.
IA Scholar marks a milestone in our work initiated in 2018 to leverage the automation and scale of web and API harvesting in providing open infrastructure for the preservation of and perpetual access to scholarly materials from the public web. We particularly want to thank the Mellon Foundation for their original and ongoing support of this work, our many current partners, and the other collaborators, contributors, and volunteers.
The Internet Archive is bringing more periodicals and scholarly resources to students directly and by working with disability offices in the United States, Canada and elsewhere.
As more students with disabilities pursue higher education, demand is growing for books, journal articles and other learning materials to be available in accessible formats. This includes digitizing print materials for people who are blind or have low vision, those with dyslexia or attention deficit/hyperactivity disorder (ADHD), and people with limited mobility who might have difficulty holding print documents.
The Internet Archive is part of an expanding effort to make it easier for people with print disabilities to access information by digitizing books, periodicals, and microfilm needed to succeed in school and beyond. Once print materials are converted to machine-readable formats, users can listen with a screenreader, text-to-speech software or other forms of audio delivery—starting, stopping, and slowing down the information flow, as well as change the colors of text and background of pages.
With 10 percent or more of students at colleges in the United States requesting accessibility accommodations (Government Accountability Office, 2009, p.37), providing digitized learning materials is critical. Each semester Disability Service Offices (DSOs) on campuses respond to student requests to convert materials into accessible formats—often doing so in silos with limited budgets.
Libraries are being called into action to coordinate the delivery of accessible instructional materials. Doing its part to improve access to knowledge for all, Internet Archive is collaborating with others to share its collection and streamline the search process.
A level playing field
“There is a need for a fast turnaround with materials. Students [with print disabilities] need a level playing field,” said John Unsworth, dean of libraries at the University of Virginia. “The library is not just here for the able-bodied.”
UVA is working with the Internet Archive, BookShare, and the HathiTrust to reduce duplication of efforts across the country to convert text materials to accessible formats. Together, they are participating in the Federating Repositories of Accessible Materials for Higher Education (FRAME) project funded with a $1 million grant from The Andrew W. Mellon Foundation.
Since 2019, the partners have established Educational Materials Made Accessible (EMMA), a hub and repository for digitized materials. The pilot includes six other universities: George Mason University, University of Virginia, Texas A&M University, University of Illinois at Urbana-Champaign, University of Northern Arizona, University of Wisconsin-Madison and Vanderbilt University.
“When looking at the intersection between copyright and civil rights…civil rights win every time”
– John Unsworth, university librarian, University of Virginia
EMMA provides DSO staff (on behalf of students) with a central place to retrieve—and library staff re-deposit—machine-readable texts from the Internet Archive, HathiTrust, and Bookshare. It provides a searchable database to locate materials requested by students more efficiently. Users can filter by repository, format and accessibility features—which will become more valuable as texts are remediated. The project relies on the Internet Archive as a large digital repository to provide a federated network of storage and delivery, as well as technical expertise.
Unsworth said the goal of EMMA is to speed up access to materials and help DSOs avoid duplication. If faculty tinker with a syllabus and add a book at the last minute, students with print disabilities need to be able to have a copy they can use at the same time their peers do. “It’s the nature of education that what you need to read changes during the semester,” Unsworth said. “[Students with print disabilities] can’t get materials at the last minute when everyone else has had it for two weeks.”
Often, libraries are not involved in collecting, cataloguing, or preserving educational materials for people with disabilities on their own campus, or making them discoverable to others. EMMA is designed to connect DSOs and libraries on the same campus — and with other institutions. Once materials are remediated, DSOs put them in a drop box that the library validates with the new metadata and uploads it. “Libraries shoulder the burden of sharing—and by doing that, they help fulfill their mission,” Unsworth said.
Despite publisher warnings about what DSOs can do with their remediated content, Unsworth said concerns are not supported by law. “When looking at the intersection between copyright and civil rights…civil rights win every time,” Unsworth said. “Libraries are used to pushing back on publisher claims. Libraries bring a willingness to stand up to appropriate use rights.”
A coordinating hub for materials was desperately needed and, Unsworth said, something DSOs have been waiting to have for years.
“Everyone should have the same shot at succeeding”
– Angella Anderson, disability specialist, University of Illinois at Urbana-Champaign
Based on a student’s syllabus, Angella Anderson, a disability specialist at the University of Illinois at Urbana-Champaign, arranges for needed accessible materials for students at all levels—from undergraduates to law students to doctoral students. “We have several students who—without this service—would have had significant challenges being successful in their programs.”
Now, with EMMA, if a book or journal article a student needs is already shared on the hub, the DSO can download it and save time. Anderson estimates it has cut her time searching for learning materials by half. “The problem we’ve all had over the years is that we are converting the same book at the same time. That’s a huge resource drain,” Anderson said, noting the potential benefit of EMMA. “Everyone should have the same shot at succeeding at whatever it is they want to do, so I feel this will be extremely useful to a lot of schools and a lot of students.”
Canadian efforts advance
In Canada, the Internet Archive supports work of the Accessible Content E-Portal (ACE), a service of the Ontario Council of University Libraries. At the Internet Archive digitization center at the University of Toronto, staff digitize on demand and prioritize requests received by ACE from students who need materials for accessibility. The turnaround used to take weeks, but Andrea Mills, digitization program manager, said the system has been improved and students with print disabilities now can get materials digitized often in less than two days.
Mills said requested materials most often include non-fiction research books and novels, often printed between 1990 and 2010—before e-books were widely available. Elsewhere in Canada at the University of Alberta, another Internet Archive scanning center provides the same service, through their Accessibility Resources office, to students who have qualifying perceptual challenges..
“Sometimes people not part of the mainstream are forgotten,” Mills said. “It may only be a handful of users who have this need, and not represent a high number of downloads or uses, but these are people who truly need assistance.”
Librarians: Join our free program to qualify your patrons to access the Internet Archive’s resources for users with print disabilities. Individuals can gain access by having a qualifying authority like the Vermont Mutual Aid Society enroll you in their program.
Like campuses across the country, Howard University in Washington, D.C., shut down last March when COVID-19 hit. Most of its nearly 6,000 undergraduate students have been remote learning ever since.
Without access to the physical library, demand for e-books has increased. The university recently joined the Open Libraries program to expand the digital materials that students can borrow. Through the program, users can check out a digital version of a book the library owns using controlled digital lending (CDL).
Amy Phillips, head of technical services for Howard University Libraries, learned about the opportunity last fall through the Washington Research Library Consortium. Howard is one of nine D.C.-area libraries in the nonprofit consortium, which recently collaborated with the Internet Archive to do an overlap analysis of its shared collection. When the digital materials became available to use for free through the consortia, Howard decided to join, too.
After Alisha Strother, metadata librarian, ran an analysis of books in the Howard collection by International Standard Book Number (ISBN), it was discovered that more than 14,000 books matched a copy that the Internet Archive had acquired and digitized. Howard decided to join the Open Libraries program in January. This means that students can now check out these Howard books from across the country as they engage in online instruction.
“I see this as being an important resource for students to be able to access materials from anywhere,” Phillips said. “And I think it will have value and be heavily utilized even when we are back on campus.”
Howard is one of nearly 100 historically black colleges and universities (HBCU) in the United States. One of Howard’s most important entities is the Moorland-Spingarn Research Center, which is recognized as one of the world’s largest and most comprehensive repositories for the documentation of the history and cultural of people of African descent in Africa, the Americas and other parts of the world. Portions of its materials are also now available for digital borrowing through the Open Libraries program. The collection will now have greater exposure since it had previously only been accessible onsite for researchers who scheduled appointments.
“This opens up a premier collection to public usage. From a scholarly and cultural point of view, this material is very much in demand,” Phillips said. “Looking forward, we think it will get a lot of traffic.”
COVID-19 has disproportionately affected people of color, prompting Howard to be cautious and extending online learning into the spring semester for most all students, Phillips said. The university is doing all it can to connect students with resources and its libraries have been investing more in digital items. But budgets are limited and licensing agreements curb the library’s ability to broadly lend e-books.
“The Internet Archive has been an important way to open up more library materials to students,” said Phillips, adding that it’s new and just beginning to be promoted to students and faculty. “We’re excited and we know this will have a positive impact on student success and scholarship.”
Sometimes they arrive tied up in string because their binding is broken. Others are in envelopes to protect the brittle pages from further damage.
Aging books are sent from libraries to the Internet Archive for preservation. Thanks to the careful work of the nearly 70 people who scan at digitization centers in the United States, United Kingdom and Canada, the books get a second life with a new audience.
Scanners sometimes call these “Last Chance Books” and they take pride in restoring them. As they turn the pages one at time to be photographed and digitized, they develop a daily cadence—but it must be adjusted with fragile materials.
“We do our best with the flaking or cracking pages,” said Andrea Mills, digitization program manager for the Internet Archive stationed in Toronto, Canada. “You have to be really cautious that the flake doesn’t fall off and cover a word. It’s almost like a puzzle.”
Some books that land at the Internet Archive digitization centers date back to the 1700s. They are fiction and nonfiction, journals and pamphlets covering a range of topics. And, it can be surprising to learn what reviving the material means to patrons.
“We chuckled when we digitized a book on sea captains. We thought – who will care? And then a year later, it had hundreds of views,” said Elizabeth MacLeod, senior manager of satellite digitization services who manages remote operations out of Wilmington, North Carolina.
Digitization helps preserve materials that are no longer in circulation at their holding library because they are falling apart. It also gives new exposure to books that are out of print that may otherwise be forgotten.
Both Mills and MacLeod began working for the Internet Archive more than 10 years ago as book scanners – also known as Scribe operators. Mills has an arts degree in jewelry design and teaching; MacLeod studied biology. They were both drawn to the mission of the Internet Archive and share a passion of connecting people with resources.
Over the years, Mills and MacLeod have worked closely with librarians and archivists around the world to digitize their collections, learning more with each project. They now manage digitization and support sites with training and best practices, many embedded in libraries, in 10 countries and upwards of 30 locations. Digitizing is a somewhat solitary task and some people “get in the zone” while scanning; others are very chatty or listen to music, Mills said.
Many employees have worked together for nearly a decade and there is a friendly, collaborative vibe at the centers. “We have all sorts of people—artists, printers and photographers. They are people who are meticulous and love books,” Mills said. A recent viral video shared on the Internet Archive’s Twitter account features Scribe operator Eliza Zhang, who has worked at the Archive for more than ten years. Book conservators from larger institutional partners also offer additional training for Internet Archive operators on best practices for handling their unique collections.
MacLeod says the scanners are all committed to providing a service to readers and it’s satisfying to help people with disabilities connect with books, “It’s energizing to be part of an organization that is thinking outside the box,” she said. “I want people to be able to have more access to whatever they are trying to find.”
Added Mills: “I’m an information junky. I love the search and the hunt and the finding the answer. The power of the internet and digitization is that you can find that answer faster. It just sort of opens up the possibilities of what you can do.”
Written by Professor Tom Gally, University of Tokyo, and Katie Barrett, Internet Archive & JET Program Alum Translations by Tomoki Sakakibara, University of Tokyo
(日本語はページ下部にあります。 Scroll down for Japanese version.)
As our global society grows ever more connected, it can be easy to assume that all of human history is just one click away. Yet language barriers and physical access still present major obstacles to deeper knowledge and understanding of other cultures, even on the world wide web. That is why the Internet Archive is thrilled to announce a new partnership with the University of Tokyo General Library. Spearheaded by Masaya Nakatake as a member of the UTokyo Academic Archives Project Office, the Internet Archive partnership provides expanded access and a digital backup for some of the library’s most precious artifacts.
Since June 2020, our Collections team has worked in tandem with library staff to ingest thousands of digital files from the General Library’s servers, mapping the metadata for over 4,000 priceless scrolls, texts, and papers. The collection, representing meticulous digitization efforts by Japanese historians and scholars, showcases hundreds of years of rich Japanese history expressed through prose, poetry, and artwork.
Most of the works are written in Japanese, but some of them include illustrations that can be appreciated by anyone now. A search through the collection for 地震 (jishin, “earthquake”), for example, yields a fascinating set of depictions of earthquakes and their impact in past centuries.
In one satirical illustration, thought to date from shortly after the 1855 Edo earthquake, courtesans and others from the demimonde, who suffered greatly in the disaster, are shown beating the giant catfish that was believed to cause earthquakes. The men in the upper left-hand corner represent the construction trades; they are trying to stop the attack on the fish, as rebuilding from earthquakes was a profitable business for them.
Other highlights are high-resolution images from the Kamei Collection of original etchings from Opere di Giovanni Battista Piranesi, Francesco Piranesi e d’altri, originally published by Firmin Didot Freres in Paris between 1835 and 1839.
We hope this partnership and collection will expand access to history and culture from Japan and spur a new generation of usage and scholarship.
About the University of Tokyo General Library
The University of Tokyo was established in 1877 as the first national university in Japan. As a leading research university, UTokyo offers courses in essentially all academic disciplines at both undergraduate and graduate levels and conducts research across the full spectrum of academic activity. The University of Tokyo Library System is composed of 30 libraries, with the General Library being the largest among them. While providing services to the researchers and students of UTokyo, the General Library also plays a central role in the operation and management of the Library System. The General Library’s history can be traced back nearly 130 years to the university’s founding and it now houses approximately 1.3 million books, including rare collections inherited from academies in the Edo period.
About the Internet Archive
The Internet Archive is one of the largest libraries in the world and home of the Wayback Machine, a repository of 475 billion webpages. Founded in 1996 by Internet Hall of Fame member Brewster Kahle, the Internet Archive now serves more than 1.5 million patrons each day, providing access to 70 petabytes of data—books, web pages, music, television, and software—and working with more than 800 library and university partners to create a digital library, accessible to all. To make a donation to the Internet Archive, please visit https://archive.org/donate/
Academics, legal experts, and authors explained the thoughtful reasoning and compelling need for libraries to engage in Controlled Digital Lending (CDL) at a webinar hosted by the Internet Archive and Library Futures on February 11. A recording of the session is now available.
The panel dispelled myths about CDL, the digital lending model in which a library lends a digital version of a print book it owns. Emphasizing the limited and controlled aspect of the practice, the speakers said CDL allows libraries to fulfill their mission of serving the public in the digital age. The global pandemic only underscores the importance of providing flexibility in how people can access information.
Isn’t CDL digital piracy? No, CDL is not like Napster, said Kyle K. Courtney, copyright advisor at Harvard University, referring to the music file-sharing service. Twenty years ago, the actions of Napster were ruled illegal because it made unlimited reproductions of MP3 music to anyone, anywhere.
“CDL uses technology to replicate a library’s right to loan works in a digital format—one user at a time,” Courntey said. Libraries are using rights they already have, leveraging the same technology as publishers to make sure that the books are controlled when they’re loaned—not duplicated, copied or redistributed.
“Libraries are not pirates. There is a vast difference between the Napster mission and the library mission,” Courtney said. “We can loan books to patrons. Only now we’re harnessing that right in the digital space.”
In laying out the rationale behind CDL, Courtney described the “superpower” granted to libraries by Congress through copyright law to serve the public. The “fair use” section of the law allows libraries to responsibly lend materials, and experts say logically includes both print and digital works.
The idea of “fair use” has been around as long as there has been copyright, and it applies to new technologies, said Michelle Wu, attorney and law librarian at the webinar. The Internet Archive did not invent CDL. Wu is the visionary behind CDL, developing the concept in 2002 as a way to protect a library’s print collection from natural disaster—an imperative she faced in rebuilding a library destroyed by flooding.
Just as libraries lend out entire books, fair use allows the scanning of whole books, said panelist Sandra Aya Enimil, copyright librarian and contract specialist at Yale University. The law makes no mention of the amount of material that can be made available under “fair use,” so for libraries to fulfill their purpose they can make complete books—whether in print or digital—available to patrons, she said.
It’s a myth that librarians need author and publisher permission for CDL, explained Jill Hurst-Wahl, copyright scholar and professor emerita in Syracuse University’s School of Information Studies. “Authors and publisher control ends at the time a book is published, then fair use begins,” she said. “Once a work is legally acquired by you, by a library, the copyright owners’ rights are exhausted.”
Library lending is viewed as fair use, in part, because it is focused on socially beneficial, non-commercial outcomes, like literacy, said Hurst-Wahl. Also, libraries loan physical books without concern about the market effect—so the same rules apply if a digital version of the book is substituted.
CDL does not harm authors or publisher sales, the panelists emphasized. Indeed, it can provide welcome exposure.
“The reality is that CDL can help authors by enhancing discoverability, availability and accessibility of their works,” said Brianna Schofield, executive director of Authors Alliance, speaking at the event. “It helps authors to spread their ideas, and it helps authors to build their audiences.”
Many of the books that are circulated by CDL are rare, out-of-print books that would otherwise be unavailable. This source material can be useful for writers as they develop their creative works.
“Digital and physical libraries contribute to a healthy publishing ecosystem and increase sales and engagement for creative works,” said Jennie Rose Halpin, executive director of Library Futures, a newly formed nonprofit coalition advocating for libraries to operate in the digital space. Research shows that leveraged digitization increases sales of physical additions by about 34% and increases the likelihood of any sale by 92%, particularly for less popular and out-of-print works.
Because digitized versions can be made more readily available, CDL can extend access to library collections to people with print disabilities or mobility issues, the panelists noted. CDL also allows libraries to preserve material in safe, digital formats with the best interest of the public—not profits—at the center of its work.
“People love books and will buy if they’re able. But we have to remember that paper books and even some ebooks do not serve the needs of all readers,” said Andrea Mills, digitization program manager at the Internet Archive and lead on the Archive’s accessibility efforts. “Accessibility is a human right that must be vigilantly protected.”
For anyone interested in learning more about how to get involved with CDL, the Internet Archive now has 2 million books available to borrow for free, and an active program for libraries that want to make their collections available through CDL.
“The CDL community of practice is thriving,” said Chris Freeland, director of Open Libraries at the Internet Archive. “We are in a pandemic. Libraries are closed. Schools are closed. CDL just makes sense and solves problems of access.”
The glass rises and falls. Quickly and efficiently, a woman turns the pages to the rhythmic beep of the cameras. She never misses a beat.
In its first 48 hours, this tweet about book scanning at the Internet Archive went viral, reaching 7.7 million people. More than 1.5 million people viewed the video, liking it 70,000 times and retweeting it 24,000 more. At the center of it all sits Eliza Zhang, a book scanner at the Internet Archive’s headquarters in San Francisco since 2010. When I asked Eliza what she likes about her job, she replied, “Everything! I find everything interesting. I don’t feel it is boring. Every collection is important to me.”
Eliza, a college graduate from southern China, immigrated to the United States in 2009, seeking a new life and new opportunities. She landed in San Francisco during the midst of an economy-crushing recession. But through a city program called JobsNOW, the Internet Archive hired Eliza and scores of other job seekers, training them to digitize, quality control, and upload metadata for books, newspapers, periodicals and manuals. Often our digitizing staff are making these analog texts available online for the first time.
Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.
If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.
Listening to 70s and 80s R & B while she works, Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? “Working with very old, fragile books. The paper is very thin. I always wear rubber fingertips and sometimes gloves when I scan newspapers, because of the ink,” she explained.
Tweets Spark a New Interest in Digitization
Eliza is one of about 70 Scribe operators at the Internet Archive, working in digitization centers embedded in libraries across the United States, United Kingdom, and Canada. The operations are led by Elizabeth MacLeod, who manages our remote operations, and Andrea Mills, who is stationed at the University of Toronto, with support from managers and operators in each center.
“We try to meet libraries where they are,” said MacLeod, who manages remote operations from her home office in North Carolina. “From digitizing a few shipments a year at one of our regional centers to setting up and staffing full-service digitization within the library itself, we have a flexible approach to our library partnerships.”
Across Twitter, another common question arose: “Why hasn’t this job been automated?” To many, the repetitive act of turning the pages in a book and photographing them seems like the natural task for a robot. In fact, some 20 years ago, we tested commercial book scanners that feature a vacuum-powered page-turning arm. It turns out those automated scanners didn’t really work well for brittle books, rare volumes, and other special collections—the kinds of material our library partners ask us to digitize.
“Clean, dry human hands are the best way to turn pages,” said Mills, from her socially-distanced office at the University of Toronto. In her 15 years on the job, she has worked with hundreds of librarians to hone our digitization operations, balancing our need to preserve the original pages with minimal impact during the imaging process. “Our goal is to handle the book once and to care for the original as we work with it,” Mills explained.
So what does it take to be a Scribe operator? “It takes a level of zen,” wrote Brewster Kahle, founder and digital librarian of the Internet Archive, responding to one of the many threads about the video that popped up on Reddit. “It takes concentration and a love of books. For those who love working with books and libraries, it fits well.”
As for the hardware used for digitization, like much at the Internet Archive, the equipment is engineered and purpose-built for the job. In the viral video, Eliza is operating the original Scribe machine, designed more than 15 years ago, and Scribe software that was developed in-house and refined continuously over years of operation. “The variation in books makes [automation] difficult to do quickly and without damage,” Kahle elaborates. “We do not disbind the books, which also makes automation more difficult.”
18,000 Books and Climbing
In the decade Eliza has been working with the Internet Archive, she has scanned more than 3 million pages, 14,000 foldouts, and 18,000 items (mostly books).
And what about all the sudden social media attention? Eliza shrugs. She’s never been on Twitter before. “My goal is to guarantee zero errors,” she said. “I want to give our readers a satisfying experience.”
Digitize With Us
The Covid-19 pandemic has both created higher demand for digital content as well as shuttered some of our scanning centers for health and safety. We have reopened following local and national health guidelines and continue to engage with new libraries on their digitization projects.
The Internet Archive is wholly dependent on Ubuntu and the Linux communities that create a reliable, free (as in beer), free (as in speech), rapidly evolving operating system. It is hard to overestimate how important that is to creating services such as the Internet Archive.
When we started the Internet Archive in 1996, Sun and Oracle donated technology and we bought tape robots. By 1999, we shifted to inexpensive PC’s in a cluster, running varying Linux distributions.
At this point, almost everything that runs on the servers of the Internet Archive is free and open-source software. (I believe our JP2 compression library may be the only piece of proprietary software we use.)
For a decade now, we have been upgrading our operating system on the cluster to the long-term support server Linux distribution of Ubuntu. Thank you, thank you. And we have never paid anything for it, but we submit code patches as the need arises.
Does anyone know the number of contributors to all the Linux projects that make up the Ubuntu distribution? How many tens or hundreds of thousands? Staggering.
Ubuntu has ensured that every six months a better release comes out, and every two years a long-term release comes out. Like clockwork. Kudos. I am sure it is not easy, but it is inspiring, valuable and important to the world.
We started with Linux in 1997, we started with Ubuntu server release Warty Warthog in 2004 and are in the process of moving to Focal (Ubuntu 20.4).
Depending on free and open software is the smartest technology move the Internet Archive ever made.
The Internet Archive has reached a new milestone: 2 million. That’s how many modern books are now in its lending collection—available free to the public to borrow at any time, even from home.
“We are going strong,” said Chris Freeland, a librarian at the Internet Archive and director of the Open Libraries program. “We are making books available that people need access to online, and our patrons are really invested. We are doing a library’s work in the digital era.”
The lending collection is an encyclopedic mix of purchased books, ebooks, and donations from individuals, organizations, and institutions. It has been curated by Freeland and other librarians at the Internet Archive according to a prioritized wish list that has guided collection development. The collection has been purpose-built to reach a wide base of both public and academic library patrons, and to contain books that people want to read and access online—titles that are widely held by libraries, cited in Wikipedia and frequently assigned on syllabi and course reading lists.
“The Internet Archive is trying to achieve a collection reflective of great research and public libraries like the Boston Public Library,” said Brewster Kahle, digital librarian and founder of the Internet Archive, who began building the diverse library more than 20 years ago.
“Libraries from around the world have been contributing books so that we can make sure the digital generation has access to the best knowledge ever written,” Kahle said. “These wide ranging collections include books curated by educators, librarians and individuals, that they see are critical to educating an informed populace at a time of massive disinformation and misinformation.”
Everyday about 3,500 books are digitized in one of 18 digitization centers operated by the Archive worldwide. While there’s no exact way of identifying a singular 2 millionth book, the Internet Archive has chosen a representative title that helped push past the benchmark to highlight why its collection is so useful to readers and researchers online.
On December 31, The dictionary of costume by R. Turner Wilcox was scanned and added to the Archive, putting the collection over the 2 million mark. The book was first published in 1969 and reprinted throughout the 1990s, but is now no longer in print or widely held by libraries. This particular book was donated to Better World Books via a book bank just outside of London in August 2020, then made its way to the Internet Archive for preservation and digitization.
As expected from the title, the book is a dictionary of terms associated with costumes, textiles and fashion, and was compiled by an expert, Wilcox, the fashion editor of Women’s Wear Daily from 1910 to 1915. Given its authoritative content, the book made it onto the Archive’s wish list because it is frequently cited in Wikipedia, including on pages like Petticoat and Gown.
Now that the book has been digitized, Wikipedia editors can update citations to the book and include a direct link to the cited page. For example, users reading the Petticoat page can see that page 267 of the book has been used to substantiate the claim that both men & women wore a longer underskirt called a “petticote” in the fourteenth century. Clicking on that reference will take users directly to page 267 in The dictionary of costume where they can read the dictionary entry for petticoat and verify that information for themselves.
An additional reason why this work is important is that there is no commercial ebook available for The dictionary of costume. This book is one of the millions of titles that reached the end of its publishing lifecycle in the 20th century, so there is no electronic version available for purchase. That means that the only way of accessing this book online and verifying these citations in Wikipedia—doing the kind of research that students of all ages perform in our connected world—is through a scanned copy, such as the one now available at the Internet Archive.
Donations play an important role
Increasingly, the Archive is preserving many books that would otherwise be lost to history or the trash bin.
In recent years, the Internet Archive has received donations of entire library collections. Marygrove College gave more than 70,000 books and nearly 3,000 journal volumes for digitization and preservation in 2019 after the small liberal arts college in Detroit closed. The well-curated collection, known for its social justice, education and humanities holdings, is now available online at https://archive.org/details/marygrovecollege.
Just like The dictionary of costume, many of the books supplied for digitization come to the Archive from Better World Books. In its partnership over the past 10 years, the online book seller has donated millions of books to be digitized and preserved by the Archive. Better World Books acquires books from thousands of libraries, book suppliers, and through a network of book donation drop boxes (known as “book banks” in the UK), and if a title is not suitable for resale and it’s on the Archive’s wish list, the book is set aside for donation.
“We view our role as helping maximize the life cycle and value of each and every single book that a library client, book supplier or donor entrusts to us,” said Dustin Holland, president and chief executive officer of Better World Books. “We make every effort to make books available to readers and keep books in the reading cycle and out of the recycle stream. Our partnership with the Internet Archive makes all this possible.”
The Archive provides another channel for customers to find materials, Holland added.
“We view archive.org as a way of discovering and accessing books,” said Holland. “Once a book is discoverable, the more interest you are going to create in that book and the greater the chance it will end up in a reader’s hands as a new or gently used book.”
Having books freely available for borrowing online serves people with a variety of needs including those with limited access to libraries because of disabilities, transportation issues, people in rural areas, and those who live in under-resourced parts of the world.
Sean, an author in Oregon said he goes through older magazines for design ideas, especially from cultures that he wouldn’t be exposed to otherwise: “It gives me a wider understanding of my small place in the global historical context.” One parent from San Francisco said she uses the lending library to learn skills like hand drawing to draw characters and landscapes to interact deeper with her child.
The need for information is more urgent than ever.
“We are all homeschoolers now. This pandemic has driven home how important it is to have online access to quality information,” Kahle said. “It’s gratifying to hear from teachers and parents that are now given the tools to work with their children during this difficult time.”
Kahle’s vision is to have every reference in Wikipedia be linked to a book and for every student writing a high school report to have access to the best published research on their subject. He wants the next generation to become authors of the books that should be in the library and the most informed electorate possible.
Adds Kahle: “Thank you to all who have made this possible – all the funders, all the donors, the thousands who have sent books to be digitized. If we all work together, we can do another million this year.”
Many students do not have direct or unrestricted access to their local libraries during our current health crisis. One of our goals as librarians and stewards is to bring books to these learners of all ages as they continue their educations at home.
As a step toward this, we have created a collection of California State Suggested Reading that is based on resources from the California Department of Education. This is intended to help students, teachers and their families find books for further learning. (As with any collection, we recommend that adults review items for age-appropriateness before passing them on to children.)
And we’ve also created some resource lists for different areas of interests in the Kid Friendly section of our help center. These are fun to explore by yourself, in company or over the internet with friends, and cover topics like :