Tag Archives: Books

Library as Laboratory Recap: Analyzing Biodiversity Literature at Scale

At a recent webinar hosted by the Internet Archive, leaders from the Biodiversity Heritage Library (BHL) shared how its massive open access digital collection documenting life on the planet is an invaluable resource of use to scientists and ordinary citizens.

“The BHL is a global consortium of the  leading natural history museums, botanical gardens, and research institutions — big and small— from all over the world. Working together and in partnership with the Internet Archive, these libraries have digitized more than 60 million pages of scientific literature available to the public”, said Chris Freeland, director of Open Libraries and moderator of the event.

Watch session recording:

Established in 2006 with a commitment to inspiring discovery through free access to biodiversity knowledge, BHL has 19 members and 22 affiliates, plus 100 worldwide partners contributing data. The BHL has content dating back nearly 600 years alongside current literature that, when liberated from the print page, holds immense promise for advancing science and solving today’s pressing problems of climate change and the loss of biodiversity.

Martin Kalfatovic, BHL program director and associate director of the Smithsonian Libraries and Archives, noted in his presentation that Charles Darwin and colleagues famously said “the cultivation of natural science cannot be efficiently carried on without reference to an extensive library.”

“Today, the Biodiversity Heritage Library is creating this global, accessible open library of literature that will  help scientists, taxonomists, environmentalists—a host of people working with our planet—to actually have ready access to these collections,” Kalfatovic said. BHL’s mission is to improve research methodology by working with its partner libraries and the broader biodiversity and bioinformatics community. Each month, BHL draws about 142,000 visitors and 12 million users overall.

“The outlook for the planet is challenging. By unlocking this historic data [in the Biodiversity Heritage Library], we can find out where we’ve been over time to find out more about where we need to be in the future.”

Martin Kalfatovic, program director, Biodiversity Heritage Library

Most of the BHL’s materials are from collections in the global north, primarily in large, well-funded institutions. Digitizing these collections helps level the playing field, providing researchers in all parts of the world equal access to vital content.

The vast collection includes species descriptions, distribution records, climate records, history of scientific discovery, information on extinct species, and records of scientific distributions of where species live. To date, BHL has made over 176,000 titles and 281,000 volumes available. Through a partnership with the Global Names Architecture project, more than 243 million instances of taxonomic (Latin) names have been found in BHL content.

Kalfatovic underscored the value of BHL content in understanding the environment in the wake of recent troubling news from the Sixth Assessment Report (AR6) published by the  Intergovernmental Panel on Climate Change about the impact of the earth’s warming. 

Biodiversity Heritage Library by the numbers.

“The outlook for the planet is challenging,” he said. “By unlocking this historic data, we can find out where we’ve been over time to find out more about where we need to be in the future.”

JJ Dearborn, BHL data manager, discussed how digitization transforms physical books into digital objects that can be shared with “anyone, at any time, anywhere.” She describes the Wikimedia ecosystem as “fertile ground for open access experimentation,” crediting the organization with giving BHL the ability to reach new audiences and transform its data into 5-star linked open data. “Dark data” that is locked up in legacy formats, JP2s, and OCR text are sources of valuable checklist, species occurrence, and event sampling data that the larger biodiversity community can use to improve humanity’s collective ability to monitor biodiversity loss and the destructive impacts of climate change, at scale.  

The majority of the world’s data today is siloed, unstructured, and unused, Dearborn explained. This “dark data” “represents an untapped resource that could really transform human understanding if it could be truly utilized,” she said. “It might represent a gestalt leap for humanity.” 

The event was the fifth in a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The final session of the Library as Laboratory series will be a series of lightning talks on May 11 at 11am PT / 2pm ET—register now!

Helping Ukrainian Scholars, One Book at a Time

The Internet Archive is proud to partner with Better World Books to support Ukrainian students and scholars. With a $1 donation at checkout during your purchase at betterworldbooks.com, you will help provide verifiable information to Ukrainian scholars all over the world through Wikipedia.

Since 2019, the Internet Archive has worked with the Wikipedia community to strengthen citations to published literature. Working in collaboration with Wikipedians and data scientists, Internet Archive has linked hundreds of thousands of citations in Wikipedia to books in our collection, offering Wikipedia editors and readers single-click access to the verifiable facts contained within libraries. 

Recently, our engineers analyzed the citations in the Ukrainian-language Wikipedia, and were able to connect citations to more than 17,000 books that have already been digitized by the Internet Archive, such as the page for Геноміка (English translation: Genomics), which links to a science textbook published in 2002. Through this work, we discovered that there are more than 25,000 additional books that we don’t have in our collection—and that’s where you can help! 

Now through the end of June, when you make a $1 donation at checkout during your purchase at betterworldbooks.com, your donation will go to acquire books that are cited in the Ukrainian-language Wikipedia. Books acquired will be donated to Internet Archive for digitization and preservation. Once digitized, the books will be linked from their citations in Wikipedia, offering readers the ability to check facts in published literature. Books will be available for borrowing by one person at a time at archive.org, and will also be available for scholars to request via interlibrary loan. With your help, we can ensure that Ukrainian scholars and people studying Ukraine have access to authoritative, factual information about Ukrainian history and culture. 

Thank you for making a difference by buying books from Better World Books and helping Ukrainian students and scholars with your donation.

Event Recap: Why Trust a Corporation to Do a Library’s Job?

Although people are increasingly turning to Google to search for information, a corporate search engine is not the same as a trusted librarian. And while libraries are used to buying and preserving books, they are now often unable to buy and own digital materials because of publisher licensing restrictions.

The tension between the interests of business and the public was the focus of a conversation hosted by the Internet Archive and Library Futures on April 28. Wendy Hanamura moderated the event with guest panelists Joanne McNeil, author of Lurking: How a Person Became a User; Darius Kazemi, an internet artist and cofounder of Feel Train, a creative technology cooperative in Portland, Oregon; and Jennie Rose Halperin, executive director of Library Futures.

A recording of the event is now available:

Doing an online Google search can feel private because you are doing it alone at home, but corporations are accumulating your information and using it, said McNeil. The tools involved are imperfect and there are trade-offs involved.

“The experiences that a user has on the internet can be quite profound, creative, and very human,” McNeil said. “But to participate with a lot of the social media and websites, especially nowadays, you are dealing with corporations and you don’t have the elements of control.”

In Lurking, McNeil traces the evolution of the internet and how it has profoundly changed the way people communicate. She also examines concerns that people have online including privacy, safety, identity and anonymity. In the book, McNeil contrasts the short-term memories of companies with the preservation mission and public accountability of libraries.

Kazemi noted that working with librarians on research there is an understanding of privacy—something that is lacking when engaging online. “It’s a totally different accountability chain,” he said.

Rather than giving your personal information away on a social media network, Kazemi advocates having individuals or even libraries maintain small, independently-run online communities (see https://runyourown.social).

“Facebook can’t understand norms of what passes for civic discourse in every location on the planet. It’s impossible,” Kazemi said. “Libraries already spend time thinking about the norms of their communities,” making it natural to have content moderation at the local level.

Halperin said it’s important for public libraries to have autonomy to be able to fulfill their mission. Her work with the nonprofit Library Futures centers on advocacy for an equitable publishing ecosystem that serves authors, users and communities.

“Artificial scarcity that’s put on digital objects—as a way to create a market for digital books—is really hurting the public,” she said. “I think it’s one of the most important consumer protection issues right now.”

McNeil said the best thing to happen to her, as an author, is for people to read her book. Whether buying or borrowing from a library (in print or electronically), she wants to reach the largest audience.

The panelists said by working together, libraries can provide tools that reflect the public’s values and teach users smart digital citizenship. When corporations control what people have access to in searching, they are embedding bias into the distribution of information, said Halperin. “Libraries must engage in more than just individual information seeking needs, but also in the information seeking needs of communities.”

Register Now: A (re)Introduction to Book Scanning at the Internet Archive

The Internet Archive has been partnering with libraries to digitize their collections for more than 15 years. Following a recent viral video featuring our book digitization efforts, and increased demands for e-resources, we’ve had renewed interest in our book scanning partnerships, with libraries wondering how we might be able to help them reach their patrons through digitization. Join scanning center managers Andrea Mills and Elizabeth MacLeod for a virtual event to learn about the ways in which the Internet Archive can help turn your print collections digital, and the impacts that these digital collections are having on remote learners.

Registration for the virtual event is free and open to the public. The live session is being offered twice to accommodate schedules and flexibility; if you are interested in joining, you only need to register for one session:
March 24 @ 10am ET / 2pm GMT
March 25 @ 1pm ET / 5pm GMT

Meet Eliza Zhang, Book Scanner and Viral Video Star

The glass rises and falls. Quickly and efficiently, a woman turns the pages to the rhythmic beep of the cameras. She never misses a beat.

In its first 48 hours, this tweet about book scanning at the Internet Archive went viral, reaching 7.7 million people. More than 1.5 million people viewed the video, liking it 70,000 times and retweeting it 24,000 more. At the center of it all sits Eliza Zhang, a book scanner at the Internet Archive’s headquarters in San Francisco since 2010. When I asked Eliza what she likes about her job, she replied, “Everything! I find everything interesting. I don’t feel it is boring. Every collection is important to me.”

Eliza, a college graduate from southern China, immigrated to the United States in 2009, seeking a new life and new opportunities. She landed in San Francisco during the midst of an economy-crushing recession. But through a city program called JobsNOW, the Internet Archive hired Eliza and scores of other job seekers, training them to digitize, quality control, and upload metadata for books, newspapers, periodicals and manuals. Often our digitizing staff are making these analog texts available online for the first time.

Eliza Zhang in front of the Scribe (featured in the viral video) that she has operated for more than a decade.

Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.

If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.

Brittle, delicate fold-outs, like this page from “Early London theatres” (1894), make digitization a time-intensive task best handled by a human operator.

Listening to 70s and 80s R & B while she works, Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? “Working with very old, fragile books. The paper is very thin. I always wear rubber fingertips and sometimes gloves when I scan newspapers, because of the ink,” she explained.

Tweets Spark a New Interest in Digitization

Eliza is one of about 70 Scribe operators at the Internet Archive, working in digitization centers embedded in libraries across the United States, United Kingdom, and Canada. The operations are led by Elizabeth MacLeod, who manages our remote operations, and Andrea Mills, who is stationed at the University of Toronto, with support from managers and operators in each center.

“We try to meet libraries where they are,” said MacLeod, who manages remote operations from her home office in North Carolina. “From digitizing a few shipments a year at one of our regional centers to setting up and staffing full-service digitization within the library itself, we have a flexible approach to our library partnerships.”

Across Twitter, another common question arose: “Why hasn’t this job been automated?” To many, the repetitive act of turning the pages in a book and photographing them seems like the natural task for a robot. In fact, some 20 years ago, we tested commercial book scanners that feature a vacuum-powered page-turning arm. It turns out those automated scanners didn’t really work well for brittle books, rare volumes, and other special collections—the kinds of material our library partners ask us to digitize.

Scribe operators and staff at Internet Archive’s former digitization center in San Francisco, ca. 2011.

“Clean, dry human hands are the best way to turn pages,” said Mills, from her socially-distanced office at the University of Toronto. In her 15 years on the job, she has worked with hundreds of librarians to hone our digitization operations, balancing our need to preserve the original pages with minimal impact during the imaging process. “Our goal is to handle the book once and to care for the original as we work with it,” Mills explained.

So what does it take to be a Scribe operator? “It takes a level of zen,” wrote Brewster Kahle, founder and digital librarian of the Internet Archive, responding to one of the many threads about the video that popped up on Reddit. “It takes concentration and a love of books. For those who love working with books and libraries, it fits well.”

As for the hardware used for digitization, like much at the Internet Archive, the equipment is engineered and purpose-built for the job. In the viral video, Eliza is operating the original Scribe machine, designed more than 15 years ago, and Scribe software that was developed in-house and refined continuously over years of operation. “The variation in books makes [automation] difficult to do quickly and without damage,” Kahle elaborates. “We do not disbind the books, which also makes automation more difficult.”

18,000 Books and Climbing

In the decade Eliza has been working with the Internet Archive, she has scanned more than 3 million pages, 14,000 foldouts, and 18,000 items (mostly books).

And what about all the sudden social media attention? Eliza shrugs. She’s never been on Twitter before. “My goal is to guarantee zero errors,” she said. “I want to give our readers a satisfying experience.”


Digitize With Us

The Covid-19 pandemic has both created higher demand for digital content as well as shuttered some of our scanning centers for health and safety. We have reopened following local and national health guidelines and continue to engage with new libraries on their digitization projects. 

If your library is interested in learning more about the Internet Archive’s digitization services, visit https://archive.org/scanning, and contact us at digitallibraries@archive.org

Bay State College ‘Flips to Digital’ by Donating Entire College Library to the Internet Archive

Bay State College’s Boston Campus has donated its entire undergraduate library to the Internet Archive so that the digital library can preserve and scan the books, while allowing Bay State to gain much needed open space for student collaboration. By donating and scanning its 11,000-volume collection centered on fashion, criminal justice, allied health, and business books, Bay State’s Boston campus decided to “flip entirely to digital.”

When it came to what to do with the books, Jessica Neave, librarian at Bay State College, had to get creative. “I didn’t have a library close by willing to take our collection,” Neave explained. Shortly after reaching out to our partners at Better World Books, she stumbled upon the Inside Higher Education article about the Marygrove College Library donation. This led Neave to our physical item donation form, where she laid out her library’s tight timeline to deaccession its entire print collection. “You guys made it so easy,” Bay State’s librarian said. “It couldn’t have been any easier!”

Internet Archive team members having fun with the task of packing and shipping an entire library collection.

Under the direction of Neave, an Internet Archive team packed and shipped the 11,000 books in the first week of December.

Considering the future of Bay State’s books, its librarian is hopeful, noting, “Thanks to the Internet Archive, the books can live on as a cohesive collection.” Patrons can look forward to thumbing through historic fashion and textile books, texts on the history of the Civil Rights Movement, graphic novels, and even Bay State’s collection of historically banned young adult books.

Protecting Books From Harm With Controlled Digital Lending

Photo by: Jon Schultz, Director,
University of Houston Law Library

Michelle Wu began working at the University of Houston Law Library in the wake of flooding from Tropical Storm Allison in 2001. Some parts of the city had 14 feet of water and the library took in at least 8 feet. Law books on the lower level were underwater and the lingering humidity produced mold that destroyed much of the remaining collection.

Michelle Wu, Georgetown Law Library

“I wanted to create a model that would allow libraries to be able to preserve collections while respecting copyright in a world where natural disasters are a growing threat,” said Wu, now associate dean for library services and professor of law at the Georgetown Law Library in Washington, D.C. “Digitizing a collection and storing it under existing standards ensures that there is always a backed-up copy somewhere. During and after any disaster, the user would never lose access and the government would not have to reinvest to rebuild collections.” Controlled Digital Lending–the digital equivalent of traditional library lending–is a model that achieves these purposes.

For libraries with fewer resources, CDL can also be a tool to maximize public dollars and improve access. Once a library determines that its community no longer has a need for a certain CDL book (or as many copies as owned), the extra copies can be shared with libraries that never had access and would never have access without collaborative efforts.

“It’s a way of wealth sharing without much cost to communities,” Wu said. “Storage,
digitization, and system costs would have already been budgeted by the lending library, CDL requires no shipping costs to be paid by either party, and the lending library’s community won’t feel the loss of copies as local need has decreased.”

“It’s a way to build a more robust collection for all of us to use. It helps the community and
society at large in the long term,” said Wu. “That’s not something any of us can do alone. The only way we will do it is if we do it together.”

Internet Archive awarded grant from Arcadia Fund to digitize university press collections

Internet Archive has received a $1 million dollar grant from Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin – to digitize titles from university press collections to make them available via controlled digital lending.  The project, Unlocking University Press Books, will bring more than 15,000 titles online from university presses.  This project extends the successful pilot with MIT Press, which has already made more than 400 books available for digital learners around the world.

Today, for many learners, if a book isn’t digital or discoverable through a web search, it’s as if it doesn’t exist. Large-scale digitization projects have brought millions of books online, largely from the nineteenth and early twentieth centuries, but almost a century of knowledge still lives only on the printed page, inaccessible to scholars, journalists and online learners.

To bring important twentieth century scholarship online, the Internet Archive seeks partnerships with university presses to digitize their publications. These materials represent the preeminent scholarly output of research universities, presenting research and analysis of use to policymakers and scholars, and providing materials that help shape and inform a literate culture.

“Every online user should have access to a great digital library,” said Brewster Kahle, Digital Librarian of the Internet Archive, “We are grateful to Arcadia for their support of this project, which will make the unique research published by university presses available to even wider audiences.”

“We are very excited about this transformational program,” said Dean Smith, Director of Cornell University Press. “We take our mission as the nation’s first university press seriously—to make high-quality, peer-reviewed scholarship discoverable and accessible to the world. The Internet Archive is perfectly aligned with that mission and will greatly assist us in taking bold actions to unearth these titles and provide access options.”

To participate in the project, please complete our signup form.  Please contact Chris Freeland, Director of Open Libraries, at chrisfreeland@archive.org with additional questions.

Celebrating 100 million tasks (uploading and modifying archive.org content)

Just over 8-1/2 years ago, I wrote a multi-process daemon in PHP that we refer to as “catalogd”.  It runs 24 hours a day, 7 days a week, no rest!

It is in charge of uploading all content to our archive.org servers, and all changes to uploaded files.

We recently passed the 100 millionth “task” (upload or edit to an archive “item”).

After starting with a modest 100 or so tasks/day, we currently run nearly 100,000 tasks/day.  We’ve done some minor scaling, but of the most part, the little daemon has become our little daemon that could!

Here’s to the next 100 million tasks at archive.org!

-tracey

Open Library Buying e-Books from Publishers

The Internet Archive is on campaign to buy e-Books from publishers and authors; making more digital books available to readers who prefer using laptops, reading devices or library computers.  Publishers such as Smashwords, Cursor and A Book Apart have already contributed e-Books to OpenLibrary.org – offering niche titles and the works of best-selling “indy” authors including Amanda Hocking and J.A. Konrath.

“Libraries are our allies in creating the best range of discovery mechanisms for writers and readers—enabling open and browser-based lending through the OpenLibrary.org means more books for more readers, and we’re thrilled to do our part in achieving that.” – Richard Nash, founder of Cursor.

American libraries spend $3-4 billion a year on publisher’s materials.  OpenLibrary.org and its more than 150 partnering libraries around the US and the world are  leading the charge to increase their combined digital book catalog of 80,000+ (mostly 20th century) and 2 million+ older titles.

“As demand for e-Books increases, libraries are looking to purchase more titles to provide better access for their readers.” – Digital Librarian Brewster Kahle, Founder of the Internet Archive.

This new twist on the traditional lending model promises to increase e-book use and revenue for publishers. OpenLibrary.org offers an e-Book lending library and digitized copies of classics and older books as well as books in audio and DAISY formats for those qualified readers.