The Internet Archive has been partnering with libraries to digitize their collections for more than 15 years. Following a recent viral video featuring our book digitization efforts, and increased demands for e-resources, we’ve had renewed interest in our book scanning partnerships, with libraries wondering how we might be able to help them reach their patrons through digitization. Join scanning center managers Andrea Mills and Elizabeth MacLeod for a virtual event to learn about the ways in which the Internet Archive can help turn your print collections digital, and the impacts that these digital collections are having on remote learners.
Registration for the virtual event is free and open to the public. The live session is being offered twice to accommodate schedules and flexibility; if you are interested in joining, you only need to register for one session: March 24 @ 10am ET / 2pm GMT March 25 @ 1pm ET / 5pm GMT
The glass rises and falls. Quickly and efficiently, a woman turns the pages to the rhythmic beep of the cameras. She never misses a beat.
In its first 48 hours, this tweet about book scanning at the Internet Archive went viral, reaching 7.7 million people. More than 1.5 million people viewed the video, liking it 70,000 times and retweeting it 24,000 more. At the center of it all sits Eliza Zhang, a book scanner at the Internet Archive’s headquarters in San Francisco since 2010. When I asked Eliza what she likes about her job, she replied, “Everything! I find everything interesting. I don’t feel it is boring. Every collection is important to me.”
Eliza, a college graduate from southern China, immigrated to the United States in 2009, seeking a new life and new opportunities. She landed in San Francisco during the midst of an economy-crushing recession. But through a city program called JobsNOW, the Internet Archive hired Eliza and scores of other job seekers, training them to digitize, quality control, and upload metadata for books, newspapers, periodicals and manuals. Often our digitizing staff are making these analog texts available online for the first time.
Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.
If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.
Listening to 70s and 80s R & B while she works, Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? “Working with very old, fragile books. The paper is very thin. I always wear rubber fingertips and sometimes gloves when I scan newspapers, because of the ink,” she explained.
Tweets Spark a New Interest in Digitization
Eliza is one of about 70 Scribe operators at the Internet Archive, working in digitization centers embedded in libraries across the United States, United Kingdom, and Canada. The operations are led by Elizabeth MacLeod, who manages our remote operations, and Andrea Mills, who is stationed at the University of Toronto, with support from managers and operators in each center.
“We try to meet libraries where they are,” said MacLeod, who manages remote operations from her home office in North Carolina. “From digitizing a few shipments a year at one of our regional centers to setting up and staffing full-service digitization within the library itself, we have a flexible approach to our library partnerships.”
Across Twitter, another common question arose: “Why hasn’t this job been automated?” To many, the repetitive act of turning the pages in a book and photographing them seems like the natural task for a robot. In fact, some 20 years ago, we tested commercial book scanners that feature a vacuum-powered page-turning arm. It turns out those automated scanners didn’t really work well for brittle books, rare volumes, and other special collections—the kinds of material our library partners ask us to digitize.
“Clean, dry human hands are the best way to turn pages,” said Mills, from her socially-distanced office at the University of Toronto. In her 15 years on the job, she has worked with hundreds of librarians to hone our digitization operations, balancing our need to preserve the original pages with minimal impact during the imaging process. “Our goal is to handle the book once and to care for the original as we work with it,” Mills explained.
So what does it take to be a Scribe operator? “It takes a level of zen,” wrote Brewster Kahle, founder and digital librarian of the Internet Archive, responding to one of the many threads about the video that popped up on Reddit. “It takes concentration and a love of books. For those who love working with books and libraries, it fits well.”
As for the hardware used for digitization, like much at the Internet Archive, the equipment is engineered and purpose-built for the job. In the viral video, Eliza is operating the original Scribe machine, designed more than 15 years ago, and Scribe software that was developed in-house and refined continuously over years of operation. “The variation in books makes [automation] difficult to do quickly and without damage,” Kahle elaborates. “We do not disbind the books, which also makes automation more difficult.”
18,000 Books and Climbing
In the decade Eliza has been working with the Internet Archive, she has scanned more than 3 million pages, 14,000 foldouts, and 18,000 items (mostly books).
And what about all the sudden social media attention? Eliza shrugs. She’s never been on Twitter before. “My goal is to guarantee zero errors,” she said. “I want to give our readers a satisfying experience.”
Digitize With Us
The Covid-19 pandemic has both created higher demand for digital content as well as shuttered some of our scanning centers for health and safety. We have reopened following local and national health guidelines and continue to engage with new libraries on their digitization projects.
Bay State College’s Boston Campus has donated its entire undergraduate library to the Internet Archive so that the digital library can preserve and scan the books, while allowing Bay State to gain much needed open space for student collaboration. By donating and scanning its 11,000-volume collection centered on fashion, criminal justice, allied health, and business books, Bay State’s Boston campus decided to “flip entirely to digital.”
When it came to what to do with the books, Jessica Neave, librarian at Bay State College, had to get creative. “I didn’t have a library close by willing to take our collection,” Neave explained. Shortly after reaching out to our partners at Better World Books, she stumbled upon the Inside Higher Education article about the Marygrove College Library donation. This led Neave to our physical item donation form, where she laid out her library’s tight timeline to deaccession its entire print collection. “You guys made it so easy,” Bay State’s librarian said. “It couldn’t have been any easier!”
Under the direction of Neave, an Internet Archive team packed and shipped the 11,000 books in the first week of December.
Considering the future of Bay State’s books, its librarian is hopeful, noting, “Thanks to the Internet Archive, the books can live on as a cohesive collection.” Patrons can look forward to thumbing through historic fashion and textile books, texts on the history of the Civil Rights Movement, graphic novels, and even Bay State’s collection of historically banned young adult books.
Michelle Wu began working at the University of Houston Law Library in the wake of flooding from Tropical Storm Allison in 2001. Some parts of the city had 14 feet of water and the library took in at least 8 feet. Law books on the lower level were underwater and the lingering humidity produced mold that destroyed much of the remaining collection.
“I wanted to create a model that would allow libraries to be able to preserve collections while respecting copyright in a world where natural disasters are a growing threat,” said Wu, now associate dean for library services and professor of law at the Georgetown Law Library in Washington, D.C. “Digitizing a collection and storing it under existing standards ensures that there is always a backed-up copy somewhere. During and after any disaster, the user would never lose access and the government would not have to reinvest to rebuild collections.” Controlled Digital Lending–the digital equivalent of traditional library lending–is a model that achieves these purposes.
For libraries with fewer resources, CDL can also be a tool to maximize public dollars and improve access. Once a library determines that its community no longer has a need for a certain CDL book (or as many copies as owned), the extra copies can be shared with libraries that never had access and would never have access without collaborative efforts.
“It’s a way of wealth sharing without much cost to communities,” Wu said. “Storage, digitization, and system costs would have already been budgeted by the lending library, CDL requires no shipping costs to be paid by either party, and the lending library’s community won’t feel the loss of copies as local need has decreased.”
“It’s a way to build a more robust collection for all of us to use. It helps the community and society at large in the long term,” said Wu. “That’s not something any of us can do alone. The only way we will do it is if we do it together.”
Internet Archive has received a $1 million dollar grant from Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin – to digitize titles from university press collections to make them available via controlled digital lending. The project, Unlocking University Press Books, will bring more than 15,000 titles online from university presses. This project extends the successful pilot with MIT Press, which has already made more than 400 books available for digital learners around the world.
Today, for many learners, if a book isn’t digital or discoverable through a web search, it’s as if it doesn’t exist. Large-scale digitization projects have brought millions of books online, largely from the nineteenth and early twentieth centuries, but almost a century of knowledge still lives only on the printed page, inaccessible to scholars, journalists and online learners.
To bring important twentieth century scholarship online, the Internet Archive seeks partnerships with university presses to digitize their publications. These materials represent the preeminent scholarly output of research universities, presenting research and analysis of use to policymakers and scholars, and providing materials that help shape and inform a literate culture.
“Every online user should have access to a great digital library,” said Brewster Kahle, Digital Librarian of the Internet Archive, “We are grateful to Arcadia for their support of this project, which will make the unique research published by university presses available to even wider audiences.”
“We are very excited about this transformational program,” said Dean Smith, Director of Cornell University Press. “We take our mission as the nation’s first university press seriously—to make high-quality, peer-reviewed scholarship discoverable and accessible to the world. The Internet Archive is perfectly aligned with that mission and will greatly assist us in taking bold actions to unearth these titles and provide access options.”
Just over 8-1/2 years ago, I wrote a multi-process daemon in PHP that we refer to as “catalogd”. It runs 24 hours a day, 7 days a week, no rest!
It is in charge of uploading all content to our archive.org servers, and all changes to uploaded files.
We recently passed the 100 millionth “task” (upload or edit to an archive “item”).
After starting with a modest 100 or so tasks/day, we currently run nearly 100,000 tasks/day. We’ve done some minor scaling, but of the most part, the little daemon has become our little daemon that could!
Here’s to the next 100 million tasks at archive.org!
“Libraries are our allies in creating the best range of discovery mechanisms for writers and readers—enabling open and browser-based lending through the OpenLibrary.org means more books for more readers, and we’re thrilled to do our part in achieving that.” – Richard Nash, founder of Cursor.
American libraries spend $3-4 billion a year on publisher’s materials. OpenLibrary.org and its more than 150 partnering libraries around the US and the world are leading the charge to increase their combined digital book catalog of 80,000+ (mostly 20th century) and 2 million+ older titles.
“As demand for e-Books increases, libraries are looking to purchase more titles to provide better access for their readers.” – Digital Librarian Brewster Kahle, Founder of the Internet Archive.
Þorsteinn Hallgrímsson, formerly of the National Library of Iceland, had a big idea: digitize all Icelandic literature all the way to the current day and make it available to everyone interested in reading it. The Internet Archive was eager to be a part of this bold vision. I am in Iceland now, and because the financial crisis and Icelandic reaction to the US Department of Justice’s subpoenaing the tweets and Facebook account of a sitting member of the Icelandic Parliament, this project may have the momentum it needs to happen.
Ingibjörg Steinunn Sverrisdóttir, the National Librarian, and Katrín Jakobsdóttir, the Minister of Culture, met to discuss this possibility this week. I have met with several other ministers and parliamentarians in the last few days to discuss how this could be done.
The total literature of Iceland is under 50,000 books, which is easily scannable in 2 years by 12 people using the scribe scanners of the Internet Archive. David Lesperance, a lawyer from Canada who has helped support the Room to Read project, has offered to fundraise for this project; the Internet Archive has offered scanning technology, training, and backend software; and the Library has offered to administer the project. A digital lending system could be a way that they decide to limit access to a book to one person at a time in order to balance the interests of the writers and publishers while still having some access to everything from anywhere forever for free. Egill Helgason, of the Icelandic TV network, interviewed Brewster about this (photo below, video on the Archive).
If they decide to go ahead, Iceland could be the first country to have its complete literature go online. Fingers crossed.
The next step beyond this that is interesting to many here is to have Iceland become a “Switzerland of Bits,” where the laws will help protect the historical record from foreign or corporate danger. This is being promoted by Birgitta Jónsdóttir, a member of parliament. The Internet Archive works with many libraries around the world, and everyone wants to make sure that the digital copies are safe for the long term. Iceland is taking steps to be a good place for this.
As an aside, with all their inexpensive “green” electricity from their hydro electric and geothermal plants, I found it interesting that they are growing some vegetables under lights in the long winters as a way to become more self sufficient. With LED lights that can be tuned to produce specific wavelengths at different parts of the growth cycle, this approach could be a fairly energy efficient way to grow food for their people.
We are excited to see commercial books from many publishers being made available through web browser technology from Google eBookstore. As a standards based system, reading in a browser offers an opportunity for many more people to actively participate in the evolving digital book ecology.
The advantage of “books in browsers” over dedicated devices and even app store-based selling is that books can come from any website, read on many more devices, and be findable with standard search technologies.
The Google eBook Reader
Buying books that are delivered in a browser is now being demonstrated on a massive scale by Google. This is great news as it shows that the security measures offered are good enough for commercial players.
Lending books through a browser that recreates the traditional library-check-out system was demonstrated at the Books in Browsers 2010 summit at the Internet Archive. Lending and vending of books using browsers can pave the way for many winners:
Authors can find wider distribution for their work. Publishers both big and small can now distribute books directly to readers. Book sellers can find new and larger audiences for their products. Device makers can offer access to millions of books instantly. Libraries can continue to loan books in the way that patrons expect. Readers could start to get universal access to all knowledge.
I am especially excited to see the possibilities in platform independent social reading and beautifully designed ebooks that could come from browser based books.