The goal of the News Measures Research Project is to examine the health of local community news by analyzing the amount and type of local news coverage in a sample of community. In order to generate a random and unbiased sample of communities, the team used US Census data. Prior research suggested that average income in a community is correlated with the amount of local news coverage; thus the team decided to focus on three different income brackets (high, medium and low) using the Census data to break up the communities into categories. Rural areas and major cities were eliminated from the sample in order to reduce the number of outliers; this left a list of 1,559 communities ranging in population from 20,000 to 300,000 and in average household income from $21,000 to $215,000. Next, a random sample of 100 communities was selected, and a rigorous search process was applied to build a list of 663 news outlets that cover local news in those communities (based on Web searches and established directories such as Cision).
The News Measures Research Project web captures provide a unique snapshot of local news in the United States. The work is focused on analyzing the nature of local news coverage at a local level, while also examining the broader nature of local community news. At the local level, the 100 community sample provides a way to look at the nature of local news coverage. Next, a team of coders analyzed content on the archived web pages to assess what is being covered by a given news outlet. Often, the websites that serve a local community are simply aggregating content from other outlets, rather than providing unique content. The research team was most interested in understanding the degree to which local news outlets are actually reporting on topics that are pertinent to a given community (e.g. local politics). At the global level, the team looked at interaction between community news websites (e.g. sharing of content) as well as automated measures of the amount of coverage.
The primary data for the researchers was the archived local community news data, but in addition, the team worked with census data to aggregate other measures such as circulation data for newspapers. These data allowed the team to examine the amount and type of local news changes depending on the characteristics of the community. Because the team was using multiple datasets, the Web data is just one part of the puzzle. TheWAT data format proved particularly useful for the team in this regard. Using the WAT file format allowed the team to avoid digging deeply into the data – rather, the WAT data allowed the team to examine high level structure without needing to examine the content of each and every WARC record. Down the road, the WARC data allows for a deeper dive, but the lighter metadata format of the WAT files has enabled early analysis.
The Internet Archive is hosting a FOIAPOLOOZA to celebrate Aaron Swartz and to provide a yearly showcase of his many interests. Aaron’s work focused on civic awareness and activism and we will spend Saturday together keeping his prescient vision alive.
Doors are open for the hackathon and the daytime programming on Saturday at 10 am. The reception will be on Saturday evening at 6:00 pm with the main program starting at 8 pm with a music and dance party afterward.
FOIAPALOOZA: Aaron filed many FOIA requests and inspired lots of journalists, including the now-legendary Jason Leopold, to use them as a tool for evidence-based journalism. So we decided to focus on FOIA and public records requests at this year’s San Francisco event. We aim to not only teach folks how to file their own requests but also to let them dig into the information we have received back from the 200+ requests we filed this last year with our Police Surveillance Project. FOIAPALOOZA speakers include Tracy Rosenberg & Mike Katz-McCabe from Oakland Privacy — an organization that just won an EFF Pioneer Award! — as well as Freddy Martinez from Lucy Parsons Labs, Ryan Shapiro from Property of the People and Brewster Kahle and Tracey Jaquith from the Internet Archive.
Announced today, Better World Books, the world’s leading socially conscious online bookseller, is now owned by Better World Libraries, a mission-aligned, not-for-profit organization that is affiliated with longtime partner, the Internet Archive. This groundbreaking partnership will allow both organizations to pursue their collective mission of making knowledge universally accessible to readers everywhere. This new relationship will provide additional resources and newfound synergies backed by a shared enthusiasm for advancing global literacy. Together, the two organizations are expanding the digital frontier of book preservation to ensure books are accessible to all for generations to come.
This new relationship will allow Better World Books to provide a steady stream of books to be digitized by the Internet Archive, thereby growing its digital holdings to millions of books. Libraries that work alongside Better World Books will now make a bigger impact than ever. Any book that does not yet exist in digital form will go into a pipeline for future digitization, preservation and access. According to Brewster Kahle, Founder and Digital Librarian at the Internet Archive, “The Better World Books origin story is inspiring, and the service they provide to libraries is invaluable. These are our kind of people. We share their values, and we are proud to partner with Better World Books and libraries around the world to promote the goal of universal access to all knowledge.”
Better World Books was born in 2003, when a group of recent college graduates sold their used textbooks online. Their success eventually led to the creation of a revolutionary new business model where used books are collected primarily from libraries, booksellers, colleges, and universities in six countries and then are either resold online, donated or recycled. To date, Better World Books has donated almost 27 million books worldwide, has raised close to $29 million for libraries and literacy, and has saved more than 326 million books from landfills. With the backing of the not-for-profit Better World Libraries, Better World Books will enhance these valuable services to libraries and readers.
According to Jim Michalko, former president of The Research Libraries Group, Inc., and a Better World Books board member, “This new relationship is a win for the library community. One of the biggest challenges facing libraries today is responsibly removing materials from their shelves so they can bring in more desirable materials or repurpose space to fit community needs. Better World Books has always been a trusted partner in this activity. Now, libraries can provide books to Better World Books knowing that a digital copy will be created and preserved if one doesn’t yet exist. That’s responsible collection management.”
The Internet Archive has long been committed to digitizing books and library materials so they can be accessed by users all over the world. Through digitization, these materials can be used by researchers in large-scale, data-driven computing investigations, preserved in both digital and physical form, and where appropriate, loaned to readers.
Dustin Holland, the newly appointed President and CEO of Better World Books, underscores, “We exist to make a difference in the world, and our customers make that possible. We are honored to join the Internet Archive family, and our partnership allows us to extract the maximum value out of every book we collect at scale, while continuing to delight readers all over the world.”
Better World Books remains the bookstore that customers know and love, with its operations now enhanced for the services it offers to libraries and readers alike. Shop with confidence and help us celebrate this exciting new partnership with a 10% discount on purchases made through the end of the year using discount code “BWBIA”. At checkout, you will have an opportunity to round up your purchase amount to benefit the Internet Archive and directly support our mission.
For more about the Internet Archive click here. For more about Better World Books click here.
The following blog post was written by freelance writer Caralee Adams about the Internet Archive’s Library Leaders Forum, held on October 23 at San Francisco Public Library.
As enthusiasm grows for making library collections more accessible, the Internet Archive hosted an event to build a community of practice around Controlled Digital Lending (CDL). A diverse group gathered for the 2019 Library Leaders Forum Oct. 23 to share stories and strategies for libraries to expand their reach by lending out digital books based on their physical collections.
Why is this important?
“At the Internet Archive, we have a strong belief that everyone deserves to learn. We want to offer up the greatest digital library the Internet has ever seen to the world for free,” said Chris Freeland, Director of the Internet Archives’ Open Libraries program. “We think that everyone, regardless of where they live, should have ready access to a great library. More importantly, we think it should be available on phones and mobile devices that people turn to today. We want to make sure they have access to vetted, trusted information that’s held in libraries.”
The mantra of CDL: “Own one, loan one.” The idea is that a library can make a choice of lending either a physical copy or a digital version of a book.
The Internet Archive has been doing CDL since 2011, beginning with the Boston Public Library. Now two dozen other libraries of all sizes in the U.S. and Canada have embraced the model. Librarians from some of those institutions spoke about their passion for the practice at the forum.
The meeting provided an overview of the legal issues, policy considerations, and examples of CDL in action. The appeal to library leaders gathered was to endorse CDL, join Open Libraries, donate books to the Internet Archive for scanning, and volunteer to help with a new serials project.
Helping libraries see what’s possible
Michael Lambert, City Librarian at the San Francisco Public Library, which hosted the event, shared his institution’s experience as an early partner with the Internet Archive on Open Libraries and CDL. Beginning with city government documents and historical materials, SFPL created an entire scanning department. To date, the library has digitized 13,000 books and documents with the Internet Archive, which have received over 7.5 million views. Since November 2018, SFPL has donated 30,000 copyrighted books to the Internet Archive as part of its community distribution program.
“Having this alternative virtual lending site as an option has been great,” Lambert said. ”Librarians have been able to confidently weed excess, outdated materials from our collection, secure in knowledge that the books will not disappear, but rather have a new life where people around the world can read and research the materials that SFPL has meticulously collected over the decades.”
The Internet Archive embodies library values: persistence, comprehensiveness and accessibility, said Lambert. “The Archive has become a crucial part of the broad library information eco-system,” he said. “They have provided examples that have challenged traditional libraries. The Internet Archive helps other libraries see what’s possible.”
What Internet Archive Founder Brewster Kahle hopes is possible is digitization will allow more online sources to be linked to books, providing people trust information.
“If Wikipedia is the encyclopedia of the Internet, we are trying to build the library of the Internet,” Kahle explained at the forum. “Let’s make it really easy for people to go deeper.”
So far, the Internet Archive has turned 122,000 references on Wikipedia to digitized book links through its online library. Still, a century of books is missing after 1923 because of copyright laws. Kahle called on libraries to help fill that gap.
As part of that strategy, the Internet Archive is trying to institutionalize CDL, a practice that has been successfully working in a handful of libraries for eight years with no negative pushback. Yet, it has not been widely embraced. Kahle appealed to libraries to endorse CDL and donate books for scanning to address the larger goal of universal access to knowledge.
Framing the approach
The forum hosted experts to explain the legal underpinnings of CDL and discuss how the concept fits into the overall push to level the playing field for access to information.
Lila Bailey of the Internet Archive moderated a conversation with Kyle Courtney, Copyright Advisor at Harvard University, David Hansen, Associate University Librarian at Duke University, and Michelle Wu, Associate Dean for Library Services and Professor of Law at the Georgetown Law Library in Washington, D.C.
They have written a paper spelling out how libraries can practice CDL within the confines the fair use doctrine in current copyright law. Copyright law established in 1976 and dating back to 1950 does not reflect the digital reality today and it should allow flexibility for libraries to lend out one book at a time – no matter what the format – digital or print, they maintain.
To garner broad support for the concept of CDL, John Bergmayer of the nonprofit, Public Knowledge, spoke about the need to build relationships with lawmakers and educate them on the issue. This summer, he led a group engaged in CDL to The Hill in Washington, D.C. to brief members of Congress and their aides on the importance of expanding access to library materials through CDL.
“You have to make a project matter to the politicians,” explained Bergmayer. In the case of CDL, it’s about outlining the benefits of providing access to rural patrons, protecting materials from damage from disasters, saving libraries money, and helping K-12 school libraries, among others. “You want to get people to do the right thing for their reasons, not your reason — and show how your issue affects voters.”
Heather Joseph, Executive Director of the Scholarly Publishing & Academic Resources Coalition (SPARC), said CDL fits into the larger open agenda that advocates for unrestricted access to research. “It’s a vision based on opportunity,” said Joseph. “An old tradition and a new technology have converged to make possible an unprecedented public good.”
Now more than ever, in an era of “fake news,” and “alternative facts,” free, immediate access to high-quality vetted, source material is crucial for scholars, scientists, students, journalists, policymakers – everyone, she said.
“CDL is a pragmatic, incremental step towards open that operates in a way that’s respectful of libraries current operations and of copyright. It moves the needle towards open,” said Joseph. “CDL can contribute to collective movement towards a full vision of open access to knowledge.”
Opening Doors for Students
Making digital books more widely available to students has the potential for remedying inequities in education. Nationwide, public school districts have lost 20 percent of their libraries and librarians in recent years. Lisa Petrides, founder of the non-profit Institute for the Study of Knowledge Management in Education, has embraced CDL as a model to build a Universal School Library (USL) and connect students – particularly from under-resources schools — to relevant materials that increasingly are digital.
“CDL holds the potential to broaden access to knowledge in public schools in a way that schools haven’t even begun to tap,” said Petrides, who is trying to curate an inclusive collection of 15,000 high-quality digitized books. “We are taking an equity lens in terms of diversity.”
The Detroit School of Arts will be piloting USL and Librarian Karen Lemmons said she was excited to be able to offer her high school students books they can access while they are on the go. “This might give them an opportunity to read in between practices. They can pull out their phone and read a few pages. It’s mobile and flexible,” said Lemmon, noting that reading is closely linked to student achievement. “Our students really do want to be the best.”
Lemmons said she wants to be a model for other urban schools. “We want to be a driving force to get other libraries involved,” said Lemmons. “This is a data-driven district and we will need data to show reading more makes a difference in student performance.”
When the prestigious Phillips Academy in Andover, Massachusetts, recently was doing a $20 million renovation to its library, the Internet Archive approached it about digitizing their collection. The library already had its books packed on pallets, but instead of storing them decided to have them all scanned, explained Michael Barker, Director of Academy Research, Information and Library Services.
“We had this very well-intentioned idea to create a space for learners of the 21st century. It’s all good. It is a space of immense privilege. But it takes a vision to think well beyond our campus to say that belongs to every learner. That opportunity is to digitize the entire collection – that’s why we are all in,” said Barker of the school’s decision to participate in CDL “It goes to the heart of what Phillips was founded on. This school is for youth from every quarter and we try to live out that ideal as a private school for a public purpose.”
Next, Barker said he would like to see peer prep schools join the CDL model to further expand access to schools without the same resources.
CDL in Action
As the first library to use the CDL approach, the Boston Public Library recently extended its offerings by scanning its historic Alice Jordan Collection of 250,000 children’s books that were in storage. It has also digitized city directories, cookbooks and other fact-based documents in its catalog. Recently, it got permission from Boston-based publisher Houghton Mifflin to digitize its entire trade collection that is housed at BPL.
Expanding its CDL involvement, BPL’s Tom Blake challenged participants to bring another partner library next year to the forum.
“This the first time, I feel like it’s less about digitization and scanning and more about us, as librarians, leveraging not just our collections, but our historical collection policies with each other,” said Blake, who has been attending the library leaders forum for 10 years.
In discussing how to improve the CDL process, meeting participants suggested adjusting the amount of time users checked out titles and allowing for short-term loans. Perhaps smarter return and wait-list notifications could be developed to encourage faster processing of books. Others said re-branding Digital Rights Management (DRM) software with a different moniker to that would be more appealing to librarians.
In Sonoma County, California, Geoffrey Skinner said its 14 public library branches have just starting to participate in CDL. It first scanned documents in the history and genealogy library, then digitized its specialized wine library.
“We are doing a massive weed of our closed stacks. By taking those material to the Internet Archive, we will have digital access back,” said Skinner. Having library materials online will benefit many of the county’s rural users who otherwise travel far to access the physical books and provide access for print-disabled patrons.
Justin Gardner, Special Collections Librarian at the American Printing House for the Blind in Louisville, Kentucky, said digitizing 9,000 books in its collection has preserved rare and fragile documents, including books autographed by Helen Keller. Also, being located in Kentucky, it gives people interested in their materials from anywhere.
“We are becoming the go-to place for visual impairment materials,” said Gardner. Now these research documents are in an accessible form for people who have visual impairments and have never been able to read these materials before they were digitized.
At the forum, Mike Buschman of the Washington State Library announced that the Chief Officers of State Library Agencies (COSLA) voted to endorse CDL. “It feels like it’s entering a new, good phase – a traction phase,” he said.
Kahle emphasized the need for CDL to be a community project and build a deeper collection. “We have to brave up,” he said. “We just act in good faith. We aren’t pirates. We are trying to do the right thing.”
Chief Librarian and CEO at the Hamilton Public Library in Canada Paul Takala said his institution is an enthusiastic supporter of CDL. With a long history of innovation, moving forward with digitizing is the right move – despite the technical challenges – to make information more accessible to patrons, he said.
“Deeper collaboration is needed. It’s hard to get adequate resources,” said Takala. “As a library community, we are generally risk adverse. When we talk about CDL, I think we need to take a more balanced view….If we make what’s available in our community to other communities – and others make their collections available – then everyone wins.”
Dale Askey, Vice Provost at the University of Alberta, said he liked Takala’s challenge to pull more Canadian institutions past their risk aversion to embrace CDL. “It’s great to see people aligning behind these principles and taking this to scale,” said Askey, whose university has scanned an historic collection of education materials with zero negative impact. “There is a strong history and impulse at the university to do things with maximum benefit to the largest possible community.”
Princeton Theological Seminary is piloting CDL and it has created a secure area in its library for the physical collection, so that when a digital copy is checked out that the physical copy will reside there. Participating the program has great potential benefits for the seminary’s reach, according to Managing Director of the Library Evelyn Frangakis.
“The PTS comprehensive theological collection is in high demand and the CDL library allows increased accessibility to all users, including those with various print disabilities,” said Frangakis. “I think CLD is gaining momentum. That’s really heartening for broad access to the materials that we are able to contribute to this program. It’s going to continue to grow.”
Ross Mounce, Director of Open Access Programmes at Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin in London, said he was encouraged by participation in the forum and said action points were clear and institutions can choose their level of engagement.
“It’s nice seeing things moving forward. At the end of the day, it just makes sense,” said Mounce of CDL. “If you own a physical copy of a book, you should be able to loan a digital version of it. Libraries should be able to lend books.”
Added Wu of Georgetown: “I’m delighted there has been a lot more buy in in recent years. The voices and the participants are much more diverse. Libraries [like Phillips] are willing to go all in and that’s remarkable. It is true that if we get more of those, I think we will see a true movement across the nation.”
As part of the many releases and announcements for our October Annual Event, we created The Whole Earth Web Archive. The Whole Earth Web Archive (WEWA) is a proof-of-concept to explore ways to improve access to the archived websites of underrepresented nations around the world. Starting with a sample set of 50 small nations we extracted their archived web content from the Internet Archive’s web archive, built special search and access features on top of this subcollection, and created a dedicated discovery portal for searching and browsing. Further work will focus on improving IA’s harvesting of the national webs of these and other underrepresented countries as well as expanding collaborations with libraries and heritage organizations within these countries, and via international organizations, to contribute technical capacity to local experts who can identify websites of value that document the lives and activities of their citizens.
Archived materials from the web play an increasingly necessary role in representation, evidence, historical documentation, and accountability. However, the web’s scale is vast, it changes and disappears quickly, and it requires significant infrastructure and expertise to collect and make permanently accessible. Thus, the community of National Libraries and Governments preserving the web remains overwhelmingly represented by well-resourced institutions from Europe and North America. We hope the WEWA project helps provide enhanced access to archived material otherwise hard to find and browse in the massive 20+ petabytes of the Wayback Machine. More importantly, we hope the project provokes a broader reflection upon the lack of national diversity in institutions collecting the web and also spurs collective action towards diminishing the overrepresentation of “first world” nations and peoples in the overall global web archive.
As with prior special projects by the Web Archiving & Data Services team, such as GifCities (search engine for animated Gifs from the Geocities web collection) or Military Industrial Powerpoint Complex (ebooks of Powerpoints from the archive of the .mil (military) web domain), the project builds on our exploratory work to provide improved access to valuable subsets of the web archive. While our Archive-It service gives curators the tools to build special collections of the web, we also work to build unique collections from the pre-existing global web archive.
The preliminary set of countries in WEWA were determined by selecting the 50 “smallest” countries as measured by number of websites registered on their national web domain (aka ccTLD) — a somewhat arbitrary measurement, we acknowledge. The underlying search index is based on internally-developed tools for search of both text and media. Indices are built from features like page titles or descriptive hyperlinks from other pages, with relevance ranking boosted by criteria such as number of inbound links and popularity and include a temporal dimension to account for the historicity of web archives. Additional technical information on search engineering can be found in “Exploring Web Archives Through Temporal Anchor Texts.”
We intend both to do more targeted, high-quality archiving of these and other smaller national webs and also have undertaking active outreach to national and heritage institutions in these nations, and to related international organizations, to ensure this work is guided by broader community input. If you are interested in contributing to this effort or have any questions, feel free to email us at webservices [at] archive [dot] org. Thanks for browsing the WEWA!
On Tuesday, the Internet Archive joined Public Knowledge, the Wikimedia Foundation and the Samuelson Law, Technology and Public Policy Clinic from Berkeley Law to brief the Congressional Internet Caucus on efforts to combat misinformation online. Misinformation is a complex issue but one of the root causes is a lack of easy, reliable ways for Internet users to distinguish good information from bad, or authoritative sources from propaganda. The panel highlighted our recent work to weave books into Wikipedia articles, giving users the ability to dig deeper and fact check assertions in just one click.
The Internet Archive has transformed 130,000 references to books in Wikipedia into live links to 50,000 digitized Internet Archive books in several Wikipedia language editions including English, Greek, and Arabic. And we are just getting started. By working with Wikipedia communities and scanning more books, both users and robots will link many more book references directly into Internet Archive books. In these cases, diving deeper into a subject will be a single click.
“I want this,” said Brewster Kahle’s neighbor Carmen Steele, age 15, “at school I am allowed to start with Wikipedia, but I need to quote the original books. This allows me to do this even in the middle of the night.”
For example, the Wikipedia article on Martin Luther King, Jr cites the book To Redeem the Soul of America, by Adam Fairclough. That citation now links directly to page 299 inside the digital version of the book provided by the Internet Archive. There are 66 cited and linked books on that article alone.
Readers can see a couple of pages to preview the book and, if they want to read further, they can borrow the digital copy using Controlled Digital Lending in a way that’s analogous to how they borrow physical books from their local library.
“What has been written in books over many centuries is critical to informing a generation of digital learners,” said Brewster Kahle, Digital Librarian of the Internet Archive. “We hope to connect readers with books by weaving books into the fabric of the web itself, starting with Wikipedia.”
You can help accelerate these efforts by sponsoring books or funding the effort. It costs the Internet Archive about $20 to digitize and preserve a physical book in order to bring it to Internet readers. The goal is to bring another 4 million important books online over the next several years. Please donate or contact us to help with this project.
“Together we can achieve Universal Access to All Knowledge,” said Mark Graham, Director of the Internet Archive’s Wayback Machine. “One linked book, paper, web page, news article, music file, video and image at a time.”
Every day hundreds of millions of web pages are archived to the Internet Archive’s Wayback Machine. Tens of millions of them submitted by users like you using our Save Page Now service. You can now do that in a way that is easier, faster and better than ever before.
Save Page Now (SPN) just got a major upgrade as a result of a total code rewrite, adding a slew of new and awesome features, with more on the way.
Let’s explore what’s new with Save Page Now
You can now save all the “outlinks” of a web page with a single click. By selecting the “save outlinks” checkbox you can save the requested page (and all the embedded resources that make up that page) and also all linked pages (and all the embedded resources that make up those pages). Often, a request to archive a single web page, with outlinks, will cause us to archive hundreds of URLs. Every one of which is shown via the SPN interface as it is archived.
When users are logged in with their free Archive.org account, SPN-generated archives can be saved to that user’s “My web archive” public gallery of archived pages.
Have you ever wanted to archive all the web pages linked from an email message? Well, you are in luck because now you can forward that email to “email@example.com” and after a few minutes you will get an email back filled with Wayback Machine playback URLs.
Some of you might like the new “First capture” badge you will see if any of the URLs you submit to be archived (including outlinked URLs and URLs included in emails) have not been archived yet. And, yes, for those of you who are feeling competitive, we are planning to launch a “leader board” soon. Let the games begin!
Maybe you want the URLs embedded in a web-based PDF file, RSS feed, or JSON file archived. The new SPN will parse those files and archive all the URLs they contain. To use this feature, simply submit PDF/RSS or JSON URLs to SPN, and don’t forget to select the “capture outlinks” checkbox.
This new version of SPN is also being used as the back-end support for a number of Wayback Machine services, including the iOS and Android apps as well as the Chrome, Firefox and Safari browser extensions. And, in case you wondered, those apps and extensions will also be getting major updates very soon.
And, yes, of course SPN has a brand new API that you can use to automate a range of Web archiving projects. Please write to us at firstname.lastname@example.org if you would like to learn more about the API.
We have often gotten requests to archive URLs from a Google Sheet. We now support that feature for authorised users. Please write to us for access to this advanced capability at email@example.com.
We LOVE hearing about ways we can make the Wayback Machine better. In fact most of these new SPN features started with your user suggestions.
Please let us know what you think. Good, bad, or otherwise. Who knows, the next cool SPN feature might be invented by you!
And remember, “If you see something, save something!”
Announced today, Phillips Academy has received the Hero Award from the Internet Archive for its leadership in adopting controlled digital lending for school libraries. The Hero Award is presented annually to an organization that exhibits leadership in making its holdings available to digital learners all over the world, and when Phillips Academy was renovating its Oliver Wendell Holmes Library, librarian Michael Barker wanted to update more than the physical space. This was also an opportunity to bring the private preparatory high school up to speed digitally – and in the process, share its vast book collection with others.
Barker, Director of Academy Research, Information and Library Services, has embraced Controlled Digital Lending (CDL), where a library digitizes a book it owns and lends out one secured digital version to one user at a time. In this case, the Andover, Massachusetts school owns 80,000 books.
“With the closure of so many high school libraries, this allows us to share the collection we’ve built up over 100 years with all other high schools,” Barker said. “I can’t think of any better way the library could contribute its private resources for a public purpose.”
Phillips, which has roughly 1,100 students in grades 9-12, has been active in the Digital Public Library of America. It has already digitized about 4,000 of its titles published prior to 1923.
With all the books already boxed up for the renovation, the school’s decision to expand its CDL project was clear: “There would never be a better time than now,” Barker said. This summer it shipped most of the remaining volumes to be digitized by Internet Archive at its scanning facility in the Philippines.
Sharing the cost of scanning and shipping with Internet Archive was critical to the digitization process happening, said Barker. The books are expected back early in 2020 and will be placed back on library shelves over spring break.
Rather than most books being on display, the renovated Phillips library includes more open space for collaboration. It was last updated in 1987 and was not wired for a world that included the Internet. Renovations began in early 2018 and the newly updated facility opened to students this fall.
Originally designed like a “book fortress,” Barker said the center of the library now has room for students to study together while some books are on shelves around the periphery. Most books are now in the attic and basement where they can be called up to lending.
“One local benefit of CDL is that students don’t necessarily need to call the book from the attic. With a digital version there is no delay in getting the book,” Barker said.
As Barker awaits the return of the book collection from the Philippines, he is tracking the shipment (which went on two separate ships and was insured). In the meantime, Phillips is preparing to share the news of its vast collection becoming open to students everywhere. Barker is excited to offer the school’s resources openly and said it’s particularly timely as school library budgets are being cut, making it hard for libraries to fulfill their mission.
“The truth of the matter is that some schools don’t have libraries anymore,” Barker said. “If other schools like us got involved in CDL in the same way and shared their copies, many public schools would not have to worry about their students having access to collections in the same way they might be doing now. I encourage others to explore it and jump in. It seems like it can only get stronger the more libraries that join.”
NOTE: Come meet Mike Barker and learn more about Phillips Academy when he speaks at Internet Archive’s World Night Market, Wednesday 10/23 from 5-10 PM. Tickets available here.
Imagine if your favorite song or nostalgic recording from childhood was lost forever. This could be the fate of hundreds of thousands of audio files stored on vinyl, except that the Internet Archive is now expanding its digitization project to include LPs.
Unfortunately, many of these audio files were never translated into digital formats and are therefore locked in their physical recording. In order to prevent them from disappearing forever when the vinyl is broken, warped, or lost, the Internet Archive is digitizing these at-risk recordings so that they will remain accessible for future listeners.
“The LP was our primary musical medium for over a generation. From Elvis, to the Beatles, to the Clash, the LP was witness to the birth of both Rock & Roll and Punk Rock. It was integral to our culture from the 1950s to the 1980s and is important for us to preserve for future generations.”
– CR Saikley, Director of Special Projects, Internet Archive
Since all of the information on an LP is printed, the digitization process must begin by cataloging data. High-resolution scans are taken of the cover art, the disc itself and any inserts or accompanying materials. The record label, year recorded, track list and other metadata are supplemented and cross-checked against various external databases.
“We’re really trying to capture everything about this artifact, this piece of media. As an archivist, that’s what we want to represent, the fullness of this physical object.”
– Derek Fukumori, Internet Archive Engineer
Once cataloged, the LP’s are then digitized. The Internet Archive partners with Innodata Knowledge Services, an organization focused on machine learning and digital data transformation, to complete the digitization process at their facilities in Cebu, Philippines. An Innodata worker digitizes 12 LPs at a time, setting turntables to play and record by hand, then turning each record over to the next side. Since each LP is digitized in real time, it takes a full 20 minutes to record an average LP side. By operating 12 turntables simultaneously, the team expects to be able to digitize ten LPs per hour.
Once recorded, there is a large FLAC file for each side of the LP, which needs to be segmented so listeners can easily begin at the desired song. There are two different algorithms used for segmenting; the first one looks at images of the vinyl disc to locate gaps in its grooves, which usually line up with gaps between songs. A second algorithm listens to the audio file to find the silent spaces between songs. When these two algorithms align, our engineers have a good measure of confidence that the machine has found the proper tracks.
These algorithms currently predict segmenting with about 80% accuracy, but some audio files are more difficult. For example, recordings of live music fill in the spaces between songs with applause, while classical music utilizes silence as part of a song. In order to account for these anomalies, digitized LP files are always checked manually before being added to the online database.
Currently, there are more than 900 LPs from the Boston Public Library LP collection available on Archive.org. The Internet Archive continues to digitize the remainder of the BPL collection in addition to more than 285,000 LPs that have been donated by others. The organization aims to engage a greater community of LP and 78 rpm enthusiasts by welcoming contributions and improvements to the recorded metadata. Many of the audio files online can be listened to in full, but some of the albums are only available in 30 second snippets due to rights issues.
For decades, vinyl records were the dominant storage medium for every type of music and are ingrained in the memories and culture of several generations. Despite the challenges, the Internet Archive is determined to preserve these at-risk records so that they can be heard online by new audiences of scholars, researchers, and music lovers around the world.
ABOUT THE AUTHOR: Faye Lessler is a California-born, Brooklyn-based freelance writer and founder of lifestyle blog, Sustaining Life. She is an expert in mission-driven communications and enjoys writing while sipping black tea in a beam of sunshine.