Category Archives: News

What Information Should we be Preserving in Filecoin?

The folks at Protocol Labs love their rockets. And outerspace. And exploration.

So when Filecoin, their cryptocurrency-fueled decentralized storage network launched recently, it was no surprise they called it Filecoin Liftoff. In the payload of that Filecoin rocket are treasures from the Internet Archive:

For 15 years, LibriVox has harnessed a global army of volunteers, creating 14,200 free public domain audiobook projects in 100 different languages. Where else can you listen to Jules Verne’s 20,000 Leagues Under the Sea in French, Spanish, English, German or Dutch…for free? Now, phrases of Shakespeare, Poe, Joyce and Dante will be stored across the Filecoin mainnet, broken into packets to be reconstituted when needed—perhaps in a new century.

The same destiny awaits the home movies, stock footage, educational and amateur films in the public domain, lovingly curated by the Prelinger Archives founder, Rick Prelinger. He encourages creatives to download and reuse these videos, creating countless new works like this one by musician Jordan Paul:

Now filmmakers and connoisseurs can sleep easier, knowing that a new, distributed copy of those films lives in the Filecoin network, (along with the main copy and multiple backups in the Internet Archive’s repositories.)

So what’s next Filecoin explorers?

Today, Protocol Labs and the Internet Archive are happy to announce the Filecoin Archives, a new community project to curate, disseminate and preserve important open access information often at risk of being lost. You can get involved in so many ways: by nominating information to be stored, uploading it to the Internet Archive, preserving the data as a Filecoin node while earning Filecoin for sharing your storage capacity.

What information should we be preserving? Please tell us!

How about 166,000 public domain books (60 terabytes) from the Library of Congress? Including 2100 texts about Abraham Lincoln and slavery?

Or Open Access Journal articles? (The Internet Archive has collected 9.1 million of them.)

It takes a host of global voices with diverse viewpoints to ensure that humanity’s most precious knowledge is represented online and preserved. So we need to hear from you. What open access information or datasets are you interested in preserving?

Between now and November 5, please send us your ideas and vote on the others. We will gather your suggestions, add our own, and publish the list from which we will select information to preserve across a global network of Filecoin nodes.

How to send us your suggestions 

Look for the tweet from @JuanBenet– reply to it with:

  • The Name of the Dataset.
  • The size in GB or TB.
  • An HTTP or @IPFS link to the data.
  • Why it matters.
  • #FilecoinArchives

Bonus points if the data is already stored in the Internet Archive or if you upload it there. Vote for ideas by retweeting them and please help us spread the word!

Juan Benet presents his early vision at the 2016 Decentralized Web Summit at the Internet Archive in San Francisco.

In 2015, a young developer named Juan Benet wandered into the Internet Archive headquarters. He painted a picture of a decentralized stack, something he now calls Web3, where the storage, transport and other layers would be distributed across many machines. Together with the DWeb community, we have imagined a web with our values written into the code: values such as privacy, security, reliability, and control over one’s own identity.  With the launch of Filecoin’s mainnet, a piece of that new web is perhaps within reach. 

Now it’s up to us to make sure the payload includes humanity’s most important knowledge.

Library Leaders Forum Explores Impact of Controlled Digital Lending

The third and final session of the 2020 Library Leaders Forum wrapped up Tuesday with a focus on the impact of Controlled Digital Lending on communities to provide broader access to knowledge. A full recording of the session is now available online.

Michelle Wu was honored with the Internet Archive Hero Award for her vision in developing the legal concept behind CDL. In her remarks, the attorney and law librarian shared her thoughts on the development and future of the lending practice. Wu does not see the theory that she designed 20 years ago as revolutionary, but rather a logical application of copyright law that allows libraries to fulfill their mission.

Despite current legal challenges, Wu predicts CDL can continue if libraries make themselves and their users heard.

“We must make sure that the public interests served are fully described, visible and clear to lawmakers and courts at the time they make their decisions,” Wu said. “If we do that, I believe the public interest will prevail and CDL will survive.”

The pandemic has underscored the need for digital access to materials and changed attitudes about CDL among libraries that had previously been risk averse to the practice, Wu said.  

“The closing of our libraries due to COVID has changed that mindset permanently,” Wu said. “It showed how the desire to avoid risk resulted in the actual and widespread harm to populations, depriving them of content at a time when access was more important than ever.”

Because of the pandemic, libraries are now empowered to try innovative practices to serve their patrons.

“With this new heightened awareness, I think the future of access is brighter,” Wu said. “Not only do I think CDL will flourish, but there seems to be very real chance that libraries will more aggressively fight to regain some of the public interest benefits of copyright that they’ve lost over the years.”

In the future, Wu maintained that CDL can ensure a balance for full and equal access to knowledge for every person.

“Reliable access to information is the great equalizer,” Wu said. “Information shapes each of us, and lack of it is part of what increases our divide.”

(A complete profile of Wu’s work can be found here.)

The event also included the virtual ribbon cutting ceremony announcing the reopening of the Marygrove College Library. The Internet Library now houses its 70,000-volume library online, and has preserved the physical copies, after the institution closed the campus in 2019 and donated its entire collection for digitization. The move preserves books that reflect the college’s rich history of social justice and education programs that largely served women, African Americans and low-income students in Detroit.

“The knowledge that [the books] would still be available and still be utilized just keeps us going as we wrap up the college,” said Marygrove President Elizabeth Burns at the Forum. “It’s a sad, sad time, but it is also a time where we know the impact of the college will continue…It’s a very tangible measure of Marygrove for the future.”

Chris Freeland, director of Open Libraries at the Internet Archive, moderated a panel with Marygrove librarian Mary Kickham-Samy, Mike Hawthorne, a librarian at nearby Wayne State University, and Brenda Bryant, dean and director of Marygrove’s social justice program, to talk about the transformation of the library into a digital format.

“It’s exciting! I’m thrilled that it won’t be in just one small corner,” said Bryant of the library’s move online and value to scholars. Bryant built the nation’s first Master of Arts program in social justice at Marygrove and considered the library one of the best kept secrets on campus. “Like my activist friend Elena Herrada [said], the collection was important because in Detroit, reading is an act of resistance.” 

For more about Marygrove’s story, read our online profile.

Want Some Terabytes from the Internet Archive to Play With?

There are many computer science projects, decentralized storage, and digital humanties projects looking for data to play with. You came to the right place– the Internet Archive offers cultural information available to web users and dataminers alike.

While many of our collections have rights issues to them so require agreements and conversation, there are many that are openly available for public, bulk downloading.

Here are 3 collections, one of movies, another of audio books, and a third are scanned public domain books from the Library of Congress. If you have a macintosh or linux machine, you can use those to run these command lines. If you run each for a little while you can get just a few of the items (so you do not need to download terabytes).

These items are also available via bittorrent, but we find the Internet Archive command line tool is really helpful for this kind of thing:

$ curl -LOs https://archive.org/download/ia-pex/ia
$ chmod +x ia
$ ./ia download –search=”collection:prelinger” #17TB of public domain movies
$ ./ia download –search=”collection:librivoxaudio” #20TB of public domain audiobooks
$ ./ia download –search=”collection:library_of_congress” #166,000 public domain books from the Library of Congress (60TB)

Here is a way to figure out how much data is in each:

apt-get install jq > /dev/null
./ia search “collection:library_of_congress” -f item_size | jq -r .item_size | paste -sd+ – | bc | numfmt –grouping
./ia search “collection:librivoxaudio” -f item_size | jq -r .item_size | paste -sd+ – | bc | numfmt –grouping
./ia search “collection:prelinger” -f item_size | jq -r .item_size | paste -sd+ – | bc | numfmt –grouping

Sorry to say we do not yet have a support group for people using these tools or finding out what data is available, so for the time being you are pretty much on your own.

Michelle Wu Receives Internet Archive Hero Award for Establishing the Legal Basis for Controlled Digital Lending

Michelle Wu, Internet Archive Hero Award 2020 recipient

Michelle Wu is leading libraries to think and act in new ways to fulfill their missions.

For nearly two decades, she has advocated for preserving and expanding access to materials by responsibly digitizing collections. Using her expertise as an attorney, law librarian and professor, Wu crafted the legal theory behind Controlled Digital Lending (CDL) and has dedicated much of her career to showing libraries how to put the concept into practice.

To honor her innovative and tireless work, Wu has been named the recipient of the 2020 Internet Archive Hero Award. The annual award recognizes those who have exhibited leadership in making information available for digital learners all over the world. Past recipients have included Phillips Academy, the Biodiversity Heritage Library, and the Grateful Dead. Michelle received the award during the Library Leaders Forum final session on October 20.

“Michelle Wu was ahead of her time in understanding the transition to the digital era and brought library lending into our new landscape,” said Brewster Kahle, founder of the Internet Archive.

“Not only did Michelle see a problem coming, she did something about it,” Kahle says. “It’s a combination of being both a visionary on how the world could work and then making concrete steps to get us there.”

With library buildings closed now for safety, the demand for digital materials has grown. The pandemic magnifies the importance of using CDL as a strategy to expand services to the public, says Pamela Samuelson, a distinguished professor of law and information management at the University of California, Berkeley, who admires Wu’s insights as a scholar and librarian.

“She set the example and made people feel comfortable with a concept that was initially a little bit questionable,” says Samuelson. In her copyright classes, Samuelson now draws on Wu’s work to inform her students.

“Michelle’s articles explaining the concept have been very useful for students to have not just the reader’s perspective, or law student’s perspective, but how librarians are really taking the challenge of the digital age,” Samuelson says. “They are making good things happen to carry on the grand tradition of libraries to facilitate as much access as lawfully possible to the public they serve.”

Looking back on her career, Wu says she sort of fell into law. She abandoned plans for medical school after helping her roommate at the University of California San Diego study for the Law School Admission Test. Fascinated with the logic puzzles, she took the LSAT on a whim and did well enough to get a scholarship.

“I found I loved the theory of the law, looking at issues from all sorts of angles and finding a path through,” says Wu, who enrolled at the California Western School of Law and worked part-time at the San Diego County Law Library. She soon realized that the adversarial nature of the legal process didn’t suit how she viewed the law. Law librarianship was a better match, one grounded in collaboration and a commitment to using legal knowledge to educate and assist users in finding meaningful solutions to their legal problems . A year after earning her J.D., Wu got her master’s degree in librarianship with a certificate in law librarianship at the University of Washington.

She landed her first job at George Washington University Law School Library. In 2001, she was hired by the University of Houston School of Law. It was there, following the massive destruction of the school’s library due to Tropical Storm Allison, that Wu focused on the need to protect materials through digitization.

Wu says she began to wonder: “Is there a better way for libraries to prepare society for a world in which there are a growing number of natural disasters?” she recalls. “There are so many risks to our collections, and society depends on long-term access for this information,” Wu says.

Wu developed the theory for a digitization program designed with copyright in mind. What came to be known as CDL, she says, strikes a balance between the interest of the users and copyright owners. A library can lend out only the number of copies that it has legitimately acquired, though the copy can be any format.   The flexibility in format facilitates  more effective access for a wide variety of users, including those  who live remotely or have trouble physically coming to a library building, while also ensuring the preservation of content in situations like natural disasters.

After Houston, Wu worked at the Hofstra School of Law and Georgetown University Law Center. As both a library director and law professor, Wu says she has been well-positioned to advocate for CDL and reason with the skeptics.

 “I haven’t heard a lot of substantive objections. I have heard fear, which is common and understandable anytime you are changing the status quo, but it is something that must be overcome for advancement.” says Wu. “In talking with others about CDL, I  focus on what CDL is and what it is intended to accomplish, which pushes people to engage deeply instead of rejecting the idea out of fear. From my perspective, CDL  is the purest form of balance in copyright that you are going to find in a world of technology, and that balance is difficult to deny when you examine CDL in detail.”

Kyle K. Courtney,  the copyright advisor and program manager at the Harvard Library Office for Scholarly Communication, says from the first time he met Wu, he was inspired by her ideas and willingness to challenge norms. Her research was a major influence on Courtney’s work and career. Together, they co-authored a position statement on CDL.

“It is great to meet your heroes sometimes — and even better to be able to work with them side by side,” says Courtney. “She is not a theoretical scholar. This is what’s awesome: She puts the cutting-edge CDL copyright system to work. That’s why she’s a trailblazer in both words and action, putting libraries at the forefront in our field.”

Wu’s leadership has helped advance the collaborative work of libraries and enabled there to be  more transparency in sharing information, says Courtney. He and Wu have presented on CDL at several conferences and discussed the concept with Congressional staff on Capitol Hill last year.

“She is one of the hardest working members of the library field I know,” Courtney says. “She’s oriented toward practical results and addresses 21st century challenges in multiple environments – public, private and academic. She is a person of remarkable integrity.”

Courtney says Wu’s recognition showcases what leaders in librarianship should aspire to: a successful record of progressive scholarship,  influence on the next generation of librarians and a legacy of hard work that reflects an enthusiasm for libraries.

Sharing the story of CDL on Capitol Hill, Lila Bailey, policy counsel for the Internet Archive, says she was struck by Wu’s ability to connect with staffers. “Michelle explains things in such a clear, intuitive, practical way,” says Bailey, who also has collaborated with Wu on research. “She’s so competent and conscientious.”

Wu has been committed to spreading her knowledge of both academic and practical aspects of the CDL to librarians and policymakers across the country. “She is somebody who came up with a legal theory and spent her career creating a proof of concept for why this is important,” Bailey says. “The Internet Archive sets this very ambitious vision of universal access to all knowledge then it tries to live up to the vision. Michelle embodies this ethos of the Internet Archive to be the change you want to see in the world.”

In June, Wu retired from academia, but she continues to research and mentor emerging librarians. Too often, (outside of the sciences) academia gives more weight to the risk in innovation instead of imagining the opportunities that creative problem-solving can provide, but Wu says that attitude doesn’t serve the public in the best way.

“We can’t sit back and expect everyone automatically to understand the importance of libraries long term. We have to stand up for what we believe, advocate for it, and find solutions that better serve society in an ever-changing world.” Wu says.

Advertising powers the Web. What if it just doesn’t work?

On October 14, the Internet Archive presented a book talk with author Tim Hwang, NYT Tech Reporter, Kashmir Hill, and technologist, Desigan Chinniah, discussing Hwang’s new book is “Subprime Attention Crisis.”

Is the Ad-Tech model powering the Internet really just the next financial bubble?

That is the question at the heart of a significant new book by Internet researcher Tim HwangSubprime Attention Crisis: Advertising and the Time Bomb at the Heart of the Internet. If you don’t already know Tim, he’s is a polymath: former Google AI policy wonk, lawyer, polemicist. In other words, just the kind of thinker we think you should know. Watch the video of a virtual book event with Tim here:

Subprime Attention Crisis makes the case that the core advertising model driving Google, Facebook, and many of the most powerful companies on the internet is—at its heart—a multibillion dollar financial bubble. Drawing parallels to the 2008 subprime mortgage crisis, Tim shines a spotlight on the lack of transparency, flawed incentives, and outright fraud that keep this machine running.

On October 14, the Internet Archive hosted a talk with the author and New York Times technology reporter Kashmir Hill. Their discussion tackled:

  • Why data-driven, online advertising may be much, much less effective than it looks
  • The long-term impact of the COVID-19 recession on the media and online ads
  • Whether or not the giants of Big Tech are already “too big to fail”

This discussion focused not only on the problems of advertising, but also on the future, and how we might be able to transition to a better, more financially robust internet. Joining the discussion was Desigan Chinniah, who co-leads Grant for the Web—a $100 million fund launched by Coil, Mozilla, and Creative Commons to spur open standards and new economic models for the web beyond advertising.

NOTE: We urge you to purchase a copy of Tim’s new book, Subprime Attention Crisis, via our local bookseller, The Booksmith. The first 50 purchasers will receive an autographed copy.

Library Leaders Forum: how to empower communities affected by COVID-19

This year’s virtual Library Leaders Forum closes on Tuesday, following three weeks of inspiring discussion about the future of libraries in the digital age. The final session will focus on the impact of controlled digital lending on communities, particularly those affected by COVID-19. 

In last week’s session, we heard from librarians on the frontline of the COVID-19 response. Panelists shared how controlled digital lending has empowered libraries to get vital resources to those in need, despite lockdowns. “We were aware of [controlled digital lending] beforehand, but this pandemic has made us acutely aware of the need and opportunity,” said Stanford University’s chief technology strategist Tom Cramer. If you missed it, you can read a detailed recap of the session or watch the full recording

The session demonstrated the power of digital tools for reaching marginalized communities in lockdown and beyond. We were therefore pleased to announce that Internet Archive is joining Project ReShare, a group of organizations developing an open-source resource sharing platform for libraries. Resource sharing, like controlled digital lending, has the power to break down the access barriers associated with commercial platforms. 

The next session will focus on the impact that controlled digital lending is having on libraries and the communities they serve. Internet Archive founder and digital librarian Brewster Kahle will present the Internet Archive Hero Award to Michelle Wu, the visionary behind the practice. We’ll learn what inspired Michelle and how her work has empowered libraries during the current pandemic. There’s still time to register for free

We also have a very special event taking place during the session to which everyone is invited. Join us for the grand reopening of Marygrove College Library and find out how digitization saved a valuable archive from being split up and lost. The event will help place the Forum’s discussions in a real-world context by showing the impact of controlled digital lending on one African American community. It will also explore the power of digitization for preserving key elements of our cultural heritage. Registration is free for this special event.

The Library Leaders Forum may be drawing to a close, but the library community can stay connected through the #EmpoweringLibraries campaign. The campaign builds on the work of the Forum by raising awareness of the positive impact of controlled digital lending. We hope the community will unite to protect this key library practice and make knowledge accessible for all.

Internet Archive Joins Project ReShare


The Internet Archive is the newest library to join Project ReShare, a group of organizations coming together to develop an open source resource sharing platform for libraries.

“Internet Archive is pleased to partner with Project ReShare and its member libraries and consortia to build the next generation of library resource sharing tools,” says Brewster Kahle, founder of the Internet Archive. “We believe in community-developed software and support library efforts to build systems that address the ever-present challenges of connecting readers and learners with books.”

The project was formed in 2018 in response to concern about market consolidation and the pace of innovation among vendors serving libraries. Rather than rely solely on commercial providers, members wanted to be able to set their own priorities.

“We felt we needed to introduce some additional alternatives,” says Jill Morris, chair of the Project ReShare steering committee and executive director of the Pennsylvania Academic Library Consortium Inc. (PALCI). “Libraries need to be able to share ideas and resources with each other to best support their patron bases.”

As a Project ReShare member, the Internet Archive will have a voice in the project’s direction as it works directly with libraries, consortia, and other organizations to improve the value and impact of resource sharing networks and the tools used to support them.

“We are thrilled to have the Internet Archive share their expertise and contribute to the vision of ReShare,” says Morris.

The project is resulting in productive competition and a new suite of options unavailable in the past.  Creating space to devise technology and system agnostic approaches, Project ReShare enables libraries to make decisions in the best interest of good patron service rather than forced into an ecosystem with limited choice, adds Morris.  

“From my own experience working in an academic library, managing a print collection is a major undertaking,” says Chris Freeland, director of Open Libraries at the Internet Archive. “We’re excited to join Project ReShare and the community that is developing new ways of connecting library patrons to the resources they need.”

Other ReShare members include library consortia (ConnectNY, GWLA, MCLS, PALCI, TAL, and TRLN), commercial entities (Knowledge Integration and Index Data) and university libraries (Grand Valley State University, Louisiana State University, Michigan State, Millersville University, Texas A&M, University of Alabama, and University of Chicago).

Library Leaders Forum Kicks off with Policy Discussion: Ways to Better Serve the Public

COVID-19 has made it clear that digital access to books and other library materials is more important than ever. Yet, the information ecosystem is not working as well as it should.

These issues and more were explored during the Policy session of the Library Leaders Forum, a three-week series focused on empowering libraries and the communities they serve through digital lending.  In addition to the panel discussion, special announcements were made about the donation and digitization of a rare Frederick Douglass pamphlet, and that Michelle Wu will receive the Internet Archive Hero Award at the final session of the Library Leaders Forum on October 20.

A captioned video of the entire session is now online and available for all to view.

Panel discussion

For the Policy session panel, librarians, authors, publishers, and advocates came together to discuss the role libraries should play in improving the digital landscape for the communities they serve. Potential policy solutions, such as copyright and labor law reforms, as well as collective action and boycotts to pressure publishers were discussed.

“Our country is struggling to find a common set of facts. The truth often lives behind paywalls while misinformation and disinformation go viral,” said Lila Bailey, policy counsel with the Internet Archive, moderating the discussion. “Equal access to information is foundational to our democratic society and it’s part of why libraries exist.”

Digital materials hold the promise for expanded access, but the outcome is not guaranteed. As publishers refuse to sell e-books, but rather license them, libraries are responding with a variety of strategies including Controlled Digital Lending – the digital equivalent of traditional lending. 

As libraries evolve with the changing landscape, leaders need tools to change for the better. Brewster Kahle, founder of the Internet Archive, said the balance of power is up for grabs and publishers are pushing for control.

“We need librarians to be trained to push back,” Brewster said. “We are fighters for our patrons. We should stand by libraries and help empower them.”

Carmi Parker, librarian for the Whatcom County Library System in Washington state, said the average price of e-book licensing more than tripled over the past decade and libraries are forced to repurchase more frequently. When McMillan recently limited libraries to buying one e-book in the first eight weeks after publication (instead of dozens of copies of best sellers), Parker’s library consortium launched a boycott. After 1,200 other public libraries joined the protest, the publisher bowed to the pressure and dropped the practice.

“The concern here is this pattern of increasing prices and increasingly limited licenses that impede our ability to offer books to our patrons,” Parker says. “We think that we sent the message that embargoes are not OK, but we still have the crippling prices and limitations. We need to use print lending as a model for how these e-books should work. That’s why I’m interested in Controlled Digital Lending because that’s exactly what it does.”

Kyle K. Courtney, copyright advisor & program advisor at Harvard University, said CDL is a complementary model that helps libraries preserve their mission of long-term preservation and access.

“CDL has emerged as one of several answers to deal with these access issues now,” Courtney says. “CDL helps fill this digital void by harnessing the library’s special role in copyright to broaden digital access. We are craving this kind of digital access.”

Some panelists underscored it was important to embrace new forms of dissemination, but that CDL was an incomplete solution in need of refinement.

Many authors are coming around to the idea that sharing their works openly can only help them gain readers, said Dean Smith, director of Duke University Press.

“We are focused on smart and sustainable Open Access,” says Smith, who adds that OA usage has made his press more relevant. CDL is especially useful for titles that are out of print to bring scholarship that is buried back into circulation, he said. Smith suggested a possible “buy button” be added to books offered on Internet Archive as a way to entice more participation in CDL.

There should be several ways for writers to market and sell their books beyond the large publishers and online outlets, according to Cory Doctorow, a science fiction author, activist and journalist, and special advisor to the Electronic Frontier Foundation. He is a supporter of the Internet Archive and believes libraries should be able to scan books for CDL.

Among Doctorow’s policy wish list to improve digital access: reform the copyright law, change labor laws for writers to form strong unions, subject mergers to strict scrutiny, force breakups of monopolistic firms in publishing, distribution and retail, increase arts funding, and create a Library of Congress rights database.

Meredith Rose, senior policy counsel for Public Knowledge, said that the pandemic might be moving public opinion on some of these issues and lead lawmakers to consider new measures. CDL could be pitched as a solution to help address distance learning, public health, misinformation, disability rights and other relevant concerns.

Looking ahead

Next week’s session of the Library Leaders Forum will focus on the community of practice that has developed around Controlled Digital Lending, and the panel discussion will bring together the librarians, technologists and educators who are working together to develop the next generation of library tools that incorporate & build upon Controlled Digital Lending. Registration is free and available now. 

On Bookstores, Libraries & Archives in the Digital Age

The following was a guest post by Brewster Kahle in Against The Grain (ATG). See the original article from September 28, 2020 on the ATG website here.

Back in 2006,  I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.”  This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural.  My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive.

The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place. 

Bookstores: The Thrill of the Hunt

Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before Amazon.com and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game.

Libraries: Offering Conversations not Answers

The libraries that I used in Boston—MIT Libraries, Harvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’ short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored. 

Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas.  Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better. 

Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet).

But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States.  Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others. 

But why was this library of law books not available to everyone? It stung me. It did not seem right. 

A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square:  “Free to All.”  

Archives: A Wonderful Place for Singular Obsessions

When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated.  I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector.

So in this formulation, an archive is a collection, archives are collections of collections.  Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process.

The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future. 

Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light.

Digital Libraries: A Memex Dream, a Global Brain

So when I helped start the Internet Archive, we wanted to build a digital library—a “complete enough” collection, and “organized enough” that everything would be there and findable. A Universal Library. A Library of Alexandria for the digital age. Fulfilling the memex dream of Vanevar Bush (do read “As We May Think“), of Ted Nelson‘s Xanadu, of Tim Berners-Lee‘s World Wide Web, of Danny Hillis‘ Thinking Machine, Raj Reddy’s Universal Access to All Knowledge, and Peter Russell’s Global Brain.

Could we be smarter by having people, the library, networks, and computers all work together?  That is the dream I signed on to.

I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be  a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.”

ABOUT THE AUTHOR

Brewster Kahle, Founder & Digital Librarian, Internet Archive

A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all.

Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to Amazon.com in 1999.  Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta.