Author Archives: Brewster Kahle

What is the Democracy’s Library?

Illustration created with MidJourney

Democracies require an educated citizenry to flourish– and because of this, Democratic governments, at all levels, spend billions of dollars publishing reports, manuals, books, videos so that all can read and learn. That is the good news.  The bad news is that in our digital age, much of this is not accessible.   Democracy’s Library aims to change this.   

The aim of the Internet Archive Democracy’s Library is to collect, preserve and make freely available all the published works of all the democracies– the federal, provincial, and municipal government publications– so that we can efficiently learn from each other to solve our biggest challenges in parallel and in concert.

Democracy’s Library is the foundational information of free people.

We call this “Democracy’s Library” because Democracy is an open system that trusts its citizens to learn, grow and have independent agency. Democratic governments publish openly because they want important information spread widely.  There are no paywalls to the works of government, or there shouldn’t be. 

We need access to all the River reports so we can help understand and manage our declining clean water.   Access to Agricultural research to help farm more sustainably.  To Materials research to build better products and devices. To Local hearings on project results so other cities can overcome the same challenges.  To Training materials and text books for many professions.   All free– and in ways you can find them.

Bringing free public access to the public domain is the opportunity of the Internet– an infrastructure that effectively costs nothing to distribute information that has been collected and organized.

Yes, this will cost a small fortune– but it is within our grasp– to collect and organize billions of documents and datasets, preserve the materials for the ages and make them available for many purposes.  While scoping projects in the United States and Canada have now begun, we estimate this project will cost at least $100 million dollars. The big money has not been committed yet, and we’re still fundraising. But to get things kicked off, Filecoin Foundation (FF) and Filecoin Foundation for the Decentralized Web (FFDW), are supporting the project. The Internet Archive has ramped up collecting government websites and datasets as well as digitizing print materials with many library partners.

Thankfully, we do not have the rights and paywall problems that have been strangling the Internet’s best feature: an essentially free information distribution system.   

Democracy’s Library can be a free public library available on your phone and your laptop.  

Democracy’s Library will be the foundation of new services, both non-commercial and commercial, that leverages language understanding, machine learning, automatic translation, speech recognition, and visualizations.

Democracies publish openly– let’s take advantage of this.  Leverage our library system to not just lease commercial publisher’s database products, but build open collections that everyone can use and reuse without limitation.

Lets build direct conduits from governments into Democracy’s Library for long term preservation and access. A public-public partnership that long served us in the paper era, that took a pause in the mainframe era of commercial databases, can flourish again in the Internet era.  

“Public Access to the Public Domain” can be a rallying cry for Democracy’s Library.

Democracy’s Library can be a flowering of information services for free people.

Please join in and help.   Jamie Joyce of the Internet Archive (jamiejoyce@archive.org) is leading the effort in the United States, Andrea Mills of the Internet Archive Canada ( andrea@archive.org ) is leading the Canadian effort.  The project is overseen by Brewster Kahle ( brewster@archive.org ).   

If you’d like to stay connected, sign up for the #EmpoweringLibraries newsletter.

Have ideas?  Have materials?  Have a use case?  Have resources to bring to bear?   This can only happen if we work together.  

Let’s build Democracy’s Library, together.

Digital Books wear out faster than Physical Books

Ever try to read a physical book passed down in your family from 100 years ago?  Probably worked well. Ever try reading an ebook you paid for 10 years ago?   Probably a different experience. From the leasing business model of mega publishers to physical device evolution to format obsolescence, digital books are fragile and threatened.

For those of us tending libraries of digitized and born-digital books, we know that they need constant maintenance—reprocessing, reformatting, re-invigorating or they will not be readable or read. Fortunately this is what libraries do (if they are not sued to stop it). Publishers try to introduce new ideas into the public sphere. Libraries acquire these and keep them alive for generations to come.

And, to serve users with print disabilities, we have to keep up with the ever-improving tools they use.

Mega-publishers are saying electronic books do not wear out, but this is not true at all. The Internet Archive processes and reprocesses the books it has digitized as new optical character recognition technologies come around, as new text understanding technologies open new analysis, as formats change from djvu to daisy to epub1 to epub2 to epub3 to pdf-a and on and on. This takes thousands of computer-months and programmer-years to do this work. This is what libraries have signed up for—our long-term custodial roles.

Also, the digital media they reside on changes, too—from Digital Linear Tape to PATA hard drives to SATA hard drives to SSDs. If we do not actively tend our digital books they become unreadable very quickly.

Then there is cataloging and metadata. If we do not keep up with the ever-changing expectations of digital learners, then our books will not be found. This is ongoing and expensive.

Our paper books have lasted hundreds of years on our shelves and are still readable. Without active maintenance, we will be lucky if our digital books last a decade.

Also, how we use books and periodicals, in the decades after they are published, change from how they were originally intended. We are seeing researchers use books and periodicals in machine learning investigations to find trends that were never easy in a one-by-one world, or in the silos of the publisher databases. Preparing these books for this type of analysis is time consuming and now threatened by publisher’s lawsuits.

If we want future access to our digital heritage we need to make some structural changes:  changes to institution and publisher behaviors as well as supportive funding, laws, and enforcement.

The first step is to recognize preservation and access to our digital heritage is a big job and one worth doing.  Then, find ways that institutions– educational, government, non-profit, and philanthropic– could make preservation a part of our daily responsibility.

Long live books.

Illustration: midjourney AI generated.

We have added a Mastodon Server

The Internet Archive has recently set up its own Mastodon server– a federated/decentralized open source social media package– that has garnered lots of attention lately.

We use it in ways that we use twitter now (we are not leaving twitter):
@internetarchive@mastodon.archive.org for events, announcements, and fun things
• Staff accounts (e.g. my account @brewsterkahle@mastodon.archive.org) for, well, whatever.

Why?  We need a game with many winners, not just a few powerful players.  

Through our dweb work, the Internet Archive has catalyzed decentralized web technologies through conferences, summits, meet-ups and camps for 6 years. We need new tech to help with privacy, robustness, and work around issues of disinformation and corporate consolidation.  Mastodon is built on open standards so others can build alternative clients and integrate it into other systems.  

Looking forward to many social media alternatives: Blue Sky, Matrix, and many others. 

Personally, I want to see the evolution and combination of features of Slack, Twitter, SMS, Signal, email, Discord, Facebook, IRC, zoom, google meet, and other ways we communicate.  While we are at it, how about a more integrated environment of zendesk, jira, wordpress, and google docs.  Free and open technologies that invite interoperability while communities maintain control would be ideal.  And in my day-to-day I would love fewer systems to monitor that also limit my direct exposure to celebrities, influencers, and politicians.   Oh, I can dream…

Twitter
Facebook
Mastodon
Donations
Physical donations

Please help us learn, this time about Mastodon.   Thank you, all!

Guide to the exhibition galleries of the Departament of Geology and Palaeontology in the British Museum pg 18

“Doors Open” — Go Behind–the-Scenes at the Physical Archive of the Internet Archive

Please join us on October 18th 6:00- 8:00 pm as we take a peek behind the doors of the Physical Archive in Richmond, California

In anticipation of launching Democracy’s Library on October 19th we are excited to offer a behind-the-scenes tour of our physical collections of books, music, film, and video in Richmond, California.

With this special insider event we are opening the doors to an often unseen place. See the lifecycle of physical books acquired by the Internet Archive — donation, preservation, digitization, and access. We’ll also present samples from generous donations and acquisitions of books, records, microfiche, and film, and demonstrate the Archive’s high-end motion-picture film scanner.

We look forward to offering this glimpse into a very important part of the Internet Archive in its mission to bring Universal Access to All Knowledge. 

Light refreshments will be provided

RSVP HERE

Cost: $10

DOORS OPEN:  6 PM – 8PM

ADDRESS: 2512 Florida Avenue Richmond, CA

THANK YOU FOR REGISTERING IN ADVANCE 

Ukrainian Book Drive: Please Contribute

(CC photo credit)

The Internet Archive is requesting donations of Ukrainian books and books useful to Ukrainians. The books will be preserved, digitized and lent (for free to one user at a time) over the Internet. The Internet Archive is prioritizing the digitization and hosting of relevant materials for Ukrainians.

Already the University of Toronto and University of Alberta has sponsored the digitization of sizable Ukrainian collections, where the total collections on archive.org total over 8,000 items in Ukrainian.

But we need much more to support Ukrainians, many of whom are displaced and do not have access to their schools and libraries.

We need your help.  Together we can preserve all published works and make them as widely available as we can.  

The Internet Archive provides free downloading of public domain materials, services for those with print disabilities, free Controlled Digital Lending of books, free interlibrary loan services, free hosting for materials that are uploaded to archive.org, and supports web archiving efforts.  These services can be more relevant to Ukrainians with your help.

Please donate physical books and other materials, upload relevant materials to archive.org, and also consider financial support for our activities.  

Turns Out It’s Not the Technology, It’s the People

25 years ago, Brewster Kahle founded the Internet Archive, now one of the world’s largest digital libraries.

NOTE: On October 21, 2021, the Internet Archive celebrated its 25th anniversary in a virtual event featuring this keynote address by Founder & Digital Librarian, Brewster Kahle. You can watch the talk here or read the transcript below.

Universal Access to All Knowledge has been the dream for millennia, from the Library of Alexandria on forward. The idea is that if you’re curious enough to want to know something, that you can get access to that information. That was the promise of the printing press or Andrew Carnegie’s public libraries — fueling so much citizenship and democracy in the United States. The Internet was the opportunity to really make this dream come true.

What we have is an opportunity that happens maybe only once a millennium. The opportunity that  comes only when we change how knowledge is recorded and shared. From oral to manuscript, manuscript to printing, and now from printing to digital. I was lucky enough to be there in 1980 and thought: what a fantastic opportunity to try to influence that transition.

From Life magazine, Volume 19, Number 11, Sept 10, 1945

Of course, we were building on the vision of many before us. This dream of having an interlocking publishing system had been around for a long time. Vannevar Bush’s 1945 article “As We May Think” was very much on people’s minds in the 1980s. There was Ted Nelson’s Xanadu—a world of hypertext. Doug Engelbart’s way of annotating and enabling you to build on the works of others.

The key thing was not the computers. Actually, it was the network. It was the ability to communicate with each other. Sure, anybody could go and write word processing documents. That’s good. But can you make everybody a publisher? Can everyone find their voice and their community no matter where they are in the world? And can people write in a way that allows others to build on their work? By 1996, we had built that. It was the World Wide Web.

With this global publishing network, the Web, we could finally build the library. It was time to build the library. In 1996, I thought: Why don’t we just build this thing? I mean, how hard could it be? Sure, maybe we’re going to have to go and digitize a whole library, but that couldn’t be that hard, right?

And so, a group of us said, let’s do this. We started by archiving the most transient of media, which was the World Wide Web’s pages. We did that for five years before we even made the Wayback Machine. The idea was to record what people were publishing and be able to go and use that in new and different ways. Could we build a library to preserve all of that material, but then add computers to the mix, so that something new and magic happens?  Could we connect people, connect ideas, build on each other’s concepts with computers and these new AI things that we knew were coming. Ultimately could we make the world smarter?

Could we make people smarter by being better connected? Not just because they could read what other people were writing, but because machines would help filter information, scan vast amounts of knowledge, emphasize what is most important, provide context to the deluge.

In many ways, we have achieved this, but not completely enough: now people are writing and sharing knowledge, but it is intermingled with misinformation — purposefully false information.  We still don’t have the tools to filter out the lies, and in many ways, we have business models that prosper when misinformation is widely shared. So while the dream of access may be at hand, we lack the tools and responsible organizations to help us make good use of the flood of data now at our fingertips. Given how new our digital transition is, this may not be that surprising, but it is an urgent issue that faces us. We need to fight misinformation and build data-mining tools to leverage all this knowledge to help people make better decisions — to be smarter.  

This is our challenge for our next 25 years.

When we started the Internet Archive, I felt this project needed to be done in the open and as a non-profit. We needed to have not just one or two search engines, we needed lots and lots of different organizations building their new ideas on top of the whole knowledge base of humanity. We could help by being a library for this new digital world.

Caslon & Brewster Kahle in front of the Carnegie Library in Pittsburgh, October 9, 2002.

The libraries I grew up with were vast and free, and came with librarians who helped me understand and find things I needed to know.  In our new digital world, that future is not guaranteed. It may be that most people will just feed on what they can access for free, placed there because it’s promoted by somebody. If we don’t solve this–getting quality published material to the internet population–we’re going to bring up a generation educated on whatever dreck they can find online. So we have to build not only universal access to lots of webpages, but access to the right and best information– Universal Access to All Knowledge. That is going to require requiring changes to existing  business models and adjustments by long standing institutions. We need an Internet with many winners. If we have an Internet with just a few winners, some big corporations and large governments that are controlling too much of what’s online, then we will all lose.

A library alone can not solve all of these issues, but it is a necessary component, needed infrastructure in a digital world.

On October 12, 2012, the Internet Archive reached 10 petabytes of data stored in its repository.

25 years ago, I thought building this new library would largely be a technological process, but I was wrong.  It turns out that it’s mostly a people process. Crucially, the Internet Archive has been supported by hundreds of organizations. About 800 libraries have helped build the web collections that are in the Wayback Machine. Over 1000 libraries have contributed books to be digitized into the collections—now 5 million volumes strong. And beyond that, people with expertise in, say, railway timetables, Old Time Radio, 78 RPM records—they’ve been donating physical media and uploading digital files to our servers that you see here in this room. Last year, well over 100 million people used the resources of the Internet Archive, and over 100,000 people made a financial donation to support us.  This has truly been a global project– the people’s library.

I love the weird and wacky stuff of the Internet, just the fun and frolicy things. You go online and see these things like, wow, that’s remarkable.

Yesterday, I was looking through the uploads from Kevin Hubler. He donated the collection his father built over his lifetime.  His father collected everything a particular singer, Buddy Clark, had ever done. Clark was a 1940’s big band singer who died when he was 37.  So I could listen to records, see sheet music, and dive into details, all thanks to Kevin Hubler.  I love this– going down rabbit holes and learning something deeply.  This was a tribute to Buddy Clark, but also to Kevin and his father– who prepared and preserved something they loved for the future.

That we’re able to enjoy each other and to express our wackiness– that’s the win of the World Wide Web!  That’s the thing that you wouldn’t get if it were all just more channels of television.  Yes, the internet and the World Wide Web are a bit of the Wild West, but would you want it any other way?  Isn’t that where the fun and interesting things come from?

Today, it is still the people’s internet. That’s the internet that I wanted to support by starting the Internet Archive. The World Wide Web is an experiment in radical sharing where people feel that they’re better off, not worse off, building on other people’s works. 

I’m hopeful and optimistic that we can build this next 25 years to be as interesting and fun as the last. That we can usher in another level of technology, another 25 years of blossoming, interesting ideas.

Douglas Lurton, Grandfather & Author

I want to  end this talk with a personal story– my grandfather Douglas Lurton was a publisher and an author who died before I was born. Last weekend I searched for his name using full text search in the 20 million texts now on the Archive and found this quotation from him in a newspaper from West Sacramento: “Take the tools in hand and carve your own best life.” — Douglas Lurton

Now, I would like to extend my grandfather’s advice.  “Let us all  take our tools in hand, and together, carve our own best future.”

Let’s keep the trust.

Internet Archive Joins IDS Project for Interlibrary Loan

The Internet Archive is pleased to announce it has joined the The Information Delivery Services (IDS) Project, a mutually supportive resource-sharing cooperative whose 120 members include public and private academic libraries from across the country.  As a member of the IDS Project, the Internet Archive expands its ability to support libraries and library patrons by providing access to two million monographs and three thousand periodicals in its physical collections available for non-returnable interlibrary loan (ILL) fulfillment. 

“The Internet Archive is a wonderful addition to the IDS Project’s team of libraries.  It is a great honor to be able to help IA reach more libraries and more patrons through the integration with IDS Logic,” said Mark Sullivan, Executive Director of the IDS Project.

If you want to learn more about the IDS Project and the Internet Archive, I will be speaking at the 17th Annual IDS Summer Conference on July 29th.

In addition to the IDS Project, the Internet Archive is also piloting a program with libraries through RapidILL. If there are other resource sharing efforts that we should investigate as we expand our ILL service, please reach out to me at brewster@archive.org.

Reflections as the Internet Archive turns 25

Photo by Rory Mitchell, The Mercantile, 2020 – CC by 4.0
(L-R) Brewster Kahle, Tamiko Thiel, Carl Feynman at Thinking Machines, May 1985. Photo courtesy of Tamiko Thiel.

A Library of Everything

As a young man, I wanted to help make a new medium that would be a step forward from Gutenberg’s invention hundreds of years before. 

By building a Library of Everything in the digital age, I thought the opportunity was not just to make it available to everybody in the world, but to make it better–smarter than paper. By using computers, we could make the Library not just searchable, but organizable; make it so that you could navigate your way through millions, and maybe eventually billions of web pages.

The first step was to make computers that worked for large collections of rich media. The next was to create a network that could tap into computers all over the world: the Arpanet that became the Internet. Next came augmented intelligence, which came to be called search engines. I then helped build WAIS–Wide Area Information Server–that helped publishers get online to anchor this new and open system, which came to be enveloped by the World Wide Web.  

By 1996, it was time to start building the library.

This library would have all the published works of humankind. This library would be available not only to those who could pay the $1 per minute that LexusNexus charged, or only at the most elite universities. This would be a library available to anybody, anywhere in the world. Could we take the role of a library a step further, so that everyone’s writings could be included–not only those with a New York book contract? Could we build a multimedia archive that contains not only writings, but also songs, recipes, games, and videos? Could we make it possible for anyone to learn about their grandmother in a hundred years’ time?

From the San Francisco Chronicle, Business Section, May 7, 1988. Photo by Jerry Telfer.

Not about an Exit or an IPO

From the beginning, the Internet Archive had to be a nonprofit because it contains everybody else’s things. Its motives had to be transparent. It had to last a long time.

In Silicon Valley, the goal is to find a profitable exit, either through acquisition or IPO, and go off to do your next thing. That was never my goal. The goal of the Internet Archive is to create a permanent memory for the Web that can be leveraged to make a new Global Mind. To find patterns in the data over time that would provide us with new insights, well beyond what you could do with a search engine.  To be not only a historical reference but a living part of the pulse of the Internet.

John Perry Barlow, lyricist for the Grateful Dead & founder of the Electronic Frontier Foundation, accepting the Internet Archive Hero Award, October 21, 2015. Photograph by Brad Shirakawa – CC by 4.0

Looking Way Back

My favorite things from the early era of the Web were the dreamers. 

In the early Web, we saw people trying to make a more democratic system work. People tried to make publishing more inclusive.

We also saw the other parts of humanity: the pornographers, the scammers, the spammers, and the trolls. They, too, saw the opportunity to realize their dreams in this new world. At the end of the day, the Internet and the World Wide Web–it’s just us. It’s just a history of humankind. And it has been an experiment in sharing and openness.

The World Wide Web at its best is a mechanism for people to share what they know, almost always for free, and to find one’s community no matter where you are in the world. 

Brewster Kahle speaking at the 2019 Charleston Library Conference. Photo by Corey SeemanCC by 4.0

Looking Way Forward

Over the next 25 years, we have a very different challenge. It’s solving some of the big problems with the Internet that we’re seeing now. Will this be our medium or will it be theirs? Will it be for a small controlling set of organizations or will it be a common good, a public resource? 

So many of us trust the Web to find recipes, how to repair your lawnmower, where to buy new shoes, who to date. Trust is perhaps the most valuable asset we have, and squandering that trust will be a global disaster. 

We may not have achieved Universal Access to All Knowledge yet, but we still can.

In another 25 years, we can have writings from not a hundred million people, but from a billion people, preserved forever. We can have compensation systems that aren’t driven by advertising models that enrich only a few. 

We can have a world with many winners, with people participating, finding communities of like-minded people they can learn from all over the world.  We can create an Internet where we feel in control. 

I believe we can build this future together. You have already helped the Internet Archive build this future. Over the last 25 years, we’ve amassed billions of pages, 70 petabytes of data to offer to the next generation. Let’s offer it to them in new and exciting ways. Let’s be the builders and dreamers of the next twenty-five years.

See a timeline of Key Moments in Access to Knowledge, videos & an invitation to our 25th Anniversary Virtual Celebration at anniversary.archive.org.

Internet Archive Launches New Pilot Program for Interlibrary Loan

Photo by Alfons Morales on Unsplash

The pandemic has resulted in a renewed focus on resource sharing among libraries. In addition to joining resource sharing organizations like the Boston Library Consortium, the Internet Archive has started to participate in the longstanding library practice of interlibrary loan (ILL). 

Internet Archive is now making two million monographs and three thousand periodicals in its physical collections available for non-returnable fulfillment through a pilot program with RapidILL, a prominent ILL coordination service. To date, more than seventy libraries have added the Internet Archive to their reciprocal lending list, and Internet Archive staff are responding to, on average, twenty ILL requests a day. If your library would like to join our pilot in Rapid, please reach out to Mike Richins at Mike.Richins@exlibrisgroup.com and request that Internet Archive be added to your library’s reciprocal lending list.

If there are other resource sharing efforts that we should investigate as we pilot our ILL service, please reach out to Brewster Kahle at brewster@archive.org.

Thank you Ubuntu and Linux Communities

The Internet Archive is wholly dependent on Ubuntu and the Linux communities that create a reliable, free (as in beer), free (as in speech), rapidly evolving operating system. It is hard to overestimate how important that is to creating services such as the Internet Archive.

When we started the Internet Archive in 1996, Sun and Oracle donated technology and we bought tape robots. By 1999, we shifted to inexpensive PC’s in a cluster, running varying Linux distributions.  

At this point, almost everything that runs on the servers of the Internet Archive is free and open-source software. (I believe our JP2 compression library may be the only piece of proprietary software we use.)

For a decade now, we have been upgrading our operating system on the cluster to the long-term support server Linux distribution of Ubuntu. Thank you, thank you. And we have never paid anything for it, but we submit code patches as the need arises.

Does anyone know the number of contributors to all the Linux projects that make up the Ubuntu distribution? How many tens or hundreds of thousands? Staggering.   

Ubuntu has ensured that every six months a better release comes out, and every two years a long-term release comes out. Like clockwork. Kudos. I am sure it is not easy, but it is inspiring, valuable and important to the world.

We started with Linux in 1997, we started with Ubuntu server release Warty Warthog in 2004 and are in the process of moving to Focal (Ubuntu 20.4).

Depending on free and open software is the smartest technology move the Internet Archive ever made.

1998: https://www.sfgate.com/business/article/Archiving-the-Internet-Brewster-Kahle-makes-3006888.php

Internet Archive servers running at the Biblioteca Alexandrina circa approximately 2002.

2002: https://archive.org/about/bibalex.php

2013: https://www.theguardian.com/technology/2013/apr/26/brewster-kahle-internet-archive

petabox2.JPG

2021: Internet Archive