Looking Back at the Million Book Project

Years ago, many people rejected the idea of reading a book on a screen. Fortunately, others had a vision for the potential of digitizing the world’s knowledge.

One of those pioneers was Carnegie Mellon Professor Raj Reddy. The Internet Archive recently hosted a virtual event to honor him and celebrate the 20th anniversary of his Million Book Project that included Reddy, Vint Cerf of Google, Moriel Schottlender of the Wikimedia Foundation, Brewster Kahle of the Internet Archive, Mike Furlough of HaithiTrust, and Liz Ridolfo of the University of Toronto.

Since Reddy’s dream of providing universal access to all human knowledge—instantly to anyone, anywhere in the world—others have embraced the mission.  Advocates of mass digitization discussed the tremendous impact that open access to creative works online has had on society, the challenges ahead, and potential, if more books are unleashed.

“There are tens of millions of digitized books available on the internet now. Many of these are born digital. Many more are being converted from print copies,” said Mike Furlough, executive director at HathiTrust, which has a collection of 17.5 million digital books. “This is really a human accomplishment that represents decades, if not centuries, of intellectual labor, physical labor to steward and preserve these items.”

Reddy said he knew his vision two decades ago was just the beginning and there is a huge amount of room to improve the utility of digital works. “It’s time for us to put our heads together to find a way to create digital libraries and archives that are far more useful than what we have today,” he said.

Many agreed more must be done to expand efforts, build a sustainable infrastructure and raise awareness of the shifting role of libraries to provide digital materials.

“I think we should ask more questions: What aren’t we digitizing? What are the economic or political forces that are constraining our choices and what corrective measures can we take?”

Mike Furlough, executive director, HathiTrust

Internet Archive Founder Brewster Kahle said Reddy was right that bringing our full history online for the next generation is important, but it’s not been easy technically or institutionally.

“If we’ve ever wondered why you’d want digital books, the year 2020 told us why. The global pandemic hit and shut down school libraries, public libraries, and college libraries,” Kahle said. “We got calls from professors, teachers and homeschoolers, desperate to find some way in their Zoom classrooms to bring books to kids.”

The Internet Archive responded, explaining how libraries could extend access digitally to books that were in their physical collections. This helped make a big difference on the ground, and Kahle says policies are changing so libraries are confident in serving their digital learners. For instance, as libraries spend $12 billion a year on materials, Kahle said they should be able to purchase (not lease) e-books to fulfill their mission of service to users.

There was also a push among panelists for digitization to be more inclusive of works from all kinds of authors, recognizing what is being scanned is what’s already been obtained by libraries. “I think we should ask more questions: What aren’t we digitizing? What are the economic or political forces that are constraining our choices and what corrective measures can we take?” Furlough said.

The future interaction with knowledge involves the digitization of books and expanding the diversity of voices is critical, said Moriel Schottlender, principal system architect with the Wikimedia Foundation.

“Making resources available to anyone online is key and this is really what we’re striving for,” said Schottlender, noting Wikipedia’s mission is to be a beacon of factual information that is verifiable, neutral and transparent. “Our goal is that everyone in the world should be able to contribute to the sum of all knowledge. But not everyone has equal access to knowledge, to books, to journals, to libraries, to educational materials…We use digitization to increase equity.”

“Our goal is that everyone in the world should be able to contribute to the sum of all knowledge. But not everyone has equal access to knowledge…We use digitization to increase equity.”

Moriel Schottlender, principal system architect, Wikimedia Foundation

There is growing demand for all kinds of digital information, said Liz Ridolfo, special collections projects librarian at University of Toronto Libraries.. Donors want items digitized for a variety of reasons including to protect rare items, to reach a broader audience, and to free up physical space for other materials. Especially during the pandemic, Ridolfo said, it has been useful to have a curated collection of online teaching and reference materials.

Vint Cerf, vice president and internet evangelist at Google, said people are increasingly going online to get answers to questions—often turning to YouTube to view how-to videos. That demand for “just-in-time learning” is not a substitute for long-form content, he said, but it’s an interesting phenomenon that may draw people to the internet to learn more.

Looking ahead, Reddy said there is a need for big change to address the broken copyright law. His aspiration is that by 2031, there will be a frictionless, streamlined copyright regime, in which authors register for no fee, but can extend the copyright of a work indefinitely if they want by paying a prescribed fee. For users, he proposes access to copyright material for fair use in less than five minutes. They could pay a required fee, as prescribed by the data for a single copy use. If the copyright is not registered with the national digital library, then fines for copyright violations of unregistered copyright material should be nominal.

“Let’s take Raj’s vision here and make it come true,” Kahle said. “Who should argue against the streamline system where fair uses are easy. Where compensation is understood, where there’s registration and the actual copyrighted materials are in repositories that are long-term protected. Let’s just do this.”

24 Arts Organizations join the Collaborative ART Archive (CARTA)

Earlier this summer, the Internet Archive announced its partnership with the New York Art Resources Consortium (NYARC) to form a collaborative, web-based art resources preservation and access initiative. We are now thrilled to announce that the initiative has kicked off with a diverse roster of 24 participating member institutions throughout the United States and Canada.

The Collaborative ART Archive (CARTA) project has a mission to collect, preserve, and provide access to vital arts content from the web by supporting a vibrant, growing collaboration of art and museum libraries. With funding from federal agencies and foundations, the Internet Archive is able to expand CARTA to a diverse set of museums and art libraries worldwide and to broaden the ways the resulting collections can be discovered and used both by scholar and patrons.

The arts institutions actively participating in this program so far include:

  • American Craft Council
  • American Folk Art Museum
  • ART | library deco
  • Art Gallery of Ontario
  • Art Institute of Chicago
  • Fashion Institute of Technology
  • Getty Research Institute (Getty Library)
  • Harvard University – Fine Arts Library
  • Harvard University – Graduate School of Design
  • Indianapolis Museum of Art at Newfields
  • Leonardo/ISAST
  • Maryland Institute College of Art
  • Museum of Contemporary Art of Georgia
  • National Gallery of Art Library
  • National Gallery of Canada
  • New York Art Resources Consortium
  • Philadelphia Museum of Art
  • San Francisco Museum of Modern Art
  • Sterling and Francine Clark Art Institute Library
  • The Corning Museum of Glass
  • The Menil Collection
  • The Metropolitan Museum of Art
  • The Nelson-Atkins Museum of Art, Spencer Reference Library
  • University of Hawaii at Manoa, Hamilton Library

Membership in the program includes national and regional art and museum libraries throughout the United States and Canada committed to the preservation of 21st century art historical resources on the web. One of our early supporters and current CARTA member Amelia Nelson, Director of Library and Archives at The Nelson-Atkins Museum of Art, noted the increased risk of losing art history on the web in comparison to earlier generations of artists: “Websites are the letters, exhibition postcards, exhibition reviews and newspaper articles of today’s artists and artistic communities, but they aren’t resources that scholars can find in archives like the physical materials that document the careers of earlier generations of artists. I worry that as we lose these sites, we are also losing the potential for scholars to place this moment in the canon of art history and culture broadly. This initiative will build a collaborative and sustainable way for art libraries to pool their limited resources, with the technical, administrative, and organizational expertise of the Internet Archive, to ensure that this content is available for future generations.”

The initial group of member institutions have identified an initial set of more than 150 valuable and at-risk websites, articles, and other materials on five primary collection topics: Local Arts Organizations; Artists Websites; Art Galleries; Auction Houses (Catalogs/Price Lists); and Art Criticism.  These collections will continue to grow and evolve over the course of the project, capturing thousands of websites and many terabytes of data. 

Untitled Art website, nominated by NYARC for inclusion in the CARTA Art Fairs and Events collection.

We’re actively seeking more US-based arts institutions to participate in the project as we continue to grow our collections of web-based art history resources. Collaborative members attend meetings every two months to coordinate curation and other group activities as well as participate in subcommittees focused on collection development, metadata, end-user/researcher engagement, and outreach. If you are involved with an art and/or museum library interested in joining this collaborative project, please complete this form.

2021 Library Leaders Forum Recap

This year’s Library Leaders Forum brought more than 1,300 people together for virtual discussions across the month of October. All of the public sessions were recorded and are available for viewing at https://www.libraryleadersforum.org. Check out the following highlights:

Library Leaders Forum Sessions

October 13
Session I: Community Dialogue
Hear from library leaders as they navigate the challenges of the ebook marketplace & their concerns about the future of library collections. Watch now

October 20
Session II: Community Impact
Hear firsthand from educators & librarians about the value of digitized library collections for the patrons, students, and communities they serve. Watch now


2021 Internet Archive Hero Award

Librarians Kanta Kapoor & Lisa Radha Weaver have been named the recipients of the 2021 Internet Archive Hero Award for helping their communities stay connected to digital books during the pandemic. Watch the awards ceremony


Conference Workshops

October 7
Controlled Digital Lending: Unlocking the Library’s Full Potential

Hear from the authors of the new CDL policy document. Watch now

October 12
Empowering Libraries Through Controlled Digital Lending

Learn how CDL works, the benefits of the Open Libraries program, and the impact that the program is having for partner libraries and the communities they serve. Watch now

October 27
Resource Sharing with the Internet Archive

Learn about the Internet Archive’s new resource sharing initiatives and how your library can participate. Watch now

Turns Out It’s Not the Technology, It’s the People

25 years ago, Brewster Kahle founded the Internet Archive, now one of the world’s largest digital libraries.

NOTE: On October 21, 2021, the Internet Archive celebrated its 25th anniversary in a virtual event featuring this keynote address by Founder & Digital Librarian, Brewster Kahle. You can watch the talk here or read the transcript below.

Universal Access to All Knowledge has been the dream for millennia, from the Library of Alexandria on forward. The idea is that if you’re curious enough to want to know something, that you can get access to that information. That was the promise of the printing press or Andrew Carnegie’s public libraries — fueling so much citizenship and democracy in the United States. The Internet was the opportunity to really make this dream come true.

What we have is an opportunity that happens maybe only once a millennium. The opportunity that  comes only when we change how knowledge is recorded and shared. From oral to manuscript, manuscript to printing, and now from printing to digital. I was lucky enough to be there in 1980 and thought: what a fantastic opportunity to try to influence that transition.

From Life magazine, Volume 19, Number 11, Sept 10, 1945

Of course, we were building on the vision of many before us. This dream of having an interlocking publishing system had been around for a long time. Vannevar Bush’s 1945 article “As We May Think” was very much on people’s minds in the 1980s. There was Ted Nelson’s Xanadu—a world of hypertext. Doug Engelbart’s way of annotating and enabling you to build on the works of others.

The key thing was not the computers. Actually, it was the network. It was the ability to communicate with each other. Sure, anybody could go and write word processing documents. That’s good. But can you make everybody a publisher? Can everyone find their voice and their community no matter where they are in the world? And can people write in a way that allows others to build on their work? By 1996, we had built that. It was the World Wide Web.

With this global publishing network, the Web, we could finally build the library. It was time to build the library. In 1996, I thought: Why don’t we just build this thing? I mean, how hard could it be? Sure, maybe we’re going to have to go and digitize a whole library, but that couldn’t be that hard, right?

And so, a group of us said, let’s do this. We started by archiving the most transient of media, which was the World Wide Web’s pages. We did that for five years before we even made the Wayback Machine. The idea was to record what people were publishing and be able to go and use that in new and different ways. Could we build a library to preserve all of that material, but then add computers to the mix, so that something new and magic happens?  Could we connect people, connect ideas, build on each other’s concepts with computers and these new AI things that we knew were coming. Ultimately could we make the world smarter?

Could we make people smarter by being better connected? Not just because they could read what other people were writing, but because machines would help filter information, scan vast amounts of knowledge, emphasize what is most important, provide context to the deluge.

In many ways, we have achieved this, but not completely enough: now people are writing and sharing knowledge, but it is intermingled with misinformation — purposefully false information.  We still don’t have the tools to filter out the lies, and in many ways, we have business models that prosper when misinformation is widely shared. So while the dream of access may be at hand, we lack the tools and responsible organizations to help us make good use of the flood of data now at our fingertips. Given how new our digital transition is, this may not be that surprising, but it is an urgent issue that faces us. We need to fight misinformation and build data-mining tools to leverage all this knowledge to help people make better decisions — to be smarter.  

This is our challenge for our next 25 years.

When we started the Internet Archive, I felt this project needed to be done in the open and as a non-profit. We needed to have not just one or two search engines, we needed lots and lots of different organizations building their new ideas on top of the whole knowledge base of humanity. We could help by being a library for this new digital world.

Caslon & Brewster Kahle in front of the Carnegie Library in Pittsburgh, October 9, 2002.

The libraries I grew up with were vast and free, and came with librarians who helped me understand and find things I needed to know.  In our new digital world, that future is not guaranteed. It may be that most people will just feed on what they can access for free, placed there because it’s promoted by somebody. If we don’t solve this–getting quality published material to the internet population–we’re going to bring up a generation educated on whatever dreck they can find online. So we have to build not only universal access to lots of webpages, but access to the right and best information– Universal Access to All Knowledge. That is going to require requiring changes to existing  business models and adjustments by long standing institutions. We need an Internet with many winners. If we have an Internet with just a few winners, some big corporations and large governments that are controlling too much of what’s online, then we will all lose.

A library alone can not solve all of these issues, but it is a necessary component, needed infrastructure in a digital world.

On October 12, 2012, the Internet Archive reached 10 petabytes of data stored in its repository.

25 years ago, I thought building this new library would largely be a technological process, but I was wrong.  It turns out that it’s mostly a people process. Crucially, the Internet Archive has been supported by hundreds of organizations. About 800 libraries have helped build the web collections that are in the Wayback Machine. Over 1000 libraries have contributed books to be digitized into the collections—now 5 million volumes strong. And beyond that, people with expertise in, say, railway timetables, Old Time Radio, 78 RPM records—they’ve been donating physical media and uploading digital files to our servers that you see here in this room. Last year, well over 100 million people used the resources of the Internet Archive, and over 100,000 people made a financial donation to support us.  This has truly been a global project– the people’s library.

I love the weird and wacky stuff of the Internet, just the fun and frolicy things. You go online and see these things like, wow, that’s remarkable.

Yesterday, I was looking through the uploads from Kevin Hubler. He donated the collection his father built over his lifetime.  His father collected everything a particular singer, Buddy Clark, had ever done. Clark was a 1940’s big band singer who died when he was 37.  So I could listen to records, see sheet music, and dive into details, all thanks to Kevin Hubler.  I love this– going down rabbit holes and learning something deeply.  This was a tribute to Buddy Clark, but also to Kevin and his father– who prepared and preserved something they loved for the future.

That we’re able to enjoy each other and to express our wackiness– that’s the win of the World Wide Web!  That’s the thing that you wouldn’t get if it were all just more channels of television.  Yes, the internet and the World Wide Web are a bit of the Wild West, but would you want it any other way?  Isn’t that where the fun and interesting things come from?

Today, it is still the people’s internet. That’s the internet that I wanted to support by starting the Internet Archive. The World Wide Web is an experiment in radical sharing where people feel that they’re better off, not worse off, building on other people’s works. 

I’m hopeful and optimistic that we can build this next 25 years to be as interesting and fun as the last. That we can usher in another level of technology, another 25 years of blossoming, interesting ideas.

Douglas Lurton, Grandfather & Author

I want to  end this talk with a personal story– my grandfather Douglas Lurton was a publisher and an author who died before I was born. Last weekend I searched for his name using full text search in the 20 million texts now on the Archive and found this quotation from him in a newspaper from West Sacramento: “Take the tools in hand and carve your own best life.” — Douglas Lurton

Now, I would like to extend my grandfather’s advice.  “Let us all  take our tools in hand, and together, carve our own best future.”

Let’s keep the trust.

LaTurbo Avedon’s Hypertext Wishes

As part of the Internet Archive’s 25th Anniversary celebration we asked artist LaTurbo Avedon to contemplate what the year 2046 and the future of the internet might look like through the lens of their own art practice.

LaTurbo Avedon introduces the work Hypertext Wishes, inviting viewers to follow a virtual token as it passes into a contemplative well of the Internet. Avedon has spent the past decade developing a body of work that illuminates the ever-growing intensity between users and virtual experiences, pursuing creative environments that deepen the meaning of memories found in the metaverse. They curate and design Panther Modern, a file-based exhibition space that encourages artists to create site-specific installations for the Internet.

Here is a clip from Hypertext Wishes, available for viewing at Internet Archive Headquarters:

The imagery from Hypertext Wishes is made available on the Internet Archive here: https://archive.org/details/15_20211020_20211020_1907

LaTurbo Avedon is an avatar and artist, creating work that emphasizes the practice of non-physical identity and authorship. Their process of character creation continues through gaming, performance and exhibitions. Their work has appeared internationally, including  The Whitney Museum (New York City). The Manchester International Festival (UK), Transmediale (Berlin), Haus der elektronischen Künste (Basel), HMVK (Dortmund), Barbican Center (London), Galeries Lafayette (Paris), and TRANSFER Gallery (New York).

Celebrating Kanta Kapoor: 2021 Internet Archive Hero Award Recipient

Kanta Kapoor, manager of support services, Milton Public Library, Milton, Ontario.

Kanta Kapoor was the first in her family to go to a university. Growing up in New Delhi, she was determined to become an independent woman, and she knew education was the key to success.

“I understand the value of knowledge—to survive in this world, to make a living and make informed decisions,” said Kapoor, who excelled in school and worked at public and university libraries in India for several years before moving to Canada in 2012.

Kapoor developed an expertise in emerging technologies and became an advocate for open sharing of information. Now, she is manager of support services at the Milton Public Library (MPL) in Ontario. In that role, she helped MPL become an early adopter of the Internet Archive’s Open Libraries program, which offers digital access to the physical books that a library owns through the library practice known as controlled digital lending (CDL).

For her efforts to broaden access and embrace innovative practices, Kapoor has been named a recipient of the 2021 Internet Archive Hero Award. The annual award recognizes those who have exhibited leadership in making information available for digital learners all over the world. Past recipients have included Michelle Wu, Phillips Academy, the Biodiversity Heritage Library, and the Grateful Dead.

Kanta helping patrons at Milton Public Library.

In her career, Kapoor has focused on leveraging technology to improve services to the community. She has a master’s degree in library science and gained a specialty in open-source software and data management through additional graduate studies at the University of Toronto.

Kapoor said she was drawn to MPL in 2019 because the leadership team was forward thinking and there was an opportunity to expand community-led projects.

“We were challenged to think outside of the box and become champions throughout Canadian public libraries to stay ahead of the curve,” Kapoor said.  

“In my career, I’ve seen many changes—and it’s still evolving. We need to continue to adapt and embrace new technology.”

Kanta Kapoor, 2021 Internet Archive Hero Award recipient

In her newly created position, she helped improve services for patrons and library staff alike with new technology, mobile apps and digitization of materials. When she was introduced to the Open Libraries program, Kapoor said she was impressed by the ability to provide millions of digitized books to users across the world. MPL decided this was the direction it wanted to go and became one of the first public libraries in Canada to embrace CDL and embed a link to Open Libraries in its catalogue.

MPL’s Mark Williams, chief librarian and chief executive officer, credits Kapoor’s strong leadership skills in building the partnership with the Internet Archive, which helped the MPL community during the earliest days of COVID-19 closures.

“It meant we were able to provide our patrons with access to tens of thousands of digitized materials at a time when they were more welcome than ever, during the pandemic lockdowns, while also being able to donate over 40,000 items for the benefit of a truly global audience,” he said. “We are incredibly fortunate that Kanta is part of the MPL team and her  compassion, graciousness, humility and ultimately exemplary leadership have been put to good use.”

Milton Public Library, Milton, Ontario

MPL expanded its partnership by donating physical items to the Archive, obtained a state-of-the-art digitization scanner, and became involved with Library Futures, a coalition of libraries and other stakeholders championing equitable access to knowledge.

Kapoor has helped promote materials available through CDL on the library’s web page, newsletters, and social media. So far, the response by users has been positive and Kapoor is reaching across her professional networks to educate her colleagues about the potential benefits.

“I encourage my fellow librarians to participate in this wonderful project to help their communities out,” Kapoor said. “In my career, I’ve seen many changes—and it’s still evolving. We need to continue to adapt and embrace new technology. I would like to see more libraries joining hands together to serve the community.”

Celebrating Lisa Radha Weaver: 2021 Internet Archive Hero Award Recipient

Lisa Radha Weaver, director of collections and program development, Hamilton Public Library, Hamilton, Ontario.

As a child, Lisa Radha Weaver says she spent most Sunday afternoons at the Kitchener Public Library in Ontario. She has fond memories of the friendly library staff helping her load up as many books as she could carry home.

Then, as a college student at Trent and Queen’s Universities, Weaver again was struck by how kind and generous the people were behind the reference desk at the library. Finally, she asked: How do you get this job?

Weaver learned about the pathway to become a professional librarian. So, after finishing her undergraduate degree in education, she earned her master of library and information science at Western University in London, Ontario.

“I knew that I wanted to serve the public in the same way that I had always been served at all the libraries that I had the privilege of growing up with in the first half of my life,” said Weaver, now director of collections and program development at Hamilton Public Library (HPL) in Ontario.

But that public service role was tested in the spring of 2020 when HPL closed due to COVID-19, as she and her fellow library staff were left wondering how they were going to get books to members who were now locked out of their physical collection. Weaver had been instrumental in helping HPL become an early adopter of the Internet Archive’s Open Libraries program, which offers digital access to the physical books that a library owns. Because of the collections team’s hard work, HPL patrons had access to tens of thousands of books from the safety of their homes, and could continue to read and learn while the physical library remained closed.

Lisa Radha Weaver presents Hamilton Public Library’s 1-Millionth eBook user, Connie Vissers, with a special HPL tote bag prize on October 28, 2020 at the Terryberry Branch.

In recognition of her contributions in her 20-plus year career, and her foresight in leading HPL into new digital lending practices, Weaver has been named the recipient of the 2021 Internet Archive Hero Award. The annual award recognizes those who have exhibited leadership in making information available for digital learners all over the world. Past recipients have included Michelle Wu, Phillips Academy, the Biodiversity Heritage Library, and the Grateful Dead

Weaver has long been committed to broadening access to information. Not everyone is as lucky as she was to have an adult bring them to the library, she says. Others don’t live nearby or work hours that limit their ability to physically visit a branch. To serve the changing needs of users, she has embraced digitizing collections and innovative outreach. 

Weaver led efforts at HPL to become an early adopter of Controlled Digital Lending, as well as identify special collections to donate to the Internet Archives for digitization.  

“CDL means removing barriers to access to collections in a way that is sustainable, accessible and equitable. With one library card, users have access to THE library, not just your local branch, system, region, province, state or even country,” Weaver said. “CDL means great breadth and depth in collections access. No one library can have all the books. CDL helps all libraries work together to best support each member to find what they are looking for, when and where they are looking for it.”

“I just really believe the library should be there for everyone, where they are and when they need it.”

Lisa Radha Weaver, 2021 Internet Archive Hero Award Recipient

The timing of HPL’s embrace of CDL in the fall of 2019 was fortuitous. When the physical buildings had to close due to the pandemic in March 2020 for three months, the library was positioned to provide users with digital access to its collection through the Internet Archive.

“Our hearts were a little bit less heavy, knowing that at least that part of our collection continued to be accessible to people,” Weaver said. “We had positive feedback.”

HPL also beefed up its own virtual library collection and created a range of online programming. Weaver says it developed an online reference system so users could call, email or chat to get connected to the resources or collections, which was especially helpful to teachers and students. Staff also phoned older members of the library to just check in and some were thankful to learn about new ways to access the library online.

Weaver says her team at the library is fearless and collaborative in how they approach their work.

She credits support from her administration and green light from the library’s legal team with the success of the CDL at Hamilton. Management promotes the notion of a “freedom to fail card” to encourage risk-taking, which says she seized upon to embark on the practice. Also, the library got a legal option that it shared widely backing up the notion that it was well within the library’s right to participate. “Those two things really allowed us to step forward confidently with the Internet Archive in this project,” Weaver said.

Hamilton Public Library, Hamilton, Ontario.

Since 2019, Weaver has joined the call for wider acceptance of CDL. She has participated in several panel presentations with librarians to explain the details of CDL. She has also lobbied with others in Washington, D.C., making the case to lawmakers on Capitol Hill for policy that supports the practice. Weaver is known for her professionalism and thoughtfulness in promoting the benefit of CDL.

“The ‘c’ in CDL is controlled. One copy, one use,” Weaver said. “We already own these books. Why did we buy these books, if not, for the broader library community to access?  None of us are closing our libraries because we are running out of books, so doesn’t it make sense to share? Most people buy into that idea.”

Before joining HPL in 2018, Weaver was with the Toronto District School Board as manager of collections and extension services for 13 years. In that role, she coordinated operations with the largest library system in Canada and worked with diverse communities to expand digital access to learning materials for students. Weaver was honored by the Ontario School Library Association with the 2006 Mover and Shaker Award and the 2016 Award for Technical Service.

The motivation in all her work is simple: “I just really believe the library should be there for everyone, where they are and when they need it.”  

Olia Lialina gives good wishes with her artwork Perpetual Calendar

As part of the Internet Archive’s 25th Anniversary celebration we asked artist Olia Lialina to contemplate what the year 2046 and the future of the internet might look like through the lens of her own art practice.

Olia Lialina’s artwork Perpetual Calendar builds upon the rich digital folklore tradition to start a day on your social network by wishing each other a good one in the form of an image, often animated, and most likely glittering. With https://haveagood.today/ you can go to the future and the past, checking what day of the week were you born, or on what day of the week New Year eve 2071 is going to be. At the same time you can see it as a flipping through of her archived collection of the graphics that represent an important layer of vernacular web. In the beginning of the century the tradition to wish a good (nice, great, sexy,…) Monday (Tuesday, Humpday,…) with a self made or found graphic replaced “Welcome to My Home Page” greetings and relieved the ever growing urge for updates.

Views of the year 2046 on Perpetual Calendar by Olia Lialina
Perpetual Calendar by Olia Lialina

Olia Lialina (b. 1971, Moscow) is among the best-known participants in the net.art scene of the 1990s – an early-days, network-based art pioneer. Her early work had a great impact on recognizing the Internet as a medium for artistic expression and storytelling. This century, her continuous and close attention to Internet architecture, ‘net.language’ and Web vernacular – in both artistic and publishing projects – has made her an important voice in contemporary art and new media theory. Lialina is a co-author of Digital Folklore Reader and keeper of One Terabyte of Kilobyte Age archive (together with Dragan ESpenschied). She is an Animated GIF model and professor for Art and Design Online at Merz Akademie in Stuttgart, Germany.

DWeb Meetup September 2021 — Preserving Humanity’s Greatest Assets

The September 2021 DWeb Meetup explored the potential and reality of decentralized storage with two projects leading the way toward storing highly valuable cultural data at scale.

Watch the recording of the event and learn more about the speakers below.

The September 2021 DWeb meetup was held virtually on Tuesday, September 28, 2021 at 10am PT, optimized for American/European time zones. Wendy Hanamura welcomed attendees and kicked off the meetup. Brewster Kahle, the founder of the Internet Archive, set the stage for the discussion by emphasising the need for a more secure and decentralized web. The Meetup also broached the possibility of a DWeb camp in the Fall of 2022.

The discussion explained the differences between the IPFS and Filecoin systems, how they work together and delved into the two projects led by Arkadiy Kukarkin and Jonathan Dotan which are at the cutting edge of storing large scale data of high cultural significance in the Filecoin network. They discussed the challenges, successes, and future opportunities presented by these efforts.

Lastly, attendees welcomed Eseohe “Ese” Ojo, the new DWeb Projects Organizer and said farewell to Mai Ishikawa Sutton as she goes off to grad school in Japan. Mai will continue to stay connected with the DWeb community and can be reached on Twitter @maira. Ese can be reached at dweb@archive.org or on Twitter @EseoheOjo. The meetup wrapped up with socializing and networking in Gather.town. 

The next DWeb Meetup “DWeb Meetup Nov 2021 – Centering Respect, Trust and Equity in the DWeb” is scheduled for Thursday, November 4, 2021 at 5pm PT, optimized for Asia time zones. At this meetup, we will hear the latest in the DWeb and from our featured speaker Coraline Ada Ehmke on centering respect, trust, and equity in the DWeb. You can read Coraline’s blog post on the DWeb principle of Mutual Respect here

We’re interested in hearing from DWeb projects about the breakthroughs, challenges, and new roadmaps they might be exploring. For anyone interested in participating in lightning rounds at this meetup, let us know here.

Featured Speakers


		DWeb Meetup September 2021 — Preserving Humanity's Greatest Assets image

Image of Arkadiy Kukarkin (Twitter: @parkan)

Arkadiy Kukarkin, DWeb engineer for the Internet Archive. Arkadiy explained this nonprofit’s history with decentralization, from BitTorrent to today. He is leading a new project to explore how the Internet Archive could better decentralize its historical archives using Filecoin. He’s starting with End-of-Term data — all US government websites as they appear at the end and beginning of each Presidential Administration — starting with the 2016-2017 transition. At this talk, Arkadiy revealed his roadmap, lessons learned, and future direction.


		DWeb Meetup September 2021 — Preserving Humanity's Greatest Assets image

Image of Jonathan Dotan 

Jonathan Dotan, Founder of the Starling Lab, the first major research lab devoted to Web3 technologies. It is affiliated with Stanford and USC. Jonathan returned to the DWeb Meetup to bring us up-to-date on the USC Shoah Foundation Project, which preserves testimony of survivors of genocide on decentralized storage at huge scale. How does the process work and how do we keep these precious artifacts safe.

Visit GetDWeb.net to learn more about the decentralized web. You can also follow us on Twitter at @GetDWeb for ongoing updates.

Internet Archive Releases Refcat, the IA Scholar Index of over 1.3 Billion Scholarly Citations

As part of our ongoing efforts to archive and provide perpetual access to at-risk, open-access scholarship, we have released Refcat (“reference” + “catalog”), the citation index culled from the catalog that underpins our IA Scholar service for discovering the scholarly literature and research outputs within Internet Archive. This first release of the Refcat dataset contains over 1.3 billion citations extracted from over 60 million metadata records and over 120 million scholarly artifacts (articles, books, datasets, proceedings, code, etc) that IA Scholar has archived through web harvesting, digitization, integrations with other open knowledge services, and through partnerships and joint initiatives.

Refcat represents one of the larger citation graph datasets of scholarly literature, as well as uniquely containing a notable portion of citations from works that do not have a DOI or persistent identifier. We hope this dataset will be a valuable community resource alongside other critical knowledge graph projects, including those with which we are collaborating, such as OpenCitations and Wikicite

The Refcat dataset is released under a CC0 license and is available for download from archive.org. The related software created for the extraction and matching process, including exact and fuzzy citation matching (refcat and fuzzycat), are also released as open-source tools. For those interested in technical details about the project, a white paper is available on arxiv.org authored by IA engineers, including Martin Czygan, who led work on Refcat, and is described in our catalog user guide.

What does Refcat mean for regular users of IA Scholar? Refcat results from work to ensure the interconnection between material within IA Scholar and other resources archived in Internet Archive in order to make browsing and lookups easier and to ensure overall citation integrity and persistence. For example, there are over 25 million web links in the citations in Refcat and we were able to match ~14 million of these to archived web pages in Wayback Machine and also found that ~18% of these matched web citations are no longer available on the live web. Web links in citations not in Wayback Machine have been added to ongoing web harvests. We also matched over 20 million citations to books that are available for lending in our Open Library service and matched over 1 million citations to Wikipedia entries. 

Besides interconnection, Refcat will allow users to understand what works have cited a specific scholarly resource (i.e. “cited by” or “inbound citations”) that will help with improved discovery features. Finally, knowing the full “knowledge graph” of IA Scholar helps us better identify important scholarly material that we have not yet archived, thus improving the overall quality and extent of the collection. This, in turn, aids scholars by ensuring their open-access work is archived and accessible forever, especially for those whose publisher may not have the resources for long-term preservation, and it ensures that related outputs like research registrations or datasets are also archived, matched to the article of record, and available into the future.

The Refcat release is a milestone of Phase Two of our project, “Ensuring the Persistent Access of Long Tail Open Access Journal Literature,” first announced in 2018 and supported by funding from the Andrew W. Mellon Foundation. Current work focuses on citation integrity within the IA Scholar archive, partnerships and services, such as our role in the multi-institutional Project Jasper and our partnership with Center for Open Science, and the addition of secondary scholarly outputs to IA Scholar, including datasets, software, and other non-article/book scholarly materials. Lookout for a plethora of announcements about other IA Scholar milestones in the coming months!