Documentation for Public APIs at the Internet Archive

Internet Archive is well-known for our interactive user services.  These include the Wayback Machine, the archive.org website, and OpenLibrary.  Less well known are the programmatic, or API (Application Program Interface) tools that can allow users and computer programs to access archived information “at scale.”

Our APIs evolved over time, adapting to address specific projects and expanding as we introduced new services and capabilities into our operations.  Although not entirely uniform, these APIS were created to encourage developers to add media to archive.org as well as to consume and repurpose metadata and media.

“Items” are the organizational units of Internet Archive.  Our primary APIs interact with items to perform fundamental actions:

  • Write and read metadata to and from Items
  • Write and read media or other files to and from Items

We have recently introduced two new capabilities:

  • Report the interaction and activity that an item has experienced
  • Discover what changes have happened to Internet Archive content

Documentation and examples to use our most important APIs have now been organized at a single location.  We invite our community to review and use this documentation to make use of the information and content in the Internet Archive.

Posted in News | Leave a comment

Working to Keep Positive Copyright Provisions in Canada

We have said previously that Canada is doing a relatively good job of achieving the appropriate balance in its laws between user rights and the rights of authors and publishers.

The Internet Archive joined the Internet Archive Canada today in filing a brief to the Canadian Standing Committee on Industry, Science and Trade (INDU) under that county’s statutory review of its copyright laws.

Our message to INDU is mostly: “don’t back pedal”. We do suggest that if Canada decides to extend its copyright term by 20 years pursuant to the USMCA, that they add a balancing provision allowing libraries to make those older works available to the public.

Our brief here.

Posted in Announcements, News | 1 Comment

Stories that Move Us

Wendy Hanamura, Director of Partnerships, Internet Archive

I have always been a storyteller. It’s how I make sense of the world and share what I value most. And it’s why I have come to love December. Because during this month, when we ask our community to support us, you also take the time to tell us what the Internet Archive means to you.

Thank you!  Thank you for the thousands of messages you send us each day of our campaign. By reading them, I learned what you cherish, how you like to pass your time. I recognize among you poets and pragmatists, idealists and those deeply worried about our future.

Your stories move us to keep improving—to do more.

Here are a few that I’d like to share:

When my boyfriend died, he left behind a ticket stub to a concert that he took me to. I had no idea that he had held on to it for over 30 years. You were able to help me find a recording of that first Grateful Dead concert I ever went to. Listening to it brought back the magic of that night. Thank you.   Robyn

I used free internet resources when I was a penniless student. Now that I have a job, I want to help other penniless students.  Stephen

One of 4 million digital books available on archive.org.

I am a recently retired professor of anthropology, and I am thrilled that I have access to resources that I once only had access to through my university library.  My university ends access to both email accounts and library access upon retirement. Apparently, they assume that retirees immediately lose interest in research when they retire.  Sad. Linda

I am house-bound, reading my only enjoyment. On a fixed income, I appreciate what you provide and wish I could do more to support it.     Barbara

Without BBC radio plays I do not see how I could get through another Canadian winter. . .  Don

I love to read.  I have Chronic Lymphocyctic Leukemia, so it’s hard to go out to shop for books. THANK YOU for this opportunity to read books.  —D.G

I’m a student and I’m doing research about techno, house, clubs and rave culture.  So your site is like a gold mine for me!    Elsa

I’ve searched so many websites for the same opportunities the Internet Archive offers, but was satisfied with none. With the Internet Archive library I feel joyous, happy and calm—cause I know it’s right there. Like my preferred name, I am just a happy reader. Happy Reader

 

Website of the Western Montana Mycological Association, captured in the Wayback Machine on November 22, 2011.

Thanks for helping keep open the only webport our tiny nonprofit has been able to offer since being attacked by WordPress hackers. The information is hard to find and invaluable to educators, poison control centers, and recreationists.   —Western Montana Mycological Association

I donated because civilization devolves into tribal skulduggery when knowledge is allowed to perish. This we must not allow.    —Jamaal

You are like an old hardware store full of vintage nuts and bolts…please stick around!           Happy Surfer

The remedy for Internet Alzheimer’s… Steve

“Wonder in Aliceland” Blog, captured in the Wayback Machine on May 13, 2010.

My daughter’s blog, Wonderinaliceland.blog.com ‘disappeared’ from the web some time ago and my friend Jonathan used your site to retrieve some of her wonderful writing. She has a brain tumor and will not be with us much longer. Her writing was her main way of dealing with her illness over the last eight years.        Peter

I did because I had the option to do, not the obligation, and I love it.  —Tiochan

 

Thank you letters from the Internet Archive to our donors, mailed with vintage stamps.

When your write to us, we like to write back.  So if you find a letter with lots of beautiful stamps in your mailbox, you’ll know who it is from. Although we offer millions of free digital books and billions of Web pages throughout time, at the Internet Archive, we still appreciate a finely crafted 15 cent stamp.

And if you find our services useful, I hope you will make a donation and send us your own stories.  Thank you.

 

Posted in Announcements, News | 3 Comments

Archiving as Activism: Environmental Justice in the Trump Era

By the Environmental Data & Governance Initiative

In November 2016, the U.S. elected a new president who had sworn to roll back important environmental protections, dismantle the EPA, and who had once called climate change a “hoax.” In the context of warming global temperatures, rising tides, and oil pipeline battles, a dozen colleagues at universities and nonprofits across the country got together online, and decided to do something. We were concerned about the continued existence of federal environmental agencies—particularly in their abilities to protect the most vulnerable among us—as well as the preservation and accessibility of important environmental and climate data. More broadly, we were concerned with the collective investment in public research and agencies.

From our initial email we grew into the Environmental Data & Governance Initiative (pronounced “edgy”), today a North American-wide network that includes 175 members from more than 30 different academic institutions, 10 nonprofits, and caring and committed volunteers who come from a broad spectrum of work and life backgrounds. Our work has included crowd-sourced archiving federal environmental datasets, monitoring and reporting on changes to federal environmental agency websites, and interviewing employees at EPA and OSHA. Major news outlets have reported on us, from the Washington Post to CNN to the New York Times, and we have contributed to and helped shape an ongoing, national discussion on the value of federal environmental protections, and the need for accessible and accountable data infrastructures and publicly-engaged forms of data stewardship.

DataRescue event in San Francisco in February 2017. Photo by Jamie Lyons.

So much of what we have been able to accomplish over the past two years is enabled by the Internet Archive, and in particular the Wayback Machine. For example, our first event in December 2016 sought to archive EPA websites, prior to Trump’s inauguration, by nominating key pages and datasets for inclusion in the Wayback Machine. This project grew over the subsequent 5 months, as over 49 DataRescue events were held across the country, and over 63,000 web pages from environmental agencies like EPA, NOAA, NASA, and OSHA were nominated to the archive. The DataRescue project ended in June 2017, but not before raising important questions about the politics of data accessibility and stewardship.

Through DataRescue we began partnering with the Internet Archive, which has become essential in another EDGI project: tracking ongoing changes at federal agency websites. Initially using a fee-based software program, Versionista, to crawl government web pages (currently crawling 42,000 URLs), we have been able to locate and report on the removal or alteration of web content on climate, non-renewable energy sources, and important environmental treaties. This kind of work increasingly relies on the Wayback Machine, and our reports systematically include references and screenshots from it. In our commitment to building participatory and responsive civic technologies and data infrastructure (partly inspired by the Internet Archive), we also developed our own web monitoring software, called Scanner, that is free and open-source, and which we plan to turn into a public platform. We are partnering with the Internet Archive to develop its functionality.

Example of screenshot comparisons (using Versionista) on the EPA website, where references to “climate change” have been deleted.

Let us end with a few words about why this work, and our partnership with the Internet Archive, is so important.

Our current federal records laws are outdated—they do not require online publication or webpage preservation, even as online research and access today is the norm (and the expectation).

Many of us who work with vulnerable communities on environmental justice issues have seen how access to online state environmental data is essential for social groups seeking to learn about and document environmental harms in their community. Data access is a justice issue.

Beyond mere access, we need creative, participatory, community-based, transparent, accountable, and justice-oriented data infrastructures, and new communities of data practice and care. We need these not only to enable government and industry accountability, but to help usher in a better, more just world. The Internet Archive’s commitment to participatory archiving, archiving vulnerable content, and free access, has both inspired and enabled EDGI’s work, and we are glad to partner with the Internet Archive to continue building this important data ecology and community of practice.

–Lindsey Dillon & EDGI

Lindsey Dillon is an Assistant Professor of Sociology at the
University of California, Santa Cruz. She is one of the founding members of EDGI.

Posted in Announcements, News | 3 Comments

DJ Spooky’s QUANTOPIA: THE EVOLUTION OF THE INTERNET

This gallery contains 6 photos.

We live in a world that is full of algorithms. We have an unconscious relationship to code and numbers. Even creativity is now quantified by data. — DJ Spooky The Internet Archive is pleased to announce that tickets are now on sale for … Continue reading

More Galleries | 4 Comments

Join us for A Grand Re-Opening of the Public Domain

 

Screen shot from Cecil B. DeMille’s 1923 silent classic, “The Ten Commandments.” On January 1, 2019, this film and tens of thousands of other works will enter the public domain.

It’s time to celebrate!  For the first time in decades, new creative works such as Cecil B. DeMille’s 1923 silent film, “The Ten Commandments,” Kahlil Gibran’s classic “The Prophet,” and Virginia Woolf’s third novel, “Jacob’s Room,” will enter the public domain on the first day of 2019. Please join us for a Grand Re-opening of the Public Domain, featuring a keynote address by Creative Commons’ founder, Lawrence Lessig, on January 25, 2019.  Co-hosted by the Internet Archive and Creative Commons, this celebration will feature legal thought leaders, lightning talks, demos, and the chance to play with these new public domain works. The event will take place at the Internet Archive in San Francisco, and is free and open to the public.

RSVP now before the tickets run out

Kahlil Gibran’s “The Prophet” will enter the public domain on January 1st!

The public domain is our shared cultural heritage, a near limitless trove of creativity that’s been reused, remixed, and reimagined over centuries to create new works of art and science. The public domain forms the building blocks of culture because these works are not restricted by copyright law. Generally, works come into the public domain when their copyright term expires. But U.S. copyright law has greatly expanded over time, so that now many works don’t enter the public domain for a hundred years or more. Ever since the 1998 Copyright Term Extension Act, no new works have entered the public domain (well, none due to copyright expiration). But for the first time this January, tens of thousands of books, films, visual art, sheet music, and plays published in 1923 will be free of intellectual property restrictions, and anyone can use them for any purpose at all.

The cartoons featuring Felix the Cat, 1923, is among the tens of thousands of works that will be full accessible starting 2019.

Join the creative, legal, library, and advocacy communities plus an amazing lineup of people who will highlight the significance of this new class of public domain works. Presenters include Larry Lessig, political activist and Harvard Law professor; Corynne McSherry, legal director of the Electronic Frontier Foundation; Cory Doctorow, science fiction author and co-editor of Boing Boing; Pam Samuelson, copyright scholar; and Jamie Boyle, the man who literally wrote the book on the public domain, and many others.

Continue the celebration at the world premiere of DJ Spooky’s “Quantopia” at the Yerba Buena Center in SF on January 25.

In the evening, the celebration continues as we transition to Yerba Buena Center for the Arts for the world premiere of Paul D. Miller, aka DJ Spooky’s Quantopia: The Evolution of the Internet, a live concert synthesizing data and art, both original and public domain materials, in tribute to the depth and high stakes of free speech and creative expression involved in our daily use of media. Attendees of our Grand Re-Opening of the Public Domain event will receive an Internet Archive code for a 20% discount for tickets to Quantopia.

If you’d like to  support the work we do at the Internet Archive, including making these 1923 works available to you for free on January 1,

please donate here.

Posted in Announcements, Event, News, Upcoming Event | 1 Comment

Big Tech Comedy Roast at the Internet Archive

Politics got you down? Feeling like you need some laughter in your life? Have we got the event for you!

Come join us at the Internet Archive for a Big Tech Comedy Roast presented by former Amazon executive and entrepreneur Vahid Razavi, author of “Ethics in Tech, or The Lack Thereof and Age of Nepotism“.

The evening is hosted by Will Durst, and also brings us the comedic talents of Francesca Fiorentini, Chloe McGovern, Nichole Spain and Jared Horning.

The New York Times on Will Durst: “Quite possibly the best political satirist working in the country today.” The Boston Globe: “A modern day Will Rogers.”

Get Tickets Here $12.00

Friday, December 14, 2018
6:00 pm Doors Open
7:00 pm Program
Internet Archive
300 Funston Avenue
San Francisco, CA 94118

 

Posted in Event, Upcoming Event | Leave a comment

Now you can donate your favorite altcoin to the Internet Archive

Got Clams? Maybe some extra XRP lying around? Is your Litecoin portfolio flush and you’d like to share the love? Now you can! Thanks to Changelly, the Internet Archive is able to accept donations in a whole new variety of altcoins. Our crypto-donations page recently got a fresh, new look, and now with the Changelly button, we can accept more than 100 forms of cryptocurrency.

How It Works
If you’d like to support us in Dogecoin, or Dash or one of the many other altcoins supported, simply click the Changelly ‘Pay with altcoins’ button, choose the currency you’d like to donate, and Changelly magically converts it to the equivalent value in Bitcoin sending it to the Internet Archive’s public Bitcoin address.

We still happily accept donations in Bitcoin, Bitcoin Cash, Ethereum and Zcash via the Internet Archive’s public addresses on the cryptocurrency contributions page.

Why we care about crypto
The Internet Archive has been a long-time participant in cryptocurrencies — we have been accepting Bitcoin donations since 2011, and our staff receives year-end bonuses and some salary in BTC. It’s been amazing to see crypto donations grow enormously year after year…Many thanks to the Bitcoin, Ethereum, and Zcash communities. We hope the Changelly button will help bring to light and further support the various tokens in the ecosystem.

Posted in News | 22 Comments

Open Libraries Forum: recap & reflection

Open Libraries Forum, October 18, 2018

Open Libraries Forum, October 18, 2018

On October 18 more than 50 library leaders from across the U.S. and Canada joined us at our headquarters in San Francisco for our Open Libraries Forum.  We’ve been holding an annual forum for more than ten years, using the opportunity to bring together thought leaders from across library communities and cultural heritage organizations to envision a digital future for our collections and services.  

This year we focused specifically on advancing the Open Libraries program, and its vision of digitizing four million modern books.  Discussions focused on the legal and operational issues related to delivering digital books and services for the print disabled community; integrating our controlled digital lending service with emerging ebook platforms and consortia; and building a digital library that is widely used, frequently cited, and representative of the diverse voices in our communities.  To learn more about the discussions, we’ve assembled the breakout reports given at the end of the day in a publicly available Google Doc

Takeaways

As the Director of Open Libraries, I learned a tremendous amount from the participants about how Open Libraries can be used in their institutions, and the needs they have in communicating the value of Open Libraries to their stakeholders, peers, and patrons.  Given time to reflect on the day in full, I’ve summarized four main takeaways:

Get the word out

  • We need to continue promoting Open Libraries through education, marketing, and investing in communities.
  • We should provide FAQs, tools, and training to help libraries and users get the most out of Open Libraries.

Invest in our partnerships

  • We need to be at the meetings where likely adopters gather, and collaborate with partners we need for service integration.
  • We need to be very clear about what services and features are on offer.
  • We should use steering groups and existing networks as two-way methods for communication.
  • We should use inclusion as a lens on all aspects of Open Libraries by investing in and partnering with underrepresented communities.

Learn from our users

  • We need to understand who our users are & their research/reading needs (collections as well as tools).
  • We need fewer switches between platforms and services to provide a coherent experience for our patrons.
  • We need analytics of what’s being used in an environment that respects reader privacy.

Enhance our tools

  • We need tools for communities and institutions to customize ways of viewing content in Open Libraries, as well as to sort and find content by format, subject, theme, and institution.
  • We need tools that encourage libraries to participate by solving existing problems—to improve operations (such as managing waitlists), to collaborate in collection development, and to enable community curation.
  • We need greater visibility of what APIs are available and how to consume machine-readable data from Open Libraries.

Open Libraries & you

If you are a librarian who is interested in learning how Open Libraries can benefit your patrons, please visit: http://openlibraries.online

If you are a reader or researcher who is looking for free access to digital books, please view our Open Libraries collections at: https://archive.org/details/inlibrary

For additional information or questions, please reach out to me via email.

Posted in News | Comments Off on Open Libraries Forum: recap & reflection

Wasted: A case study for controlled digital lending


The recent nomination and appointment of Brett Kavanaugh to the U.S. Supreme Court offered a timely opportunity to demonstrate how controlled digital lending can be used by libraries to circulate digital copies of books that are out of print or not widely held.  The basic premise of
controlled digital lending is “own one, loan one”—rather than loaning a physical book in their collection, libraries can choose instead to loan a scanned version of that book to one user at a time, while the physical book remains on the shelf.

A key player during the confirmation hearing was Mark Judge, a friend of Kavanaugh’s who wrote the book Wasted: Tales of a GenX Drunk, describing his raucous, alcohol-fueled high school years. Judge’s memoir was published in 1997 by Hazelden Publishing, the publishing arm of the Hazelden Betty Ford Foundation, which runs the recovery centers where Judge was treated for addiction.  The book had a limited print run and subsequent shelf life—it was not widely held by libraries outside of those focusing on addiction and recovery. Interest skyrocketed once Judge’s book entered the public consciousness, but because the book was no longer being sold by the publisher and used copies were scarce, when available at all, its price on Amazon.com topped out just under $2,000.

Boston Public Library (BPL), a long-time scanning partner with Internet Archive, located a copy of Wasted in their research stacks. Those books are only available for use within the library, so the book was never going to circulate. Tom Blake, Manager of Content Discovery at BPL, sent the book down to be scanned by Internet Archive book scanners in their in-house digitization center.  Internet Archive staff digitized the book using the same procedures and equipment that have been used to digitize more than 55,000 books from BPL’s collection since the partnership with Internet Archive began in 2007. Using existing workflows and post-production processes, the physical book was scanned and turned into a digital book complete with page images, OCR text, and mobile-friendly formats before being placed online at https://archive.org/details/wastedtalesofgen00judg.

Wasted is currently still under copyright but out-of-print. BPL reached out to the publisher, Hazelden, to ask whether Wasted could be made available digitally without restrictions. In the years after the book was published, however, copyright had reverted from the publisher back to Judge, so Hazelden was unable to grant permission. Because BPL wanted to fulfill its mission in providing access to a book of cultural and political significance, BPL put the non-circulating copy it owns into Internet Archive’s controlled digital lending service, where it can be digitally loaned to users one-at-a-time. Books in controlled digital lending follow the same circulation patterns as those in a traditional library; a user has access to the book for 14 days, and if a book is checked out users can join a waitlist for their turn to read it.

As BPL’s copy of the book was being digitized and published online, two electronic copies of Wasted were uploaded to the Internet Archive’s Community Texts collection, which does not have the same access restrictions as controlled digital lending. Instead, user uploads are governed by the Internet Archive’s Copyright Policy.  One copy was noticed by Twitter users and its URL was widely circulated online, drawing considerable interest from the public and media.  At this point, there were three copies of Wasted available: two uploaded by users into Community Texts with no access restrictions, and one scanned by BPL and available to one user at a time. During the few days that all three copies were online, several news outlets began noticing the different modes of access.  Slate offered perhaps the best analysis of making Wasted available via controlled digital lending, reaching out to academics and legal scholars about the legality of the move, with the general agreement that the Archive’s actions were very likely to be legal.

Mark Judge eventually e-mailed Internet Archive and requested that the copies of Wasted be taken down. Internet Archive takes prompt action on takedown requests. In this case, given that the book was out-of-print, we made an appeal to him to allow the book to be made available without restriction. Judge denied our request, as he is working on plans for the book to be republished. Ultimately, we came to agreement that only the two openly downloadable copies in Community Texts would be removed; the copy made available from BPL through controlled digital lending would remain online.

Before the two unrestricted copies were taken down, they were viewed more than 27,000 times.  Compare that with the 28 borrows to date that have occurred with the BPL copy, and the wait list that numbers more than 400, and you’ll quickly come to realize that for controlled digital lending to work at scale, more physical copies are needed to loan against, especially for titles like this that enter the public zeitgeist and become part of a major news story.

And that’s where other libraries come in. The Internet Archive’s Open Libraries project is bringing together libraries that are committed to making their collections available via controlled digital lending and pooling their physical collections in order to make more lendable copies of digital books available to their users and the world.   

The takeaway here is that controlled digital lending is a viable, but limited, way for libraries to provide digital access to the physical copies on their shelves.  This case study demonstrates the ways in which controlled digital lending works, and its limitations of scale for titles with wide appeal. A significant way of addressing those limitations is for more libraries to join Open Libraries and lend digitized versions of their print collections, making more copies of books available for loan and getting more books into the hands of researchers and readers all over the world.

Posted in Announcements, News | 14 Comments