Save the Date – Public Domain Day 2020

The Internet Archive is excited to announce that we will be celebrating Public Domain Day on January 30, 2020 in our nation’s capitol. We have an exciting line-up of local musicians and artists, as well as projects from around the country – Join Us! The event is free and open to the public.

Register now!

Posted in Event, News, Upcoming Event | Leave a comment

Top 10 Reasons to Support the Internet Archive

DONATE NOW TO SUPPORT THIS NON-PROFIT LIBRARY FOR ALL

Today, the Internet Archive launches is End of Year Fundraising Campaign. We’re lucky to have a 2-to-1 Matching Grant for the next few weeks, so your impact will be tripled if you give today. Of the 1.1 million people who use the Internet Archive each day, only a tiny percentage donate. Why give? Here are ten great reasons to support the Internet’s non-profit library for all:

#1. The Wayback Machine has fixed 11 million broken links in Wikipedia, making the web more reliable.

#2. We’re home to the live recordings of the Grateful Dead. (And 7800 other bands!)

#3. The Internet Archive is working with the people of Bali to keep their culture and language alive by scanning and transcribing the world’s largest online collection of Balinese Palm Leaf manuscripts.

#4. In our Music Collection, you can now read the liner notes for John Coltrane’s album with Johnny Hartman and many LPs and CDs.

#5. Readers! You’re borrowing half a million books each month with complete reader privacy. That’s among the 3.8 million online books we have to choose from.

#6. Who knows what evil lurks in the hearts of men? Only the Shadow knows…along with listeners of our Old Time Radio collection.

#7. Super Munchers and 2500 other MS-DOS games you can play in your browser.

#8. What was the sound of America from 1898 to 1960? Just listen to our 150,000 78rpm recordings. Including 4172 polkas to dance to!

#9. The superpowers of Free-range archivist, Jason Scott.

#10. Information you can trust has never been more important. Our mission is to preserve the best knowledge of humankind and share it with everyone.

If you listen to music and radio, read books, play vintage video games, or reference past web sites at the Internet Archive, we ask that you chip in and help keep us going strong in 2020! Don’t forget, we have a 2-to-1 matching grant, so if you donate $5 it becomes $15 for the Internet Archive today.

DONATE NOW

Posted in Announcements, News | 26 Comments

Archiving Online Local News with the News Measures Research Project

Over the past two years Archive-It, Internet Archive’s web archiving service, has partnered with researchers at the Hubbard School of Journalism and Mass Communication at University of Minnesota and the Dewitt Wallace Center for Media and Democracy at Duke University in a project designed to evaluate the health of local media ecosystems as part of the News Measures Research Project, funded by the Democracy Fund. The project is led by Phil Napoli at Duke University and Matthew Weber at University of Minnesota. Project staff worked with Archive-It to crawl and archive the homepages of 663 local news websites representing 100 communities across the United States. Seven crawls were run on single days from July through September and captured over 2.2TB of unique data and 16 million URLs. Initial findings from the research detail how local communities cover core topics such as emergencies, politics and transportation. Additional findings look at the volume of local news produced by different media outlets, and show the importance of local newspapers in providing communities with relevant content. 

The goal of the News Measures Research Project is to examine the health of local community news by analyzing the amount and type of local news coverage in a sample of community. In order to generate a random and unbiased sample of communities, the team used US Census data. Prior research suggested that average income in a community is correlated with the amount of local news coverage; thus the team decided to focus on three different income brackets (high, medium and low) using the Census data to break up the communities into categories. Rural areas and major cities were eliminated from the sample in order to reduce the number of outliers; this left a list of 1,559 communities ranging in population from 20,000 to 300,000 and in average household income from $21,000 to $215,000. Next, a random sample of 100 communities was selected, and a rigorous search process was applied to build a list of 663 news outlets that cover local news in those communities (based on Web searches and established directories such as Cision).

The News Measures Research Project web captures provide a unique snapshot of local news in the United States. The work is focused on analyzing the nature of local news coverage at a local level, while also examining the broader nature of local community news. At the local level, the 100 community sample provides a way to look at the nature of local news coverage. Next, a team of coders analyzed content on the archived web pages to assess what is being covered by a given news outlet. Often, the websites that serve a local community are simply aggregating content from other outlets, rather than providing unique content. The research team was most interested in understanding the degree to which local news outlets are actually reporting on topics that are pertinent to a given community (e.g. local politics). At the global level, the team looked at interaction between community news websites (e.g. sharing of content) as well as automated measures of the amount of coverage.

The primary data for the researchers was the archived local community news data, but in addition, the team worked with census data to aggregate other measures such as circulation data for newspapers. These data allowed the team to examine the amount and type of local news changes depending on the characteristics of the community. Because the team was using multiple datasets, the Web data is just one part of the puzzle. The WAT data format proved particularly useful for the team in this regard. Using the WAT file format allowed the team to avoid digging deeply into the data – rather, the WAT data allowed the team to examine high level structure without needing to examine the content of each and every WARC record. Down the road, the WARC data allows for a deeper dive,  but the lighter metadata format of the WAT files has enabled early analysis.

Stay tuned for more updates as research utilizing this data continues! The websites selected will continue to be archived and much of the data are publicly available.

Posted in Archive-It, Web & Data Services | Tagged , , , | Comments Off on Archiving Online Local News with the News Measures Research Project

7th Annual Aaron Swartz Day at the Internet Archive

The Internet Archive is hosting a FOIAPOLOOZA to celebrate Aaron Swartz and to provide a yearly showcase of his many interests. Aaron’s work focused on civic awareness and activism and we will spend Saturday together keeping his prescient vision alive.

Doors are open for the hackathon and the daytime programming on Saturday at 10 am. The reception will be on Saturday evening at 6:00 pm with the main program starting at 8 pm with a music and dance party afterward.

FOIAPALOOZA:  Aaron filed many FOIA requests and inspired lots of journalists, including the now-legendary Jason Leopold, to use them as a tool for evidence-based journalism. So we decided to focus on FOIA and public records requests at this year’s San Francisco event. We aim to not only teach folks how to file their own requests but also to let them dig into the information we have received back from the 200+ requests we filed this last year with our Police Surveillance Project. FOIAPALOOZA speakers include Tracy Rosenberg & Mike Katz-McCabe from Oakland Privacy — an organization that just won an EFF Pioneer Award! — as well as Freddy Martinez from Lucy Parsons Labs, Ryan Shapiro from Property of the People and Brewster Kahle and Tracey Jaquith from the Internet Archive.

Get Tickets Here

Saturday, November 9, 2019
10:00 am Doors Open for Hackathon and Daytime Programming
11:00 am Programming starts
6:00 pm Reception
8:00 pm Evening Program

Internet Archive
300 Funston Avenue
San Francisco, CA 94118

Posted in Event, Past Event | 7 Comments

For the Love of Literacy–Better World Books and the Internet Archive Unite to Preserve Millions of Books

Better World Books

Announced today, Better World Books, the world’s leading socially conscious online bookseller, is now owned by Better World Libraries, a mission-aligned, not-for-profit organization that is affiliated with longtime partner, the Internet Archive.  This groundbreaking partnership will allow both organizations to pursue their collective mission of making knowledge universally accessible to readers everywhere. This new relationship will provide additional resources and newfound synergies backed by a shared enthusiasm for advancing global literacy. Together, the two organizations are expanding the digital frontier of book preservation to ensure books are accessible to all for generations to come.

This new relationship will allow Better World Books to provide a steady stream of books to be digitized by the Internet Archive, thereby growing its digital holdings to millions of books. Libraries that work alongside Better World Books will now make a bigger impact than ever. Any book that does not yet exist in digital form will go into a pipeline for future digitization, preservation and access.  According to Brewster Kahle, Founder and Digital Librarian at the Internet Archive, “The Better World Books origin story is inspiring, and the service they provide to libraries is invaluable. These are our kind of people. We share their values, and we are proud to partner with Better World Books and libraries around the world to promote the goal of universal access to all knowledge.” 

Better World Books was born in 2003, when a group of recent college graduates sold their used textbooks online. Their success eventually led to the creation of a revolutionary new business model where used books are collected primarily from libraries, booksellers, colleges, and universities in six countries and then are either resold online, donated or recycled. To date, Better World Books has donated almost 27 million books worldwide, has raised close to $29 million for libraries and literacy, and has saved more than 326 million books from landfills. With the backing of the not-for-profit Better World Libraries, Better World Books will enhance these valuable services to libraries and readers. 

According to Jim Michalko, former president of The Research Libraries Group, Inc., and a Better World Books board member, “This new relationship is a win for the library community. One of the biggest challenges facing libraries today is responsibly removing materials from their shelves so they can bring in more desirable materials or repurpose space to fit community needs. Better World Books has always been a trusted partner in this activity. Now, libraries can provide books to Better World Books knowing that a digital copy will be created and preserved if one doesn’t yet exist. That’s responsible collection management.” 

The Internet Archive has long been committed to digitizing books and library materials so they can be accessed by users all over the world. Through digitization, these materials can be used by researchers in large-scale, data-driven computing investigations, preserved in both digital and physical form, and where appropriate, loaned to readers. 

Dustin Holland, the newly appointed President and CEO of Better World Books, underscores, “We exist to make a difference in the world, and our customers make that possible. We are honored to join the Internet Archive family, and our partnership allows us to extract the maximum value out of every book we collect at scale, while continuing to delight readers all over the world.”

Better World Books remains the bookstore that customers know and love, with its operations now enhanced for the services it offers to libraries and readers alike. Shop with confidence and help us celebrate this exciting new partnership with a 10% discount on purchases made through the end of the year using discount code “BWBIA”. At checkout, you will have an opportunity to round up your purchase amount to benefit the Internet Archive and directly support our mission.

For more about the Internet Archive click here. For more about Better World Books click here.

Posted in News | 18 Comments

Controlled Digital Lending Takes Center Stage at Library Leaders Forum

The following blog post was written by freelance writer Caralee Adams about the Internet Archive’s Library Leaders Forum, held on October 23 at San Francisco Public Library.

As enthusiasm grows for making library collections more accessible, the Internet Archive hosted an event to build a community of practice around Controlled Digital Lending (CDL). A diverse group gathered for the 2019 Library Leaders Forum Oct. 23 to share stories and strategies for libraries to expand their reach by lending out digital books based on their physical collections.

Why is this important?

Chris Freeland, Director of Open Libraries

“At the Internet Archive, we have a strong belief that everyone deserves to learn. We want to offer up the greatest digital library the Internet has ever seen to the world for free,” said Chris Freeland, Director of the Internet Archives’ Open Libraries program. “We think that everyone, regardless of where they live, should have ready access to a great library. More importantly, we think it should be available on phones and mobile devices that people turn to today. We want to make sure they have access to vetted, trusted information that’s held in libraries.”

The mantra of CDL: “Own one, loan one.” The idea is that a library can make a choice of lending either a physical copy or a digital version of a book.

The Internet Archive has been doing CDL since 2011, beginning with the Boston Public Library. Now two dozen other libraries of all sizes in the U.S. and Canada have embraced the model. Librarians from some of those institutions spoke about their passion for the practice at the forum.

The meeting provided an overview of the legal issues, policy considerations, and examples of CDL in action. The appeal to library leaders gathered was to endorse CDL, join Open Libraries, donate books to the Internet Archive for scanning, and volunteer to help with a new serials project.

Helping libraries see what’s possible

Library Leaders Forum attendees at San Francisco Public Library

Michael Lambert, City Librarian at the San Francisco Public Library, which hosted the event, shared his institution’s experience as an early partner with the Internet Archive on Open Libraries and CDL. Beginning with city government documents and historical materials, SFPL created an entire scanning department. To date, the library has digitized 13,000 books and documents with the Internet Archive, which have received over 7.5 million views. Since November 2018, SFPL has donated 30,000 copyrighted books to the Internet Archive as part of its community distribution program.

“Having this alternative virtual lending site as an option has been great,” Lambert said. ”Librarians have been able to confidently weed excess, outdated materials from our collection, secure in knowledge that the books will not disappear, but rather have a new life where people around the world can read and research the materials that SFPL has meticulously collected over the decades.”

The Internet Archive embodies library values: persistence, comprehensiveness and accessibility, said Lambert. “The Archive has become a crucial part of the broad library information eco-system,” he said. “They have provided examples that have challenged traditional libraries. The Internet Archive helps other libraries see what’s possible.”

Brewster Kahle, Internet Archive, and Dale Askey, University of Alberta

What Internet Archive Founder Brewster Kahle hopes is possible is digitization will allow more online sources to be linked to books, providing people trust information.

“If Wikipedia is the encyclopedia of the Internet, we are trying to build the library of the Internet,” Kahle explained at the forum. “Let’s make it really easy for people to go deeper.”

So far, the Internet Archive has turned 122,000 references on Wikipedia to digitized book links through its online library. Still, a century of books is missing after 1923 because of copyright laws. Kahle called on libraries to help fill that gap.

As part of that strategy, the Internet Archive is trying to institutionalize CDL, a practice that has been successfully working in a handful of libraries for eight years with no negative pushback. Yet, it has not been widely embraced. Kahle appealed to libraries to endorse CDL and donate books for scanning to address the larger goal of universal access to knowledge.

Framing the approach

The forum hosted experts to explain the legal underpinnings of CDL and discuss how the concept fits into the overall push to level the playing field for access to information.

Lila Bailey of the Internet Archive moderated a conversation with Kyle Courtney, Copyright Advisor at Harvard University, David Hansen, Associate University Librarian at Duke University, and Michelle Wu, Associate Dean for Library Services and Professor of Law at the Georgetown Law Library in Washington, D.C.

They have written a paper spelling out how libraries can practice CDL within the confines the fair use doctrine in current copyright law. Copyright law established in 1976 and dating back to 1950 does not reflect the digital reality today and it should allow flexibility for libraries to lend out one book at a time – no matter what the format – digital or print, they maintain.

John Bergmayer, Public Knowledge, talks with Lila Bailey, Internet Archive, and Mike Buschman, Washington State Library

To garner broad support for the concept of CDL, John Bergmayer of the nonprofit, Public Knowledge, spoke about the need to build relationships with lawmakers and educate them on the issue. This summer, he led a group engaged in CDL to The Hill in Washington, D.C. to brief members of Congress and their aides on the importance of expanding access to library materials through CDL.

“You have to make a project matter to the politicians,” explained Bergmayer. In the case of CDL, it’s about outlining the benefits of providing access to rural patrons, protecting materials from damage from disasters, saving libraries money, and helping K-12 school libraries, among others. “You want to get people to do the right thing for their reasons, not your reason — and show how your issue affects voters.”

Heather Joseph, Executive Director of the Scholarly Publishing & Academic Resources Coalition (SPARC), said CDL fits into the larger open agenda that advocates for unrestricted access to research. “It’s a vision based on opportunity,” said Joseph. “An old tradition and a new technology have converged to make possible an unprecedented public good.”

Now more than ever, in an era of “fake news,” and “alternative facts,” free, immediate access to high-quality vetted, source material is crucial for scholars, scientists, students, journalists, policymakers – everyone, she said.

“CDL is a pragmatic, incremental step towards open that operates in a way that’s respectful of libraries current operations and of copyright. It moves the needle towards open,” said Joseph. “CDL can contribute to collective movement towards a full vision of open access to knowledge.”

Opening Doors for Students

Lisa Petrides, Founder and CEO of ISKME

Making digital books more widely available to students has the potential for remedying inequities in education.  Nationwide, public school districts have lost 20 percent of their libraries and librarians in recent years. Lisa Petrides, founder of the non-profit Institute for the Study of Knowledge Management in Education, has embraced CDL as a model to build a Universal School Library (USL) and connect students – particularly from under-resources schools — to relevant materials that increasingly are digital.

“CDL holds the potential to broaden access to knowledge in public schools in a way that schools haven’t even begun to tap,” said Petrides, who is trying to curate an inclusive collection of 15,000 high-quality digitized books. “We are taking an equity lens in terms of diversity.”

Karen Lemmons, Detroit School of Arts

The Detroit School of Arts will be piloting USL and Librarian Karen Lemmons said she was excited to be able to offer her high school students books they can access while they are on the go. “This might give them an opportunity to read in between practices. They can pull out their phone and read a few pages. It’s mobile and flexible,” said Lemmon, noting that reading is closely linked to student achievement. “Our students really do want to be the best.”

Lemmons said she wants to be a model for other urban schools. “We want to be a driving force to get other libraries involved,” said Lemmons. “This is a data-driven district and we will need data to show reading more makes a difference in student performance.”

When the prestigious Phillips Academy in Andover, Massachusetts, recently was doing a $20 million renovation to its library, the Internet Archive approached it about digitizing their collection. The library already had its books packed on pallets, but instead of storing them decided to have them all scanned, explained Michael Barker, Director of Academy Research, Information and Library Services.

“We had this very well-intentioned idea to create a space for learners of the 21st century. It’s all good. It is a space of immense privilege. But it takes a vision to think well beyond our campus to say that belongs to every learner. That opportunity is to digitize the entire collection – that’s why we are all in,” said Barker of the school’s decision to participate in CDL “It goes to the heart of what Phillips was founded on. This school is for youth from every quarter and we try to live out that ideal as a private school for a public purpose.”

Next, Barker said he would like to see peer prep schools join the CDL model to further expand access to schools without the same resources.

CDL in Action

As the first library to use the CDL approach, the Boston Public Library recently extended its offerings by scanning its historic Alice Jordan Collection of 250,000 children’s books that were in storage. It has also digitized city directories, cookbooks and other fact-based documents in its catalog. Recently, it got permission from Boston-based publisher Houghton Mifflin to digitize its entire trade collection that is housed at BPL.

Expanding its CDL involvement, BPL’s Tom Blake challenged participants to bring another partner library next year to the forum.

“This the first time, I feel like it’s less about digitization and scanning and more about us, as librarians, leveraging not just our collections, but our historical collection policies with each other,” said Blake, who has been attending the library leaders forum for 10 years.

Michael Kostukovsky discusses Controlled Digital Lending

In discussing how to improve the CDL process, meeting participants suggested adjusting the amount of time users checked out titles and allowing for short-term loans. Perhaps smarter return and wait-list notifications could be developed to encourage faster processing of books. Others said re-branding Digital Rights Management (DRM) software with a different moniker to that would be more appealing to librarians.

In Sonoma County, California, Geoffrey Skinner said its 14 public library branches have just starting to participate in CDL. It first scanned documents in the history and genealogy library, then digitized its specialized wine library. 

“We are doing a massive weed of our closed stacks. By taking those material to the Internet Archive, we will have digital access back,” said Skinner. Having library materials online will benefit many of the county’s rural users who otherwise travel far to access the physical books and provide access for print-disabled patrons.

Justin Gardner, Special Collections Librarian at the American Printing House for the Blind in Louisville, Kentucky, said digitizing 9,000 books in its collection has preserved rare and fragile documents, including books autographed by Helen Keller. Also, being located in Kentucky, it gives people interested in their materials from anywhere.

“We are becoming the go-to place for visual impairment materials,” said Gardner. Now these research documents are in an accessible form for people who have visual impairments and have never been able to read these materials before they were digitized.

Moving forward

Mike Buschman, Washington State Library

At the forum, Mike Buschman of the Washington State Library announced that the Chief Officers of State Library Agencies (COSLA) voted to endorse CDL. “It feels like it’s entering a new, good phase – a traction phase,” he said.

Kahle emphasized the need for CDL to be a community project and build a deeper collection. “We have to brave up,” he said.  “We just act in good faith. We aren’t pirates. We are trying to do the right thing.”

Chief Librarian and CEO at the Hamilton Public Library in Canada Paul Takala said his institution is an enthusiastic supporter of CDL. With a long history of innovation, moving forward with digitizing is the right move – despite the technical challenges – to make information more accessible to patrons, he said.

“Deeper collaboration is needed. It’s hard to get adequate resources,” said Takala. “As a library community, we are generally risk adverse. When we talk about CDL, I think we need to take a more balanced view….If we make what’s available in our community to other communities – and others make their collections available – then everyone wins.”

Dale Askey, Vice Provost at the University of Alberta, said he liked Takala’s challenge to pull more Canadian institutions past their risk aversion to embrace CDL. “It’s great to see people aligning behind these principles and taking this to scale,” said Askey, whose university has scanned an historic collection of education materials with zero negative impact. “There is a strong history and impulse at the university to do things with maximum benefit to the largest possible community.”

Princeton Theological Seminary is piloting CDL and it has created a secure area in its library for the physical collection, so that when a digital copy is checked out that the physical copy will reside there. Participating the program has great potential benefits for the seminary’s reach, according to Managing Director of the Library Evelyn Frangakis.

“The PTS comprehensive theological collection is in high demand and the CDL library allows increased accessibility to all users, including those with various print disabilities,” said Frangakis.  “I think CLD is gaining momentum. That’s really heartening for broad access to the materials that we are able to contribute to this program. It’s going to continue to grow.”

Ross Mounce, Director of Open Access Programmes at Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin in London, said he was encouraged by participation in the forum and said action points were clear and institutions can choose their level of engagement.

 “It’s nice seeing things moving forward. At the end of the day, it just makes sense,” said Mounce of CDL. “If you own a physical copy of a book, you should be able to loan a digital version of it. Libraries should be able to lend books.”

Added Wu of Georgetown: “I’m delighted there has been a lot more buy in in recent years. The voices and the participants are much more diverse. Libraries [like Phillips] are willing to go all in and that’s remarkable. It is true that if we get more of those, I think we will see a true movement across the nation.”

Posted in Books Archive, Education Archive, Event, Lending Books, News | Tagged , , | 8 Comments

The Whole Earth Web Archive

As part of the many releases and announcements for our October Annual Event, we created The Whole Earth Web Archive. The Whole Earth Web Archive (WEWA) is a proof-of-concept to explore ways to improve access to the archived websites of underrepresented nations around the world. Starting with a sample set of 50 small nations we extracted their archived web content from the Internet Archive’s web archive, built special search and access features on top of this subcollection, and created a dedicated discovery portal for searching and browsing. Further work will focus on improving IA’s harvesting of the national webs of these and other underrepresented countries as well as expanding collaborations with libraries and heritage organizations within these countries, and via international organizations, to contribute technical capacity to local experts who can identify websites of value that document the lives and activities of their citizens.

whole earth web archive screenshot

Archived materials from the web play an increasingly necessary role in representation, evidence, historical documentation, and accountability. However, the web’s scale is vast, it changes and disappears quickly, and it requires significant infrastructure and expertise to collect and make permanently accessible. Thus, the community of National Libraries and Governments preserving the web remains overwhelmingly represented by well-resourced institutions from Europe and North America. We hope the WEWA project helps provide enhanced access to archived material otherwise hard to find and browse in the massive 20+ petabytes of the Wayback Machine. More importantly, we hope the project provokes a broader reflection upon the lack of national diversity in institutions collecting the web and also spurs collective action towards diminishing the overrepresentation of “first world” nations and peoples in the overall global web archive.

As with prior special projects by the Web Archiving & Data Services team, such as GifCities (search engine for animated Gifs from the Geocities web collection) or Military Industrial Powerpoint Complex (ebooks of Powerpoints from the archive of the .mil (military) web domain), the project builds on our exploratory work to provide improved access to valuable subsets of the web archive. While our Archive-It service gives curators the tools to build special collections of the web, we also work to build unique collections from the pre-existing global web archive.

The preliminary set of countries in WEWA were determined by selecting the 50 “smallest” countries as measured by number of websites registered on their national web domain (aka ccTLD) — a somewhat arbitrary measurement, we acknowledge. The underlying search index is based on internally-developed tools for search of both text and media. Indices are built from features like page titles or descriptive hyperlinks from other pages, with relevance ranking boosted by criteria such as number of inbound links and popularity and include a temporal dimension to account for the historicity of web archives. Additional technical information on search engineering can be found in “Exploring Web Archives Through Temporal Anchor Texts.”

We intend both to do more targeted, high-quality archiving of these and other smaller national webs and also have undertaking active outreach to national and heritage institutions in these nations, and to related international organizations, to ensure this work is guided by broader community input. If you are interested in contributing to this effort or have any questions, feel free to email us at webservices [at] archive [dot] org. Thanks for browsing the WEWA!

Posted in Archive-It, News, Wayback Machine - Web Archive, Web & Data Services | Tagged , | Comments Off on The Whole Earth Web Archive

Fighting Misinformation Online

On Tuesday, the Internet Archive joined Public Knowledge, the Wikimedia Foundation and the Samuelson Law, Technology and Public Policy Clinic from Berkeley Law to brief the Congressional Internet Caucus on efforts to combat misinformation online. Misinformation is a complex issue but one of the root causes is a lack of easy, reliable ways for Internet users to distinguish good information from bad, or authoritative sources from propaganda. The panel highlighted our recent work to weave books into Wikipedia articles, giving users the ability to dig deeper and fact check assertions in just one click.

We would like to thank the Congressional Internet Caucus Academy and Representative Anna Eshoo for sponsoring this conversation.

Posted in Announcements, Books Archive, Lending Books, News | 15 Comments

Weaving Books into the Web—Starting with Wikipedia

[announcement video, Wired]

The Internet Archive has transformed 130,000 references to books in Wikipedia into live links to 50,000 digitized Internet Archive books in several Wikipedia language editions including English, Greek, and Arabic. And we are just getting started. By working with Wikipedia communities and scanning more books, both users and robots will link many more book references directly into Internet Archive books. In these cases, diving deeper into a subject will be a single click.

Moriel Schottlender, Senior Software Engineer, Wikimedia Foundation, speech announcing this program

“I want this,” said Brewster Kahle’s neighbor Carmen Steele, age 15, “at school I am allowed to start with Wikipedia, but I need to quote the original books. This allows me to do this even in the middle of the night.”

For example, the Wikipedia article on Martin Luther King, Jr cites the book To Redeem the Soul of America, by Adam Fairclough. That citation now links directly to page 299 inside the digital version of the book provided by the Internet Archive. There are 66 cited and linked books on that article alone. 

In the Martin Luther King, Jr. article of Wikipedia, page references can now take you directly to the book.

Readers can see a couple of pages to preview the book and, if they want to read further, they can borrow the digital copy using Controlled Digital Lending in a way that’s analogous to how they borrow physical books from their local library.

“What has been written in books over many centuries is critical to informing a generation of digital learners,” said Brewster Kahle, Digital Librarian of the Internet Archive. “We hope to connect readers with books by weaving books into the fabric of the web itself, starting with Wikipedia.”

You can help accelerate these efforts by sponsoring books or funding the effort. It costs the Internet Archive about $20 to digitize and preserve a physical book in order to bring it to Internet readers. The goal is to bring another 4 million important books online over the next several years.  Please donate or contact us to help with this project.

From a presentation on October 23, 2019 by Moriel Schottlender, Tech lead at the Wikimedia Foundation.

“Together we can achieve Universal Access to All Knowledge,” said Mark Graham, Director of the Internet Archive’s Wayback Machine. “One linked book, paper, web page, news article, music file, video and image at a time.”


Posted in Announcements, Books Archive, Lending Books, News, Open Library, Wayback Machine - Web Archive | 25 Comments

The Wayback Machine’s Save Page Now is New and Improved

Every day hundreds of millions of web pages are archived to the Internet Archive’s Wayback Machine. Tens of millions of them submitted by users like you using our Save Page Now service. You can now do that in a way that is easier, faster and better than ever before.

Save Page Now (SPN) just got a major upgrade as a result of a total code rewrite, adding a slew of new and awesome features, with more on the way.  

Let’s explore what’s new with Save Page Now    

You can now save all the “outlinks” of a web page with a single click. By selecting the “save outlinks” checkbox you can save the requested page (and all the embedded resources that make up that page) and also all linked pages (and all the embedded resources that make up those pages). Often, a request to archive a single web page, with outlinks, will cause us to archive hundreds of URLs.  Every one of which is shown via the SPN interface as it is archived.

My Web Archive keeps a record of the pages you personally saved in the Wayback Machine using Save Page Now.

The new and improved SPN is based on the modern, server-side Brozzler software, which is capable of running web page JavaScript when saving a URL. With this new approach, we can replay the original more faithfully than was possible before.  And, because this software is actively supported by several developers, bugs are quickly fixed, and new features added at a rapid pace. 

When users are logged in with their free Archive.org account, SPN-generated archives can be saved to that user’s “My web archive” public gallery of archived pages.  

In addition to capturing more high-quality archives of web page elements (HTML, JavaScript, Image files, etc.), SPN can now also produce a screenshot. If screenshots of archived pages are available, we will display an icon on corresponding playback pages and if selected the screenshot will be shown. 

Have you ever wanted to archive all the web pages linked from an email message?  Well, you are in luck because now you can forward that email to “savepagenow@archive.org” and after a few minutes you will get an email back filled with Wayback Machine playback URLs. 

Some of you might like the new “First capture” badge you will see if any of the URLs you submit to be archived (including outlinked URLs and URLs included in emails) have not been archived yet. And, yes, for those of you who are feeling competitive, we are planning to launch a “leader board” soon. Let the games begin!

Maybe you want the URLs embedded in a web-based PDF file, RSS feed, or JSON file archived. The new SPN will parse those files and archive all the URLs they contain.  To use this feature, simply submit PDF/RSS or JSON URLs to SPN, and don’t forget to select the “capture outlinks” checkbox.

This new version of SPN is also being used as the back-end support for a number of Wayback Machine services, including the iOS and Android apps as well as the Chrome, Firefox and Safari browser extensions. And, in case you wondered, those apps and extensions will also be getting major updates very soon.

And, yes, of course SPN has a brand new API that you can use to automate a range of Web archiving projects. Please write to us at info@archive.org if you would like to learn more about the API.

We have often gotten requests to archive URLs from a Google Sheet. We now support that feature for authorised users. Please write to us for access to this advanced capability at info@archive.org.

We LOVE hearing about ways we can make the Wayback Machine better. In fact most of these new SPN features started with your user suggestions.  

Please let us know what you think. Good, bad, or otherwise. Who knows, the next cool SPN feature might be invented by you!

And remember, “If you see something, save something!”

Posted in Announcements, News | 10 Comments