Google Library Project Legal: Let the Robots Read!

Guardian_of_Law_by_James_Earle_Fraser,_US_Supreme_Court

The decade-long legal battle over Google’s massive book scanning project is finally over, and it’s a huge win for libraries and fair use. On Monday, the Supreme Court declined to hear an appeal by the Author’s Guild, which had argued that Google’s scanning of millions of books was an infringement of copyright on a grand scale. The Supreme Court’s decision means that the Second Circuit case holding that Google’s creation of a database including millions of digital books is fair use still stands. The appeals court explained how its fair use rationale aligns with the very purpose of copyright law: “[W]hile authors are undoubtedly important intended beneficiaries of copyright, the ultimate, primary intended beneficiary is the public, whose access to knowledge copyright seeks to advance by providing rewards for authorship.”

Google Books gives readers and internet users the world over access to millions of works that had previously been hidden away in the archives of our most elite universities. As a Google representative said in a statement, “The product acts like a card catalog for the digital age by giving people a new way to find and buy books while at the same time advancing the interests of authors.”

Google began scanning books in partnership with a group of university libraries in 2004. In 2005, author and publisher groups filed a class action lawsuit to put a stop to the project. The parties agreed to settle the lawsuit in a manner that would have forever changed the legal landscape around book rights. The District Court judge rejected the settlement in 2011, based on concerns about competition, access, and fairness, and so litigation over the core question of fair use resumed.

Judge Chin, Judge Leval, and the Supreme Court all made the right decisions along the long and winding path to Google’s victory. Libraries around the country are now free to rely on fair use as they determine how to manage their own digitization projects–encouraging innovation and increasing our access to human knowledge.

Posted in Announcements, Books Archive, News | Leave a comment

Truck and Back Again: The Internet Archive Truck Takes a Detour

When one of our employees came out of his home over the weekend, he saw an empty parking space. Granted, in San Francisco, that’s a pretty precious thing, but since this empty parking space had held the Internet Archive Truck for the previous two days, he was not feeling particularly lucky.

A staff conversation then ensued, the city was called to see if the truck had been towed, and after a short time, it became obvious that no, somebody had stolen the Truck.

This in itself is not news: thousands of vehicles are stolen in the Bay Area every year. But what makes this unusual was the nature of the vehicle stolen… the Truck is a pretty unique looking vehicle.

IMG_3634

IMG_3635

Once the report was filed with the police and a few more checks were made to ensure that the truck was absolutely, positively missing and presumed stolen, the truck’s theft was announced on Twitter, which garnered tens of thousands of views and the news being spread very far. Thanks to everyone who got the word out.

What was not expected, besides the initial theft, was that a lot of people wondered why the Internet Archive, essentially a website, would have a truck. So, here’s a little bit about why.

Besides the providing of older websites, books, movies, music, software and other materials to millions of visitors a day, the Internet Archive also has buildings for physical storage located in Richmond, just outside the limits of San Francisco. In these buildings, we hold copies of books we’ve scanned, audio recordings, software boxes, films, and a variety of other materials that we are either turning digital or holding for the future. It turns out you can’t be a 100% online experience – physical life just gets in the way. We also have multiple data centers and the need to transport equipment between them.

Therefore, we’ve had a hard-working vehicle for getting these materials around: a 2003 GMC Savana Cutaway G3500, often parked out front of the Archive’s 300 Funston Avenue address and making up to several trips a week between our various locations.

In a touch of whimsy, the truck has had a unique paint job for most of its life with the Archive. Notably, this isn’t even the first mural it had on its sides; here is a shot with the previous mural:

10620121_10152811702463834_2063151320571234802_o

We’re not sure of the motivation in stealing this rather unique and noticeable vehicle, and there seems to be some evidence it was driven around the city for a while after it was taken. But yesterday, we were contacted by the San Francisco Police Department with really great news:

The Truck has been recovered!

Left abandoned by the side of the road, the truck was found and is about to be returned to the Archive, and with good luck, back and in service helping us prepare and transport materials related to our mission: to bring the world’s knowledge to everyone.

Again, thanks to everyone who sounded out the original call for the truck’s return, and to the SFPD for getting a hold of the truck so quickly after it was gone.

Posted in Announcements, Cool items | Leave a comment

Join us for “How Digital Memory is Shaping our Future” with Abby Smith Rumsey– April 26

Abby Smith Rumsey photo by Cindi de ChannesWhat is the future of human memory? What will people know about us when we are gone?

Abby Smith Rumsey, historian and author, has explored these important questions and more in her new book When We Are No More: How Digital Memory is Shaping Our Future.

On the evening of Tuesday, April 26 at 7 p.m., the Internet Archive hosts Abby Smith Rumsey as she takes us on a journey of human memory from prehistoric times to the present, highlighting the turning points in technology that have allowed us to understand more about the history of the world around us.

Each step along the way – from paintings on cave walls to cuneiform on clay tablets, from the Gutenberg printing press to the recent technological advances of digital storage – shows how humans have adapted to the increasing need for new methods to share knowledge with a widening community. In addition to these milestones of human communication, the development of machinery in the industrial age helped unlock the geological record of the physical world around us, changing how our societies think about time and change to the natural environment on a grand scale.

When We Are No More_HC_catExamining the past helps us understand where the future might lead us. Yet with our current methods of digital storage, what will still be accessible and what steps can we take to make sure knowledge persists? Out of the vast amounts of data that we are capable of saving, what will be considered important? Only time will tell, and it will be when “we are no more.”  The Internet Archive, under the leadership of Brewster Kahle, is one organization playing an important role in bringing our civilization’s record of knowledge into the future. Smith Rumsey will share her insights into how we can leave a legacy for those in the future to best understand our lives, our struggles, our passions – our very humanity.

We hope you’ll join us for an enlightening evening with this thought-provoking author, historian and librarian.

Event Info:
How Digital Memory is Shaping Our Future:  A Conversation with Abby Smith Rumsey
Tuesday, April 26, 2016
Internet Archive, 300 Funston Avenue, San Francisco, CA 94118

Doors open at 6:30 PM, Talk begins at 7:00pm
Reception and book signing to follow presentation

This event is free and open to the public.  Please RSVP to our Eventbrite at:
http://www.eventbrite.com/e/abby-smith-rumsey-how-digital-memory-is-shaping-our-future-tickets-22473471759

For more information about Abby Smith Rumsey and her book, please visit her website at www.rumseywrites.com.

Posted in News | 3 Comments

Upcoming changes in epub generation

Epub is a format for ebooks that is used on book reader devices.   It is often mostly text, but can incorporate images. The Internet Archive offers these in two cases:  when a user uploads them, and when they are created from other formats, such as scanned books or uploaded PDFs that were made up of images of pages.

The Internet Archive creates them from images of pages using “optical character recognition” (OCR) technology. This is then reformatted into the epub format (currently epub v2). These files are sometimes created “on-the-fly” and sometimes created as files and stored in our item directories.   All “on-the-fly” epubs use the newest code, where stored ones use the code available at the time of generation.

Based on a change in the format from our OCR engine last August, many of the epubs generated between then and last week have been faulty. Newly generated epubs are now fixed, and we will soon be going back to fix the faulty ones that were stored. We have also discovered that some of the older epubs have also been faulty, and it is difficult to know which.

To fix this we are shifting to the “on-the-fly” generation for all epubs so that all epubs get the newest code.   This is how we already generate daisy, mobi, and many zip files as well.   To access the epubs for the books we have scanned the URL is https://archive.org/download/ID/ID.epub, for instance https://archive.org/download/recordofpennsylv00linn/recordofpennsylv00linn.epub.

More generally, to find when an epub can be generated, for items that do not have a field the ocr field in meta.xml, that says “language not currently OCRable”, and there is a file an abbyy format file will be in an item. For instance, in an item’s file list, the presence of an abbyy file downloadable at  http://archive.org/download/file_abbyy.gz will mean a corresponding epub file can be downloaded at http://archive.org/download/file.epub.

Posted in News | Comments Off on Upcoming changes in epub generation

New video shows rich resources available at Political TV Ad Archive

Since our launch on January 22, the Political TV Ad Archive has archived more than 1,080 ads with more than 155,000 airings. We’ve trained hundreds of journalists, students, and other interested members of the public with face-to-face trainings. But much as we would like to, we can’t talk to each of you individually. That’s why we created this video.

Watch the video for an overview of the project, the wealth of information it provides, and how fact checkers and journalists have been using it to enrich their reporting. It is a great introduction for educators to use with students, for civic groups to engage their membership in the political process, and for reporters who want to get the basics on how to use the site.

And remember: we want to hear from you about how you are using the Political TV Ad Archive. Please drop us an email at politicalad@archive.org or tweet us @PolitAdArchive. Over the week ahead, we’ll be highlighting examples of how educators have used the project in their classrooms. We’d love to feature examples of how other members of the public are using this collection to enhance deeper understanding of the 2016 elections.

Going forward, we are tracking ads in the New York City, Philadelphia, San Francisco, and Washington, DC markets. These markets will provide a window on political ads appearing in several upcoming primary states: California, Maryland, New Jersey, New York, and Pennsylvania. 

Enjoy!

Posted in Announcements, News | Comments Off on New video shows rich resources available at Political TV Ad Archive

Getting back to “View Source” on the Web: the Movable Web / Decentralized Web

The Web 1.0 moved so fast partly because you could “View Source” on a webpage you liked and then modify and re-use it to make your own webpages. This even worked with pages with JavaScript programs—you could see how it worked, modify and re-use it. The Web jumped forward.

Then came Web 2.0, where the big thing was interaction with “APIs” or application programmable interfaces.  This meant that the guts of a website were on the server and you only got to ask approved questions to get approved answers, or it would specially format a webpage for you with your answer on it.   The plus side was that websites had more dynamic webpages, but learning from how others did things became harder.

Power to the People went to Power to the Server.

Can we get both?  I believe we can, and with a new Web built on top of the existing Web.  A “decentralized web” or a “movable web” has many privacy and archivability features, but another feature could be knowledge reuse.  In this way, the set of files that make up a website—text/HTML, programs, and data—are available to the user if they want to see them.

The decentralized Web works by having a p2p distribution of the files that make up the website, and then the website runs in your browser.  By being completely portable, the website has all the pieces it needs: text, programs, and data.  It can all be versioned, archived, and examined.

[Upcoming Summit on the Decentralized Web at the Internet Archive June 8th, 2016]

For instance, this demo has the pages of a blog in a peer-to-peer file system called IPFS, but also the search engine for the site, in JavaScript, that runs locally in the browser.    The browser downloads the pages and JavaScript and the search-engine index from many places on the net and then displays in the browser.  The complete website, including its search engine and index, are therefore downloadable and inspectable.

This new Web could be a way to distribute datasets because the data would move with programs that could make use of it, thus helping document the dataset.  This use of the decentralized Web became clear to me by talking with the Karissa McKelvey and Max Ogden of the DAT Data project working on distributing scientific datasets.

What if scientific papers evolved to become movable websites (or call them “distributed websites” or “decentralized websites”)?  That way, the text of the paper, the code, and the data would all move around together documenting itself.  It could be archived, shared, and examined.

Now that would be “View Source” we could all live with and learn from.

Posted in News | Comments Off on Getting back to “View Source” on the Web: the Movable Web / Decentralized Web

The Internet Archive, ALA, and SAA Brief Filed in TV News Fair Use Case

tvnewsarchiveThe Internet Archive, joined by the American Library Association, the Association of College and Research Libraries, the Association of Research Libraries, and the Society of American Archivists filed an amicus brief in Fox v. TVEyes on March 23, 2016. In the brief, the Internet Archive and its partners urge the court to issue a decision that will support rather than hinder the development of comprehensive archives of television broadcasts.

The case involves a copyright dispute between Fox News and TVEyes, a service that records all content broadcast by more than 1,400 television and radio stations and transforms the content into a searchable database for its subscribers. Fox News sued TVEyes in 2013, alleging that the service violates its copyright. TVEyes asserted that its use of Fox News content is protected by fair use.

politicaltvadDrawing on the Internet Archive’s experience with its TV News Archive and Political TV Ad Archive, the friend-of-the-court brief highlights the public benefits that flow from archiving and making television content available for public access. “The TV News Archive allows the public to view previously aired broadcasts–as they actually went out over the air–to evaluate and understand statements made by public officials, members of the news media, advertising sponsors, and others, encouraging public discourse and political accountability,” said Roger Macdonald, Director of the TV Archive.

Moreover, creating digital databases of television content allows aggregated information about the broadcasts themselves to come to light, unlocking researchers’ ability to process, mine, and analyze media content as data. “Like library collections of books and newspapers, television archives like the TV News Archive and the Political TV Ad Archive allow anyone to thoughtfully assess content from these influential media, enhancing the work of journalists, scholars, teachers, librarians, civic organizations, and other engaged citizens,” said Tomasz Barczyk, a Berkeley Law student from the Samuelson Law, Technology & Public Policy Clinic who helped author the brief.

The brief also explains the importance of fostering a robust community of archiving organizations. Because television broadcasts are ephemeral, content is easily lost if efforts are not made to preserve it systematically.  In fact, a number of historically and culturally significant broadcasts have already been lost, from BBC news coverage of 9/11 to early episodes of Doctor Who. Archiving services prevent this disappearance by collecting, indexing, and preserving broadcast content for future public access.

A decision in this case against fair use would chill these services and could result in the loss of significant cultural resources. “This is an important case for the future of digital archives,” explained William Binkley, the other student attorney who worked on the brief. “If the court rules against TVEyes, there’s a real risk it could discourage efforts by non-profits to create searchable databases of television clips. That would deprive researchers and the general public of a tremendously valuable source of knowledge.”

The Internet Archive would like to thank Tomasz Barczyk, William Binkley, and Brianna Schofield from the Samuelson Law, Technology & Public Policy Clinic at Berkeley Law for helping to introduce an important library perspective as the Second Circuit court considers this case with important cultural implications.

Posted in Announcements, News | Comments Off on The Internet Archive, ALA, and SAA Brief Filed in TV News Fair Use Case

Three takeaways after logging 1,032 political ads in the primaries

The Political TV Ad Archive launched on January 22, 2016, with the goal of archiving airings of political ads across 20 local broadcast markets in nine key primary states and embedding fact checks and source checks of those ads by our journalism partners. We’re now wrapping up this first phase of the project, and are preparing for the second, where we’ll fundraise so we can apply the same approach to political ads in key 2016 general election battleground states.

But first: here are some takeaways from our collection after logging 1,032 ads. Of those ads, we captured 263 airing at least 100 times apiece, for a total all together of more than 145,000 airings.

1. Only a small number of ads earned “Pants on Fire!” or “Four Pinocchio” fact checking ratings. Just four ads received the worst ratings possible from our fact-checking partners.

Donald Trump’s campaign won the only “Pants on Fire” rating awarded by fact checking partner PolitiFact for a campaign ad: “Trump’s television ad purports to show Mexicans swarming over ‘our southern border.’ However, the footage used to support this point actually shows African migrants streaming over a border fence between Morocco and the Spanish enclave of Melilla, more than 5,000 miles away,” wrote PolitiFact reporters C. Eugene Emery Jr. and Louis Jacobson in early January, when Trump released the ad, his very first paid ad of the campaign. The ad aired more than 1,800 times, most heavily in the early primary states of Iowa and New Hampshire.

Trump also won a “four Pinocchio” rating from the Washington Post’s Fact Checker for this ad which charges John Kasich of helping “Wall Street predator Lehman Brothers destroy the world economy.” “[I]t’s preposterous and simply not credible to say Kasich, as one managing director out of 700, in a firm of 25,000, “helped” the firm “destroy the world economy,” wrote reporter Michelle Ye Hee Lee.

Two other ads received the “four Pinocchio” rating from the Washington Post’s Fact Checker. This one, from Ted Cruz’s campaign, claims that Marco Rubio supported an immigration plan that would have given President Obama the authority to admit Syrian refugees, including ISIS terrorists. “[T]his statement is simply bizarre,” wrote Glenn Kessler. “With or without the Senate immigration bill, Obama had the authority to admit refugees, from any country, under the Refugee Act of 1980, as long as they are refugees and are admissible….What does ISIS have to do with it? Nothing. Terrorists are not admissible under the laws of the United States.”

This one, from Conservative Solutions PAC, the super PAC supporting Rubio, claims that there was only one “Republican helpful” who had “actually done something” to dismantle the Affordable Care Act, by inserting a provision preventing protection for insurance companies from losses if they didn’t do accurate estimates on the premiums in first three years of the law. “Rubio goes way too far in claiming credit here,” wrote Kessler. “He raised initial concerns about the risk-corridor provision, but the winning legislative strategy was executed by other lawmakers.”

Overall, our fact-checking and journalism partners—the Center for Responsive Politics, the Center for Public IntegrityFactCheck.org, PolitiFact, and the Washington Post’s Fact Checker—wrote 57 fact- and source-checks of 50 ads sponsored by presidential campaigns and outside groups. (The American Press Institute and Duke Reporters’ Lab, also partners, provided training and tools for journalists fact checking ads.)

Of the 25 fact checks done by PolitiFact, 60 percent of the ads earned “Half True,” “Mostly True,” and “True” ratings, with the remainder earning “Mostly False,” “False,” and “Pants on Fire” ratings. The Washington Post’s Fact Checker, the other fact-checking group that uses ratings, fact-checked 11 ads. Of these, seven earned ratings of three or four Pinocchios. A series of ads featuring former employees and students denouncing Trump University, from a “dark money” group that doesn’t disclose its donors, earned the coveted “Geppetto Checkmark” for accuracy. Those ads aired widely in Florida and Ohio leading up to the primaries there.

The ad that produced the most fact checks and source checks was this one from the very same group, the American Future Fund, for an attack ad on John Kasich. Robert Farley of FactCheck.org wrote, “An ad from a conservative group attacks Ohio Gov. John Kasich as an ‘Obama Republican,’ and misleadingly claims his budget ‘raised taxes by billions, hitting businesses hard and the middle class even harder.'” PolitiFact Ohio reporter Nadia Pflaum gave the ad a “False” rating; Michelle Ye Hee Lee of the Washington Post’s Fact Checker awarded it “Three Pinocchios.” The Center for Public Integrity described the American Future Fund as “a conservative nonprofit linked to the billionaire brothers Charles and David Koch that since 2010 has inundated federal and state races with tens of millions of dollars.”

This ad from Donald Trump’s campaign earned a “Pants on Fire” rating from PolitiFact.

2. Super Campaign Dodger, and other creative ways to experience and analyze political ads. Journalists did some serious digging into the downloadable metadata the Political TV Ad Archive provides here to analyze trends in presidential ad campaigns.

The Economist mashed up data about airings in Iowa and New Hampshire with polling data and asked the question: Does political advertising work? The answer—”a bit of MEH” (or, “minimal-effects hypothesis”)—in other words, voters are persuaded, but just the littlest bit.

Farai Chideya of FiveThirtyEight and Kate Stohr of Fusion delved into data on anti-Trump ads airing ahead of the Florida primary—which Trump went on to win handily, despite the onslaught.

Nick Niedzwiadek plumbed the collection when writing about political ad gaffes for The Wall Street Journal. Nadja Popovich of The Guardian graphed Bernie Sanders’s surge in ad airings in Nevada, ahead of the contest there.

William La Jeunesse of Fox News reported on negative ads here. Philip Bump of The Washington Post used gifs to illustrate just how painful it was to be a TV-watching voter in South Carolina in the lead up to the primary there.

And in what was the most interactive use of the project’s metadata, Andrew McGill, a senior associate editor for The Atlantic, created an old-style video game, where the viewer uses the space key on a computer keyboard to try to dodge all the ads that aired on Iowa airwaves ahead of the caucuses there. For links to other journalists’ uses of the Political TV Ad Archive, click here.

via GIPHY
3. Candidates’ campaigns dominated; super PACs favored candidates who failed. In our collection, candidates’ official campaigns sponsored the most ad airings—63 percent. Super PACs accounted for another 27 percent, and nonprofit groups, often called “dark money” groups because they do not disclose their donors, accounted for nine percent of ad airings.

Bernie Sanders‘ and Hillary Clinton‘s campaigns had the most ad airings—29,347 and 26,891 respectively. Of the GOP candidates, who faced a more divided competition, it was Marco Rubio’s campaign that had the most airings—11,798—and Donald Trump was second, with 9,590. However, in the Republican field, super PACs played a much bigger role, particularly those advocating for candidates who have since pulled out of the race. Conservative Solutions PAC, the super PAC that supported Marco Rubio in his candidacy, showed 12,851 airings; Right to Rise, which supported Jeb Bush, had 12,543.

This pair of issue ads sponsored by the AARP (aka the American Association of Retired People), aired at least 9,653 times; the ads focus on social security and have been broadcast across the markets monitored by the Political TV Ad Archive.

The biggest non-news shows that featured political ads were “Jeopardy!,” “Live With Kelly and Michael,” and “Wheel of Fortune.” Fusion did an analysis that showed that the most popular entertainment shows targeted by presidential candidates and mashed it up with Nielsen data about viewership. For example, Bernie Sanders’ campaign favored “Jimmy Kimmel Live,” while Hillary Clinton’s campaign likes “The Ellen Degeneres Show.”

 

Screenshot 2016-03-04 13.50.08

The Political TV Ad Archive–which is a project of the Internet Archive’s TV News Archive–is now conducting a thorough review of this project, which was funded by a grant from the Knight News Challenge, an initiative of the John S. and James L. Knight Foundation. The Challenge is a joint effort of the Rita Allen Foundation, the Democracy Fund, and the Hewlett Foundation.

Stay tuned for news of the Political TV Ad Archive’s plans for covering future primaries in California, New York, and Pennsylvania, and beyond, our fundraising for the second phase of this project: fundraising to track ads in key battleground states in the general elections.

This post is cross posted at the Political TV Ad Archive.

Posted in Announcements, News | Tagged , , , , , , , , , , , , , | 1 Comment

Save our Safe Harbor: Submission to Copyright Office on the DMCA Safe Harbor for User Contributions

lighthouseThe United States Copyright Office is seeking feedback on how the “notice and takedown” system created by the Digital Millennium Copyright Act, also known as the “DMCA Safe Harbors,” is working. Congress decided that in this country, users of the Internet should be allowed to share their ideas with the world via Internet platforms. In order to facilitate this broad goal, Congress established a system that protects platforms from liability for the copyright infringement of their users, as long as the platforms remove material when a copyright holder complains. The DMCA also allows users to challenge improper takedowns.

We filed comments this week, explaining that the DMCA is generally working as Congress intended it to. These provisions allow platforms like the Internet Archive to provide services such as hosting and making available user-generated content without the risk of getting embroiled in lawsuit after lawsuit. We also offered some thoughts on ways the DMCA could work better for nonprofits and libraries, for example, by deterring copyright holders from using the notice and takedown process to silence legitimate commentary or criticism.

The DMCA Safe Harbors, while imperfect, have been essential to the growth of the Internet as an engine for innovation and free expression. We are happy to provide our perspective on this important issue to the Copyright Office.

Posted in Announcements, News | Comments Off on Save our Safe Harbor: Submission to Copyright Office on the DMCA Safe Harbor for User Contributions

Guess what we find in books? A look Inside our Midwest Regional Digitization Center– by Jeff Sharpe

The history of a book isn’t captured merely by the background of the author or its publishing date or its written content. Most books were purchased and read by someone; they are from a specific time and place. That too is part of each book’s history. Sometimes in digitizing books we find pressed flowers or a single leaf or pieces of paper that were used as bookmarks then forgotten. We even found a desiccated chameleon in one book  When we find something like that at the Internet Archive’s Digitization Centers, we digitize the object because it is part of the history of that book. We see our mission to be archiving each book exactly as it was found, so that when you flip through a book, you are seeing it as if you had the physical copy in your hands, not just black text on a white page.

Take for example this book from the Lincoln Financial Foundation Collection:  The Life and Speeches of Henry Clay. In the chapter on Clay’s speeches, you can see what Abraham Lincoln highlighted, points he thought worthy of noting.

Blog Lincoln notations in book

In fact, by seeing what Lincoln underscored as he read this book and by reading his notes, you get a glimpse into what may have shaped his ideas; how he might have then used certain concepts to express his thoughts and policies about slavery and its abolition. The history of this book, which was held and read and annotated by Abraham Lincoln, had a direct effect on the history of this nation. A historic book that also has a history of it’s own.

We’ve digitized over 125,000 items here at the Midwest Regional Digitization Center at the Allen County Public Library in Fort Wayne, Indiana. In several books we digitized for the University of Pittsburgh’s Darlington Collection, we found some treasures. In one, we found a note by William Henry Harrison , then governor of the Indian Territory in 1803. (Scroll down the pages to see the letters in situ.)

In another we found a promissory note by Aaron Burr from 1796 for a large sum of money. Burr was a controversial person to say the least. He was not only a Revolutionary War hero, Thomas Jefferson’s Vice President and a presidential candidate himself, but also the man who shot and killed Alexander Hamilton in a duel.

Once someone at the University of Pittsburgh contacted me regarding an item a digital reader had made them aware of:  a previously unknown, original survey report written by none other than Daniel Boone!  He asked me if  I knew anything about it. I verified that we had found and digitized it–along with the note by Aaron Burr  and the letter by William Henry Harrison. I got a shocked reply, “Where??”  Apparently digitizing not only opened up access to these books, it also rediscovered long-lost manuscripts stuck between the pages, penned by important figures in American history. Blog Boone letter (1)

The history of these books turned out to contain the history of this country, highlighted in a very personal way. Whether it is someone pressing a violet between the pages, Abe Lincoln researching abolition, or a forgotten survey report by Daniel Boone, sometimes the material we digitize can bring our past alive.  What will you discover lodged between the pages in our three million digital books?

Take a tour of the Midwest Regional Digitization with Jeff Sharpe in this recent video.


jeffsharpeJeff Sharpe is Senior Digitization Manager for the Midwest Region.

Jeff’s work experience in administration and research led him to the Internet Archive’s digitization center in the Allen County Public Library in Fort Wayne Indiana. He’s proud of his role in helping to bring well over a hundred thousand books online for universal access, including more than fifteen thousand items digitized by volunteers at the Midwest Center. Jeff is a voracious reader and loves books. He has a passion for history and archaeology– particularly from the Mayan civilization which has led him to  travel extensively to Mayan ruins. He enjoys among other things bicycle riding, gardening, and hanging out with his wife, two kids, and their two dogs.

Posted in Announcements, News | 11 Comments

CASH BOX Music Magazine to Come Online

The Swem Library at the College of William & Mary in Virginia has received a grant from the Council on Library and Information Resources (CLIR) to digitize its entire run of Cash Box, a music trade magazine published from 1942 to 1996.  Swem Library is partnering with the Internet Archive, to scan all 190,000 pages of the 163-volume collection and create an online portal for reading and downloading the digital images.

“We are overjoyed to be able to unleash decades of music industry information to the public,” said Dean of University Libraries Carrie Cooper. “Swem Library has been gearing up for a greater emphasis on the digitization of unique and rare collections that are of interest to the public and scholars. We are grateful to have partners like CLIR to support our efforts to expose the hidden treasures of our library.”

The grant is part of CLIR’s Digitizing Hidden Special Collections and Archives awards program, a national competition that funds the digitization of rare and unique content held by libraries and institutions that would otherwise be unavailable to the public. The program is funded by the Andrew W. Mellon Foundation.

An alternative to Billboard Magazine, Cash Box included regional chart data; hit songs by city, radio station, and record sales; popularity by jukebox; and charts by genre including country and R&B. It also featured stories on artists, news of tours, insider gossip, album summaries and photographs found nowhere else. Later issues included sections relating to the music industry in Canada, Europe, Japan and Mexico.

“We are very excited to make this important and internationally significant resource for the study of music history and popular culture more widely accessible,” said Jay Gaidmore, director of the library’s Special Collections Research Center. “Since acquiring these issues in 2010, we have received more requests for copies and information from Cash Box than from any other individual collection held in Special Collections.”

Filling requests for copies of Cash Box materials has been difficult, Gaidmore said, due to the library’s lack of resources. Researchers who need immediate access to the collection typically must travel to Williamsburg. Making the collection available online will put this resource into the hands of researchers across the globe.

Philip Gentry, assistant professor of music history at the University of Delaware, is one of those researchers. As a scholar and teacher of American music in the post-war era, Gentry believes Cash Box provides a crucial alternative to Billboard, which primarily focused on mainstream music.

“[Cash Box’s] formula relied more heavily upon jukebox ‘plays,’ and thus are often a much more reliable window into trends of more subcultural markets such as African American-dominated rhythm and blues or white working-class country,” he said.

Gentry is currently working on a project documenting anti-communist blacklisting in popular music during the McCarthy era. He has found very little discussion on the topic in Billboard, but has seen hints that it was more openly discussed in Cash Box.

Not only is Gentry excited to see Cash Box digitized for his own scholarship, he sees impact on his teaching as well.

“Digitization makes possible a whole world of classroom assignments,” he said. “Unlike with older primary sources, very few institutions have undertaken the commitment to properly archive and make accessible collections of the recent past. And yet, teaching research skills and the tools of critical reading is no less important for students engaging with popular culture of the American twentieth century.”

The project will begin in February and is expected to be completed by December 2016. The collection will be made freely and publicly available through the Swem Library website and here at the Internet Archive.

This article was republished by permission of our partners at the Swem Library.  It first appeared in January 2016.  

Posted in News | 4 Comments

Saving 500 Apple II Programs from Oblivion

Among the tens of thousands of computer programs now emulated in the browser at the Internet Archive, a long-growing special collection has hit a milestone: the 4am Collection is now past 500 available Apple II programs preserved for the first time.

playable_screenshot

To understand this achievement, it’s best to explain what 4am (an anonymous person or persons) has described as their motivations: to track down Apple II programs, especially ones that have never been duplicated or widely distributed, and remove the copy protection that prevents them from being digitized. After this, the now playable floppy disk is uploaded to the Internet Archive along with extensive documentation about what was done to the original program to make it bootable. Finally, the Internet Archive’s play-in-a-browser emulator, called JSMESS (a Javascript port of the MAME/MESS emulator) allows users to click on the screenshot and begin experiencing the Apple II programs immediately, without requiring installation of emulators or the original software.

In fact, all the screenshots in this entry link to playable programs!

playable_screenshot (1)

If you’re not familiar with the Apple II software library that has existed over the past few decades, a very common situation of the most groundbreaking and famous programs produced by this early home computer is that only the “cracked” versions persist. Off the shelf, the programs would include copy protection routines that went so far as to modify the performance of the floppy drive, or force the Apple II’s operating system to rewrite itself to behave in strange ways.

Because hackers (in the “hyper-talented computer programmers” sense) would take the time to walk through the acquired floppy disks and remove copy protection, those programs are still available to use and transfer, play and learn from.

One side effect, however, was that these hackers, young or proud of the work they’d done, would modify the graphics of the programs to announce the effort they’d put behind it, or remove/cleave away particularly troublesome or thorny routines that they couldn’t easily decode, meaning the modern access to these programs were to incomplete or modified versions. For examples of the many ways these “crack screens” might appear, I created an extensive gallery of them a number of years ago. (Note that there are both monochrome and color versions of the same screen, and these are just screen captures, not playable versions.) They would also focus almost exclusively on games, especially arcade games, meaning any programs that didn’t fall into the “arcade entertainment” section of the spectrum of Apple II programs was left by the wayside entirely.

With an agnostic approach to the disks being preserved, 4am has brought to light many programs that fall almost into the realm of lore and legend, only existing as advertisements in old computer magazines or in catalog listings of computer stores long past.

playable_screenshot (2)

It gets better.

Easily missed if you’re not looking for it are the brilliant and humorous write-ups done by 4am to explain, completely, the process of removing the copy protection routines. The techniques used by software companies to prevent an Apple II floppy drive from making a duplicate while also allowing the program to boot itself were extensive, challenging, and intense. Some examples of these write-ups include this one for “Cause and Effect”, a 1988 education program, as well as this excellent one for “The Quarter Mile”, another educational program. (To find the write-up for a given 4am item in the collection click on the “TEXT” link on the right side of the item’s web page.)

These extensive write-ups shine a light on one of the core situations about these restored computer programs.

As 4am has wryly said over the years, “Copy Protection Works!” – if the copy protection of a floppy disk-based Apple II program was strong and the program did not have the attention of obsessed fans or fall into the hands of collectors, its disappearance and loss was almost guaranteed.  Because many educational and productivity software programs were specialized and not as intensely pursued/wanted as “games” in all their forms, those less-popular genres suffer from huge gaps in recovered history. Sold in small numbers, these floppy disks are subject to bit rot, neglect, and being tossed out with the inevitably turning of the wheels of time.

This collection upends that situation: by focusing on acquiring as many different unduplicated Apple II programs as possible, 4am are using their skills to ensure an extended life and documented reference materials for what would otherwise disappear.

Classifying Animals with Backbones title screen

Already, the collection has garnered some attention – the “Classifying Animals With Backbones” educational program linked above has a guest review from one of the creators describing the process of the application coming to life. And a particularly thorny copy protection scheme on a 1982 game of Burger Time went viral (in a good way) and was read 25,000 times when it was uploaded to the Archive.

In a few cases, the amount of effort behind the copy protection schemes and the concerned engineering involved in removing the copy protection are epics in themselves.

Speed Reader II 091286 screen 3 - main menu

As an example, this educational program Speed Reader II contains extensive copy protection routines, using tricks and traps to resist any attempts to understand its inner workings and misleading any potential parties who are duplicating it. 4am do their best to walk the user through what’s going on, and even if you might not understand the exact code and engineering involved, it leaves the reader smarter for having browsed through it.

This project has been underway for years and is now at the 500 newly-preserved program mark – that’s 500 different obscure programs preserved for the first time, which you can play and experience on the archive.

Get cracking!

Algernon title screen

(The usual notes: The “Play in Browser” technology used at the Internet Archive is still relatively new, and works best on modern machines running newest versions of browsers, especially Firefox, Chrome and Brave. Javascript (not Java) needs to be enabled on the machine to work. (By default on all browsers, it is.) The manuals for many of the programs are not directly available in many cases, so some experimentation is required, although educational programs often worked to be understood without any manuals for the use of their audiences. Thanks to 4am for housing their collection at the Internet Archive and the many individuals on the MAME and JSMESS teams who have made this emulation possible.)

Posted in Emulation, Software Archive | 12 Comments

Distributed Preservation Made Simple

Library partners of the Internet Archive now have at their fingertips an easy way – from a Unix-like command line in a terminal window – to download digital collections for local preservation and access.

This post will show how to use a Internet Archive command-line tool (ia) to download all items in a collection stored on Archive.org, and keep their local collections in sync with the Archive.org collection.

To use ia, the only requirement is to have Python 2 installed on a Unix-like operating system (i.e. Linux, Mac OS X). Python 2 is pre-installed on Mac OS X and most Linux systems so there is nothing more that needs to be done, except to open up a terminal and follow these steps:

1.  Download the latest binary of the ia command-line tool by running the following command in your terminal:

curl -LO https://archive.org/download/ia-pex/ia

2. Make the binary executable:

chmod +x ia

3. Make sure you have the latest version of the binary, version 1.0.0:

./ia --version

4. Configure ia with your Archive.org credentials (This step is only needed if you need privileges to access the items). :

./ia configure

5. Download a collection:

./ia download --search 'collection:solarsystemcollection'

or

./ia download --search 'collection:JangoMonkey'

The above command to “Download a collection”, for example, will download all files from all items from the band JangoMonkey or the NASA Solar System collection. If re-run, by default, will skip over any files already downloaded, as rysnc does, which can help keep your local collection in sync with the collection on Archive.org.

If you would like to download only certain file types, you can use the –glob option. For example, if you only wanted to download JPEG files, you could use a command like:

./ia download --search 'collection:solarsystemcollection' --glob '*.jpeg|*.jpg'

Note that by default ia will download files into your current working directory. If you launch a terminal window without moving to a new directory, the files will be downloaded to your user directory. To download to a different directory, you can either cd into that directory or use the “–destdir” parameter like so:

mkdir solarsystemcollection-collection

./ia download --search 'collection:solarsystemcollection' --destdir solarsystemcollection-collection

Downloading in Parallel

GNU Parallel is a powerful command-line tool for executing jobs in parallel. When used with ia, downloading items in parallel is as easy as:

./ia search 'collection:solarsystemcollection' --itemlist | parallel --no-notice -j4 './ia download {} --glob="*.jpg|*.jpeg"'

The -j option controls how many jobs run in parallel (i.e. how many files are downloaded at a time). Depending on the machine you are running the command on, you might get better performance by increasing or decreasing the number of simultaneous jobs. By default, GNU Parallel will run one job per CPU.

GNU Parallel can be installed with Homebrew on Mac OS X (i.e.: brew install parallel), or your favorite package manager on Linux (e.g. on Ubuntu: apt-get install parallel, on Arch Linux: pacman -S parallel, etc.). For more details, please refer to: https://www.gnu.org/software/parallel/

For more options and details, use the following command:

./ia download --help

Finally, to see what else the ia command-line tool can do:

./ia --help

Documentation of the ia command-line tool is available at: https://internetarchive.readthedocs.org/en/latest/cli.html

There you have it. Library partners, download and store your collections now using this command-line tool from the Internet Archive. If you have any questions or issues, please write to info (at) archive.org. We are trying to make distributed preservation simple and easy!

 

Posted in News, Technical | 6 Comments

Next Librarian of Congress: Carla Hayden

Carla Hayden

Carla Hayden

The President has nominated Carla Hayden to be the next Librarian of Congress.    I have met her through IMLS and support her for this position.

As a public librarian, she can bring an access and public service orientation to a position that has traditionally been focused on Congress’ needs and collecting valuable materials.

The Library of Congress is both a powerful symbol and a fabulous organization.   Its collections are unbelievable– there are employees in Cairo and Delhi collecting the best that humanity has produced. The Library has high collecting standards and has resisted restrictions from being put on access.

For instance, the Library of Congress has actively pursued web archiving since 2000 and made these collections more available than almost any other institution. As the home of the US Copyright Office, the Library can keep the constitutional balance in mind as copyright laws evolve.

All of these features of the Library play into the strengths of Carla Hayden who can help shape a potent institution for our new century.

-brewster

 

Posted in Announcements | 1 Comment

Fair Use & Access to All Human Knowledge

FairUseWeek-Logo-Blue

This is Fair Use Week, an annual recognition of the most important user right in U.S. copyright law. Today we celebrate fair use and fair dealing along with a host of other participating groups and organizations.

The fundamental goal of fair use aligns with the Internet Archive’s mission of providing universal access to all human knowledge. Fair use is often called the “safety valve” of copyright law, built in to ensure that the protection granted to authors doesn’t stifle the very creativity and innovation it was designed to promote. Libraries serve as guardians of the public’s access to information and facilitate education, research, scholarship, creativity, and discovery—activities essential to the functioning of our democratic society. Fair use plays a similar role in the legal world, allowing access and reuse of materials in order to criticize or comment on them, for educational purposes, or in ways that alter the original with a new message or meaning.

Over the years, the flexible nature of fair use has supported the creation and use of new technologies, like the VCR for home recording of television programs, or search engines for the web. It has also helped libraries to adapt to new technologies and bring traditional library functions into the digital age, for example, by allowing libraries to digitize books in their collections for the purposes of building search tools and providing access to the blind and print disabled. Fair use allows artists and musicians to reuse materials to comment on society and the world around them, bloggers to use photos of the people and organizations they are criticizing, and citizens to use videos to comment on the effectiveness of their elected officials. Fair use also allows regular people to engage with our culture, from debating the color of a dress to making creative mashups of existing works.

People across the web have engaged in the creative remixing of materials hosted here at the Archive. For example, we have a collection dedicated to mashups created from the Prelinger film archives. Take a look at one of our favorites: https://archive.org/details/bonobocirrus

Want to make your own mashup from our collection, but not sure how fair use works exactly? Check out this guide to best practices in fair use for online video, which provides some helpful guidelines for understanding how to use fair use. Fair use week is the perfect time to learn about and exercise your own fair use rights.

Posted in Announcements, News | 3 Comments

Internet Archive’s Youngest Volunteer– by b. George

baby-internAt two-and-a-half months, Zinnia Dupler takes the cake as the youngest volunteer to give us a hand here at the Internet Archive. Strapped to her mom, Lindsey, the duo is hard at work out here in our Richmond warehouse, as we sort about 100,000 LPs.  Ten minutes after taking this photo, I encountered a little musical gem on the other side of the warehouse – but we hid it from her crying eyes.

It was in a pile of records being boxed by slightly older interns working on the 48,000 seventy-eights we got from the Batavia Public Library in Illinois, part of the Barrie H.Thorp Collection.

Now this is the first time we’ve had a chance to have a look at this great collection, and so far, it’s quite a surprise. At least the first pallet hasburpin been box-after-box of hillbilly, country, and western swing records. Now I used to think I knew a bit about music. But after this, it’s back to school for me. Just so many artists I’ve never heard of or held a record by. You know, like the Burpin’ Baby warbler, Cactus Pryor and his Pricklypears!

In the ‘G’s alone there’s Curly Gribbs, Lonnie Glosson and the Georgians. Geeez! Did you know that Hank Snow had a recordin’ kid, Jimmy, and he cut “Rocky Mountain Boogie’ on 4 Star Records, or that Cass Daley, star of stage and screen, was the “Queen of Musical Mayhem?” Me neither.  The Davis Sisters, turns out, included a young Skeeter!  There was also a Black Gospel group named the Davis Sisters, also from the 40s, and we got some of those seventy-eights also.  Then there’s them Koen Kobblers, Bill Mooney and his Cactus Twisters, and Ozie Waters and the Colorado Hillbillies. No matter that they should be named the Colorado Mountaineers–they’re new to me.

B.-GeorgeB. George is the Music Curator for the Internet Archive. He is also the co-founder and Director of the ARChive of Contemporary Music in NYC.  ARC is a partner of the Internet Archive, where B. George and his staff help to curate the physical and digital music collections.

 

Posted in News | 2 Comments

Internet Archive Does Windows: Hundreds of Windows 3.1 Programs Join the Collection

Microsoft Windows was, to some people, too little, too late.

Released as Version 1.0 in 1985, the graphic revolution was already happening elsewhere, with other computer operating systems – but Microsoft was determined to catch up, no matter what it cost or took. Version 1.0 of their new multi-tasking navigation program (it was not quite an “Operating System”) appeared and immediately got marks for being a step in the right direction, but not quite a leap. Later versions, including versions 2.0 and 2.1, finished out the late 1980s with a set of graphics-oriented programs that could be run from DOS and allow the use of a mouse/keyboard combination (still new at the time) and a chance for Microsoft to be one of the dominant players in graphical interfaces. It also got them a lawsuit from Apple, which ultimately resulted in a many-years court case and a settlement in 1997 that possibly saved Apple.

Meanwhile, the Windows shell started to become more an more like an operating system, and the introduction of Windows 3.0 and 3.1 brought stability, flexibility, and ease-of-programming to a very wide audience, and cemented the still-dominant desktop paradigms in use today.

In 2015, the Internet Archive started the year with the arrival of the DOS Collection, where thousands of games, applications and utilities for DOS became playable in the browser with a single click. The result has been many hundreds of thousands of visitors to the programs, and many hours of research and entertainment.

This year, it’s time to upgrade to Windows.

win31logo

We’ve now added over 1,000 programs that run, in your browser, in a Windows 3.1 environment. This includes many games, lots of utilities and business software, and what would best be called “Apps” of the 1990s – programs that did something simple, like provide a calculator or a looping animation, that could be done by an individual or small company to great success.

windows

Indeed, the colorful and unique look of Windows 3/3.1 is a 16-bit window into what programs used to be like, and depending on the graphical whims of the programmers, could look futuristic or incredibly basic. For many who might remember working in that environment, the view of the screenshots of some of the hosted programs will bring back long-forgotten memories. And clicking on these screenshots will make them come alive in your browser.

screenshot_00 (2)screenshot_00 (3)screenshot_00 (4)When they focused on it, a developer could produce something truly unique and beautiful within the Windows 3.x environment. Observe this Role-Playing Game “Merlin”:

screenshot_01

But on the whole, the simple libraries for generating clickable boxes and rendering fonts, and an intent to “get the job done” meant that a lot of the programs would look like this instead:

payoff

(Then again, how complicated and arty does a program to calculate amortization amounts have to be?)

Windows 3.1 continues to be in use in a few corners of the world – those easily-written buttons-and-boxes programs drive companies, restaurants, and individual businesses with a dogged determination and extremely low hardware requirements (a recent news story revealed at least one French airport that depended on one).

Many people, though, moved on to Microsoft’s later operating systems, like Windows 95, ME, Vista, 7, and so on. Microsoft itself stopped officially supporting Windows 3.1 in 2001, 15 years ago.

But Windows 3.1 still holds a special place in computer history, and we’re pleased to give you a bridge back to this lost trove of software.

If you need a place to start without being overwhelmed, come visit the Windows Showcase, where we have curated out a sample set of particularly interesting software programs from 20 years ago.

As is often the case with projects like this, volunteers contributed significant time to help bring this new library of software online. Justin Kerk did the critical scripting and engineering work to require only 2 megabytes to run the programs, as well as ensure that the maximum number of Windows 3.1 applications work in the browser-based emulator. (Justin thanks Eric Phelps, who in 1994 wrote the SETINI.EXE configuration program). db48x did loader programming to ensure we could save lots of space. James Baicoianu did critical metadata and technical support. As always, the emulation for Windows and DOS-based programs comes via EM-DOSBOX, which is a project by Boris Gjenero to port DOSBOX into Javascript; his optimization work has been world-class. And, of course, a huge thanks to the many contributing parties of the original DOSBOX project.

Posted in Emulation, Software Archive | 100 Comments

How Will We Explore Books in the 21st Century?

OpenBooksI love working with the Internet Archive’s collections, especially the growing book collection. As an engineer and sometimes scholar, I know there’s a lot of human knowledge inside books that’s difficult to discover. What new things could we do to help our users discover knowledge in books?

Today, most people access books through card catalog search and full-text search — both essentially 20th century technologies. If you ask for something broad or ambiguous, because you don’t know what you’re looking for yet, any attempt to present a short list of the most relevant results is likely to be overly narrow, not inspiring discovery or serendipity.

For the past few months, I’ve been experimenting with a new way to visualize book contents. This experiment starts with one simple idea: Most sentences contain related things. If I see a concept and a year together in a sentence, the odds are that the two are related. Consider this sentence:

A new, Gregorian Calendar, was introduced by Pope Gregory XIII in 1582.

I’ll explain in a minute how I figured out that Gregorian Calendar and Pope Gregory XIII are things, and that 1582 is a year. Given that, what can we learn from the sentence? We can guess that these things and the year are probably associated with each other. This guess is sometimes wrong, but let’s try adding together data from around a hundred thousand books and see what happens:

GregCal-Example

Three years have a relatively large number of sentences containing “Gregorian calendar” and that year. Are these important dates in the history of the Gregorian Calendar? Yes: in 1582, Pope Gregory XIII had Catholic countries adopt this new calendar, replacing the Julian calendar. In 1752, England adopted it, and in 1918, after the Russian Revolution, Bolshevik Russia adopted it.

 

Let’s take a look at some of the actual book sentences from the most popular year, 1582:

1582

The Cambridge handbook of physics formulas

The routine is designed around FORTRAN or C integer arithmetic and is valid for dates from the onset of the Gregorian calendar, 15 October 1582.

The Cambridge history of English literature, 1660-1780

In 1582 Pope Gregory XIII (hence the name Gregorian
Calendar) ordered ten days to be dropped from October to make up for the errors that had crept into the so-called Julian Calendar instituted by Julius Caesar, which made the year too long and added a day every one hundred and twenty-eight years.

Chinese history : a manual

They give year, month, and day in cyclical characters and their equivalent in the Western calendar (using the modern Gregorian calendar even for pre-1582 dates).

The crest of the peacock : non-European roots of mathematics

Clavius was a member of the commission that ultimately reformed the Gregorian calendar in 1582.

You can give the experiment a try at https://books.archivelab.org/dateviz/.

Now that you’ve seen what the experiment looks like, let’s look at some of the details of building this visualization. (The code can be found on GitHub at https://github.com/wumpus/visigoth/.)

We need a way to find dates in sentences. Sometimes it’s obvious that something is a date: “January 31, 2016” or “Jan 2016.” Other times it’s more ambiguous: a 4 digit number might be a year, or it might be a section of a US law (“15 U.S.C. § 1692”), or a page number in a book. What I ended up doing was creating a series of patterns (see https://github.com/wumpus/visigoth/blob/master/visigoth/dateparse.py) that look for English helper words (“In 2016”, “before 1812”) before guessing that a 4-digit number is a date. While this technique has both false positives and false negatives, it works well enough not to hurt the visualization significantly.

The next item is generating the list of things (people, places, concepts, etc.) in a sentence. There are many techniques for doing this, ranging from computationally-expensive machine-learning libraries like the Stanford NER library, to using human-generated lists such as the US Library of Congress Name Authority Files. There’s also the complication of disambiguating things like “John Smith.” (Which “John Smith” of the hundreds do we mean?) To match the simple nature of the other algorithms in this experiment, I decided to use a very simple dataset: English Wikipedia article titles. Not only is this a comprehensive collection of encyclopedic things, but there are numerous human-generated “redirects,” which provide a list of synonyms for most article titles. For example, “Western calendar” is a redirect to “Gregorian Calendar,” and in fact numerous books do use the term “Western calendar” to refer to the Gregorian calendar.

Our next task is ranking. Two aspects of this visualization use ranks. First, the suggestions that come up while users are typing in the “thing” box are ordered by Wikipedia article popularity. Eventually we’ll have enough usage of this visualization that we can use our own users’ data to put suggestions in a better order. Until then, using Wikipedia popularity is a good way to make suggestions more relevant.

A ranking of the books themselves is useful in two ways. First, it’s used to pick which example sentences are shown for a given pair of thing/date. Second, given that I only had enough computational resources to process a fraction of the scanned books in the Internet Archive’s collection, I chose 82,000 books using the same ranking scheme. This ranking scheme doesn’t have to be that good in order to deliver a lot of benefit, so I chose a superficial approach of awarding points to academic book publishing houses, book references in Wikipedia articles, and book popularity data from Better World Books, which is a used bookseller & a partner of the Internet Archive.

What’s the result of the experiment? A relatively simple set of algorithms applied to a small collection of high-quality books seems to be both interesting and fun for users. As a next step, I would like to extend it to include a better list of “things”, and extract data from many more books. In a few years, we might have access to 100 times as many scanned books. By then, I hope to find several other new ways to explore book content.

Posted in Announcements, Books Archive, News | 14 Comments

(Educational) Film of the Week: A Shooting Gallery Called America (NBC, 1975)

Because of their role as pedagogical tools directed at students and the general public, educational films have often been the subject of controversy, especially when they tackle fraught social issues from a particular point of view. While it might seem like the debate on gun control, mass shootings and police violence has only recently mushroomed to extraordinary proportions — at least as far as its coverage in the print, broadcast and electronic press is concerned — the issue has a much longer history, including in documentaries and non-theatrical films.

One such film that originated as a TV documentary special on NBC, but whose inclusion in the Internet Archive’s educational films collection indicates its distribution in the K-12 and college film circuit, bears the rather poignant title A Shooting Gallery called America (1975).

The early 1970s were a period fraught with debate about gun control, especially after the assassinations of Martin Luther King, Jr. and Robert Kennedy.

Interestingly enough, a pamphlet issued the previous year by the National Coalition to Ban Handguns had the exact same title, providing evidence of a coordinated campaign for gun control that deployed statistics, testimonies and visual materials calculated to have an emotional impact.

The program caused as polarized a response in 1975 as one would expect a similar broadcast to cause today. NBC received thousands of letters from supporters of both sides of the debate (starting before the program had even been broadcast!) with arguments that have remained almost constant to the present day.

Said one: “We can give you our opinion of your Sunday, March 2nd special ‘Shooting Gallery Called America.’ It stank.

“We found it nothing more than a rehash of the same tired old theme: blame the instrument, not the criminal.”

Another read: “I would like to commend NBC for its coverage of the gun problems in this country. The special, A Shooting Gallery Called America, was very informative. I would like to see it again.”

Producer Lucy Jarvis who would go on to direct many similar documentaries on social causes, later recalled the storm of controversy unleashed by this special:

“People knew we were doing it, and we began to get lots of mail,” she said. ” Probably they were alerted by a national organization. Because there was such an emotional reaction, I didn’t want the program to go until I was doubly sure that everything was checked out.”

As a result the airing date was pushed back on two occasions.

The statistics presented  by the journalists — number of handguns and rifles, number of victims in shooting crimes and accidents — have only gotten worse with the passage of four decades. But the visual vocabulary established by documentaries like this one, from footage of shooting ranges to interviews at gun shows on the one hand and with families of victims of gun violence on the other, will be more than familiar to viewers of cable and network news in 2016.

As a recent article revisiting the program and its reception forty years ago put it in a rather rhetorical fashion: “Why has nothing changed in 40 years?”

 

 

Posted in Movie Archive, News | Comments Off on (Educational) Film of the Week: A Shooting Gallery Called America (NBC, 1975)