We Can Rebuild It: Using the Internet Archive to Discover Original Order

Guest post by: Amanda Hill, Archivist of the Community Archives of Belleville and Hastings County, a member of the Community Webs program and a contributor to the Internet Archive.

One of the things archivists get excited about is the importance of ‘original order’. This is the idea that the arrangement of records by their creator has significance to our understanding of the records themselves. Wherever possible, archivists will try to determine the original order of materials in their care.

An item received at the Community Archives of Belleville and Hastings County in 2015 presented something of a puzzle in this respect. It was a scrapbook from the First World War, of newspaper clippings and other memorabilia which had been pasted into a printed book. The binding of the book had partially come apart and the early pages of the scrapbook had been jumbled into no particular order, with clippings dated 1917 mixed in with those from 1916.

Examination of the scrapbook revealed that its owner was Alice Deacon, born in Belleville, Ontario, on September 27th, 1899. She was the child of Daniel Deacon and his wife, Catherine Dugan. During the First World War, the Deacons were living at 107 Station Street, Belleville. They were Roman Catholics and Alice was probably a student at St. Michael’s Academy on Church Street.

Alice had three older brothers: James, Frederick and Francis (Frank). Frank joined the Canadian Expeditionary Force on March 23rd, 1916 in Belleville and it may have been this event which triggered Alice’s interest in the war. Frank’s service record is available from Library and Archives Canada.

The scrapbook mainly comprises cuttings from The Daily Intelligencer newspaper during the war, where Alice carefully recorded references to Belleville boys overseas, sometimes annotating the clippings with her own observations about whether a man had returned from the front, or which school he had attended.

Alongside the newspaper extracts are other more personal items, such as postcards, theatre programs, calling cards, invitations and ticket stubs. This page illustrates some of the variety:

Here we find an invitation, two pressed flowers “from ruins of a French village, May 1917” and a picture “off a box of chocolates Jim gave me for my birthday, 1916.”

Alice did not begin with blank pages: she used a copy of Richardson’s New Method for the Piano-Forte, originally published in 1859 by Nathan Richardson. In between Alice’s pastings, we can see parts of the text of the underlying book. Some of the pages still had visible page numbers, although most did not, but the majority had at least some legible words and phrases. This was the key to re-creating Alice’s original order.

We discovered that the Richardson book had been digitized by the University of North Carolina at Chapel Hill and was available online through the Internet Archive.

This digital copy proved essential in discovering the original order of the scrapbook. Using the Internet Archive’s searching facility, we were able to locate the identifiable words and match them to the page numbers of the original book. Once all the pages were identified, it was a simple matter to put them back into the order they would have been in when the book was intact.

Alice’s brother Frank came home safely from the war and was demobilized on May 23rd, 1919. Alice worked as a stenographer and bookkeeper in Belleville until 1929, when she married Leo Houlihan in St. Michael’s Church. She then left Belleville to live with Leo in Lindsay, Ontario. She died in 1955 and was buried in the Our Lady of Mercy Roman Catholic cemetery in Sarnia, Ontario.

Her scrapbook arrived back in Belleville by mail, sixty years after Alice’s death. Thanks are due to the anonymous donor for sharing this glimpse into a young woman’s wartime life, and also to our colleagues at the University of North Carolina and the Internet Archive for making it possible to reconstruct the scrapbook as it was when Alice first created it.

We have mainly been sharing our newspaper collection at the Internet Archive, but once it was digitized, we felt compelled to share Alice’s scrapbook there, too! 

A New Approach To Understanding War Through Television News: Introducing The TV News Visual Explorer & The Belarusian, Russian & Ukrainian TV News Archive

For more than 20 years, the Internet Archive’s Television News Archive has monitored television news, preserving more than 9.5 million broadcasts totaling more than 6.6 million hours from across the world, with a continuous archive spanning the past decade. Today just a small sliver of that archive is accessible to journalists and scholars due to the inaccessibility of video at this scale: fast forwarding through that much television news is simply beyond the ability of any human to make sense of. The small fraction of programs that contain closed captioning, speech recognition transcripts or OCR’d onscreen text can be keyword searched through the TV Explorer and TV AI Explorer, but for the majority of this global multi-decade archive, there has until now been no way for researchers to assess and understand the narratives of television news at scale, especially the visual landscape that distinguishes television from other forms of media and which is so central to understanding many of the world’s biggest stories from war to pandemics to the economy.

As the TV News Archive enters its third decade, it is increasingly exploring the ways in which it can preserve the domestic and international response to global events as it did with 9/11 two decades ago. As a first step towards this vision, over the last few months the Archive has preserved more than 46,000 broadcasts from domestic Belarusian, Russian and Ukrainian television news channels, including (in the order they were added to the Archive) Russia Today (part of the Archive since July 2010 but included in this collection starting January 1), Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16).

Why preserve television news coverage in a time of war? For journalists today it makes it possible to digest and report on how the war is being framed and narrated, with an eye towards how these narratives influence and shape popular support for the conflict and its potential future trajectory. For future generations of scholars, it makes it possible to look back at the contemporary information environment and prevailing public information, perspectives, and narratives.

While there are myriad options for the general public to watch these channels today in realtime, there is no research-oriented archival interface designed for journalists and scholars to understand their coverage at the scale of days to months, to scan for key visuals and events and to comment, discuss and illustrate how nations are portraying major stories.

To address this critical need, today we are tremendously excited to unveil the Television News Visual Explorer, a collaboration of the GDELT Project, the Internet Archive’s Television News Archive and the Media-Data Research Consortium to explore new approaches to enabling rapid exploration and understanding of the visual landscape of television news.

The Visual Explorer converts each broadcast into a grid of thumbnails, one every 4 seconds, displayed in a grid six frames wide and scrolling vertically through the entire program, making it possible to skim an hour-long broadcast in a matter of seconds. Clicking on any thumbnail plays a brief 30 second clip of the broadcast at that point, making it trivial to rapidly triage a broadcast for key moments. The underlying thumbnails can even be downloaded as a ZIP file to enable non-consumptive computational analysis, from OCR to augmented search.

Machines today can catalog the basic objects and activities they see in video and generate transcripts of their spoken and written words, but the ability to contextualize and understand the meaning of all that coverage remains a uniquely human capability. No person could watch the entirety of the Archive’s 6.6 million hours of broadcasts, yet even just the 46,000 broadcasts in this new collection would be difficult for a single researcher to watch or even fast forward through in their entirety. Television’s linear format means coverage has historically been consumed a single moment at a time like a flashlight in a darkened warehouse. In contrast, this new interface makes it possible to see an entire broadcast all at once in a single display, making television news “skimmable” for the first time.

The Visual Explorer and this new research collection of Belarusian, Russian and Ukrainian television news coverage represent early glimpses into a new initiative reimagining how memory institutions like the Archive can make their vast television news archives more accessible to scholars, journalists and informed citizens. Beneath the simple and intuitive interface lies an immensely complex and highly experimental set of workflows prototyping both an entirely new scholarly and journalistic interface to television news and entirely new approaches to rapidly archiving international television coverage of global events.

Over the coming weeks, additional channels from the TV News Archive will become available through the new Visual Explorer, as well as a variety of experiments with the new lenses that tools like automatic transcription and translation can offer in helping journalists and scholars make sense of such vast realtime archives.

Get Started With The Television News Visual Explorer!

About Kalev Leetaru

For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.

New additions to the Internet Archive for May 2022

Many items are added to the Internet Archive’s collections every month, by us and by our patrons. Here’s a round up of some of the new media you might want to check out. Logging in might be required to borrow certain items. 

Notable new collections from our patrons: 

Books – 52,300 New items in May

This month we’ve added books on varied subjects in more than 20 languages. Click through to explore, but here are a few interesting items to start with:

Audio Archive – 89,325 New Items in May

The audio archive contains recordings ranging from alternative news programming, to Grateful Dead concerts, to Old Time Radio shows, to book and poetry readings, to original music uploaded by our users. Explore.

LibriVox Audiobooks – 92 New Items in May

Founded in 2005, Librivox is a community of volunteers from all over the world who record audiobooks of public domain texts in many different languages. Explore.

78 RPMs and Cylinder Recordings – 112 New Items in May

Listen to this collection of 78rpm records, cylinder recordings, and other recordings from the early 20th century. Explore.

Live Music Archive – 807 New Items in May

The Live Music Archive is a community committed to providing the highest quality live concerts in a lossless, downloadable format, along with the convenience of on-demand streaming (all with artist permission). Explore.

Netlabels223 New Items in May

This collection hosts complete, freely downloadable/streamable, often Creative Commons-licensed catalogs of ‘virtual record labels’. These ‘netlabels’ are non-profit, community-built entities dedicated to providing high quality, non-commercial, freely distributable MP3/OGG-format music for online download in a multitude of genres. Explore.

Movies – 110 New Items in May

Watch feature films, classic shorts, documentaries, propaganda, movie trailers, and more! Explore.

Special Event: Universal Access to All Knowledge @ The New York Society Library

Saturday, June 4, 2:00 PM ET
The New York Society Library, 53 East 79th Street, Manhattan
or by livestream


Register now for the in-person session or the livestream.

Watch event recording:

Join Brewster Kahle of the Internet Archive for a special two-part presentation and
discussion on using this massive resource and on the societal and policy
issues affecting access to knowledge.

This is a great time to be an archivist and librarian—digital memory is ever more important and more difficult to manage.

Advances in computing and communications mean that we can cost-effectively store every book, sound recording, movie, software package, and public webpage ever created and provide access to these collections via the Internet to students and adults all over the world. By using mostly existing institutions and funding sources, we can build this, as well as compensate authors, within the current worldwide library budget. Technological advances, for the first time since the loss of the Library of Alexandria, may allow us to collect all published knowledge in a similar way. But now we can take the original goal another step further to make all the published works of humankind accessible to everyone, no matter where they are in the world.
 
Will we allow ourselves to re-invent our concept of libraries and archives to expand and to use the new technologies?  This is fundamentally a societal and policy issue. These issues are reflected in our governments’ spending priorities, and in law.

This event takes place in two interrelated hours:
2:00-3:00 PM – The Internet Archive: What It Is and How to Use It
3:00-4:00 PM – Universal Access to All Knowledge: Technologies, Societies, Legalities

Register now for the in-person session or the livestream.

Music Library Association Opens Publications at Internet Archive

For librarians who specialize in caring for music collections, it can be challenging to keep up with the latest technology and resources in the profession. The Music Library Association recently helped address this problem by making many of its publications openly available online.

The MLA donated 21 of its monographs to the Internet Archive for digitization and worked with authors to make the material free to the public under Creative Commons licenses. 

The new collection of backlist titles includes information on careers in music librarianship and history of the field. It also covers planning and building music library collections, which can be complicated and involve individual creators and small publishers, said Kathleen DeLaurenti, who helped lead the partnership with the Internet Archive in her role as MLA’s first open access editor. There are also valuable materials on music library approaches to technical services—everything from how to preserve music materials to how to bind and catalog them.

“Increasingly in librarianship, we have people who are being tasked to do this work who don’t have a specialized background, especially in smaller organizations, rural places, and public libraries,” DeLaurenti said. “We’re really excited to be able to make this content available to folks who may not have access to professional development in those spaces, and who may be looking for some materials to bolster their training and their own work.”

The MLA has been publishing new research of interest to music librarians since the 1970s and wanted to find a platform to make the information easier to discover, said DeLaurenti, director of the Arthur Friedheim Library at the Peabody Institute of Johns Hopkins University in Baltimore. The Internet Archive provided the open infrastructure to share and leverage the work of the MLA, which is a small organization with about 1,000 members.

While the MLA began with 21 of the monographs, it is working to obtain rights clearance for an additional 20 titles and DeLaurenti hopes the online collection will grow. So far, authors have been excited that the association is making their work available as it increases access for scholars with the potential for more citations of their research.

The audience for the online collection will likely be “accidental music librarians”—people tasked with music library responsibilities who aren’t musicians but are looking for professional development resources in the area, DeLaurenti said, as well as individuals considering music librarianship as a career.

“As libraries are looking at what kinds of open infrastructure is out there and available, I think the work that the Internet Archive has done through COVID has really changed our perception and how they can work as a potential collaborator in that space,” DeLaurenti said. “We hope to continue different kinds of collaborations with [the Archive] in the future.”

LEARN MORE

Knowledge Rights 21 Calls for Action on Library Rights

Last week, Knowledge Rights 21 released a strong call to action to ensure that libraries can continue serving their centuries old role in society of providing access to knowledge to the public. Knowledge Rights 21 is an Arcadia funded project advocating for copyright and open access reform across Europe.

In their Position Statement on eBooks and eLending, Knowledge Rights 21 explains that government action is urgently needed because the market for eBooks now operates outside of the current copyright law that permits libraries to acquire, lend and preserve physical books. Monopolistic behavior by commercial publishers including refusals to sell, embargoes, high prices, and restrictive licensing terms have frustrated libraries’ ability to undertake collection development, hurting those who rely on libraries for education, research, and cultural participation.

The Position Statement demands that “governments must wake up and act now before the rights of citizens to access information and learning through libraries are eroded any further.” The Statement proposes the following clarifications in EU law:

1.The right for libraries to acquire, preserve and make a digital reproduction of
an analogue and / or an electronic book / audiobook that has been made
available in the market under sale or licence;
2. No more copies than have been acquired under 1 above, shall be loaned to
members of the public at any one time. Libraries should have the right to lend
directly to users, as well as via other libraries as part of interlibrary loan;
3. Neither contracts nor technical protection measures shall be enforceable to
prevent this;
4. Any loans made under this shall require the payment of [Public Lending Right] monies by public libraries in line with existing practice with paper and or audiobooks.

The Internet Archive agrees that action on this issue is important and necessary. We are defending these principles in US court, in the lawsuit brought by four of the world’s largest publishers over our controlled digital lending program. We look forward to working with Knowledge Rights 21 and the library community “to help libraries not only to survive, but also to flourish” as the EU Court of Justice said in its landmark case supporting eBook lending by libraries.

Recent Report from IFLA: How well did copyright laws serve libraries during COVID-19?

The short answer to this question from a report recently published by IFLA appears to be: not very well at all. The report documents a worldwide survey of 114 libraries, 83% of which said they had copyright-related challenges providing materials during pandemic-related facility closures. The report also provides direct quotes from a series of interviews of library professionals, discussing the challenges they faced and often how difficult digital access to necessary materials such as textbooks has been throughout the last two years. As one librarian from the United States explains:

Times were tough. We were scrambling and worried about so many things – including the health and safety of our students, faculty and colleagues – and trying to spin up as much as possible in the way of service. There are certainly some vendors that we personally like the interactions with, but it felt to me like the publishers saw this as an opportunity just to make more money and not really an opportunity to build stronger connections with us and our library. They offered free things for a very limited period of time.

The report is well worth reading in its 22-page entirety. You can find it here.

Preserving Pro-Democracy Books From Shuttered Hong Kong Bookstore

Albert Wan ran Bleak House Books, an independent bookstore in Hong Kong, for nearly five years, before closing it in late 2021. The changing political climate and crackdown on dissent within Hong Kong made life too uncertain for Wan, his wife and two children. 

As they were preparing to move, Wan packed a box of books at risk of being purged by the government. He brought them on a plane back to the United States in January and donated them to the Internet Archive for preservation. 

The collection includes books about the pro-democracy protests of 2019—some photography books; another was a limited edition book of essays by young journalists who covered the event. There was a book about the Tiananmen Square massacre and volumes about Hong Kong politics, culture and history—most written in Chinese. 

“In Hong Kong, because the government is restricting and policing speech in a way that is even causing libraries to remove books from shelves, I thought that it would be good to digitize books about Hong Kong that might be in danger of disappearing entirely,” Wan said.

“I thought that it would be good to digitize books about Hong Kong that might be in danger of disappearing entirely.”

Albert Wan, owner of the now-closed Bleak House Books

Hearing that Bleak House Books would be shutting its doors, the Internet Archive reached out and offered to digitize its remaining books. As it happens, Wan said his inventory was dwindling quickly. So, he gathered contributions from others, and along with some from his own collection, donated about thirty books and some periodicals to the Internet Archive for preservation and digitization. Wan said he was amazed at how flexible and open the Archive was in the process, assisting with shipping and scanning the materials at no cost to him. (See Hong Kong Community Collection.)

Now, Wan wants others to do the same.

“There are still titles out there that have never been digitized and might be on the radar for being purged or sort of hidden from public view,” Wan said. “The hope is that more people would contribute and donate those kinds of books to the Archive and have them digitized so that people still have access to them.”

Do you have books you’d like to donate to the Internet Archive? Learn more.

Wan said he likes how the Internet Archive operates using controlled digital lending (CDL) where the items can be borrowed one at time, not infringing on the rights of the authors, while providing broad public access.

Before his family moved to Hong Kong for his wife’s university teaching job, Wan was a civil rights and criminal defense attorney in private practice. Now, they are all getting settled in Rochester, New York, where Wan plans to open another bookshop.

Memorial Day BBQ, Live Music and Lost Landscapes at the Internet Archive – Monday May 30, 2022

Calling all SF Cineastes and Archivists!

LOST LANDSCAPES is BACK!! With BBQ and live music too!

Come join the fun this Memorial Day and hang out with us at the Internet Archive.

$1 hotdogs, live music by the Traveling Wilburys Revue then onto a screening of Prelinger Archives’ “Lost Landscapes: Earth, Fire, Air, Water: California Infrastructures“.

Date: Monday, May 30, 2022
When:
5:30 PM BBQ – 6:30 PM Live Music – 8:15 PM Film Screening
Where:
300 Funston Ave., San Francisco, CA
Cost:
$15.00

GET YOUR TICKETS HERE

Goodbye Facebook. Hello Decentralized Social Media?

The pending sale of Twitter to Elon Musk has generated a buzz about the future of social media and just who should control our data.

Wendy Hanamura, director of partnerships at the Internet Archive, moderated an online discussion April 28 “Goodbye Facebook, Hello Decentralized Social Media?” about the opportunities and dangers ahead. The webinar is part of a series of six workshops, “Imagining a Better Online World: Exploring the Decentralized Web.” 

Watch the session recording:

The session featured founders of some of the top decentralized social media networks including Jay Graber, chief executive officer of R&D project Bluesky, Matthew Hodgson, technical co-founder of Matrix, and Andre Staltz, creator of Manyverse. Unlike Twitter, Facebook or Slack, Matrix and Manyverse have no central controlling entity. Instead the peer-to-peer networks shift power to the users and protect privacy. 

If Twitter is indeed bought and people are disappointed with the changes, the speakers expressed hope that the public will consider other social networks. “A crisis of this type means that people start installing Manyverse and other alternatives,” Staltz said. “The opportunity side is clear.” Still in the transition period if other platforms are not ready, there is some risk that users will feel stuck and not switch, he added.

Hodgson said there are reasons to be both optimistic and pessimistic about Musk purchasing Twitter. The hope is that he will use his powers for good, making it available to everybody and empowering people to block the content they don’t want to see. The risk is with no moderation, Hodgson said, people will be obnoxious to one another without sufficient controls to filter, and the system will melt down. “It’s certainly got potential to be an experiment. I’m cautiously optimistic on it,” he said.

People who work in decentralized tech recognize the risk that comes when one person can control a network and act for good or bad, Graber said. “This turn of events demonstrates that social networks that are centralized can change very quickly,” she said. “Those changes can potentially disrupt or drastically alter people’s identity, relationships, and the content that they put on there over the years. This highlights the necessity for transition to a protocol-based ecosystem.” 

When a platform is user-controlled, it is resilient to disruptive change, Graber said. Decentralization enables immutability so change is hard and is a slow process that requires a lot of people to agree, added Staltz.

The three leaders spoke about how decentralized networks provide a sustainable alternative and are gaining traction. Unlike major players that own user data and monetize personal information, decentralized networks are controlled by users and information lives in many different places.

“Society as a whole is facing a lot of crises,” Graber said. “We have the ability to, as a collective intelligence, to investigate a lot of directions at once. But we don’t actually have the free ability to fully do this in our current social architecture…if you decentralize, you get the ability to innovate and explore many more directions at once. And all the parts get more freedom and autonomy.”

Decentralized social media is structured to change the balance of power, added Hanamura: “In this moment, we want you to know that you have the power. You can take back the power, but you have to understand it and understand your responsibility.”

The webinar was co-sponsored by DWeb and Library Futures, and presented by the Metropolitan New York Library Council (METRO).

The next event in the series, Decentralized Apps, the Metaverse and the “Next Big Thing,” will be held Thursday, May 26 at 4-5 p.m.EST, Register here