Category Archives: Announcements

Thank you Ubuntu and Linux Communities

The Internet Archive is wholly dependent on Ubuntu and the Linux communities that create a reliable, free (as in beer), free (as in speech), rapidly evolving operating system. It is hard to overestimate how important that is to creating services such as the Internet Archive.

When we started the Internet Archive in 1996, Sun and Oracle donated technology and we bought tape robots. By 1999, we shifted to inexpensive PC’s in a cluster, running varying Linux distributions.  

At this point, almost everything that runs on the servers of the Internet Archive is free and open-source software. (I believe our JP2 compression library may be the only piece of proprietary software we use.)

For a decade now, we have been upgrading our operating system on the cluster to the long-term support server Linux distribution of Ubuntu. Thank you, thank you. And we have never paid anything for it, but we submit code patches as the need arises.

Does anyone know the number of contributors to all the Linux projects that make up the Ubuntu distribution? How many tens or hundreds of thousands? Staggering.   

Ubuntu has ensured that every six months a better release comes out, and every two years a long-term release comes out. Like clockwork. Kudos. I am sure it is not easy, but it is inspiring, valuable and important to the world.

We started with Linux in 1997, we started with Ubuntu server release Warty Warthog in 2004 and are in the process of moving to Focal (Ubuntu 20.4).

Depending on free and open software is the smartest technology move the Internet Archive ever made.

1998: https://www.sfgate.com/business/article/Archiving-the-Internet-Brewster-Kahle-makes-3006888.php

Internet Archive servers running at the Biblioteca Alexandrina circa approximately 2002.

2002: https://archive.org/about/bibalex.php

2013: https://www.theguardian.com/technology/2013/apr/26/brewster-kahle-internet-archive

petabox2.JPG

2021: Internet Archive

Internet Archive’s Modern Book Collection Now Tops 2 Million Volumes

The Internet Archive has reached a new milestone: 2 million. That’s how many modern books are now in its lending collection—available free to the public to borrow at any time, even from home.

“We are going strong,” said Chris Freeland, a librarian at the Internet Archive and director of the Open Libraries program. “We are making books available that people need access to online, and our patrons are really invested. We are doing a library’s work in the digital era.”

The lending collection is an encyclopedic mix of purchased books, ebooks, and donations from individuals, organizations, and institutions. It has been curated by Freeland and other librarians at the Internet Archive according to a prioritized wish list that has guided collection development. The collection has been purpose-built to reach a wide base of both public and academic library patrons, and to contain books that people want to read and access online—titles that are widely held by libraries, cited in Wikipedia and frequently assigned on syllabi and course reading lists.

“The Internet Archive is trying to achieve a collection reflective of great research and public libraries like the Boston Public Library,” said Brewster Kahle, digital librarian and founder of the Internet Archive, who began building the diverse library more than 20 years ago.

“Libraries from around the world have been contributing books so that we can make sure the digital generation has access to the best knowledge ever written,” Kahle said. “These wide ranging collections include books curated by educators, librarians and individuals, that they see are critical to educating an informed populace at a time of massive disinformation and misinformation.”

The 2 million modern books are part of the Archive’s larger collection of 28 million texts that include older books in the public domain, magazines, and documents. Beyond texts, millions of movies, television news programs, images, live music concerts, and other sound recordings are also available, as well as more than 500 billion web pages that have been archived by the Wayback Machine. Nearly 1.5 million unique patrons use the Internet Archive each day, and about 17,000 items are uploaded daily.

Presenting the (representative) 2 millionth book

Every day about 3,500 books are digitized in one of 18 digitization centers operated by the Archive worldwide. While there’s no exact way of identifying a singular 2 millionth book, the Internet Archive has chosen a representative title that helped push past the benchmark to highlight why its collection is so useful to readers and researchers online.

On December 31, The dictionary of costume by R. Turner Wilcox was scanned and added to the Archive, putting the collection over the 2 million mark. The book was first published in 1969 and reprinted throughout the 1990s, but is now no longer in print or widely held by libraries. This particular book was donated to Better World Books via a book bank just outside of London in August 2020, then made its way to the Internet Archive for preservation and digitization. 

“The dictionary of costume” by R. Turner Wilcox, now available for borrowing at archive.org.

As expected from the title, the book is a dictionary of terms associated with costumes, textiles and fashion, and was compiled by an expert, Wilcox, the fashion editor of Women’s Wear Daily from 1910 to 1915. Given its authoritative content, the book made it onto the Archive’s wish list because it is frequently cited in Wikipedia, including on pages like Petticoat and Gown

Now that the book has been digitized, Wikipedia editors can update citations to the book and include a direct link to the cited page. For example, users reading the Petticoat page can see that page 267 of the book has been used to substantiate the claim that both men & women wore a longer underskirt called a “petticote” in the fourteenth century. Clicking on that reference will take users directly to page 267 in The dictionary of costume where they can read the dictionary entry for petticoat and verify that information for themselves. 

Screenshots showing how Wikipedia users can verify references that cite “The dictionary of costume” with a single click.

An additional reason why this work is important is that there is no commercial ebook available for The dictionary of costume. This book is one of the millions of titles that reached the end of its publishing lifecycle in the 20th century, so there is no electronic version available for purchase. That means that the only way of accessing this book online and verifying these citations in Wikipedia—doing the kind of research that students of all ages perform in our connected world—is through a scanned copy, such as the one now available at the Internet Archive. 

Donations play an important role

Increasingly, the Archive is preserving many books that would otherwise be lost to history or the trash bin.

In recent years, the Internet Archive has received donations of entire library collections. Marygrove College gave more than 70,000 books and nearly 3,000 journal volumes for digitization and preservation in 2019 after the small liberal arts college in Detroit closed. The well-curated collection, known for its social justice, education and humanities holdings, is now available online at https://archive.org/details/marygrovecollege.

Several seminaries have donated substantial or complete collections to the Archive to preserve items or to give them a new life as their libraries were being moved or downsized. Digital access is now available for items from the Claremont School of Theology, Hope International University, Evangelical Seminary, Princeton Theological Seminary, and Anabaptist Mennonite Biblical Seminary

Just like The dictionary of costume, many of the books supplied for digitization come to the Archive from Better World Books. In its partnership over the past 10 years, the online book seller has donated millions of books to be digitized and preserved by the Archive. Better World Books acquires books from thousands of libraries, book suppliers, and through a network of book donation drop boxes (known as “book banks” in the UK), and if a title is not suitable for resale and it’s on the Archive’s wish list, the book is set aside for donation.  

“We view our role as helping maximize the life cycle and value of each and every single book that a library client, book supplier or donor entrusts to us,” said Dustin Holland, president and chief executive officer of Better World Books. “We make every effort to make books available to readers and keep books in the reading cycle and out of the recycle stream. Our partnership with the Internet Archive makes all this possible.”

The Archive provides another channel for customers to find materials, Holland added.

“We view archive.org as a way of discovering and accessing books,” said Holland. “Once a book is discoverable, the more interest you are going to create in that book and the greater the chance it will end up in a reader’s hands as a new or gently used book.”

Impact

Having books freely available for borrowing online serves people with a variety of needs including those with limited access to libraries because of disabilities, transportation issues, people in rural areas, and those who live in under-resourced parts of the world.

Sean, an author in Oregon said he goes through older magazines for design ideas, especially from cultures that he wouldn’t be exposed to otherwise: “It gives me a wider understanding of my small place in the global historical context.” One parent from San Francisco said she uses the lending library to learn skills like hand drawing to draw characters and landscapes to interact deeper with her child.

The need for information is more urgent than ever.

“We are all homeschoolers now. This pandemic has driven home how important it is to have online access to quality information,” Kahle said. “It’s gratifying to hear from teachers and parents that are now given the tools to work with their children during this difficult time.”

Kahle’s vision is to have every reference in Wikipedia be linked to a book and for every student writing a high school report to have access to the best published research on their subject. He wants the next generation to become authors of the books that should be in the library and the most informed electorate possible.

Adds Kahle: “Thank you to all who have made this possible – all the funders, all the donors, the thousands who have sent books to be digitized. If we all work together, we can do another million this year.”

Take action

If you’re interested in making a physical donation to the Internet Archive, there are instructions and an online form that start the process in the Internet Archive’s Help Center: How do I make a physical donation to the Internet Archive?

Suggested Reading List for At Home Learners

Many students do not have direct or unrestricted access to their local libraries during our current health crisis. One of our goals as librarians and stewards is to bring books to these learners of all ages as they continue their educations at home.

As a step toward this, we have created a collection of California State Suggested Reading
that is based on resources from the California Department of Education. This is intended to help students, teachers and their families find books for further learning. (As with any collection, we recommend that adults review items for age-appropriateness before passing them on to children.)

You can find another trove of books curated for students in the Univeral School Library.

And we’ve also created some resource lists for different areas of interests in the Kid Friendly section of our help center. These are fun to explore by yourself, in company or over the internet with friends, and cover topics like :

We hope this is a helpful springboard for your journey into any topic.

Stay safe and healthy, and thanks for using the archive!

Community gathers for an online celebration of Michelson Cinema Research Library

Lillian Michelson was celebrated as a “force of nature” librarian devoted to helping Hollywood filmmakers get the details right at an event on January 27 to unveil a new online home at the Internet Archive for her extensive collection of books, photos, scrapbooks and clippings.

The Michelson Cinema Research Library was opened with an animated version of the research icon cutting a virtual ribbon to an audience of more than 300 people watching online. The public got a first glimpse of 1,300 books that are now digitally available—part of a million items in the rich collection that Michelson donated to the Internet Archive in December.

“Now, for the first time, anyone, anywhere on the planet can go roam into the halls of Lillian’s research library,” said Thomas Walsh, a production designer and former president of the Art Directors Guild, speaking on a panel at the event. “It’s a really unique, eclectic collection. The books go back to the 1700s. Nothing she had is in print anymore. It’s an extraordinary range of material.”

The Michelson Cinema Research Library included some 1600 boxes of photographs, clipping files and books, used by production designers and art directors to create the visual look for a movie.

Walsh looked for nearly eight years to find a place to house the content, which art directors and others relied on for creating accurate visual backstories to movies. Whether it was finding blueprints of a nuclear submarine or photos of the interior of a 1950s police station, Michelson was respected for being tenacious in pursuit of answers to inform movie productions. In Michelson’s decades of research, she worked on movies such as Rosemary’s Baby, Scarface, Fiddler on the Roof, Full Metal Jacket, The Graduate, and The Birds.

At the Internet Archive Physical Repository, Brewster Kahle greets the arrival of materials from the Michelson Cinema Research Library in December 2020.

Brewster Kahle, Digital Librarian and founder of the Internet Archive, said he was amazed when he opened up the first boxes from the Michelson Library and saw the variety and extent of raw materials. Making an internet equivalent of the library will be a huge challenge, but one that also is a great opportunity.

“It’s not just a hodge-podge of used books. It is a complete collection that served a community. It comes with a focus,” Kahle said of the Michelson Cinema Research Library that filled some 1600 boxes on 45 pallets. “The Internet Archive is starting to receive whole libraries. The idea of bringing those online is not just bringing those books and materials online. It’s bringing a community online.”

Daniel Raim, Academy Award-nominated director and panelist at the event, described Michelson as a storyteller whose work was central to helping create a movie’s narrative.

“I always found it fascinating to spend time in Lillian’s library— — and now online at the Internet Archive—it sparks your imagination,” said Raim, who produced and directed a 2015 documentary Harold and Lillian: A Hollywood Love Story, about Michelson, now 92, and her husband, Harold, a storyboard artist and production designer, who died in 2007. The film pays tribute to the beloved couple and their contribution behind the scenes to some of the greatest movies in the past 50 years.

After the panel discussion, the Internet Archives hosted a viewing party of the documentary.

Bay State College ‘Flips to Digital’ by Donating Entire College Library to the Internet Archive

Bay State College’s Boston Campus has donated its entire undergraduate library to the Internet Archive so that the digital library can preserve and scan the books, while allowing Bay State to gain much needed open space for student collaboration. By donating and scanning its 11,000-volume collection centered on fashion, criminal justice, allied health, and business books, Bay State’s Boston campus decided to “flip entirely to digital.”

When it came to what to do with the books, Jessica Neave, librarian at Bay State College, had to get creative. “I didn’t have a library close by willing to take our collection,” Neave explained. Shortly after reaching out to our partners at Better World Books, she stumbled upon the Inside Higher Education article about the Marygrove College Library donation. This led Neave to our physical item donation form, where she laid out her library’s tight timeline to deaccession its entire print collection. “You guys made it so easy,” Bay State’s librarian said. “It couldn’t have been any easier!”

Internet Archive team members having fun with the task of packing and shipping an entire library collection.

Under the direction of Neave, an Internet Archive team packed and shipped the 11,000 books in the first week of December.

Considering the future of Bay State’s books, its librarian is hopeful, noting, “Thanks to the Internet Archive, the books can live on as a cohesive collection.” Patrons can look forward to thumbing through historic fashion and textile books, texts on the history of the Civil Rights Movement, graphic novels, and even Bay State’s collection of historically banned young adult books.

More than 100 years of Editor & Publisher Now Fully Accessible Online on the Internet Archive

[press: niemanlab]

Editor & Publisher Magazine, 1901

When Mike Blinder acquired Editor & Publisher magazine in October 2019, he inherited boxes of back issues that he put in a climate-controlled storage unit near his home in Tampa, Florida.  Leafing through the old — and sometimes brittle — pages of the journalism trade publications, he noticed a reference to microfilm access to the content. 

Blinder, a media consultant for more than 40 years, says he turned detective trying to track down the missing E&P on microfilm. The odyssey would take him to past owners and bankruptcy documents before discovering that the canisters of microfilm had been purchased and given to the Internet Archive to digitize and make available to the public for free.

When Blinder called Brewster Kahle of the Internet Archive and found out we had the microfilm for his back issues, he was very excited to find the microfilm was not only safe, but that the Internet Archive would digitize all of the issues at no cost to him. Blinder enthusiastically gave permission for the full 100-year history to be read and downloaded by anyone, anywhere – along with E&P’s International Yearbook and Market Guide. Going beyond the Internet Archive’s traditional lending system ensures it can be indexed by search engines and made maximally useful to readers and researchers.

Mike Blinder, Publisher

“I just went nuts,” Blinder recalls of learning about the project earlier this year. “I read history all the time. The fact that content about this incredible industry was available to humanity was exceptionally exciting.”

The ability to research these archived issues has been truly exciting, especially for those looking up historical documents, many with a personal or family connection. Amy Levine is the daughter of the former publisher of sixteen small newspapers in Northern California. When Levine looked up the past issues featuring her father Mort and his legacy, she was all over it. “I loved it,” said Levine, “and I showed my father– he loved sharing his past accomplishments with me.”

Hiring a company to scan the stacks of print copies would have been a massive and expensive undertaking that Blinder says he didn’t have the funds to do. Turning microfilm into digital content is much easier and the process was underway at the Internet Archive at no cost to him.

“It was good news,” Blinder says. “[The press] is part of the constitution. Our founding fathers told us that we needed to exist.”

While Blinder says he thinks there is enough demand to charge for access to this collection of magazines, he’s glad the information is freely available for journalism students, scholars, and the general public.  Blinder plans to tell his colleagues in the media business about the newly established digital collection and says he’s confident there will be an audience for the material.

“There are a lot of people who study our industry. We are in such a crisis now,” says Blinder of the competition from social media and struggles of daily newspapers. Over the years, E&P has chronicled changes in the business during times of recession, war, and cultural change. It includes awards for publishers of the year and young journalism professionals. Blinder anticipates the collection will appeal to people who want to look back at past trends.

“There are a lot of people who study our industry. We are in such a crisis now”

Mike Blinder, Editor & Publisher magazine

Since purchasing E&P last year, Blinder and his wife, Robin, have been able to turn the operation around, doubling its revenues and tripling its audience.

“We love journalism. We want to talk about journalism as a business,” Blinder says. “We are an independent voice for the industry.”

Now you can read 100 years of history in the archive of Editor & Publisher magazine.

Pastor: “Profound Gift” to Discover Books Through Internet Archive’s Program for Users with Print Disabilities

Authorized readers have special access to millions of digitized books through the Internet Archive’s program, connecting patrons with print disabilities to a vast digital library.

Doug Wilson says he’s a bit of a “bookaholic.” As senior pastor of a nondenominational church in West Covina, California, he surrounds himself with books at his office and study at home.

Pastor Doug Wilson is an avid reader who benefits from Internet Archive’s program for people with print disabilities.

“I’m a voracious reader and love learning,” said Wilson, who recalls taking a wagon to the library every week as a kid and bringing back a stack of books. “I find life intriguing.”

The 64-year-old said his vision has never been great, but within the last year noticed it was worsening. Wilson was struggling to read print with small font sizes or in low light.  An avid user of the Internet Archive, he learned about the Archive’s program for users with print disabilities, which allows authorized users to skip waitlists for the ebook collection and download protected EPUBs and PDFs. Wilson applied for the program and was granted access, with great results.

“It’s so helpful to be able to have multiple resources open at once on my computer,” said Wilson, who looks up material online from ancient thought to contemporary theology for his sermons. “It’s been wonderful to find something on just about anything.”  

Jessamyn West of the Vermont Mutual Aid Society, who helps qualified users with print disabilities gain access to Internet Archive’s lending library.

Signing up for the program was easy and fast, said Wilson. Students and researchers associated with a university can obtain access through their university library or student success center. For those outside higher ed, the process is run by Jessamyn West at the Vermont Mutual Aid Society, who receives requests through an online form from patrons around the world. People who qualify for the program include those with blindness, low-vision, dyslexia, brain injuries and other cognition problems who need extra time to interact with materials. Since October 2018, West has welcomed more than 5,600 users into the program.

“It makes a real difference to people’s lives,” said West. “Especially nowadays when many people are stuck at home and working with limited resources, having a world of accessible books available to them opens doors and expands horizons. I’ve seen people checking out books on drawing and painting, books about art history and comparative religions, and just a lot of fiction. The collection is truly extensive.”

With expanded access to digitized books, Wilson said he has been reconnecting with works written by many of his mentors through the digital theological collections. “It’s been a profound gift to discover those books that have been influential in my life. Getting access has been another way to be encouraged and mentored even from a distance,” said Wilson, adding he is in a season of life when many of those people are passing away. “The service has been so generous and a supplement to my own library.”

Doug Wilson reading to one of his youngest parishioners, his grandson Luca.

Being able to find the exact resource he needs from home and late at night is a convenience that Wilson said he values. Wilson has enjoyed books from Marygrove Library, a collection full of religious and social justice materials that was recently donated and made available online.

In an era with competing forms of information and disinformation, Wilson said the Internet Archive is important. “Wisdom is hard to come by,” said Wilson. “We are barraged with data in our culture. It can be hard to ferret out what’s real. To have access to actual information and works that have stood the test of time is a godsend.”

To learn more about the Internet Archive’s program for users with print disabilities, and to verify eligibility, please visit the program web page.

Happy Martin Luther King Jr. Day!

Today the United States commemorates the life of Dr. Martin Luther King, Jr—one of history’s most influential advocates for peace, equality, and civil rights. As a free digital library, the Internet Archive is home to thousands of books, texts, videos, images, and other materials on his work and impact. Here are a few ways you can use our materials to celebrate the life of Dr. King!

Watch

Dr. King was a major participant in the 1963 March on Washington for Jobs and Freedom, one of the largest rallies for human rights in American history—watch original newsreel footage of the March here! You can also listen to part of a commencement speech Dr. King gave at Hofstra University in 1965 and see contemporary reporting on his receipt of the 1964 Nobel Peace Prize.

Read

The Internet Archive’s collection of texts contains thousands of works both by and about Martin Luther King Jr., ranging from books for children to collections of his speeches. Our new Marygrove College Library collection includes several books on Dr. King, as well as the Civil Rights Movement and social justice.

If you’re interested in reading more on the African-American experience, you can also check out the #1000BlackGirlBooks collection and the Zora Canon. We’ve created some handy resource guides that include Antiracist & Racial Equality Reading Lists and Racial Equality Books for Kids. Finally, through the Community Webs program, our partners at the Schomburg Center for Research in Black Culture created the #HashtagSyllabusMovement web archive collection, which contains crowdsourced reading lists highlighting social justice issues within the Black community—a great place to start if you’re looking for antiracist reading material!

Contribute

The Internet Archive contains millions of items that have been uploaded, donated, or submitted by our users; your contributions make up a crucial part of our library. If you own any civil rights books, records, or physical media that you would like to see added to the archive, feel free to donate them! If you already have digital media—such as video, images, or audio of Martin Luther King Day celebrations or multimedia tributes—then feel free to upload it to the Internet Archive. And as always, if you see something online that you think should be added to our historical record, you can use the Wayback Machine’s Save Page Now feature to preserve it for posterity.

We hope you have a safe and happy Martin Luther King Jr. Day. Enjoy the archive!

-The Internet Archive Team


If you enjoyed this blog post and want to help support the Internet Archive, you can make a tax-deductible donation here. Thank you for helping us provide Universal Access To All Knowledge. 

Public Domain Day Short Film Contest Highlights Works of 1925

A still from the winning submission Danse des Aliénés (Dance of the Insane), featuring actress Greta Garbo.

Filmmakers responded with enthusiasm and creativity to a call from the Internet Archive to make short films using newly available content from 1925 in celebration of Public Domain Day. They discovered a new freedom in being able to remix film clips with Greta Garbo, magazine covers with flappers, and sheet music from standards like “Sweet Georgia Brown” – all downloadable for free and reusable without restriction.

For the contest, vintage images and sounds were woven into films of 2-3 minutes that conveyed a sense of whimsy, nostalgia, and humor. While some were abstract and others educational, they all showcased ingenuity and possibility when materials are openly available to the public.

 “The Internet Archive has spent  24 years collecting and archiving content from around the world…now is the time to see what people can do with it,” said Amir Saber Esfahani, director of special arts projects at the Internet Archive. He was a judge in the December short-film contest along with Carey Hott, professor of art and design at the University of San Francisco, and Brewster Kahle, digital librarian and founder of the Internet Archive.

The judges reviewed 23 entries and chose a winner based on creativity, variety of 1925 content (including lists of all sources), and fit for the event (fun, interesting and captivating). These new creative works may also be available for reuse, as indicated by the license term selected by the creator.

First place: Danse des Aliénés

Joshua Curry, a digital artist from San Jose, won first place for his submission, Danse des Aliénés (Dance of the Insane), in which he layers pieces of film on three panes with images rising and falling music to “Dance Macabre” (Dance of Death) performed by the Philadelphia Symphony Orchestra. The format was inspired by the poem dramatized in triptych in the short film In Youth, Beside the Lonely Sea.  His creation included flashes of Greta Garbo, ghosts from Koko Sees Spooks and colorful designs flowing in and out of the frames.

Curry, who has been making experimental videos since the 1980s, says the project was a perfect fit for his artist techniques, where he likes to stress and transform film in new ways. His film had a glitchy, broken feel that is in line with the aesthetic he often uses (See his other work at lucidbeaming.com.)

“I wanted it to be evocative and for people to appreciate it as a stand-alone piece of art,” says Curry, 49. “My visual goal was to produce something challenging that a wide variety of people could connect to – despite being mostly abstract and sourced from 95-year-old content.”

Filmmaker Joshua Curry.

While Curry’s studio is modern and full of electronic equipment, working with the 1925 content and hearing music with cartoonish voices making novelty, popping sounds their cheeks was a welcome break. “One day when I was choosing the music, I was driving around the city listening to songs and felt like I was transported back in time,” Curry says.

He says it was also a pleasure to have easy access to the public domain content without commercial gatekeeping or legal obstacles, which he often encounters with digital material he wants to remix. As it happens, Curry just completed a class in multimedia copyright. He says he works hard to operate within the rules because he wants his video creations to survive online and not be taken down because of copyright infringement allegations.

Having the works for this project in the public domain meant less time trying to get the content and more time to focus on the creative process. “It was like being a little kid who was told he couldn’t have cake and then one day saying: ‘Dive in!’,” Curry said of the access to the 1925 material in the Internet Archive.

Receiving the contest’s top honors was particularly meaningful, says Curry, because he works in Silicon Valley where the Internet Archive has “great nerd cred” and is a library that people revere.

“I was proud to win with weirdness,” Curry says. “My piece was abstract, without narration or titles, and an authentic tribute to the pioneering work of the experimental films I made use of.”

To learn more about Curry’s inspirations and to hear from him directly, watch the director’s commentary that was captured during the Public Domain Day event.

Second place: Vanishing Ink

Second place went to Alaro Brandon for Vanishing Ink, a film with a montage of clips from 16 movies including The Last Laugh and Hold My Baby and four songs: “Norwegian Dances,” “Song of the Vagabonds,” “I Want to be Happy” and “Hawaiian Ripples.”

Third place: Fashion of the 1920s

Arden Spivack-Teather, 12, and Sissel Ramierz, 13, both of San Francisco, won third place for their short film, Fashion of the 1920s. It traces the evolution of women’s clothing from tight-fitting styles that required corsets to drop-waisted, loose dresses popularized by flappers. “Women could finally be chic and comfortable at the same time,” the film notes. “Every time you notice a fabulous flowing frock, thank the 20s.”

Watch Fashion of the 1920s.

Arden found out about the contest through her mom, Cari Spivack, a staff member at the Internet Archive, and decided to partner with her friend, Sissel, a classmate since kindergarten who she had collaborated with for a winning science fair project in fifth grade. On Zoom and FaceTime, the girls looked through old McCalls magazines and decided to focus on the changing style of women’s clothing.

From left, filmmakers Arden Spivack-Teather and Sissel Ramirez.

“It was really fun to use our creativity and find things that would look good together,” said Arden, who had never before made a film. Although the research, script and editing were a challenge, she says she hopes to do it again.

Spivack said she enjoyed seeing her daughter explore the material in the Archive, giggling and musing at the kitchen table about the tonics and ads she discovered. “It was exactly what I was hoping would happen — that they would be gripped by fascination of a time period that was long gone. They could travel back and learn on their own, paging through a magazine just like someone their age would have in 1925.”

The diversity of approaches people had with the films was impressive, added Spivack.

“It’s a good introduction to what can be done with old materials. You can use them to learn and to educate others. Or you can reuse them to make something that’s completely unexpected or never seen before,” Spivack said. “As archivists everything is important. But you don’t know why until you see what it can turn into or what it informs in the future.”

Honorable mentions

  • Yo Hey Look! by Adam Dziesinski, which pieced together film clips where something caught an actor’s eye, from a baby in a wicker stroller to a woman with a bob haircut dancing to a man in a Bowler hat laughing.
  • Michaela Giles made a time-lapse film of her using oil pastels, pencils, paints, and pens to draw a profile of a woman gracing the front of a vintage publication in 1925 Magazine Cover Recreation.
  • Public Domain Day by Subhashish Panigrahi explained the basics of how copyright works with text interspersed with cartoon clips, colorful paintings, and magazine covers.
  • 25 Dad Jokes from 1925 by Anirvan Chatterjee was a compilation of jokes gathered from vintage middle school and high school yearbooks from Iowa, Pennsylvania, Massachusetts, Oregon, California.  Among the corny humor: “Why is the ocean so angry? It’s been crossed too many times,” and “What are the three most often words used in school? I don’t know.”

The films were shown at the December 17 Public Domain Day virtual party, where the creators were asked to discuss their projects in breakout room discussions. You can view a livestream of the event here.

Radio Ngrams Dataset Allows New Research into Public Health Messaging

Guest post by Dr. Kalev Leetaru

Radio remains one of the most-consumed forms of traditional media today, with 89% of Americans listening to radio at least once a week as of 2018, a number that is actually increasing during the pandemic. News is the most popular radio format and 60% of Americans trust radio news to “deliver timely information about the current COVID-19 outbreak.”

Local talk radio is home to a diverse assortment of personality-driven programming that offers unique insights into the concerns and interests of citizens across the nation. Yet radio has remained stubbornly inaccessible to scholars due to the technical challenges of monitoring and transcribing broadcast speech at scale.

Debuting this past July, the Internet Archive’s Radio Archive uses automatic speech recognition technology to transcribe this vast collection of daily news and talk radio programming into searchable text dating back to 2016, and continues to archive and transcribe a selection of stations through present, making them browsable and keyword searchable.

Ngrams data set

Building on this incredible archive, the GDELT Project and I have transformed this massive archive into a research dataset of radio news ngrams spanning 26 billion English language words across portions of 550 stations, from 2016 to the present.

You can keyword search all 3 million shows, but for researchers interested in diving into the deeper linguistic patterns of radio news, the new ngrams dataset includes 1-5grams at 10 minute resolution covering all four years and updated every 30 minutes. For those less familiar with the concept of “ngrams,” they are word frequency tables in which the transcript of each broadcast is broken into words and for each 10 minute block of airtime a list is compiled of all of the words spoken in those 10 minutes for each station and how many times each word was mentioned.

Some initial research using these ngrams

How can researchers use this kind of data to understand new insights into radio news?

The graph below looks at pronoun usage on BBC Radio 4 FM, comparing the percentage of words spoken each day that were either (“we”, “us”, “our”, “ours”, “ourselves”) or (“i”, “me”, “i’m”). “Me” words are used more than twice as often as “we” words but look closely at February of 2020 as the pandemic began sweeping the world and “we” words start increasing as governments began adopting language to emphasize togetherness.

“We” (orange) vs. “Me” (blue) words on BBC Radio 4 FM, showing increase of “we” words beginning in February 2020 as Covid-19 progresses

TV vs. Radio

Combined with the television news ngrams that I previously created, it is possible to compare how topics are being covered across television and radio.

The graph below compares the percentage of spoken words that mentioned Covid-19 since the start of this year across BBC News London (television) versus radio programming on BBC World Service (international focus) and BBC Radio 4 FM (domestic focus).

All three show double surges at the start of the year as the pandemic swept across the world, a peak in early April and then a decrease since. Yet BBC Radio 4 appears to have mentioned the pandemic far less than the internationally-focused BBC World Service, though the two are now roughly equal even as the pandemic has continued to spread. Over all, television news has emphasized Covid-19 more than radio.  

Covid-19 mentions on Television vs. Radio. The chart compares BBC News London (TV) in blue, versus BBC World Service (Radio) in orange and BBC Radio 4 FM (Radio) in grey.

For now, you can download the entire dataset to explore on your own computer but there will also be an interactive visualization and analysis interface available sometime in mid-Spring.

It is important to remember that these transcripts are generated through computer speech recognition, so are imperfect transcriptions that do not properly recognize all words or names, especially rare or novel terms like “Covid-19,” so experimentation may be required to yield the best results.

The graphs above just barely scratch the surface of the kinds of questions that can now be explored through the new radio news ngrams, especially when coupled with television news and 152-language online news ngrams.

From transcribing 3 million radio broadcasts into ngrams to describing a decade of television news frame by frame, cataloging the objects and activities of half a billion online news images, to inventorying the tens of billions of entities and relationships in half a decade of online journalism, it is becoming increasingly possible to perform multimodal analysis at the scale of entire archives.

Researchers can ask questions that for the first time simultaneously look across audio, video, imagery and text to understand how ideas, narratives, beliefs and emotions diffuse across mediums and through the global news ecosystem. Helping to seed the future of such at-scale research, the Internet Archive and GDELT are collaborating with a growing number of media archives and researchers through the newly formed Media Data Research Consortium to better understand how critical public health messaging is meeting the challenges of our current global pandemic.

About Kalev Leetaru

For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.