Author Archives: Alexis Rossi

Great Books by Women Authors

On March 8th New York Public Library’s Gwen Glazer published a wonderful list of books in celebration of International Women’s Day: 365 Books by Women Authors to Celebrate International Women’s Day All Year.

In the spirit of continuing to celebrate female authors past the confines of Women’s History Month, we’ve gathered some of these books into a special collection called Great Books by Women Authors to make it easier to find your next exceptional read. You will also find these books via Open Library as listed below. Happy reading!

Great Books by Women Authors
Leila Aboulela, The Kindness of Enemies
Susan Abulhawa, The Blue Between Sky and Water
Chimamanda Ngozi Adichie, Half of a Yellow Sun
Anna Akhmatova, The Complete Poems of Anna Akhmatova
Michelle Alexander, The New Jim Crow
Svetlana Alexievich, Voices From Chernobyl
Clare Allan, Poppy Shakespeare
Sarah Addison Allen, Lost Lake
Isabel Allende, Eva Luna
Karin Altenberg, Island of Wings
Julia Alvarez, In the Time of the Butterflies
Tahmima Anam, The Good Muslim
Natacha Appanah, The Last Brother
Chloe Aridjis, Asunder
Bridget Asher, All of Us and Everything
Margaret Atwood, Oryx & Crake
Jane Austen, Pride and Prejudice
Mariama Bâ, Scarlet Song
Toni Cade Bambara, Those Bones Are Not My Child
Gioconda Belli, The Inhabited Woman
Karen Bender, Refund
Elizabeth Bishop, Geography III
Katherine Boo, Behind the Beautiful Forevers
Charlotte Bronte, Jane Eyre
Emily Bronte, Wuthering Heights
Gwendolyn Brooks, The Bean Eaters
Lauren Buekes, The Shining Girls
NoViolet Bulawayo, We Need New Names
Judith Butler, Gender Trouble: Feminism and the Subversion of Identity
Leonora Carrington, The hearing trumpet
Theresa Hak Kyung Cha, Dictee
Susan Choi, American Woman
Kate Chopin, The Awakening
Sonya Chung, Long for This World
Caryl Churchill, Top Girls
Lucille Clifton, Mercy
Simin Daneshvar, Sutra & Other Stories
Tsitsi Dangarembga, Nervous Conditions
Edwidge Danticat, Claire of the Sea Light
Meaghan Daum, Unspeakable
Dola de Jong, The Tree and the Vine
Grazia Deledda, After the Divorce
Anita Desai, Clear Light of Day
Emily Dickinson, The Poems of Emily Dickinson
Joan Didion, Democracy
Rita Dove, On the Bus With Rosa Parks
Yasmine El Rashidi, Chronicle of a Last Summer
Nawal El Saadawi, Woman at Point Zero
George Eliot, Middlemarch
Buchi Emecheta, The Joys of Motherhood
Leslie Feinberg, Stone Butch Blues
Elena Ferrante, My Brilliant Friend
Penelope Fitzgerald, The Blue Flower
Paula Fox, Desperate Characters
Lauren Francis-Sharma, Til the Well Runs Dry
Ru Freeman, On Sal Mal Lane
Rivka Galchen, Atmospheric Disturbances
Mary Gaitskill, The Mare
Petina Gappah, The Book of Memory
Elena Garro, First love ; &, Look for my obituary
Louise Gluck, Faithful and Virtuous Night
Nadine Gordimer, The Conservationist
Jorie Graham, Erosion
Linda LeGarde Grover, The dance boots
Paula Gunn Allen, America the Beautiful: Last Poems
Marilyn Hacker, Names
Radclyffe Hall, The Well of Loneliness
Lorraine Hansberry, A Raisin in the Sun
Eve Harris, The Marrying of Chani Kaufman
Saidiya Hartman, Lose Your Mother: A Journey Along the Atlantic Slave Route
Shirley Hazzard, The Transit of Venus
Bessie Head, The Collector of Treasures
Amy Hempel, Reasons to Live
Cristina Henriquez, The Book of Unknown Americans
Christine Dwyer Hickey, The Cold Eye of Heaven
Patricia Highsmith, The Price of Salt
Arlie Hochschild, The Second Shift
Alice Hoffman, Survival Lessons
Sara Sue Hoklotubbe, Deception on All Accounts
bell hooks, Feminism is for Everybody: Passionate Politics
Keri Hulme, The Bone People
Dương Thu Hương, Paradise of the Blind
Hồ Xuân Hương, Spring Essence
Ulfat Idilbi, Grandfather’s Tale
Elfriede Jelinek, Women As Lovers
Han Kang, The Vegetarian
Mary Karr, The Liar’s Club
Kazue Kato, Blue Exorcist
Rupi Kaur, Milk and Honey
Porochista Khakpour, The Last Illusion
Vénus Khoury-Ghata, A House at the Edge of Tears
Suki Kim, Without You, There Is No Us
Jamaica Kincaid, See Now Then
Barbara Kingsolver, The Poisonwood Bible
Maxine Hong Kingston, The Woman Warrior
Natsuo Kirino, Out
Sana Krasikov, One More Year
Jean Kwok, Girl in Translation
Jhumpa Lahiri, The Lowland
Laila Lalami, Secret Son
Nella Larsen, Passing
Adrian Nicole LeBlanc, Random Family
Harper Lee, To Kill A Mockingbird
Yiyun Li, Kinder Than Solitude
Gloria Lisé, Departing at Dawn
Clarice Lispector, The Hour of the Star
Inverna Lockpezer, Cuba: My Revolution
Alia Mamdouh, The Loved Ones
Dacia Maraini, The Silent Duchess
Ronit Matalon, The Sound of Our Steps
Ayana Mathis, The Twelve Tribes of Hattie
Eimear McBride, A Girl Is a Half-Formed Thing
Carson McCullers, The Heart is a Lonely Hunter
Claire Messud, The Woman Upstairs
Ai Mi, Under the Hawthorn Tree
Gabriela Mistral, Selected Poems of Gabriela Mistral
Nadifa Mohamed, Black Mamba Boy
Lorrie Moore, Bark
Marianne Moore, The Poems of Marianne Moore
Toni Morrison, Sula
Bharati Mukherjee, The Tree Bride
Alice Munro, Family Furnishings
Iris Murdoch, A Severed Head
Eileen Myles, School of Fish
Azar Nafisi, The Republic of Imagination: America in Three Books
Celeste Ng, Everything I Never Told You
Hualing Nieh, Mulberry and Peach
Sara Nović, Girl at War
Adaobi Tricia Nwaubani, I Do Not Come to You by Chance
Silvia Ocampo, Thus Were Their Faces
Nnedi Okorafor, Binti
Julie Otsuka, The Buddha in the Attic
Helen Oyeyemi, Mr. Fox
Ruth Ozeki, All Over Creation
Cynthia Ozick, Foreign Bodies
ZZ Packer, Drinking Coffee Elsewhere
Grace Paley, The Little Disturbances of Man
Suzan-Lori Parks, Topdog/Underdog
Shahrnush Parsipur, Kissing the Sword
Ann Patchett, Bel Canto
Anna Politkovskaya, A Russian Diary
Katha Pollitt, Pro: Reclaiming Abortion Rights
Claudia Rankine, Citizen
Alifa Rifaat, Distant View of a Minaret and Others Stories
Suzanne Rivecca, Death Is Not An Option
Riverbend, Baghdad Burning
Arundhati Roy, The God of Small Things
Vedrana Rudan, Night
Sonia Sanchez, Does Your House Have Lions?
Sappho, The Complete Works of Sappho
Noo Saro-Wiwa, Looking for Transwonderland: Travels in Nigeria
Åsne Seierstad, The Angel of Grozny
Anne Sexton, The Complete Poems of Anne Sexton
Murasaki Shikibu, The Tale of Genji
Kyung-sook Shin, Please Look After Mom
Sei Shonagon, The Pillow Book
Ana Maria Shuah, The Weight of Temptation
Leslie Marmon Silko, Almanac of the Dead
Tracy K. Smith, Life on Mars
Betty Smith, A Tree Grows in Brooklyn
Marivi Soliven, The Mango Bride
Rebecca Solnit, A Field Guide to Getting Lost
Susan Sontag, Styles of Radical Will
Ahdaf Soueif, The Map of Love
Gertrude Stein, Fernhurst, Q.E.D., and other early writings
Aoibbhean Sweeney, Among Other Things, I’ve Taken Up Smoking
Elizabeth Crane, When the Messenger Is Hot
Amy Tan, The Valley of Amazement
Valerie Taylor, The Girls in 3-B
Lygia Fagunda Telles, The Girl in the Photograph
Lynne Tillman, No Lease on Life
Dubravka Ugresic, Thank You For Not Reading
Chika Unigwe, On Black Sisters Street
Kirstin Valdez Quade, Night at the Fiestas
Jean Valentine, Little Boat
Lara Vapnyar, There Are Jews in My House
Marja-Liisa Vartio, The Parson’s Widow
Josefina Vicens, The Empty Book
Alice Walker, The Color Purple
Sarah Waters, Fingersmith
Eudora Welty, The Optimist’s Daughter
Phillis Wheatley, The Poetry of Phillis Wheatley
Zoe Wicomb, You Can’t Get Lost In Cape Town
Joy Williams, The Visiting Privilege
G. Willow Wilson, Ms. Marvel
Virginia Woolf, Orlando
Alexis Wright, Carpentaria
Sarah E. Wright, This Child’s Gonna Live
Tiphanie Yanique, Land of Love and Drowning
Samar Yazbek, Cinnamon
Banana Yoshimoto, Kitchen
Haifa Zangana, Dreaming of Baghdad

Radio Ngrams Dataset Allows New Research into Public Health Messaging

Guest post by Dr. Kalev Leetaru

Radio remains one of the most-consumed forms of traditional media today, with 89% of Americans listening to radio at least once a week as of 2018, a number that is actually increasing during the pandemic. News is the most popular radio format and 60% of Americans trust radio news to “deliver timely information about the current COVID-19 outbreak.”

Local talk radio is home to a diverse assortment of personality-driven programming that offers unique insights into the concerns and interests of citizens across the nation. Yet radio has remained stubbornly inaccessible to scholars due to the technical challenges of monitoring and transcribing broadcast speech at scale.

Debuting this past July, the Internet Archive’s Radio Archive uses automatic speech recognition technology to transcribe this vast collection of daily news and talk radio programming into searchable text dating back to 2016, and continues to archive and transcribe a selection of stations through present, making them browsable and keyword searchable.

Ngrams data set

Building on this incredible archive, the GDELT Project and I have transformed this massive archive into a research dataset of radio news ngrams spanning 26 billion English language words across portions of 550 stations, from 2016 to the present.

You can keyword search all 3 million shows, but for researchers interested in diving into the deeper linguistic patterns of radio news, the new ngrams dataset includes 1-5grams at 10 minute resolution covering all four years and updated every 30 minutes. For those less familiar with the concept of “ngrams,” they are word frequency tables in which the transcript of each broadcast is broken into words and for each 10 minute block of airtime a list is compiled of all of the words spoken in those 10 minutes for each station and how many times each word was mentioned.

Some initial research using these ngrams

How can researchers use this kind of data to understand new insights into radio news?

The graph below looks at pronoun usage on BBC Radio 4 FM, comparing the percentage of words spoken each day that were either (“we”, “us”, “our”, “ours”, “ourselves”) or (“i”, “me”, “i’m”). “Me” words are used more than twice as often as “we” words but look closely at February of 2020 as the pandemic began sweeping the world and “we” words start increasing as governments began adopting language to emphasize togetherness.

“We” (orange) vs. “Me” (blue) words on BBC Radio 4 FM, showing increase of “we” words beginning in February 2020 as Covid-19 progresses

TV vs. Radio

Combined with the television news ngrams that I previously created, it is possible to compare how topics are being covered across television and radio.

The graph below compares the percentage of spoken words that mentioned Covid-19 since the start of this year across BBC News London (television) versus radio programming on BBC World Service (international focus) and BBC Radio 4 FM (domestic focus).

All three show double surges at the start of the year as the pandemic swept across the world, a peak in early April and then a decrease since. Yet BBC Radio 4 appears to have mentioned the pandemic far less than the internationally-focused BBC World Service, though the two are now roughly equal even as the pandemic has continued to spread. Over all, television news has emphasized Covid-19 more than radio.  

Covid-19 mentions on Television vs. Radio. The chart compares BBC News London (TV) in blue, versus BBC World Service (Radio) in orange and BBC Radio 4 FM (Radio) in grey.

For now, you can download the entire dataset to explore on your own computer but there will also be an interactive visualization and analysis interface available sometime in mid-Spring.

It is important to remember that these transcripts are generated through computer speech recognition, so are imperfect transcriptions that do not properly recognize all words or names, especially rare or novel terms like “Covid-19,” so experimentation may be required to yield the best results.

The graphs above just barely scratch the surface of the kinds of questions that can now be explored through the new radio news ngrams, especially when coupled with television news and 152-language online news ngrams.

From transcribing 3 million radio broadcasts into ngrams to describing a decade of television news frame by frame, cataloging the objects and activities of half a billion online news images, to inventorying the tens of billions of entities and relationships in half a decade of online journalism, it is becoming increasingly possible to perform multimodal analysis at the scale of entire archives.

Researchers can ask questions that for the first time simultaneously look across audio, video, imagery and text to understand how ideas, narratives, beliefs and emotions diffuse across mediums and through the global news ecosystem. Helping to seed the future of such at-scale research, the Internet Archive and GDELT are collaborating with a growing number of media archives and researchers through the newly formed Media Data Research Consortium to better understand how critical public health messaging is meeting the challenges of our current global pandemic.

About Kalev Leetaru

For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.

January 1st brings public domain riches from 1925

On January 1st, 2021, many books, movies and other media from 1925 will enter the public domain in the United States. Some of them are quite famous — jump ahead to see lists of those well known books and movies that you can enjoy on the Internet Archive — or take the scenic route with me.

Book cover: Mrs. Dalloway by Virginia Woolf

What does this all mean? Essentially, many items created in 1925 in the US that are still under copyright will become free and open for people to use in any way they see fit in the new year. But check out Duke Law’s Center for the Study of the Public Domain article for a more in-depth explanation.

We have a party every year to celebrate the new works entering the public domain, and this year is no exception. Join us on Thursday, Dec. 17th to toast these newly available additions.

Traveling from Home

As part of this yearly ritual, I explore our collections to unearth these newly freed items, and I invariably run across a few things that hit a nerve. This year, it started with this intertitle in “Isn’t Life Terrible?” Less than 20 seconds into this 1925 film, and suddenly I’m dumped back into 2020.

Silent film intertitle that reads, "Charley Chase as The poor young man with only two places to go -- Front yard and back yard"

Rude, right? I don’t even have a front yard to enjoy during shelter in place.

But the magic of media is that it can transport us to different places and times. Photo books like Picturesque Italy, Picturesque Mexico, and Picturesque Palestine, Arabia and Syria show us both how much and how little has changed in the past 95 years.

Screen shot thumbnail images from the book Picturesque Italy. The 12+ photos feature tourist sites in Venice, Italy like the Doges Palace, the Bridge of Sighs, and Piazza San Marco.

Gondolas still glide under the Bridge of Sighs, and the Tower of Pisa is still leaning, but the 1925 version of the Colosseum certainly lacks today’s fake gladiator photo ops.

Looking at the past with the eyes of today

Every toe dipped into the past has the potential to surprise or shock. The story of a pantry shelf, an outline history of grocery specialties is only mildly interesting on the surface. Essentially, it’s a sales pitch to food manufacturers encouraging them to advertise in a set of women’s magazines. The book contains short case histories of successful food brands like Maxwell House Coffee, Campbell Soup, Coca Cola, etc. (all of whom advertise with them, naturally).

The book gives you a glimpse of why people were so enthusiastic about mass produced, packaged foods. Unsanitary conditions, bugs in your sugar, milk going bad over night; things modern shoppers never think about.

It puts this glowing praise of Kraft Cheese into perspective: “…a pasteurized product, blended to obtain a uniformity of quality and flavor, a thing greatly lacking in ordinary types of cheese.” (page 149)

That’s pretty entertaining if you’re a cheese lover. I think most people would agree that Kraft cheese is no longer on the cutting edge.

But keep poking around and you find a much deeper cultural divergence. While The story of a pantry shelf is extolling the virtues of the home economics training available at Cornell, you stumble across this horrifying sentence (page 12).

Passage from "The Story of a Pantry Shelf" which reads, "Indeed, the Practice House, where students learn housekeeping in its every phase, even includes the complete care of a baby, adopted each year by Cornell for the benefit of these 'mothers' who, under the direction of trained Home Economics women, feed, bathe, dress and tend an infant from the tender age of two weeks throughout the session."

I was not expecting to read about orphaned babies being used as “learning aids” while flipping through stories about Jell-O. Intellectually, I know that attitudes towards children have changed over the years — the Fair Labor Standards Act, which set federal standards for child labor, wasn’t even passed until 1938. But this casual aside tossed in amongst the marketing hype still packs an emotional punch. It’s important to remember how far we have come.

Even writing that was forward-thinking for the time, like the booklet Homo-sexual life, is terribly backward according to today’s standards. It’s from the Little Blue Book series — we have many that were published in 1925, and the publisher was quite prolific for many years. The series provided working class people with inexpensive access to all kinds of topics including philosophy, sexuality, science, religion, law, and government. Post WWII, they published criticism of J. Edgar Hoover and the founder was subsequently targeted by the FBI for tax evasion. But in 1925, they were going strong and one of their prolific writers was Clarence Darrow.

Controversies of the Age

Darrow was writing about prohibition for the Little Blue Book series in 1925, but that is also the year he defended John T. Scopes for teaching evolution in his Tennessee classroom. The Scopes Trial generated a huge amount of publicity, pitting religion against science, and even giving rise to popular songs like these two 78rpm recordings from 1925.

The John T. Scopes Trial (The Old Religion’s Better After All) by Vernon Dalhart and Company

Monkey Biz-ness (Down in Tennessee) by International Novelty Orchestra with Billy Murray


Like the Scopes trial, prohibition had its passionate adherents and detractors. This was the “Roaring 20s” — the year The Great Gatsby was published — with speakeasies and flappers and iconic cocktails. And yet the pro-prohibition silent film Episodes in the Life of a Gin Bottle follows a bottle around as it lures people into a state of dissolution.

We even see an entire book about throwing parties that includes no alcoholic beverages at all.

The more things change, the more they stay the same

But as much as some things have changed, other aspects of our lives remain unchanged. People still want to tell you about their pets, rely on self help books, read stories to their kids, follow celebrities, tell each other jokes, and make silly videos.

And the most unchanging part of this particular season, of course — children still anticipate the arrival of Santa Claus with questions, wishes and schemes.

The silent film Santa Claus features two children who want to know where Saint Nick lives and how he spends his time. We follow him to the North Pole (Alaska in disguise) to see Santa’s workshop, snow castle, reindeer, and friends and neighbors. Jack Frost, introduced around 14:20, appears to be wearing the prototype for Ralphie’s bunny suit in “A Christmas Story” (but with a magic wand). Stick around for the sleigh crash at 20:45, and right around 22:20 Santa wipes out on the ice.

And just in case you’re still doing your holiday shopping, I feel like I should pass on a recommendation from this ad in a 1925 The Billboard magazine: Armadillo Baskets make beautiful Christmas gifts. And you can still buy vintage versions online – trust me, I looked. You’re welcome.

Advertisement with a picture of an armadillo and a basket made from an armadillo. Text reads, "Armadillo Baskets Make Beautiful Christmas Gifts. From these nine-banded horn-shelled little animals we make beautiful baskets. We are the original dealers in Armadillo Baskets. We take their shells, polish them, and then line with silk. They make ideal work baskets, etc. Let us tell you about these unique baskets. Write for Free Booklet. Apelt Armadillo Co., Comfort, Texas."

The Famous Stuff

And now on to the blockbusters of 1925…

Books First Published in 1925

Movies Released in 1925

Juneteenth – Freedom Day

The Emancipation Proclamation went into effect on January 1st, 1863, legally freeing 3.5 million enslaved people in the Confederate states. But of course, this executive order from President Abraham Lincoln came in the midst of the United States Civil War, which didn’t end until April of 1865 – the order could not be enforced until the war was over. 

Juneteenth celebrates when enslaved people actually became free in 1865. The date, June 19th, commemorates General Gordon Granger of the Union Army announcing the executive order in Galveston, Texas, freeing all enslaved people in Texas.

Community access TV stations around the country have shown local celebrations of Juneteenth for years, and we thought this 2013 talk by Dr. Shennette Garrett-Scott at the Allen Public Library in Texas (via Allen City TV) was particularly helpful in understanding the history of this important day.

More resources:

Pretend you’re here with Internet Archive Zoom backgrounds

Have you seen these gorgeous library backgrounds you can use to pretend you’re amongst the smell of of old books and hushed page turning?

When I saw them I got a little jealous and thought, “computers are just as soothing!” So without further ado, welcome to your Internet Archive virtual Zoom backgrounds.

We’ve got a pretty majestic building you could sit in front of. There’s free wifi.

Or you can come inside and sit in the Great Room with us, stained glass dome and all.

Sit quietly amongst the pews with our little Internet Archivist sculptures by Nuala Creed.

Or have them be your backup dancers / Greek chorus on all your calls.

You can sit amongst the films waiting to be digitized.

Or pretend to be digitizing them yourself.

Scan books seated in front of a Table Top Scribe.

Or sit with the constant hum of busy servers in the background.

Happy Pi(e) Day

In honor of the esteemed mathematical constant, we invite you to celebrate Pi Day with us!

If you’re a math geek, we have you covered:

If your mathematical knowledge could use a little refresher, maybe try this one instead:
Sir Cumference and the dragon of pi : a math adventure.

You could listen to multiple people recite the first 50 digits of pi in various styles, including to the tune of the Battle Hymn of the Republic (my personal favorite), in the voice of Bullwinkle, as an infomercial, in Latin, while laughing, in Morse Code, and while eating actual pie.

If you’re just obsessive, here’s

Have insomnia? Listen to the first 1,000 digits of pi for 9.5 minutes straight… problem solved!

But most importantly, if you want to celebrate by eating pie we can help you make one! Winner of the Best Title Award definitely goes to Pies and tarts with schmecks appeal by the inimitable Edna Staebler. A close second goes to Tarts with Tops On by Tamasin Day-Lewis. But take your pick from amongst a wide array of pie cookbooks to find the right one for you.

And most importantly, we wish you infinite pi(e).

A Love Letter to the People Who Build the Internet Archive

canvas-1-done When you visit a public library, you get to meet the librarians and others who build and care for those collections. You know there are people who empty the garbage cans, who put back the borrowed books, who maintain the computers, and who determine what ends up on the shelf.

A digital library, on the other hand, is “just” a web site.  You don’t really see the people who build it – we are often anonymous. But the Internet Archive wasn’t built by computers and algorithms.

From its inception, the Internet Archive has been built by thousands of people who understand that we have an opportunity to use the Internet to give everyone access to canvas-2-doneknowledge. Every person on the planet should have the opportunity to learn and to make a contribution.

This goal – Universal Access to All Knowledge – inspires the people who have built the Internet Archive over the past 23 years.  

People clean and repair the buildings that we occupy. People do payroll, choose our health plans, answer the phones, plan our events, reply to user emails, clean up spam, and pay our bills. People design and build the computers that hold the collections. People construct the network that carries data to every corner of the world. People write software that processes, backs up, and delivers files. People design and test and build interfaces. People digitize analog media and type in metadata. People curate collections, establish collaborations, and manage projects.

holidaypic

There’s no way I can mention all of these people by name. Even if I listed every employee from the past 23 years, I would still be missing the volunteers, the people from other organizations who worked on joint projects with us, the pro bono lawyers, the delightfully compulsive collectors, the funding organizations, the idea generators, our sounding boards for crazy ideas, the individuals who have donated money or materials, and the hundreds of thousands of people who have uploaded media into the archive.

staff2011

Libraries are built by people, for people.  Thank you so much to all of the people who have contributed to building the Internet Archive, whether they were employees or our huge group friends and family.  We would not be here without you, and we hope you will continue to help bring universal access to all knowledge in the future.

Happy Valentine’s Day!


151020-archive-staff-large

Want to read like a celebrity?

Apparently you’re not alone. I ran across a list of celeb’s favorite books and thought you might like to check out a few. (See what I did there? Librarian pun.) Happy reading!

Anna Kendrick
All Quiet on the Western Front by Erich Maria Remarque
Slaughterhouse-Five by Kurt Vonnegut
The Things They Carried by Tim O’Brien

Bill Murray
Huckleberry Finn by Mark Twain
A Story Like the Wind by Laurens Van Der Post
A Far Off Place by Laurens Van Der Post
The Plague by Albert Camus

Bill Murray
(photo by Georges Biard, CC BY-SA 3.0, from Wikimedia Commons)

Emma Watson
Le Petit Prince by Antoine de Saint-Exupéry

Olivia Munn
Replay by Ken Grimwood

Michelle Obama
Song of Solomon by Toni Morrison

Kit Harington
1984 by George Orwell

Dolly Parton
The Little Engine That Could by Watty Piper
(And check out Dolly Parton’s Imagination Library, which gives free books to kids!)

Dolly Parton
(photo by Josef Just [CC BY-SA 3.0, from Wikimedia Commons)

Robin Williams
Foundation trilogy by Isaac Asimov (or individually at 1, 2, 3)

Daniel Radcliffe
The Master and Margarita by Mikhail Bulgakov

Rachel McAdams
When You Are Engulfed in Flames by David Sedaris

Zooey Deschanel
A Supposedly Fun Thing I’ll Never Do Again by David Foster Wallace

Donald Glover
The Curious Incident of the Dog in the Night-Time by Mark Haddon
Extremely Loud And Incredibly Close by Jonathan Safran Foer

Donald Glover
(photo by NASA/Bill Ingalls [Public domain], via Wikimedia Commons)

Alec Baldwin
The Phantom Tollbooth by Norton Juster

Hillary Clinton
The Brothers Karamazov by Fyodor Dostoyevsky
Runaway by Alice Munro

Jessica Biel
Tender Is the Night by F. Scott Fitzgerald

Chelsea Handler
Mawson’s Will by Lennard Bickel
One Thousand White Women by Jim Fergus
Anna Karenina by Leo Tolstoy

Keira Knightley
The Passion by Jeanette Winterson

J. K. Rowling
The Woman Who Walked Into Doors by Roddy Doyle

Halle Berry
Some Love, Some Pain, Sometime by J. California Cooper

Jamie Chung
The Orphan Master’s Son by Adam Johnson

Jamie Chung
(photo by David Shankbone [CC BY 3.0], from Wikimedia Commons)

Jennifer Lawrence
Catcher in the Rye by J. D. Salinger
Raise High the Roof Beam, Carpenters; and Seymour by J. D. Salinger

Lady Gaga
Letters to a Young Poet by Rainer Maria Rilke

John Hamm
Arcadia by Tom Stoppard

Cher
Music for Chameleons by Truman Capote
Stranger in a Strange Land by Robert A. Heinlein

Kesha
Still Life with Woodpecker by Tom Robbins

Anne Hathaway
The Secret Garden by Frances Hodgson Burnett

Zoe Saldana
Shawshank Redemption by Stephen King

Zoe Saldana
(photo by Gage Skidmore [CC BY-SA 3.0], from Wikimedia Commons)


George R. R. Martin
Lord of the Rings by J. R. R. Tolkien

Matt Damon
A People’s History of the United States by Howard Zinn

Nas
Convictions by Richard Pryor

Natalie Portman
Cloud Atlas by David Mitchell

Bill Gates
Better Angels of our Nature by Steven Pinker

Joan Didion
Victory by Joseph Conrad

Making Out-of-Print Pre-1942 books available with “Last 20” provision

About a year and a half ago, the Internet Archive launched a collection of older books that were determined to qualify for the “Last 20” provision in Copyright Law, also known as Section 108(h) for the lawyers. As I understand this provision, it states that published works in the last twenty years of their copyright term may be digitized and distributed by libraries, archives and museums under certain circumstances. At the time, the small number of books that went into the collection were hand-researched by a team of legal interns. As you can imagine, this is a process that would be difficult to perform one-by-one for a large and ever-growing corpus of works.

So we set out to automate it. Amazon has an API with book information, so I figured with a little data massaging it shouldn’t be too hard to build a piece of software to do that job for us. Pull the metadata from our MARC* metadata records, send it to Amazon, and presto!

I was wrong. It was hard.

Library Catalog Names are different from Book Seller’s Names

Library-generated metadata is often very detailed, which leads to problems when we try to match the metadata provided by librarians to the metadata used on consumer-oriented web sites. For example, an author listed in a MARC record might appear as 

Purucker, G. de (Gottfried), 1874-1942

But when you look on Amazon, that same author appears as 

G. de Purucker

If we search the full author from the MARC on Amazon (including full name and birth and death dates), we may miss potential matches. And this is just one simple example.  We have to transform every author field we get from MARC using a set of rules that may continue to expand as we find new problems to solve.  Here are the current rules just for transforming this one field:

General rules for transforming MARC author to Amazon author:

  • Maintain all accented or non-Roman characters as-is
  • If there are no commas, semicolons or parentheses in the string, use the whole string as-is
  • If there are no commas in the string, but there are semicolon and/or parentheses, use anything before semicolon or parentheses as the entire author string
  • If there are commas in the string:
    • Everything before the first comma should be used as the author’s last name
    • Everything after the first comma but BEFORE any of these should be used as the author’s first name:
      • comma [ , ],
      • semicolon [ ; ],
      • open parentheses [ ( ]
      • any number [0-9]
      • end of string
    • Remaining information should be discarded
  • Period [ . ] and apostrophe [ ‘ ] and other symbols should not be used to delimit any name and should be maintained as-is in the transformed string.

An Account of the Saga of the Never-ending Title: as told to the author by three blah blah blahs…

Some older books have really long titles. The MARC record contains the entire title, of course! Why wouldn’t it?! But consumer-oriented sites like Amazon often carry these books with shortened or modified titles.  

For example, here’s the title of a real page-turner:

American authors, 1600 – 1900 a biographical dictionary of American literature ; compl. in 1 vol. with 1300 biographies and 400 portraits

But on Amazon that title is:

American Authors 1600-1900: A Biographical Dictionary of American Literature (Wilson Authors)

As you can image, it’s far more difficult to reliably match books with longer titles. A human can look at those two titles and think “yeah, that’s probably the same book,” but software doesn’t work quite that well.

*$%!@$* Serials

Now that the librarians have had a laugh, let’s explain that for everybody else! Think back to the days of yore when you went to the library and looked things up in a physical card catalog. If you wanted to know where a serial or periodical was located within the library collections, you really just needed one card to tell you that. It’s on this shelf in this area and the collection contains these years.

Great! Except when you’re looking at digital versions of these serials, they are distinct entities – they have different dates, different topics, different authors sometimes, etc. And yet they often still have just one MARC record – the digital equivalent of that one card in the catalog.

And that means that the publication dates pulled from the MARC records are sometimes very wrong.

For example, we have several items from the annual series The Book of Knowledge – 1947, 1957, 1958, 1959, 1974…  The date provided in the MARC file for all of these is 1940.

As you can imagine, when we are filtering texts by year for various purposes, serials are a consistent issue.

Even when we have a correct date, Amazon does not match very well on volume and other serial or periodical-based information.  For example, when we search for a particular month of a magazine, we are likely to match an entirely different month of that same magazine.

Not All Metadata is Good Metadata

Unbelievably, librarians do make mistakes. Sometimes the data we have from MARC records has typos, or a MARC record for a different publication date was attached to the book. For example, we have an author named Fkorence A Huxley, but her name is really Florence.  Not according to the MARC record, though! Fat finger errors don’t just happen on phones. Another example: we scanned a book originally published in 1924, and *republished* in 1971. We have the 1971 version.  But the MARC record tells us it’s from 1924.

Essentially, our search is only as good as our metadata. If there are typos, or the wrong MARC record, or wrong data, our search and/or filtering will not be accurate.

Commercial APIs Are Not Built to Solve Library Problems

Amazon’s API is built to sell books to end users. Yes, it helps you find a particular book, but the other data the API contains about availability, formats and pricing is less accurate. Because the Section 108(h) exemption for libraries (read more here) involves knowing whether copies are being sold at reasonable prices, we need to know about these aspects of the book to determine whether they qualify. But Amazon’s API is incomplete in this area. So we found ourselves needing to use the API to find a match for the title and author, and then go to the page and scrape it to actually get accurate availability and pricing information.

This increases the complexity of the programming required to use Amazon as a source for information, and greatly lengthened the process of building tools for this purpose.

Everything changes

We are making a determination about whether a book meets the qualifications for Section 108(h) at a particular point in time. Even with all of the issues discussed here, the accuracy of the data we can now pull about book availability and price is high. But it’s only accurate for the moment that we pull the data, because Amazon’s marketplace is constantly changing.  If we don’t find a book on Amazon today, that doesn’t mean it won’t appear on the site tomorrow. 

Because of this, when we make an item available to the public via Section 108(h), we write into the item’s metadata the date on which the determination was made. 

Who Wants In!?

Since I’ve made this process sound SO appealing, I would imagine that any number of other library institutions are going to line up around the block wanting to try it out for themselves. Or not. But here’s the good news! If we digitize your books, the Internet Archive may be able to do the Section 108(h) determination on your behalf. Please contact us if you would like to participate.

*A MARC record is a MAchine-Readable Cataloging record. Essentially, it is the digital equivalent of the physical card from a card catalog. 

A Public Peek into 1923

Commercial radio broadcasting began in the 1920s, bringing entertainment, news and music into people’s homes. Now, instead of needing to play a 78rpm disc on your phonograph, you could just tune in to listen to popular songs.

And in 1923 that means you would have been listening to one of the many versions of “Yes! We Have No Bananas” written by Frank Silver and Irving Cohn.  

You could listen to the Billy Jones version (play below), the Billy Murray version, a Yiddish version, or an Italian version, among others.

Yes! We Have No Bananas by Billy Jones from the 78rpm collection

Then you could have moved on to dancing the Charleston, popularized by the song of the same name from the 1923 musical “Runnin’ Wild.”   And with the explosion of recordings by African American musicians, you could also enjoy “Baby Won’t You Please Come Home” by Bessie Smith and “Dipper Mouth Blues” by Louis Armstrong.

Autogyro (1934)

In the news of the day you saw the first flight of an autogyro (the precursor to the helicopter).

Jack Dempsey defended his World Heavyweight Championship title against Tommy Gibbons and Luis Firpo.

And Howard Carter’s team finally entered the burial chamber of King Tutankhamen, as covered in books, sheet music and song

But why are we focusing on 1923? Because for the first time in 20 years, new works are entering the public domain in the United States (read more: 1, 2, 3). And those works were all published in, you guessed it, 1923.

Settle in with a Reese’s Peanut Butter Cup, a Butterfinger, or a refreshing Popsicle (all invented in 1923!) while you watch Cecil B. DeMille’s The Ten CommandmentsThe White Sister starring Lillian Gish, or The Hunchback of Notre Dame starring Lon Chaney. Or any one of 50 other films available on archive.org from that year.

After your movie marathon, you can turn to your “new” reading materials to learn about sewing the latest women’s fashions, try an old recipe from a cook book (we recommend the Marshmallow Loaf), learn about theatrical lighting, construct yourself a bungalow (um, check the lastest building codes first), grab some sheet music, read up on Benito Mussolini, and learn “How You Can Keep Fit” from Rudolph Valentino (!).

Finally, settle in to read some Robert Frost, Virginia Woolf, Edith Wharton, or Kahlil Gibran. And while you’re here, take a look at the 20,000 other texts we have available from 1923. 

We look forward to introducing you to 1924 NEXT January!