A Good Day for the Open Web

Photo by Claire Anderson on Unsplash

Today the Supreme Court resolved a decade of copyright litigation by supporting interoperability and openness, ruling that reimplementing an API by copying its declarations is legal fair use, even (or perhaps especially) when you’re building a competitive service. This was a case of two massive companies – Oracle and Google – fighting over Java, Android, and billions of dollars. But it was also about the quintessential user’s right and one of crucial importance to libraries: fair use. And after last year’s Georgia v. Public.Resource.Org decision, it has become the latest in a long line of Supreme Court decisions broadly supportive of fair use.

In a 6-2 decision, the Supreme Court held that Google’s copying of many declarations associated with the Java SE API (including only those lines of code that were needed to allow programmers to put their accrued talents to work in a new and transformative program with their own implementing code) was a fair use of that material as a matter of law. That means that this ruling applies to all APIs, not just the one at issue here.

“This decision is a win for the Open Web. In our digital world, businesses, nonprofits, libraries and individual developers use APIs everyday,” says Brewster Kahle, Internet Archive’s founder and Internet Hall of Famer. “We have seen copyright used as a tool to create enclosures and walled gardens. But the Court was clear: copyright cannot be used to harm the public interest.”

Importantly, the Court held that reimplementing the Java API was fair use even though Google copied the material intentionally. That fact actually supported a finding of fair use. That’s because Google’s purpose was “to allow programmers to work in a different computing environment without discarding a portion of a familiar programming language.” Put another way, Google’s actions were in support of interoperability. And fair use protects it.

In contrast, Oracle sought to profit from the developers’ familiarity by locking them into its own environment and forcing Google to pay for a license–what the Court described as a “tax”–in order to access it. The Court held this kind of “tax”, in derogation of interoperability, did not further the goals of copyright. That was because, it explained, copyright seeks to incentivize the creation of new works. Incentivizing the creation of new works was deemed more important than allowing for the monopolization of aspects of the old. That was particularly true here, where Google copied these lines of code not because of their “creativity or beauty but because they would allow programmers to bring their skills to a new smartphone computing environment.” Enforcing copyright in these circumstances “risks causing creativity-related harms to the public,” frustrating the goals of copyright.

While many hoped that the Court would rule directly on the question of software copyrightability, which may have more squarely helped small projects take on goliaths, this ruling remains a very good thing. It is a win for interoperability, a win for fair use, and a win for the open principles that form the foundation of so much of the internet today.

“We have to wonder whether a system that took ten years and tens of million dollars worth of litigation to reach this outcome reflects a copyright system that is as fair as we need it to be,” says Brewster Kahle. “Today, thank goodness the fair use system was reaffirmed. This decision will have broad, positive benefits for openness, innovation and competition.”

Welcome to the Webspace Jam

It stood as either a memorial, embarrassment or in-joke: the promotional website for the 1996 film Space Jam, a comedy-action-sports film starring Michael Jordan and the Warner Brothers Looney Tunes characters.

Created at a time when the exact relevance of websites in the spectrum of mass media promotion was still being worked out, www.spacejam.com held many of the fashionable attributes of a site in 1996: an image map that you could click on, a repeating star background, and a screen resolution that years of advancement have long left in the dust. The limits of HTML coding and computer power were pushed as far as they could go. The intended audience was a group of people primarily using dial-up modems and single-threaded browsers to connect to what was still called The Information Superhighway.

By all rights, the Space Jam site should have died back in the 1990s, lost in the shifting sands of pop culture attention and flashier sites arriving with each passing day.

But it didn’t die, go offline or get replaced with a domain hosting advertisement or a 404.

Unlike a lot of websites from the 1990s, the Space Jam movie site simply didn’t change.

It persisted.

Just as every city seems to have that one bar or restaurant that can trace itself back for over a century, this one website became known, to people who looked for it, as a strange exception – unchanging, unshifting, with someone paying for the hosting and advertising a movie that, while a lot of fun, was not necessarily an oscar-winning cinematic experience. You could go to the site and be instantly transported back to a World Wide Web that in many ways felt like ancient history, absolutely gone.

Years turned into decades.

For those in the know and who paid close attention to this odd online relic, the real mystery was that the site was not actually staticsomeone was making modifications to the code of the website, the settings and web hosting, to jump past several notable shifts in how websites work, to ensure that deprecated features and unaccounted browser issues were handled. That costs money; that’s the work of people. Somehow, this silly movie site represented the held-out flame that with a small bit of care and dedication, a website could live forever, like we were once promised.

It wasn’t just a clickable brochure – it became a beacon in the dark, a touchstone for some who were just children when the World Wide Web was started, and who grew up with this online world, which has shifted and consolidated and closed and tracked us.

Then the unthinkable happened.

In 2021, the sequel arrived.

It is abundantly clear the abnormally long life of the original 1996 site helped see the sequel through the endless mazes and corridors of Hollywood development turnaround.

Because websites and online presence are the way that movies are now promoted, the very place that spawned this consistent brand through decades had to go. A new Space Jam site was created, using the www.spacejam.com domain.

In a nod to its beginnings, the 1996 website still exists, shoved into a back room; adding /1996 to the URL will give you the old site as it used to appear before this year, and a small note in the corner lets you know you could optionally visit this once-dependable hangout.

But now the site is broken.

Links from around the net to the Space Jam site, to specific sub-pages and specific images, now break. A browser arriving at the spacejam.com page from a link elsewhere will see Just Another Movie Promotion Site, utilizing all the current fads: Layered windows to YouTube videos (which will break), javascript calls (which will break) and a dedication to being as flashy, generically designed and film-promoting as literally any other movie site currently up. Links that worked for decades have been cast aside for the spotlight of the moment.

The word is disposable.

There’s still one place you can see the old site, as it was once arranged, though.

The same year the Space Jam movie and website arrived, another website started: The Internet Archive.

Unlike Space Jam, the Internet Archive’s site did change constantly. You can use the Wayback Machine to see all the changes as they came and went; over half-a-million captures have been done on archive.org.

We have changed across the last 25 years, but we also have not.

The ideas that the Web should keep URLs running, that the interdependent linking and reference cooked into it from day one should be a last-resort change, and that the experience of online should be one of flow and not of constant interruptions, still live here.

Hundreds of webpages that have also survived since the time of Space Jam are inside the stacks of the Wayback Machine, some of them still running, and still looking unchanged since those heady days of promises and online wishes.

And if the unthinkable happens to them, we’ll be ready.




Filecoin Foundation Grants 50,000 FIL to the Internet Archive

Amidst the speculative boom for NFTs and crypto-currencies, one decentralized technology foundation is taking the long view by investing in deep history and the far future. 

Today, the Filecoin Foundation announced a 50,000 FIL grant to the Internet Archive – the largest single donation in the digital library’s 25-year history. 

“Holy Crow! This is a big deal,” said Brewster Kahle, the Internet Archive’s founder. “And what are we going to do with it? We’re going to invest it in making the Internet Archive more decentralized, so that our digital history is available from thousands of computers, not just a few. The idea is to make a robust and private Internet that has a history that will persist over decades and maybe centuries.”

Filecoin is a decentralized storage system designed to preserve humanity’s most important information. The creators of Filecoin envisioned an independent foundation that would serve as the long-term governance body for the Filecoin ecosystem. In awarding the grant to the Internet Archive, Filecoin Foundation board chair, Marta Belcher, stressed the two organizations’ “common goal of preserving the web and fostering its future.”

It was back in 2015 that Protocol Labs‘ founder, Juan Benet, first visited the Internet Archive, to share his vision for an academic conference dedicated to preserving “humanity’s greatest treasures using decentralized storage.” Building on these conversations, the Internet Archive organized the  Decentralized Web Summit in 2016 in San Francisco, the first gathering of its kind. Back then, a decentralized web was mostly a concept, with little working code.

Decentralized technologists, Trent McConaghy of Ocean and Juan Benet of Protocol Labs at the 2016 Decentralized Web Summit at the Internet Archive in San Francisco.

Since 2016, the Internet Archive has worked with several decentralized tech startups to create a decentralized prototype of the digital library. And when the Filecoin main net took off in 2020, stored in Filecoin servers were public domain audiobooks and films from the Internet Archive. Together, the two organizations created the Filecoin Archives, a community-led project to curate, disseminate and preserve important open access to information often at risk of being lost.

“It’s wonderful to see Filecoin come of age. We started six years ago by putting out a call to make a Decentralized Web, a web that would serve us better than the current web–one that is now starting to be dominated by just a few tech behemoths. Can we make a game with many winners?” asked Kahle. “Filecoin has made a huge step forward by deploying decentralized storage at the exabyte level. That’s very different from AWS (Amazon Web Services). It has many participants, not just one player. And its protocols are open-source. We want to see more technologies like this. This was the original vision of the Decentralized Web that the Internet Archive was hoping for five, six years ago. And it’s starting to come to fruition and Filecoin is a leader in that area.”

Although purveyors of cryptocurrencies are often accused of being driven only by short-term gain, in this group Kahle sees a different motivation. “This donation by the Filecoin Foundation is significant financially for the Internet Archive, but I’d say it’s a more interesting one than that,” said the Internet Hall of Fame engineer. “It’s a donation by a new generation of technologists that are building interesting new technologies…bringing the Archive along with it to make it so that history is preserved –that the Internet Archive makes it into this next generation. That is an interesting thing! You don’t often see that. But the Filecoin Foundation, Filecoin and IPFS, and Juan Benet himself have always been interested in preserving history and how history can be woven into the present and the future of these technologies.”

Calls Intensify to Allow Libraries to Narrow Digital Divide

Watch Video

At an event discussing disinformation and the digital divide, U.S. Senator Ron Wyden from Oregon said he was committed to supporting a balanced copyright system that promotes fair use, digital lending, and the work of libraries.

“Libraries provide vital public services by making high quality resources available to everybody. And that’s true no matter what you’ve got in your bank account or your zip code,” said Wyden, noting he is the son of a librarian.  “If the system is filled with draconian copyright laws and digital restrictions that make it hard for real news to be read, shared, and discussed, that particular vacuum is filled with more misinformation and lies.”

Wyden’s remarks were part of the webinar, Burying Information – Big Tech & Access to Information, sponsored by the Institute for Technology Law & Policy at Georgetown University, Public Knowledge and Library Futures on March 24. A recording of the event is now available.

Big special interests have always pushed for tighter restrictions on content, Wyden said, and now powerful corporations are trying to get a tighter grip on the internet. He cautioned that the proposed Digital Copyright Act is not the answer, saying he would fight for more balanced intellectual property laws and support libraries to provide easy, free access to reliable information from trustworthy sources.

“We’re seeing a change in the environment, which means you still need a card to get access to books, but it’s no longer a library card, it’s increasingly a credit card.”

Heather Joseph, Executive Director, SPARC

“We want a game with many winners. We want to have many authors, publishers, booksellers, libraries—and everyone a reader,” said Internet Archive Founder Brewster Kahle at the event. “The only way to do that is to have a level playing field that doesn’t have monopoly control.”

The pandemic has underscored the need for digital content to be readily available to the public. Libraries should be able to lend and preserve just as they have with print materials for years, however, many large publishers refuse to sell e-books to libraries and instead have restrictive licensing agreements.

“We’re seeing a change in the environment, which means you still need a card to get access to books, but it’s no longer a library card, it’s increasingly a credit card,” said Heather Joseph, executive director of SPARC, a global advocacy organization working to make education and research open and equitable by design for everyone. “We really need interventions that work to combat that shift, to flip that dynamic.”

To expand access to knowledge, Internet Archive has been digitizing the materials and respectfully lending them one copy at a time through Controlled Digital Lending (CDL) since 2011. The widespread practice is embraced by more than 80 libraries as part of Internet Archive’s Open Libraries program, and is growing across the country in various implementations elsewhere as demand increases.

“If you actually take a look at how [CDL] operates, the lending function is really no more and no less than what libraries are able to do in print. It’s just changed formats,” said Michelle Wu, an attorney and law librarian who pioneered the concept of CDL. The practice can serve people who aren’t able to physically get to a library because they live in a rural area, have a disability that limits transportation, work odd hours, are ill or quarantined during a pandemic. Libraries want to reward authors for creating their works, but also ensure the public has access to those works, Wu said.

It would be a better use of public funds for libraries to be able to purchase ebooks, rather than paying repeatedly for licensing fees, said Wu. Also, a library that digitizes its collection ensures access in an emergency, such as a pandemic, and preservation in the case of a natural disaster, saving the government money in having to replace damaged materials.

To counter disinformation, the public needs reliable information—and libraries are at the center of this battle, said SPARC’s Joseph.

“We can’t amplify content that we can’t access. And that’s really at the root of what libraries do for society,” Joseph said. “We’ve always been the equalizer in providing access to this high-quality information.” Rather than libraries being a trusted and critical distribution channel, they are being treated by publishers as adversaries, which Joseph said is a dangerous trend.

The discussion touched on a variety of remedies including legislative protections to enshrine practices like CDL, antitrust regulations, and building market competition. The work of Library Futures was highlighted as an avenue for concerned citizens to raise their voices and panelists underscored the need for action that reflects the best interest of the public.

“This is not just an inconvenience, it’s not just an additional expense to us as consumers. It’s creating an enormous divide in who can access critical knowledge,” Joseph said of publishers’ actions to restrict access to digital content. “The right to access knowledge is a human right. And a world in which one player—or worse a company—decides who’s in and who’s out is unacceptable.”

Great Books by Women Authors

On March 8th New York Public Library’s Gwen Glazer published a wonderful list of books in celebration of International Women’s Day: 365 Books by Women Authors to Celebrate International Women’s Day All Year.

In the spirit of continuing to celebrate female authors past the confines of Women’s History Month, we’ve gathered some of these books into a special collection called Great Books by Women Authors to make it easier to find your next exceptional read. You will also find these books via Open Library as listed below. Happy reading!

Great Books by Women Authors
Leila Aboulela, The Kindness of Enemies
Susan Abulhawa, The Blue Between Sky and Water
Chimamanda Ngozi Adichie, Half of a Yellow Sun
Anna Akhmatova, The Complete Poems of Anna Akhmatova
Michelle Alexander, The New Jim Crow
Svetlana Alexievich, Voices From Chernobyl
Clare Allan, Poppy Shakespeare
Sarah Addison Allen, Lost Lake
Isabel Allende, Eva Luna
Karin Altenberg, Island of Wings
Julia Alvarez, In the Time of the Butterflies
Tahmima Anam, The Good Muslim
Natacha Appanah, The Last Brother
Chloe Aridjis, Asunder
Bridget Asher, All of Us and Everything
Margaret Atwood, Oryx & Crake
Jane Austen, Pride and Prejudice
Mariama Bâ, Scarlet Song
Toni Cade Bambara, Those Bones Are Not My Child
Gioconda Belli, The Inhabited Woman
Karen Bender, Refund
Elizabeth Bishop, Geography III
Katherine Boo, Behind the Beautiful Forevers
Charlotte Bronte, Jane Eyre
Emily Bronte, Wuthering Heights
Gwendolyn Brooks, The Bean Eaters
Lauren Buekes, The Shining Girls
NoViolet Bulawayo, We Need New Names
Judith Butler, Gender Trouble: Feminism and the Subversion of Identity
Leonora Carrington, The hearing trumpet
Theresa Hak Kyung Cha, Dictee
Susan Choi, American Woman
Kate Chopin, The Awakening
Sonya Chung, Long for This World
Caryl Churchill, Top Girls
Lucille Clifton, Mercy
Simin Daneshvar, Sutra & Other Stories
Tsitsi Dangarembga, Nervous Conditions
Edwidge Danticat, Claire of the Sea Light
Meaghan Daum, Unspeakable
Dola de Jong, The Tree and the Vine
Grazia Deledda, After the Divorce
Anita Desai, Clear Light of Day
Emily Dickinson, The Poems of Emily Dickinson
Joan Didion, Democracy
Rita Dove, On the Bus With Rosa Parks
Yasmine El Rashidi, Chronicle of a Last Summer
Nawal El Saadawi, Woman at Point Zero
George Eliot, Middlemarch
Buchi Emecheta, The Joys of Motherhood
Leslie Feinberg, Stone Butch Blues
Elena Ferrante, My Brilliant Friend
Penelope Fitzgerald, The Blue Flower
Paula Fox, Desperate Characters
Lauren Francis-Sharma, Til the Well Runs Dry
Ru Freeman, On Sal Mal Lane
Rivka Galchen, Atmospheric Disturbances
Mary Gaitskill, The Mare
Petina Gappah, The Book of Memory
Elena Garro, First love ; &, Look for my obituary
Louise Gluck, Faithful and Virtuous Night
Nadine Gordimer, The Conservationist
Jorie Graham, Erosion
Linda LeGarde Grover, The dance boots
Paula Gunn Allen, America the Beautiful: Last Poems
Marilyn Hacker, Names
Radclyffe Hall, The Well of Loneliness
Lorraine Hansberry, A Raisin in the Sun
Eve Harris, The Marrying of Chani Kaufman
Saidiya Hartman, Lose Your Mother: A Journey Along the Atlantic Slave Route
Shirley Hazzard, The Transit of Venus
Bessie Head, The Collector of Treasures
Amy Hempel, Reasons to Live
Cristina Henriquez, The Book of Unknown Americans
Christine Dwyer Hickey, The Cold Eye of Heaven
Patricia Highsmith, The Price of Salt
Arlie Hochschild, The Second Shift
Alice Hoffman, Survival Lessons
Sara Sue Hoklotubbe, Deception on All Accounts
bell hooks, Feminism is for Everybody: Passionate Politics
Keri Hulme, The Bone People
Dương Thu Hương, Paradise of the Blind
Hồ Xuân Hương, Spring Essence
Ulfat Idilbi, Grandfather’s Tale
Elfriede Jelinek, Women As Lovers
Han Kang, The Vegetarian
Mary Karr, The Liar’s Club
Kazue Kato, Blue Exorcist
Rupi Kaur, Milk and Honey
Porochista Khakpour, The Last Illusion
Vénus Khoury-Ghata, A House at the Edge of Tears
Suki Kim, Without You, There Is No Us
Jamaica Kincaid, See Now Then
Barbara Kingsolver, The Poisonwood Bible
Maxine Hong Kingston, The Woman Warrior
Natsuo Kirino, Out
Sana Krasikov, One More Year
Jean Kwok, Girl in Translation
Jhumpa Lahiri, The Lowland
Laila Lalami, Secret Son
Nella Larsen, Passing
Adrian Nicole LeBlanc, Random Family
Harper Lee, To Kill A Mockingbird
Yiyun Li, Kinder Than Solitude
Gloria Lisé, Departing at Dawn
Clarice Lispector, The Hour of the Star
Inverna Lockpezer, Cuba: My Revolution
Alia Mamdouh, The Loved Ones
Dacia Maraini, The Silent Duchess
Ronit Matalon, The Sound of Our Steps
Ayana Mathis, The Twelve Tribes of Hattie
Eimear McBride, A Girl Is a Half-Formed Thing
Carson McCullers, The Heart is a Lonely Hunter
Claire Messud, The Woman Upstairs
Ai Mi, Under the Hawthorn Tree
Gabriela Mistral, Selected Poems of Gabriela Mistral
Nadifa Mohamed, Black Mamba Boy
Lorrie Moore, Bark
Marianne Moore, The Poems of Marianne Moore
Toni Morrison, Sula
Bharati Mukherjee, The Tree Bride
Alice Munro, Family Furnishings
Iris Murdoch, A Severed Head
Eileen Myles, School of Fish
Azar Nafisi, The Republic of Imagination: America in Three Books
Celeste Ng, Everything I Never Told You
Hualing Nieh, Mulberry and Peach
Sara Nović, Girl at War
Adaobi Tricia Nwaubani, I Do Not Come to You by Chance
Silvia Ocampo, Thus Were Their Faces
Nnedi Okorafor, Binti
Julie Otsuka, The Buddha in the Attic
Helen Oyeyemi, Mr. Fox
Ruth Ozeki, All Over Creation
Cynthia Ozick, Foreign Bodies
ZZ Packer, Drinking Coffee Elsewhere
Grace Paley, The Little Disturbances of Man
Suzan-Lori Parks, Topdog/Underdog
Shahrnush Parsipur, Kissing the Sword
Ann Patchett, Bel Canto
Anna Politkovskaya, A Russian Diary
Katha Pollitt, Pro: Reclaiming Abortion Rights
Claudia Rankine, Citizen
Alifa Rifaat, Distant View of a Minaret and Others Stories
Suzanne Rivecca, Death Is Not An Option
Riverbend, Baghdad Burning
Arundhati Roy, The God of Small Things
Vedrana Rudan, Night
Sonia Sanchez, Does Your House Have Lions?
Sappho, The Complete Works of Sappho
Noo Saro-Wiwa, Looking for Transwonderland: Travels in Nigeria
Åsne Seierstad, The Angel of Grozny
Anne Sexton, The Complete Poems of Anne Sexton
Murasaki Shikibu, The Tale of Genji
Kyung-sook Shin, Please Look After Mom
Sei Shonagon, The Pillow Book
Ana Maria Shuah, The Weight of Temptation
Leslie Marmon Silko, Almanac of the Dead
Tracy K. Smith, Life on Mars
Betty Smith, A Tree Grows in Brooklyn
Marivi Soliven, The Mango Bride
Rebecca Solnit, A Field Guide to Getting Lost
Susan Sontag, Styles of Radical Will
Ahdaf Soueif, The Map of Love
Gertrude Stein, Fernhurst, Q.E.D., and other early writings
Aoibbhean Sweeney, Among Other Things, I’ve Taken Up Smoking
Elizabeth Crane, When the Messenger Is Hot
Amy Tan, The Valley of Amazement
Valerie Taylor, The Girls in 3-B
Lygia Fagunda Telles, The Girl in the Photograph
Lynne Tillman, No Lease on Life
Dubravka Ugresic, Thank You For Not Reading
Chika Unigwe, On Black Sisters Street
Kirstin Valdez Quade, Night at the Fiestas
Jean Valentine, Little Boat
Lara Vapnyar, There Are Jews in My House
Marja-Liisa Vartio, The Parson’s Widow
Josefina Vicens, The Empty Book
Alice Walker, The Color Purple
Sarah Waters, Fingersmith
Eudora Welty, The Optimist’s Daughter
Phillis Wheatley, The Poetry of Phillis Wheatley
Zoe Wicomb, You Can’t Get Lost In Cape Town
Joy Williams, The Visiting Privilege
G. Willow Wilson, Ms. Marvel
Virginia Woolf, Orlando
Alexis Wright, Carpentaria
Sarah E. Wright, This Child’s Gonna Live
Tiphanie Yanique, Land of Love and Drowning
Samar Yazbek, Cinnamon
Banana Yoshimoto, Kitchen
Haifa Zangana, Dreaming of Baghdad

Major SciFi Discovery Hiding in Plain Sight at the Internet Archive

Fans of science fiction learned last week that the word “robot” was first used in 1920—a full three years earlier than originally thought.

The “massively important yet obvious” change in date was confirmed with a search of the Internet Archive, which has a digitized first edition of the Czech play, R.U.R. Rossum’s Universal Robots, published in 1920. There on the title page, hiding in plain sight in an English-language subtitle to the work, is the earliest known use of the word “robot.”

This important piece of information is one of many little-known facts captured in the Historical Dictionary of Science Fiction. The project was completed this year by historian Jesse Sheidlower, who credits two things that enabled him to publish this project, decades in the making.  “One, we had a pandemic so I had a lot of enforced time at home that I could spend on it,” explained Sheidlower. “The second was the existence of the Internet Archive. Because it turns out the Internet Archive has the Pulp Magazine collection that holds almost all the science fiction pulps from this core period.”

The New York-based lexicographer—a person who compiles dictionaries—sat down with the Internet Archive’s Director of Partnerships, Wendy Hanamura, to demonstrate how he goes about his work.

The comprehensive, online dictionary includes not only definitions, but also how nearly 1,800 sci-fi terms were first used, and their context over time. From “actifan” to “zine,” the historical evolution of the core vocabulary of science fiction is now online, linked to original sources in the Internet Archive and beyond.  

The project began nearly twenty years ago at Oxford English Dictionary (OED) as the Science Fiction Citations Project. The idea was that science fiction fans would send in references from  mid-20th century pop culture materials that weren’t otherwise archived in libraries. Back then, volunteers mailed in citations they found in books and magazines, and moderators entered the details into a database of these crowdsourced references. In 2007, the project resulted in Brave New Words: The Oxford Dictionary of Science Fiction, edited by Jeff Prutcher.

Amazing Stories, v13, n5, 1935. View hundreds of classic issues in the Amazing Stories Collection.

Sheidlower moved on from the OED in 2013. But the potential of this dictionary of science fiction never left him. Sheidlower’s vision was to make the resource even more useful to the public by completing the work and offering it for free use. In 2020, OED gave him permission to dive back in. Working from home during the pandemic, the editor discovered the Internet Archive had a rich Pulp Magazine collection that he could tap into from his desk in New York.

“Instead of hoping that someone, somewhere might have something and send it in, I could just search at the Archive. It made research much easier,” Sheidlower says. He then linked any piece of information cited to the original sources online—providing readers with an avenue for more details to keep reading.

In January, the first public version of the dictionary was made available via a new website, built by Sheidlower. 

Because it is in a digital format, readers can search for terms—such as “transporter” or “hyperspace”—and be directed to the entry, complete with quotes and links to click through to the original source where it first appeared. There are also hundreds of pending entries that are being considered for inclusion in the dictionary, which is a living document that can be updated in response to reader suggestions, Sheidlower says.

Response so far to the revised dictionary has been positive from readers and the media. “I hope that the dictionary is of broad interest to anyone,” Sheidlower says. “Anyone, almost anywhere, can have access to the same kind of resources now. You don’t need to have people physically in libraries reading through absolutely everything, you can do a lot of searching online. The barrier to entry for this kind of research is reduced. Anyone can make contributions.”

Sheidlower comes to this work with a background studying in the classics, linguistics, Latin and the history of the English language. He worked in the dictionary department at Random House, before moving to the OED.  Sheidlower also does language consulting for television shows such as Amazon’s “The Man in the High Castle,”  to ensure that expressions being used match the historical period.

A New Portal for the Decentralized Web and its Guiding Principles

For a long time, we’ve felt that the growing, diverse, global community interested in building the decentralized Web needed an entry point. A portal into the events, concepts, voices, and resources critical to moving the Decentralized Web forward.

This is why we created, getdweb.net, to serve as a portal, a welcoming entry point for people to learn and share strategies, analysis, and tools around how to build a decentralized Web.

Screenshot of https://getdweb.net/

It began at DWeb Camp 2019, when designer Iryna Nezhynska of Jolocom led a workshop to imagine what form that portal should take. Over the next 18 months, Iryna steered a dedicated group of DWeb volunteers through a process to create this new website. If you are new to the DWeb, it should help you learn about its core concepts. If you are a seasoned coder, it should point you to opportunities nearby. For our nine local nodes, it should be a clearinghouse and archive for past and future events.

Above all, the new website was designed to clearly state the principles we believe in as a community, the values we are trying to build right into the code.

At our February DWeb Meetup, our designer Iryna took us on a tour of the new website and the design concepts that support it.

Then John Ryan and I (Associate Producer of DWeb Projects) shared the first public version of the Principles of the DWeb and described the behind-the-scenes process that went into developing them. It was developed in consultation with dozens of community members, including technologists, organizers, academics, policy experts, and artists. These DWeb Principles are a starting point, not an end point — open for iteration.

As stewards, we felt that we needed to crystallize the shared vision of this community, to demonstrate how and why we are building a Decentralized Web. Our aim is to identify our guiding principles through discussion and distill them into a living document that we can point to. It is to create a set of practical guiding values as we design and build the Web of the future.

Quote from Behind the Scenes of the Decentralized Web Principles

You can watch the video of the event, including the presentation about the new website and the first public version of the DWeb Principles below.

Author and Open Source Advocate VM Brasseur: Internet Archive ‘Legitimately Useful’ for Lending and Preservation of Her Work

In her 20-year career in the tech industry, VM (Vicky) Brasseur has championed the use of free and open source software (FOSS). She hails it as good for businesses and the community, writing and presenting extensively about its merits.

VM Brasseur, Raleigh, North Carolina, 2018. Credit: Peter Adams Photography

To spread the word, Brasseur has made her book, Forge Your Future With Open Source, available for borrowing through the Internet Archive. She’s also saved all of her blogs, articles, talks and slides in the Wayback Machine for preservation and access to anyone.  

“I do it to share the knowledge,” Brasseur said. “Uploading the resources to Internet Archive ensures that more people will be able to see it and will be able to see it forever.”

As soon as her book was published by The Pragmatic Programmers in 2018, Brasseur said she wanted to have it represented in the Internet Archive. She donated a copy so it could be available through Controlled Digital Lending (CDL).

“I think CDL is great. I love libraries,” Brasseur said. “To me, I don’t see how CDL is any different from walking into my local branch of the public library, picking up one of the copies that they have, going up to the circ desk, and taking it home. How is that different from the Internet Archive? They have one copy of my book and check it out one copy at a time. It just happens to be an e-book version. I, frankly, don’t see the material difference.”

A supporter of the Internet Archive since its inception, Brasseur says she’s a regular user of the Wayback Machine. It’s been useful for her to be able to do research and for others to find her body of work. Recently, she revamped her blog and removed some pages—later getting a request from someone who wanted some of the deleted material. Brasseur provided a Wayback Machine link to where she’d stored them, making it easy for that person to find the missing pages. “It’s a gift. It’s legitimately useful,” she said. “Having the Wayback means that other people can still have access” to materials she no longer has on her website.

Borrow the book through the Internet Archive, or purchase a copy for your own library.

Brasseur has led software development departments and teams, providing technical management and strategic consulting for businesses, and helping companies understand and implement FOSS. She wrote her book not just for programmers, but rather says it’s intended to be inclusive and for anyone interested in FOSS including technical writers, designers, project managers, those involved in security issues, and all other roles in the software development process.

In the book, she helps walk readers through why they might want to contribute to FOSS and how to best embrace the practices involved. The book was been positively received and was #1 on the BookAuthority list of 18 Best New Software Development Books To Read In 2018. Recently, it has been picked up by people transitioning to telecommuting and looking for resources for doing collaborative work.

“Obviously, I do want people to buy the book, but I’m also strongly pro library, as most intelligent publishers are. My publisher is a big fan of making sure that their books are available in libraries,” Brasseur said. “So the Internet Archive is a library that anyone can access all over the world. And it just makes it a lot easier to make sure that the book gets in the hands of people.”

Brasseur is committed to helping people contribute to open source; for people who can’t afford to buy the book, checking it out from the library is an alternative. “If they can get a copy from Internet Archive, then they can learn how to contribute and they can make a difference from wherever they are in the world. Nigeria, Thailand, Netherlands, or Montana. You don’t have to worry if your local library has it,” she said. “In these times, in particular, it’s very difficult to get to your library. This is a great service that the Internet Archive is providing.”


Forge Your Future with Open Source by VM Brasseur is available for purchase through a variety of retailers and local book stores.

Early Web Datasets & Researcher Opportunities

In July, we announced our partnership with the Archives Unleashed project as part of our ongoing effort to make new services available for scholars and students to study the archived web. Joining the curatorial power of our Archive-It service, our work supporting text and data mining, and Archives Unleashed’s in-browser analysis tools will open up new opportunities for understanding the petabyte-scale volume of historical records in web archives.

As part of our partnership, we are releasing a series of publicly available datasets created from archived web collections. Alongside these efforts, the project is also launching a Cohort Program providing funding and technical support for research teams interested in studying web archive collections. These twin efforts aim to help build the infrastructure and services to allow more researchers to leverage web archives in their scholarly work. More details on the new public datasets and the cohorts program are below. 

Early Web Datasets

Our first in a series of public datasets from the web collections are oriented around the theme of the early web. These are, of course, datasets intended for data mining and researchers using computational tools to study large amounts of data, so are absent the informational or nostalgia value of looking at archived webpages in the Wayback Machine. If the latter is more your interest, here is an archived Geocities page with unicorn GIFs.

GeoCities Collection (1994–2009)

As one of the first platforms for creating web pages without expertise, Geocities lowered the barrier of entry for a new generation of website creators. There were at least 38 million pages displayed by GeoCities before it was terminated by Yahoo! in 2009. This dataset collection contains a number of individual datasets that include data such as domain counts, image graph and web graph data, and binary file information for a variety of file formats like audio, video, and text and image files. A graphml file is also available for the domain graph.

GeoCities Dataset Collection: https://archive.org/details/geocitiesdatasets

Friendster (2003–2015)

Friendster was an early and widely used social media networking site where users were able to establish and maintain layers of shared connections with other users. This dataset collection contains  graph files that allow data-driven research to explore how certain pages within Friendster linked to each other. It also contains a dataset that provides some basic metadata about the individual files within the archival collection. 

Friendster Dataset Collection: https://archive.org/details/friendsterdatasets

Early Web Language Datasets (1996–1999)

These two related datasets were generated from the Internet Archive’s global web archive collection. The first dataset, “Parallel Language Records of the Early Web (1996–1999)” provides a dataset of multilingual records, or URLs of websites that have the same text represented in multiple languages. Such multi-language text from websites are a rich source for parallel language corpora and can be valuable in machine translation. The second dataset, “Language Annotations of the Early Web (1996–1999)” is another metadata set that annotates the language of over four million websites using Compact Language Detector (CLD3).

Early Web Language collection: https://archive.org/details/earlywebdatasets

Archives Unleashed Cohort Program

Applications are now being accepted from research teams interested in performing computational analysis of web archive data. Five cohorts teams of up to five members each will be selected to participate in the program from July 2021 to June 2022. Teams will:

  • Participate in cohort events, training, and support, with a closing event held at Internet Archive, in San Francisco, California, USA tentatively in May 2022. Prior events will be virtual or in-person, depending on COVID-19 restrictions
  • Receive bi-monthly mentorship via support meetings with the Archives Unleashed team
  • Work in the Archive-It Research Cloud to generate custom datasets
  • Receive funding of $11,500 CAD to support project work. Additional support will be provided for travel to the Internet Archive event

Applications are due March 31, 2021. Please visit the Archives Unleashed Research Cohorts webpage for more details on the program and instructions on how to apply.

Milton Public Library Reaches Patrons Through Controlled Digital Lending

Leaders at the Milton Public Library (MPL) in Canada say they are continually questioning their operations and looking for ways to better serve their patrons. That’s why the Ontario institution joined the Internet Archive’s Open Libraries program.

“We are always keen to innovate, in meaningful ways” said Mark Williams, MPL chief executive officer and chief librarian. “Why would we not want to be in this partnership that expands our collection, but also extends assets to other people’s collections in a digital realm? It was a no brainer.”

In making its decision to become part of Open Libraries in September 2019, Williams said rather than being concerned about publishers, the focus was on the interests of the public. 

Mark Williams, Milton Public Library

“If it challenges the status quo for the benefit of readers, wherever those readers are, then I think we should engage,” Williams said.

As it happens, the timing of its membership was fortuitous. With COVID-19 disrupting access to the print collection at its branches, being part of the Open Libraries meant broader access to digital materials for patrons quarantined at home.

MPL has been a central part of the Milton, Ontario, community since 1855, serving a population of more than 120,000 through three physical libraries and its website (and with a bookmobile and four new branches in the pipelines over the course of the next 10 years), Library services were forced to be flexible in the past year as health circumstances changed in the province.

The three MPL locations closed on March 17, 2020, under a state of emergency in Ontario. By May, a phased reopening allowed libraries to begin limited operations. During the state of emergency, librarians pivoted to providing access to services only through virtual interactions and the website was changed to focus on promoting electronic resources. As restrictions eased, MPL provided curbside, contactless pickup. Eventually, 50 to 100 patrons were allowed inside the buildings with safety protocols. The libraries had to close again when COVID-19 cases spiked in the winter, and then reopened in February.

We’ve seen overwhelming demand…Patrons think it’s a fantastic option…

Mark Williams, Milton Public Library

“The staff have been remarkably agile and good at adapting their approach,” Williams said. “We’ve done the best we possibly could to ensure the public library services continued, but the way we deliver it is different than anyone would have expected.”

In addition to joining Open Libraries, MPL donated 30,000 books to the Internet Archive. Williams said the expanded access to content in the larger online library has been a boon to the public. Regardless of the pandemic, MPL would have spread the word about access to Open Libraries, he said, but it was likely accelerated because there was no choice but to focus on digital offerings in the pandemic.

Milton Public Library

“The lockdown highlighted the ability for us to raise awareness about the partnership and introduce it to more patrons,” Williams said. MPL is creating a new portal on its website that will be dedicated to Open Libraries but has been promoting its availability in the meantime and the response has been positive.

“We’ve seen overwhelming demand,” Williams said. “Patrons think it’s a fantastic option for them to have increased materials than we currently have available.”

The transition to becoming part of the Open Libraries program was seamless, said Williams, and he’s encouraging other libraries to consider joining.

“I hope if other libraries sign up, they will be equally inspired by the partnership. The content is amazing,” Williams said. “Our patrons think it’s phenomenal. Our board thinks it’s a great idea, philosophically. Everyone believes this is an important service addition.”

To browse the books now available for lending through Milton Public Library’s participation in the Open Libraries program, please visit: https://archive.org/details/miltonpubliclibrary-ol. Learn how your library can participate in the Open Libraries program.