Taking the Words Out of My Mouth (With AI)

Everyone has a different idea of what they’d do with a time machine. Mine’s pretty simple: Head back to 2012, find myself working on a side project to film a documentary, and grab my younger self by the lapels and shout, “A 1099 IS NOT A REIMBURSEMENT! GET AN ACCOUNTANT!” before whatever energy sending me into the past gave out.

That simple mistake on my part had truly stunning financial consequences. When the tax bills and penalties started hitting my mailbox around 2014, it became a mass of stress. The IRS is undefeated in the award for Most Intense Collection Letters, and they were coming on a regular basis, even as I started directing more and more of my paycheck towards paying the debt.

A few friends generously sent me money to help beat back the interest and costs, but the combination of this and other debts had me backed into a corner, so I decided on a simple plan: Run a Patreon campaign where I told stories and opinions in short episodes, which were then supported by the audience, with all the proceeds going into debt repayment. The resulting podcast, Jason Scott Talks His Way Out of It, helped dig me out of that hole.

But it had an interesting side effect – coming up with new topics of discussion and consideration, once a week, meant that I had to mine deeper and deeper into my own outlook and relevant stories. After the first fifty episodes, I turned further inward. After one hundred, it became equal parts emotional and technical. And as I’m heading towards three hundred episodes, I’m surprised I have anything left to say at all. But I apparently do, and having now been doing this podcast weekly for six years, its episodes span a truly panoramic spectrum of topics.

(This is what a standing recording booth looks like – putting your head in an Audio Mailbox to maintain quiet background noise while speaking. It gets very hot in there.)

The Patreon gave patrons early access to the episodes, but the episodes are all open and uploaded some months later to the general world, including this collection at Internet Archive. Download, listen, remix, whatever you’d like – you have my complete permission and blessing.

Episodes have been uploaded to the Internet Archive since 2019, but in 2022, an opportunity try out new technology came up – the Whisper project, open sourced and instantly downloadable, could be implemented for transcription, either as part of video or just a basic audio file. And thanks to the project, I had many audio files, and began experimenting with using Whisper against them.

Speech recognition, the process of turning spoken words in a microphone or pre-recorded audio files into written words or issued commands, has been around a very long time – decades and decades. The Internet Archive is excellent for doing a dive into historical citations; a fast “text contents” search found these points of discussion in a 1979 issue of the Silicon Gulch Gazette newsletter:

What has changed is the combination of much faster computers, much more analysis of speech, and advances in cross-referencing the resulting training to make chips and, in this case, a program that is using other disciplines within computer science to pattern-match audio, to the point of adding capitalization and punctuation from the implications in the words. Turning this against my growing collection of podcasts, it wasn’t long before I’d say what has continued to be a theme: when it works, it’s shockingly good, and when it doesn’t, it’s shockingly bad.

As an experiment and exploration, it was very useful to let the program run, shoot out a block of text, and generate the resulting timing blocks for the purposes of subtitles or transcription:

…but it would have been foolhardy to 100% walk away and let it do transcription without a second human-driven scan through the results to find mistakes. I’ve been that human, and I’ve seen things.

I’ve seen the resulting transcriptions do great jobs with proper name capitalization, odd and challenging punctuation, and paragraph breaks. I’ve also seen it knock itself silly on my New York accent and non-obscure phrasing, and definitely making a poor guess on my made-up word “Cowicature”. The algorithm works great, except when it doesn’t.

And here we get to a turn of phrase I’ve come to adapt, which is an alternate term for AI: “Algorithmic Intensity”. The human need to give life and will to machinery is a very long-lived one; but most who look at the code behind this mechanism would agree – it’s just code. The only difference is that the amount of computing power and data to derive the outcome dwarfs numbers considered unattainable a decade or two ago.

And the speed can’t be beat – 5 years of weekly podcasts took roughly 4 hours to completely transcribe by Whisper, and that amount of time was simply because it was set as a lower-priority action in the queueing system. Knowing how long the total time for all the uploaded episodes of my podcast are sitting in that collection is a little involved, but my back of napkin estimation is it’s two and a half solid days of me talking about technology and emotion, ranging subjects from programming and compilers to summer camp and family. I can assure you – I was not going to transcribe these podcasts anytime soon, and I was not going to ask someone to do it. While I’m proud of my work, I’m in no position to be able to record four podcasts in a month and create paragraphs of text from them.

Or, for that matter, descriptive summaries.

Which brings me to the latest, continual tinkering with the tools and environment available to analyze materials with algorithms. I began asking a large language model to look at the generated transcriptions and create a summary of a given podcast episode.

Two-plus years into generalized algorithmic intensity access, it’s still very much a lumpy and oddly spell-casting endeavor. Instead of asking it to “summarize this transcription”, my request reads like a headmaster at a school or a Dungeons and Dragons game text:

You read transcripts of a podcast and carefully write out descriptions, in the form of narrative paragraphs, to accurately describe the content of the podcast. Longer and more complete descriptions are better, and encouraged. You describe the main subjects, conclusions by the participants, and provide helpful context for the subjects. The podcast you listen to is:

…followed by the transcription of the podcast, time-codes and all.

A matrix of calculation, fast beyond my reckoning but not less mysterious-and-not-mysterious as CPUs and networking itself, begins analyzing the language in the transcriptions, cross-connecting ideas mentioned, occasionally volunteering more information based on matches to terms, and within a few minutes, language comes out.

…in this language, it’s easy to find places where there’s an algorithm in the mix, a machine putting out syllables and phrases based on what I said. Like a kaleidoscope or a magnifying glass, there are occasional strange distortions and what approaches funhouse-mirror reflections of what I put in.

And oddly enough, it brings up memories.

It reminds me of being in my first neighborhood and all the kids circled around a fun and weird toy, a tape recorder, where you pressed two buttons and the … cassette, it was called? Would turn slowly and you would shout whatever came to the top of your head, press STOP, and then rewind and hear your own voice. It was distorted and weird, but it was my own voice, and I’d not heard it before from outside my own head. The world shifted, a little bit.

It reminds me of being in a computer lab at MIT in my twenties, playing games on a lunch break during a temp job I had on campus, and finding out that if I went to this obscure IP number and downloaded this very large binary file and ran it, this gargantuan program would pop up and begin asking me what “address” to go to, and if I knew some of the addresses, they would turn into full-color, fully-laid out, beautiful print-like presentations along a grey background. And, to my utter disbelief, looking at a menu option called “view source” would let me see the bizarre “code” behind these lovely looking pages. It was called Hypertext Mark-Up Language, HTML, and messing with that odd and ever-changing set of written directives inside textfiles would be a significant part of my next quarter century.

There have been all these times the times that computers (on my desk or lab or out in some nebulous “net” that girdled the Earth) provided me with a new set of buttons and knobs to press. To play around. To see what hums and clicks, and to do my best to understand what is happening underneath. In many ways, these computers have done very well by me; and they have also brought stress, sleepness nights and confusion; even anger.

Put another way, these newest concepts and tools, these Intense Algorithms, sweeping the news and the discussion around virtual watercoolers, are the latest in a long line of topics about what computers will bring to human lives. It’s a deep and varied discussion and it deserves consideration and time, and respect of the opinions of others.

And while (hopefully) the world processes this newest use of computers, we’ll see where, if anywhere, they’ll help with the mass of information and digital materials that sites like the Internet Archive provide.

Asking an LLM to look at the full text of a nearly 200-page book about the raising of ducks, such as the legendary DUCKS; AND HOW TO MAKE THEM PAY (1924), provides a paradise of questions you can ask the book as if it was able to answer in full sentences:

Q: How, exactly, do you make ducks pay?

A: The book “Ducks: and how to make them pay” by William Cook outlines various methods for making duck farming profitable. It covers the selection of suitable duck breeds, breeding techniques, and the proper management of feeding and housing. Additionally, it provides insights into marketing strategies for selling ducks and their eggs. The goal is to guide readers on how to efficiently manage duck farms to maximize income from their operations.

Q: What is, in your opinion, the best duck?

A: Choosing the “best” duck depends on what you’re looking for! For farming purposes, the Pekin duck is often considered ideal because it grows quickly and has a high yield of meat. If egg production is your priority, the Khaki Campbell duck is highly recommended, as it is one of the most prolific egg layers. For those interested in ornamental ducks, breeds like the Mandarin or Wood Ducks are known for their striking appearances. Each breed has its own advantages depending on your goals and the specific requirements of your duck farming or hobbyist endeavors.

(…as an aside, the book has one of the best bookplates inside the front cover, one which only a human being would truly apprecate.)

And that’s how I see the near future with this newest use of computers – an implementation of these tools to make materials easier to find, to summarize and help point researchers and students, and allowing new ways to work with a staggering amount of information in the online stacks. Hiding from it won’t be a solution; but asking hard questions of it might be.

Meanwhile, six years of my spoken-word memories of the ways computers have affected my life are sitting in a collection, open to all and welcome for anyone to use to tinker with. Have a great time.

A Happy Ending For Some Manual Labor (And a Call for Support)

A great saga of rescue and preservation is coming towards its end, and there’s a chance to bask in the victory, and help push towards its conclusion.

I got word in 2015 of a collection of manuals inside a business that was getting out of the manuals business, and while a lot of well-meaning people talked a good game, they wanted to cherry-pick (people getting rid of stuff hate cherry-pickers), and I drove down to show I was serious, and after a week of work with MANY volunteers and contributors, we ended up with pallets of documentation inside boxes, tens of thousands of unique manuals, many nowhere else.

Then they were stored in a storage unit. Then they were stored in a closed coffee house. Then they were transported to Internet Archive’s Physical Archive. Then they were stored until last year, 2023.

Last year, a group called DLARC, doing digitizing and indexing projects around ham radio and radio technology, worked with me and the archive to sort four pallets of the manuals for products related to the history of radio/network technology, and off they went overseas to be scanned. And as of this month, the evaluated, professionally-scanned and available-to-the-world manuals are finished, except for a few stragglers. The loop has closed!

You can browse the collection of thousands of scanned manuals here:

The Manuals Plus Collection

And now, the pitch.

The company doing the digitizing does lots of digitizing for the Internet Archive. They are well-paid and legitimate professional contractors who are sent the items, and who do careful scanning to the best of the materials’ ability to provide access to the information, and then do quality checks, and then upload them. When they’re humming, they’re processing a pallet every couple of weeks (with lots of mitigating factors).

I’ve negotiated a situation where, if money is sent in, the remaining pallets that should be scanned can just be sent along without sorting them for DLARC funds, DLARC will fund any that happen to overlap with their mission, and the rest will just be done.

That’s if money is sent in.

How much money? The number approaches hundreds of thousands of dollars. So I’m looking for both big-ticket supporters (who can mail me at jscott@archive.org) or individuals.

Here is a specialized donation link: https://archive.org/donate/?origin=jssm-dlarc

If we make less than we need to scan them all, then we’ll only scan up to where it’s paid for. I believe we can close it out, but if the interest/money isn’t there, then it isn’t there – fair enough. Browse the collection as it grows into thousands of manuals as it is and consider if you want to be part of all that. That’s definitely happened.

But what a happy ending it would be to push all these manuals through the process, and close it up. That’s why I’m popping up to talk about it, and why I hope you would consider contributing towards it, for a non-profit that deserves your support generally.

Internet Archive Stands Firm on Library Digital Rights in Final Brief of Hachette v. Internet Archive Lawsuit

Today, the Internet Archive has taken a decisive final step in our ongoing battle for libraries’ digital rights by submitting the final appellate reply brief [PDF] in Hachette v. Internet Archive, the publishers’ lawsuit against our library. This move reaffirms Internet Archive’s unwavering commitment to fulfilling our mission of providing universal access to all knowledge, even in the face of steep legal challenges.

READ THE FINAL APPELLATE REPLY BRIEF

Statement from Brewster Kahle, founder and digital librarian of the Internet Archive:
“Resolving this should be easy—just sell ebooks to libraries so we can own, preserve and lend them to one person at a time. This is a battle for the soul of libraries in the digital age.”

This process has taken nearly four years to work through the legal system, and in that time we’ve often fielded the question, “Why should I care about this lawsuit?” By restricting libraries’ ability to lend the books they own digitally, the publishers’ license-only business model and litigation strategies perpetuate inequality in access to knowledge.

Throughout this legal battle, Internet Archive has remained steadfast in our mission to defend the core values of libraries—preservation, access, and education. This fight is not just about protecting the Internet Archive’s digital lending program; it’s about standing up for the digital rights of all libraries and ensuring that future generations have equal access to the wealth of knowledge contained within them.

Aruba’s Bold Support of Library Digital Rights, by Brewster Kahle

Aruba’s Prime Minister, Evelyn Wever-Croes: “Give them the opportunity to search for the truth.”

Last week Aruba launched the island nation’s digital heritage portal online: Coleccion Aruba. As trumpeted in Wired:  “The Internet Archive Just Backed Up an Entire Caribbean Island,” but really the credit goes to Aruba. Digitizing their national cultural heritage (100k items) and putting it online for free public access is a huge achievement.

I met with the Prime Minister (pictured above), the Minister of Culture, and the Minister of Education who backed the efforts made by the National Librarian, National Archivist, and their digital strategist. Never have I seen such unified support for cultural preservation and access. They brought together people from the Dutch islands and the Internet Archive to share the news and to inspire and to lead.

Aruba was the first to sign onto the Four Digital Rights of Memory Institutions: right to Collect, Preserve, provide Access, and interlibrary Collaboration. These are bad times when we have to reclaim these rights that are being taken from all libraries, but Aruba is making a stand. Go Aruba!

Aruba’s National Librarian, Astrid Britten, signs the Four Rights, as the National Archivist, Raymond Hernandez, and Brewster Kahle look on.

If libraries are reduced to only subscribing to commercial database products rather than owning and curating collections, we will be beholden to external corporations and subject to their whims over what’s in licensed collections, and how patrons can access them. The “Spotify for Books” model is not the way we want our libraries to go. 

To top it off, the Prime Minister, Evelyn Wever-Croes, inspired us when she told us that for the next generation, we need to “Give them the opportunity to search for the truth.” Yes.

Inspiring to see a country lead so well. I hope we have the honor of working with other nations that will also assert Digital Rights for Libraries, and live by those principles.

– Brewster Kahle

Diversifying Access to the Local Historical Record with Community Webs

Community Webs partners on the NEH supported,  Increasing Access to Diverse Public Library Local History Collections
Partners on the NEH supported, Increasing Access to Diverse Public Library Local History Collections

Since 2017, Community Webs has partnered with public libraries and heritage organizations to document and diversify the historical record. These organizations have collectively archived over 100 terabytes of web-based community heritage materials, including more than 800 collections documenting the lives of those often underrepresented in history. In 2023, Community Webs began offering collection digitization and access with support from the  National Historical Publications and Records Commission (NHPRC). Today, Community Webs is happy to announce $345,000 in additional support from the National Endowment for the Humanities to digitize and provide open access to more than 411,000 local history collection items from seven Community Webs partners: Athens-Clarke County Library, Belen Public Library, District of Columbia Public Library, Evanston History Center, Jersey City Free Public Library, San Francisco Public Library, and William B. Harlan Memorial Library. 

Community Webs partner collections include a diverse range of content from across the country representing the life of immigrants, Black, and minority communities throughout US history. This includes records created by and for them, such as the Julius Hobson Papers from District of Columbia Public Library, the Belen Harvey House Collection from Belen Public Library, and the Local and Regional Family Histories collection from the William B. Harlan Memorial Library. 

ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination
ACE Newsletter, Vol. 1, No. 3, Julius Hobson Papers on Federal Job Discrimination (source)

The collections also contain items that document city and municipal agencies that significantly impact minority communities. Digitization of this material will produce a deeper understanding of how systems of power and legal structures can regulate or even erase minority community histories, especially in regards to housing and economic opportunities. For example, the Athens City Engineer Records from Athens-Clarke County Library, the African American Housing and History collection from Evanston History Center, and the San Francisco Redevelopment Agency Records from San Francisco Public Library show the impact of urban redevelopment on Black and minority neighborhoods. The Municipal Records and agency scrapbooks from Jersey City Free Public Library show the ways that politics and economic changes impacted immigrant and minority communities. 

Ashley Shull, Collections Coordinator, Athens-Clarke County Library shares what this project means to the community:

“The opportunity to be involved in a project proposal like this with the Internet Archive and our other library partners is invaluable to our community. The increased access to our Athens City Engineer collection will provide, not only local citizens, but academic researchers from around the world as well as current Athens-Clarke County Government officials insight into the past planning activities of our community. This is especially important as our local government embarks on a new Comprehensive Community Plan.”

John Beekman, Chief Librarian, Jersey City Free Public Library, also emphasized the impact of access to important city records:

“The Jersey City Free Public Library is honored to work with esteemed libraries from across the country on this innovative project spearheaded by the Internet Archive’s Community Webs program. The municipal minutes and records that make up the bulk of our contribution contain a wealth of information, not only on the workings of city government and agencies, but the people whose work is recorded there. Names and activities present in these records that never made the news will now be discoverable through search rather than the needle-in-a-haystack experience of poring over individual volumes of minutes. Making these materials accessible will provide a tool for enriching the record of city life across the 19th and 20th centuries.”

Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photograph
Hunters Point housing phase one map with unit totals, an Francisco Redevelopment Agency Records. Hunters Point Project Area A. Photographs (source)

The Community Webs program’s core goals are to increase the diversity of voices represented in the accessible historical record and to forge authentic partnerships between public libraries and heritage organizations that are members of Community Webs and the communities, individuals, and researchers they serve. Digitizing these collections will expand the overall amount and diversity of locally-focused community archives available online to users, and will augment the web and digital collections that are already aggregated by Community Webs. Records will also be shared with the Digital Public Library of America, further strengthening collection discovery. 


The Internet Archive and Community Webs are thankful for support from the National Endowment for the Humanities.

Learn more about Community Webs members, projects, and collections on our blog. Get in touch with us at commwebs@archive.org to discover ways to partner to preserve local history!

Essential Music Concerts From Home: Curated for the Internet Archive by Rob Evanoff

Amid the trials of the early pandemic, the Internet Archive’s transition to remote work in March 2020 brought the challenge of maintaining engagement for our all-staff virtual meetings. In April 2020, we devised a creative solution: biweekly performances by musicians preceding our Monday and Friday meetings. Dubbed “Essential Music Concerts from Home,” this initiative mirrored the enticement of providing donuts or snacks to draw attendees to a staff gathering. Now, as we mark its 4th anniversary, we extend our gratitude to Producer/Manager Rob Evanoff for his contributions, bringing over 50 artists to our virtual stage.

In tribute to Rob’s impact, we’d like to highlight several of the artists he represents.


Carlos Calvo

Carlos Calvo is a celebrated and versatile musician, composer, and educator. His repertoire includes contemporary and flamenco music genres. Renowned in the Los Angeles entertainment and media industry, Calvo is highly sought-after for his talents as a composer for television and film.


Joanna Pearl

Joanna Pearl exudes an unmistakable passion for music. Pearl’s powerful vocal prowess and authentic songwriting capture the essence of her musical journey. “I write from the heart and always try to relate to others by writing what I’m feeling. It’s a direct reflection of who I am.”


Afton Wolfe

Afton Wolfe has embraced various roles from philosopher to lawyer to musician. At his core, Wolfe is deeply connected to the rich heritage of rock, blues, and soul, with roots firmly planted in Mississippi.


Teni Rane

Teni Rane has a universally appealing vintage vocal style that captures the essence of everyday life. She explores her craft with a distinct fusion of Americana-folk-pop and a touch of jazz.


King Corduroy

King Corduroy is inspired by the authentic charm of American roots music. As a modern songwriter, he has been traversing the musical universe for years, crafting his unique brand of “Cosmic Southern Soul” along the way.


Ash & Eric

Ash & Eric had a musical partnership. As they played together, their musical partnership blossomed into love. Together, they have cultivated a vibrant community of supporters bound by their shared passion for music and storytelling.


If you would like to perform for one our 10 minute concerts please contact bz@archive.org.

Aruba Becomes First Country to Endorse Statement Protecting Digital Rights of Memory Institutions

From left: Aruba’s National Librarian, Astrid Britten (Director, Biblioteca Nacional Aruba), signs the statement protecting memory organizations online as Raymond Hernandez (Director, Archivo Nacional Aruba) and Brewster Kahle (Founder, Internet Archive) look on.

This was a week of firsts in Aruba. The small island nation in the southern Caribbean launched its new heritage portal, the Aruba Collection (Coleccion Aruba), and it became the first country to sign a statement to protect the digital rights of libraries & other memory institutions.

Internet Archive founder Brewster Kahle and Chris Freeland, director of library services at the Archive, attended the signing ceremony in Aruba, a country in the Kingdom of the Netherlands located 18 miles north of Venezuela.

Support for the statement, Four Digital Rights For Protecting Memory Institutions Online, was spearheaded by Peter Scholing, information scientist and researcher at the country’s national library, Biblioteca Nacional Aruba (BNA). Last fall, he learned about the need for library digital rights to be championed during a conference at the Internet Archive in San Francisco. While much of that discussion was based on the 2022 report, “Securing Digital Rights for Libraries: Towards an Affirmative Policy Agenda for a Better Internet,” authored by Lila Bailey and Michael Menna, and focused on protecting library access to e-books, Scholing was interested in Aruba making a broader statement—one encompassing all memory institutions and the diverse types of materials they house.

“Over the last few months we’ve brainstormed about these digital rights and how to broaden the statement to make it relevant to not only libraries, but also for memory institutions and GLAMs in general,” said Scholing, using the acronym for galleries, libraries, archives & museums. “In that sense, it has become a near universal declaration for open access to information, in line with the United Nations’ Sustainable Development Goals (UN 2030 Agenda/Sustainable Development Goals, #16.10) or other statements on open access to documentary, cultural or digital heritage. This aligns almost perfectly with what we aim to achieve here on Aruba—universal access to “our” information.”

Many memory institutions on the island have long worked together to digitize collections including books, government documents, photos and videos. The statement reinforces the importance of libraries, archives, museums and other memory institutions being able to fulfill their mission by preserving knowledge for the public to access.

Initial Signing Organizations

  • Archivo Nacional Aruba (ANA)
  • Aruban National Committee for UNESCO’s Memory of the World Programme
  • Biblioteca Nacional Aruba (BNA)
  • Coleccion Aruba
  • Museo Arkeologico Nacional Aruba (MANA)
  • Stichting Monumentenfonds Aruba
  • Union di Organisacionnan Cultural Arubano (UNOCA)

The statement asserts that the rights and responsibilities that memory institutions have always enjoyed offline must also be protected online. To accomplish this goal, libraries, archives and museums must have the legal rights and practical ability to:

  • Collect digital materials, including those made available only via streaming and other restricted means, through purchase on the open market or any other legal means, no matter the underlying file format;
  • Preserve those materials, and where necessary repair or reformat them, to ensure their long-term existence and availability;
  • Provide controlled access to digital materials for advanced research techniques and to patrons where they are—online;
  • Cooperate with other memory institutions, by sharing or transferring digital collections, so as to provide more equitable access for communities in remote and less well-funded areas.

DOWNLOAD THE STATEMENT

In Aruba, Scholing said library and archive leaders believed strongly that these rights should be upheld with a public endorsement. Michael Menna, co-author of the statement and the 2022 report, saw this as a key first step in building a coalition of memory institutions.

“Aruba has been brave to make such a clear and unequivocal statement about the many challenges facing libraries, archives, and museums,” said Menna. “Simply put, these essential institutions need better protections to adapt their services to today’s media environment. Hopefully, after hearing Aruba speak out, others can follow suit.”

Report co-author Lila Bailey, senior policy counsel at the Internet Archive, said that seeing the statement embraced and endorsed by memory institutions is rewarding.

“It is a thrill to see Aruba leading the way towards a better digital future for memory institutions worldwide,” said Bailey. “These institutions must meet the needs of a modern public using the best tools available. It is good public policy and basic common sense that libraries, archives and museums should be not only permitted but encouraged to leverage digital technologies to serve their essential public functions.”

The statement can be endorsed by governments, organizations, and individuals following a verification process. If you are interested in signing the statement, or would like to learn more, please complete the initial online inquiry, or e-mail Chris Freeland, Internet Archive’s director of library services, at chrisfreeland@archive.org.

Aruba Launches Digital Heritage Portal, Preserving Its History and Culture for Global Access

Many know Aruba as a popular tourist destination with beautiful beaches. The small island nation just north of Venezuela is also home to 110,000 inhabitants with a rich history—that many are working to preserve.

Aruba’s memory institutions have been digitizing materials for years. Initially, residents and international scholars could only view the items at the library on the island. But now with the help of Internet Archive, the Aruba Collection (Coleccion Aruba) is available to anyone for free from anywhere.

EXPLORE THE COLECCION ARUBA

A celebration of the heritage portal’s launch is being held via livestream on April 8.

COLLABORATION IS KEY

Digitizing the island’s historic materials was a collaborative effort. After Aruba became a country within the Kingdom of the Netherlands in 1986, the national library (Biblioteca Nacional Aruba; Aruba National Library – BNA) and the national archives (Archivo Nacional Aruba; National Archives of Aruba – ANA) were established. Leaders from the two institutions worked together to curate and scan artifacts including newspapers, government reports, and cultural items.

“Aruba has a challenging past due to migration, colonization, and slavery,” said Peter Scholing, information specialist/researcher at BNA, the national library. “That means there has been a diaspora of people coming in and spreading out throughout the world—the same goes for our collection and documents.”

Locating materials to digitize involved several local institutions on the island. Because the materials are scattered, Aruba has branched out to collaborate with others in the Caribbean, Venezuela, Netherlands and the United States. The local leaders established protocols and standards for the collection, but didn’t have enough resources to make the materials available in a robust digital library.

Kaart van het Eiland Aruba (1825) / Map of the Island of Aruba (1825)

Connecting with the Internet Archive to host the digital collection provided the missing piece of the puzzle, according to leaders in Aruba. “Because of the reality of our small island state, we don’t have much funding for big company servers,” said Raymond Hernandez, head of the Aruba National Archives (ANA). “If you have a limited budget, it’s not possible. The dream has come true, thanks to the Internet Archive. We are very grateful.”

The new portal on the Internet Archive devoted to Aruba includes links to the several other institutions such as UNOCA (Union di Organisacionnan Cultural Arubano (UNOCA),  Museo Arkeologico Nacional Aruba (MANA); National Archaeological Museum Aruba; Stichting Monumentenfonds Aruba (SMFA); Monuments Fund of Aruba), Departamento di Cultura Aruba, University of Aruba, TeleAruba and UNESCO Aruba.

EXPLORE THE COLECCION ARUBA

The collection has more than 100,000 items to date — nearly a one-to-one ratio for the island’s population. This includes about 40,000 documents, 60,000 images, 900 videos, 45 audio files and seven 3D objects for a total of 67 thematic and/or institutional (sub)collections.

As an additional layer of protection, the materials are being uploaded to the Filecoin  decentralized storage network, thanks to a longstanding relationship between the Internet Archive and Filecoin Foundation for the Decentralized Web (FFDW).

[See paper on the Aruba Model – Coleccion Aruba: Intersectoral Collaboration on Aruba as a Model for the (Dutch) Caribbean : A collaborative approach for preservation and access of collections in small island states]

RESEARCH USE OF THE COLLECTION

Chelsea Schields, University of California, Irvine

For Chelsea Schields, associate professor of history at the University of California, Irvine, the materials were so compelling and easy to use that she integrated them into her undergraduate course, “Oil and Capitalism.” Students learn about the global history of petroleum and develop research skills to build an argument based on evidence. “Students use the Aruba Collection to write research papers related to the culture of oil towns,” Schields said. “It is often their favorite part of the course because they get to dig into the sources themselves and identify the themes that resonate across those materials.”

Unlike other primary source collections, which are often cumbersome and hidden behind a costly paywall, the diverse sources found enabled students to write papers on topics ranging from migrant domestic workers in Aruba to the spatial organization of oil towns. 

In her own research for a book on the social histories of oil refineries on Aruba and Curaçao, Schields said the Aruba heritage portal was extremely useful when the COVID-19 pandemic restricted travel in the summer of 2020. “The Aruba Collection provided such an indispensable, bottom-up portrait of the history of the island’s Lago Refinery, which at its peak was among the largest plants in the world,” she said. “From photographs of refinery workers and their families to digitized copies of employee publications, these sources allowed me to see the labor required to transform oil into the commodities we rely upon today.”

Adi Martis, Utrecht University (emeritus)

Since the launch of Coleccion Aruba, Adi Martis said he uses the website almost every day. The emeritus associate professor at Utrecht University in The Netherlands appreciates how easy it is to access a variety of materials in national archives and the national library collections.  For example, by combining data from digitized historical maps and land ownership register books from the Aruban Land Registry, users can gain an insight into the history of land ownership on the island, he said. 

By applying AI-based, Handwritten Text Recognition (HTR) algorithms, the digitized, difficult-to-read handwritten texts are made accessible to the public and transformed into searchable data. Martis said in some cases, digitized archives from Aruba, Curaçao and the Netherlands are combined and search results are sometimes surprising—in particular with data about the history of slavery. Users can search using different keywords and the site can even create family trees, which normally can be difficult because the slaves had no surnames.

“For the past 50 years I have been doing archival research and I must admit that I am proud of my small island that was able to achieve such incredible results in such a short time with the help of Internet Archive,” Martis said.

Jan Bant, a doctoral student in history from Aruba who lives in The Netherlands, relied heavily on the Coleccion Aruba when doing research for  his master’s thesis in 2020 during the COVID-19 lockdown. Although he was unable to return to the island, he accessed journals and newspaper articles from Europe to examine Aruba’s political climate between in the 1970s and 80s. Being able to enter key words and dates in the search function was particularly helpful in locating sources. Bant was able to uncover documents about protests, revealing the country’s somewhat radical traditions of commenting on world affairs despite its image as a calm player in the Caribbean, he said.

As Bant continues his PhD research on the role of sports in Dutch Caribbean communities, he is tapping into the Coleccion Aruba, including materials about the oil refinery and laborers who brought baseball to the island.  

Bant contributed back to the portal by uploading his completed master’s thesis, which was completed in 2021. “There is a lot of research about Aruba that gets written but it’s never really used—often because people don’t know where to find it,” Bant said. “The Aruba Collection can also serve well as a repository to store research that has been done about Aruba. That’s what I think is very valuable.”

SERVING PATRONS

Aruba’s UNOCA Managing Director Ray-Anne Hernandez said the heritage portal allows users to easily search her foundation’s work of arts and culture. Researchers now can go to one place to locate digitized images and documents. 

“We have collections that we want to share and have accessible to the public, so this was a logical step to be part of this collaboration,” Hernadez said. “In the collection, we have history. We have art, music, and education. It’s so much more than we initially thought it would be and that fills us with great pride and great joy. It’s not just that we made a website. It’s something that’s continually growing and everybody is using it.”

The Dutch Caribbean Digital Heritage Week will be held on Aruba April 8-12. For the first day, April 8, a day-long symposium is planned, titled “Connecting our Shared Heritage: Linking (Dutch) Caribbean Heritage Institutions and Collections”, with keynote speeches from Brewster Kahle (Internet Archive), Eppo van Nispen (Dutch Network for Digital Heritage NDE and Netherlands Institute for Sound and Vision), and contributions from a wide range of heritage professionals from across the Dutch Caribbean, and the world. It will be livestreamed via https://coleccion.aw/stream.

[Editorial note: For another take on the Coleccion Aruba, see, “The Internet Archive Just Backed Up an Entire Caribbean Island” from Wired.]

Happy National Library Week 2024!

At the Internet Archive, we’re celebrating the power of libraries to transform lives and communities during this year’s National Library Week (April 7-13, 2024). From preserving the past to shaping the future, libraries are vital hubs of knowledge, connection, and inspiration.

To mark this special week, we’re shining a spotlight on our incredible admin team. They work tirelessly behind the scenes, ensuring our events and virtual spaces are welcoming to all members of our global community.

Join us in thanking them for their dedication and hard work! Watch the video to meet the faces behind the scenes and learn more about how we strive to make knowledge accessible to everyone, everywhere.

Check out more about National Library Week here: https://www.ala.org/conferencesevents/celebrationweeks/natlibraryweek

Essential Music Concerts From Home: The Variety

In early March 2020, much like the rest of the United States, the staff of the Internet Archive transitioned to fully remote work in anticipation of the prolonged pandemic. This change was monumental and, like all workplaces, we discovered the challenge of sustaining a feeling of connection, morale, and joy within the team.

Recognizing this challenge, our Director of Media & Access, Alexis Rossi, came up with a creative solution. It was already part of our workplace culture to have two weekly all-staff meetings—one at 10am PT Monday morning, and another at Friday lunch. As everyone moved to joining those meetings from home, Alexis began hosting short concerts before them by performers, particularly musicians, to uplift our team’s spirits. These concerts provided not only entertainment, but also a means of keeping our team engaged and the performers booked during uncertain times.

The initiative began with a performance by Alexis’s friend, Jefferson Bergey, whose talent for musical theater and captivating stage presence set the stage. At the time, we envisioned organizing these concerts for just a few months, as none of us could predict the duration of the pandemic.

Fast forward several years and our work world has undergone a profound transformation. Encouraged by the overwhelmingly positive response from our now mostly remote staff, we decided to continue the program, thus giving birth to “Essential Music Concerts From Home.” As we approach our fourth anniversary in April, we reflect on how this simple yet impactful idea has helped sustain our remote workplace culture through the years. We thought it would be fun to offer you a glimpse into some of the unique musical encounters enjoyed by the Internet Archive staff with some exceptionally talented musicians.


Jefferson Bergey

Jefferson Bergey is a professional musician and cherished figure in the Bay Area, known as “Fun for Hire.” His musical style epitomizes versatility, adapting to any desired vibe or genre with ease. Drawing from the rich foundations of jazz, blues, pop, folk, bluegrass, and rock, his songs are crafted with a distinct flair for musical theater. He is such a popular Bay Area performer, there’s even a burger named after him.


Jeanie & Chuck Poling

Jeanie & Chuck Poling have been making music together since 1982. Their act, Jeanie and Chuck’s Country Roundup, specializes in honky tonk and bluegrass tunes played on acoustic instruments. Their performances are known for blending music, humor, and showmanship to entertain audiences. Additionally, Chuck has served as the emcee at the Rooster Stage at Hardly Strictly Bluegrass since 2012.


Joliet

Joliet, hailing from Kansas City, is an independent singer/songwriter and live music streamer. Her vocal style is both distinctive and commanding. With her bold and expansive sound, Joliet offers up heartfelt and captivating charm. She plays live on platforms such as Smule and Twitch, where she has introduced her original compositions to audiences worldwide.


Ben Cosgrove

Ben Cosgrove is a nomadic composer, pianist, and multi-instrumentalist rooted in northern New England. Across his artistic journey, Ben’s compositions and performances have been shaped by his profound fascination with landscape, geography, place, and the environment.


Cello Joe

Cello Joe, also known as Joey Chang, defies convention within the realm of cellists. Cello Joe combines the cello with beatboxing, vocals, and live looping to create a unique fusion. His performances blend classical music with hip hop elements, showcasing his ability to generate rhythmic beats using both his cello and vocal talents in real-time. He is known for being the “Wildest Beatboxing Cellist in the West”.


Glitterfox

Glitterfox is a Portland Oregon based band. At the heart of Glitterfox are the band’s songwriters and frontpersons, the married couple Solange Igoa and Andrea Walker. Drawing from their personal struggles and experiences as queer, neurodivergent individuals, they infuse their songwriting with raw emotion. They imbue their music with a passion for Americana, grunge, and dance genres.


Rob Reich

Rob Reich epitomizes the essence of the San Francisco music scene, serving as a cornerstone of its vibrant underground community. Renowned for his eclectic style, he blends robust melodic concepts, rhythmic dynamism, and a penchant for irreverence and innovation. 


Please note that these recordings were conducted via Zoom, which often leads to lower fidelity audio quality. For a more immersive experience, we encourage you to explore these artists further on their respective websites.

If you would like to perform for one our 10 minute concerts please contact bz@archive.org.