Category Archives: News

A Quarter In, A Quarter-Million Out: 10 Years of Emulation at Internet Archive

10 years ago, the Internet Archive made an announcement: It was possible for anyone with a reasonably powerful computer running a modern browser to have software emulated, running as it did back when it was fresh and new, with a single click. Now, a decade later, we have surpassed 250,000 pieces of software running at the Archive and it might be a great time to reflect on how different the landscape has become since then.

Anyone can come up with an idea, and the idea of taking the then-quite-mature Javascript language, universally inside all major browsers and having it run complicated programs was not new.

With the rise of a cross-compiler named Emscripten, the idea of taking rather-complicated programs written in other languages and putting them into Javascript was kind of new.

That all being the case, the idea of taking a by-then 20-year-old super-emulator called MAME, using Emscripten to cross-compile it into Javascript, and then running the resulting code in the browser at Internet Archive to make computers and consoles run, was very new.

It was also, objectively, madness.

Well over a thousand hours of work went into the project from a very wide range of volunteers who poured galactic amounts of time into making the project a reality. Along the way, changes were made to Emscripten, the Firefox, Internet Explorer, and Chrome Browsers, MAME, and the Internet Archive’s codebase to accommodate this dream.

It was announced in the Fall of 2013, well over a year after the project started.

Additional announcements came with each expansion of the types of software being emulated, and it became huge news, leading to millions of visitors coming to try this it out.

By any measure, a quarter of a million items later, it has been a huge, huge success.

The rest of this blog entry is pretty pictures and beautiful links, but before we move on, it’s once again important to highlight people who provided major contributions, including Justin Kerk, Daniel Brooks, Vitorio Miliano, James Baicoianu, John Vilk, Tracey Jaquith, Jim Nelson, and Hank Bromley. Dozens more developers spent evenings, weekends, and months to make this system happen. Thank you to everyone involved.

The joy of watching a computer boot up in the browser was (and is) a miraculous feeling. And after that feeling, comes a quick comfort with the situation: Of course we can run computers inside our browsers. Of course we can make most anything we want run in these browser-based computers. What’s next?

Within a short time after our 2013 announcement, the archive was running hundreds, then thousands of individual programs, floppy disks and even cassette-based software from computing’s past.

As emulators besides MAME were added, it became necessary to create a framework for a versatile and understandable method to load emulators. This framework eventually got a name: THE EMULARITY.

In the decade of the Emularity’s existence, the Archive’s software emulation has expanded into directions nobody could have fully expected to work when the project started.

Here are some highlights:

Hypercard Stacks for the Apple Macintosh, a critical period in content creation and computer information architecture, have been restored to easy access, surpassing thousands of hypercards to try instantly.

Plastic Electronic Handheld Games, once a staple of toys in the 1970s through the 1990s, have been able to live once again as, including the original housing that these simple (and not so simple) machines relied on instead of graphics.

As the uploads veered into the many thousands, it became more and more difficult for new adventurous users to figure out what, if any, software was at the archive to check out. This has led to specialized collections focused on one type of program, like the Computer Chess Club. People can use these collections as gateways to quickly testing the waters of now-decades of computer and software history, seeing the turns and twists of countless lost companies and individuals who squeezed every last bit of wonder and spectacle out of these underpowered boxes.

The Calculator Drawer took things to a new level when entire calculators could be emulated, including their unique looks, accompanied by a “drawer of manuals” to browse through if you had to learn (or re-learn) how to make these machines run.

The Woz-a-Day Collection, in many ways, represents the logical end for the role that the Internet Archive’s Emularity can provide for software history. The project is the effort of the software historian 4am, who has spent years on its maintenance. Methodically preserving Apple II software from the original floppy disks, incorporating every last bit and track of the disks with no modifications, and allowing the best fidelity of these programs as they originally were offered, 4am allows some of these programs to be playable for the first time in decades.

With each new batch of added emulated systems and machines have come a greater and greater pool of users, toying with historical software or playing long-forgotten or never-remembered games with a new level of convenience and willingness to try them out.

At this milestone of a decade into this experimental adventure, Internet Archive continues to grow its collection, to test and automate the functioning of both uploaded and self-maintained collections of software, and to provide a vast and necessary service in the preservation of historical software.

And, of course, we all get to enjoy some really great games.

Here’s to what another ten years will bring us!

IMLS National Leadership Grant Supports Expansion of the ARCH Computational Research Platform

In June, we announced the official launch of Archives Research Compute Hub (ARCH) our platform for supporting computational research with digital collections. The Archiving & Data Services group at IA has long provided computational research services via collaborations, dataset services, product features, and other partnerships and software development. In 2020, in partnership with our close collaborators at the Archives Unleashed project, and with funding from the Mellon Foundation, we pursued cooperative technical and community work to make text and data mining services available to any institution building, or researcher using, archival web collections. This led to the release of ARCH, with more than 35 libraries and 60 researchers and curators participating in beta testing and early product pilots. Additional work supported expanding the community of scholars doing computational research using contemporary web collections by providing technical and research support to multi-institutional research teams.

We are pleased to announce that ARCH recently received funding from the Institute of Museum and Library Services (IMLS), via their National Leadership Grants program, supporting ARCH expansion. The project, “Expanding ARCH: Equitable Access to Text and Data Mining Services,” entails two broad areas of work. First, the project will create user-informed workflows and conduct software development that enables a diverse set of partner libraries, archives, and museums to add digital collections of any format (e.g., image collections, text collections) to ARCH for users to study via computational analysis. Working with these partners will help ensure that ARCH can support the needs of organizations of any size that aim to make their digital collections available in new ways. Second, the project will work with librarians and scholars to expand the number and types of data analysis jobs and resulting datasets and data visualizations that can be created using ARCH, including allowing users to build custom research collections that are aggregated from the digital collections of multiple institutions. Expanding the ability for scholars to create aggregated collections and run new data analysis jobs, potentially including artificial intelligence tools, will enable ARCH to significantly increase the type, diversity, scope, and scale of research it supports.

Collaborators on the Expanding ARCH project include a set of institutional partners that will be closely involved in guiding functional requirements, testing designs, and using the newly-built features intended to augment researcher support. Primary institutional partners include University of Denver, University of North Carolina at Chapel Hill, Williams College Museum of Art, and Indianapolis Museum of Art, with additional institutional partners joining in the project’s second year.

Thousands of libraries, archives, museums, and memory organizations work with Internet Archive to build and make openly accessible digitized and born-digital collections. Making these collections available to as many users in as many ways as possible is critical to providing access to knowledge. We are thankful to IMLS for providing the financial support that allows us to expand the ARCH platform to empower new and emerging types of access and research.

Internet Archive + IIIF

Making IIIF Official at the Internet Archive

A joint blog post between the Internet Archive and the IIIF Community


After eight years hosting an experimental IIIF service for public benefit, the Internet Archive is moving forward with important steps to make its International Image Interoperability Framework (IIIF) service official. Each year, the Internet Archive receives feedback from friends and partners asking about our long-term plans for supporting IIIF. In response, the Internet Archive is announcing an official IIIF service which aims to increase the resourcing and reliability of the Internet Archive’s IIIF service, upgrade the service to utilize the latest version 3.0 of the IIIF specification, and graduate the service from the domain to The upgrade also expands the Internet Archive’s IIIF support beyond images to also include audio, movies, and collections — enabling deep zoom on high-resolution images, comparative item analysis, portability across media players, annotation support, and more.

An image visually detailing each step of how a URL for a conceptual IIIF service run by "" may be used to crop, zoom, rotate, and color correct an image and then download the result as a jpeg. Image from


In 2015, a team of enthusiastic Internet Archive volunteers from a group called Archive Labs implemented an experimental IIIF service to give partners and patrons new ways of using images and texts. You can read more about the project’s origins and ambitions in this 2015 announcement blog post. The initial service provided researchers with an easy, standardized way to crop and reference specific regions of images. (Maybe you can tell whose eyes these are?) By making Internet Archive images and texts IIIF-compatible, they may be opened using any number of compatible IIIF viewer apps, each offering their own advantages and unique features. For instance, Mirador is a “multi-up” viewer that makes it easy for researchers to view different images side by side and then zoom into or annotate different areas of interest within each image.

Since its launch more than seven years ago, the IIIF labs service has received millions of requests by more than 15 universities and GLAM (galleries, libraries, archives and museums) organizations across the globe, including University of Texas, UCD Digital Library, Havana University, Digital Library of Georgia, BioStor, Emory University, and McGill University. In this time, the broader IIIF ecosystem itself has blossomed to include hundreds of participating institutions. For all its benefits, the labs IIIF service has been considered “unofficial,” hosted on the separate domain, and several partners have voiced interest in the Internet Archive adopting it as an officially supported service. Today, several members of the IIIF community are collaborating with the Internet Archive to make this happen. 

Josh Hadro, managing director of the IIIF Consortium (IIIF-C), sees the Internet Archive as filling a critical role “in serving the average Internet user who may not benefit from the same access to or affiliation with infrastructure offered by traditional research institutions.” The IIIF-C promotes interoperability as a core element of IIIF: the ability to streamline access to information and make cultural materials as easy to use and reuse as possible. Because the Internet Archive enables any patron to upload eligible materials, everyone has the opportunity to benefit from IIIF’s capabilities. IIIF-C counts the Internet Archive as a natural ally because of its ongoing support of open collections delivered via open web standards and protocols. With this project, IIIF-C hopes to make the Internet Archive a go-to resource online that facilitates IIIF work for students and scholars unaffiliated with the kinds of institutions that historically have provided IIIF infrastructure. This is an essential step toward a strategic goal of lowering barriers to IIIF usage and adoption worldwide.

In service of this outcome, the Internet Archive has teamed up with a number of IIIF community members to officialize and upgrade the IIIF service in order to make the best use of the new capabilities introduced into the IIIF specifications in recent years.

In the coming weeks, we’ll share more details about the IIIF improvements that will become available to users of the Internet Archive. First, we want to lay out our current plan for the update, including backwards compatibility affordances, to ensure existing consumers have the information they need to successfully migrate from the unofficial to the official IIIF API.


Both the original IIIF labs service the Internet Archive has been running, as well as the new upcoming official IIIF service, wouldn’t have been possible without huge support from volunteers within the IIIF community and Internet Archive staff. A big thank you to the following folks who are making this effort to bring IIIF into production possible:

Stay tuned for more details on the new functionality soon, and if you have questions or would like to get involved in helping us test the new setup, get in touch with IIIF-C at For more updates, including September 13 IIIF Consortium community call announcing the Internet Archive’s IIIF service, please visit the IIIF community calendar at

Technical Notes & FAQs for Partners

This technical section is intended for partners who currently rely on the IIIF API who may be seeking further details on how these changes might affect them.

What is changing? Previously, partners accessed the Internet Archive’s IIIF labs API from the domain. As part of the effort to graduate from labs to production, the IIIF API will move to the domain. Because we don’t want to break any of the amazing projects and exhibits that patrons have created using the existing IIIF capabilities on the domain, we’re migrating the API in phases. 

Phasing migration. The first phase will introduce a new and improved, official Internet Archive IIIF 3.0 service on the subdomain. The unofficial, legacy service will continue to run on the for a grace period, allowing partners to migrate. Once we’ve gathered enough data to be confident requests are being satisfactorily fulfilled by the new official service, the legacy service will be “sunset” and any request to it will redirect to use the official service. At this point, all requests for IIIF manifests and IIIF images (whether to or will default to the latest 3.0 version of the IIIF APIs and be answered by A specifiable “version” endpoint will be available for consumers whose applications require manifests and images to be served using the IIIF v2.0 legacy format. More details, examples, and technical documentation will be made available on this topic in the coming weeks and will eventually be accessible from

Possible Breaking Changes.
1. When the service was originally launched, was set up to redirect to as a convenience. Regrettably, during the first phase of development, will no longer be a redirect for and instead will run the new official IIIF service. As a result, partners whose code or applications reference (expecting it to redirect to will experience a breaking change and will need to either update their references to explicitly refer to the legacy “” service, or update their code to use the Internet Archive’s new official service. As far as we can tell, we’re unaware of partners currently referencing “”  within public projects on Github or Gitlab and so we hope no one is affected. Still, we want to give fair warning here. For those starting a new project and looking to use the Internet Archive’s IIIF offerings today, we strongly recommend using the endpoint.
2. Some partners migrating from the v2 to v3 API who have been saving annotations may also experience a breaking changes because canvas and manifest identifiers for version 3 are necessarily different from version 2 identifiers. We will be doing our best, for the time being, to ensure version 2.0 manifests remain accessible from the address (via redirects) and will retain the canvas identifiers.

DLARC Amateur Radio Library Tops 90,000 Items

Internet Archive’s Digital Library of Amateur Radio & Communications has grown to more than 90,000 resources related to amateur radio, shortwave listening, amateur television, and related topics. The newest additions to the free online library include ham radio magazines and newsletters from around the world, podcasts, and discussion forums.

Additions to the newsletter category include The Capitol Hill Monitor, a newsletter for and by scanner radio enthusiasts in the Washington, D.C. region — a complete run from 1992 through today. DLARC has also added more than 300 issues of Florida Skip and its follow-on magazine, SKIP CyberHam, donated by the family of the publisher. Both Capitol Hill Monitor and Florida Skip are online for the first time, scanned from the original paper.

DLARC has also added newsletters from an additional 35 ham radio clubs in the United States and Canada, including hundreds of issues from the Orange County (California) Amateur Radio Club, the Northern California Contest Club, Palo Alto Amateur Radio Association, Acadiana (Lafayette, Louisiana) Amateur Radio Association, Mesilla Valley (New Mexico) Radio Club, and others. 

New additions of Canadian club newsletters include 900 issues from the Lakehead Amateur Radio Club in Ontario, the Montreal Amateur Radio Club, and the Halifax Amateur Radio Club. Raleigh (North Carolina) Amateur Radio Society contributed more than 700 issues of its Exciter newsletter, which DLARC scanned for the first time. Fort Wayne (Indiana) Radio Club has contributed newsletters and other material documenting its 100-year history. The Society of Wireless Pioneers, a program of the California Historical Radio Society, contributed documents going back to its founding in 1968.

The Cal Poly Amateur Radio Club donated hundreds of radio manuals, catalogs, and magazines — literally emptying file cabinets of material. DLARC has scanned them all and made the trove available online.

DLARC has expanded its collection of e-mail and Usenet conversations about ham radio from the early days of the Internet, with the addition of thousands of messages from Glowbugs Digest, an early Internet discussion list about tube-based radios. This collection includes posts spanning November 1995 through March 1998.

DLARC has also added more than 750 books and articles written by Donald Lancaster, the American author, inventor, and microcomputer pioneer who died earlier this year; and scans of hundreds of vintage electronics and radio catalogs.

New additions of podcasts and videos include 200 episodes of the defunct Southgate Vibes podcast from the UK; the Ham Radio Guy podcast; and archives of ham radio YouTube channels KM6LYW Radio and HB9BLA Wireless. More than 1,400 historic recordings and contemporary audio clips are available courtesy of The Shortwave Radio Audio Archive.

Digital Library of Amateur Radio & Communications is funded by a grant from Amateur Radio Digital Communications (ARDC) to create a free digital library for the radio community, researchers, educators, and students. DLARC invites radio clubs and individuals to submit material in any format. If have questions about the project or material to contribute, contact:

Kay Savetz, K6KJN
Program Manager, Special Collections

DWeb Camp: Exploring Governance & AI

Written contributions by Val Elefante, Jenny Fan, Dazza Greenwood, Cent Hosten, Ronen Tamari, Joshua Tan, Riley Wong, and Jacky Zhao

The Metagovernance Project (aka- Metagov) returned to DWeb Camp for our second year in a row, this year as a DWeb Sponsor, supporting the event by curating some of the camp’s governance and AI sessions. In this blog, we hear from Josh Tan, co-curator of the AI track, and governance researchers from Metagov who helped co-create the governance track. 

To get a sense of our work, watch this video documenting our Redwood Parliament program at DWeb Camp 2022. 

AI Meets the Decentralized Web

What does the DWeb community talk about when they talk about AI? Perhaps more mysteriously, what brings an AI company like OpenAI out to the woods outside of San Francisco to talk about the decentralized web?

At this year’s DWeb Camp, Metagov worked with OpenAI, the Internet Archive, and the Foresight Institute to curate a selection of AI speakers and workshops at DWeb Camp. The programming featured presentations by Aza Raskin (Centre for Humane Technologies), Jason Kwon (OpenAI), Che Chang (OpenAI), Rosie Campbell (OpenAI), Doc Searls, Stephen Hood (Mozilla), Philip Rosedale (Second Life), and many, many others. The planning was led by Allison Duettman of Foresight and Joshua Tan of Metagov, with critical support from Wendy Hanamura of the Internet Archive.

One of the key questions raised was the challenge and risks of open-source AI. For example, in Aza Raskin’s picture of possible AI futures, open-source might also lead us to a future where everyone, everywhere has access to the intelligence needed to design viruses, imitate public figures, or manipulate elections. Yet, in a conversation on open-source AI models featuring Stephen Hood from Mozilla, James Baicoianu from Stability AI, Philip Rosedale, and Qianqian Ye, everyone agreed that “the cat is out of the bag” when it comes to open-source AI. Open-source AI is already here, and it’s not going away.

We didn’t necessarily come away with a conclusion so much as a better sense of the question. From Josh’s closing remarks: “I honestly wrestle with this. I honestly do not know, and it feels weird, it feels very weird to be a student of the legends who built the open internet and ask, should [AI] be open? It reminds me of a question we ask ourselves as a liberal society—is it possible to be too open as a society? Do open societies ultimately bring about their own downfalls?”

Governance at DWeb

Can We Trust Our Fellow “Digital Citizens”?

This session, led by Metagov contributor Jenny Fan, was a round table discussion around the provocation: can civic responsibilities for online “citizens” exist in an analogous way to how civic duties exist in offline communities? As one participant quoted, “The scarcest resource is legitimacy,” and appropriately, the conversation was framed in the context of the dearth of legitimate forms of community governance and content moderation for online communities. Though participants were not primarily governance researchers, we ended up with a comprehensive and thought-provoking survey of existing projects in this space.

We broke down the challenges of online “citizenship” around identity, reputation, intrinsic/extrinsic motivation, the issues of delegating trust to other users, and how the correlation between the level of effort affected online community engagement. Participants mentioned references as wide-ranging as existing political science research (liquid democracy, quadratic voting, radical markets), Web 2-adjacent projects (Periscope, Twitter community notes), Web 3-adjacent projects (Klairos, Nouns DAO’s zero knowledge voting, and one participant’s experience IRL at Zuzalu’s pop-up community), and more. In particular, users highlighted the challenges of shifting typically extrinsic motivators for civic behavior to intrinsic motivation, given the cost-incentive structure of the internet. As one participant put aptly, “The offline world is full of sticks, but the internet only has carrots.”

D20 Governance Playthrough

D20 Governance is a project focused on exploring modular governance through unstable communication environments and simulations. It aims to estrange the quotidian act of communication as a way of revealing ways in which interactions in online communities are infrastructurally prefigured by forms and norms of linguistic interoperability and implicit feudalism. D20 Governance aims to surface this revelation as a way of foregrounding the metagoverning architectures that order online communications, and catalyze experiences that empower communities to imagine and form more creative, flexible, experimental, and intentional patterns of self-governance. The current iteration of D20 Governance takes form as a Discord bot, and extends the composable governance mapping tool, CommunityRule. The D20 Governance working group is led by Janita Chalam, Val Elefante, Hazel, and Cent Hosten, and is supervised by Metagov research director Ellie Rennie.

For DWeb Camp we ran our first playtest with a group of eight campers placed into a “Build A Community” simulation where they had to name their community, decide on an animating purpose, and decide on their first action. The playtest had participants eloquently reciting Shakespearean recitations of their LLM-transformed posts and revolting against consensus as a decision-making mechanism. Stay tuned for future play test announcements in the newsletter. 

D20 Governance Teaser

Let Us Imagine A Communally-Owned Internet

This year at DWeb Camp, Jacky Zhao and Spencer Chang hosted a session asking campers to gather their collective imaginations and dreams for what a communally-owned internet could look like. Collectively, the group had a lot of dystopian fiction and a lot of reminiscing, but not a lot of forward-looking dreams for the web. Dreaming, to us, felt like an important piece of fiction that rallies people to articulate a vision they want to make a reality. In hosting this session, we recalled Ruha Benjamin: “to see things as they really are, you must imagine them for what they might be”.
The session focused on circulating 5 sheets of paper, each with a question on it:

  • What do you wish the Internet evoked for you?
  • What would co-owning digital spaces look like?
  • What is your digital neighborhood?
  • Where have you felt agency online?
  • What is/was your favorite place on the internet?

Each question was meant to evoke certain modes of questioning. In the discussion, the group spent a significant amount of time discussing the feeling that life on the internet feels like living on rented ground and an overwhelming feeling that we have no agency over our digital environments anymore. Some reminisced over Minecraft and building their own forums and webrings. Others wondered why modern platforms like Facebook or Twitter no longer have these affordances. The group closed by wondering how to give people the ability to be architects of their own digital homes again.

Reclaiming agency and ability to communally construct our digital spaces starts with people willing to dream and fight for it. In many ways, this session (and the greater DWeb Camp as a whole) felt like a gathering of people who haven’t given up on the inherent good of the internet and are fighting for this future.

(excerpted from a longer reflection)

Design Charrette on LLM LLC Governance Rules

The session, led by’s Dazza Greenwood, focused on an ongoing open-source project developing an algorithmically managed LLC using LLM technology.  This is similar to the Wyoming DAO LLC approach insomuch as there is a role for “algorithmically managed” LLCs, but there is no smart contract, blockchain, or decentralization involved.  Rather, the algorithmic manager is an LLM operating according to “constitutional rules” encoded into the software running the manager operations and communications.  The current codebase is designed as a Discord bot with email integration and is being tested and iterated against a handful of relatively legal and business use cases.  The Metagov-related aspect of this project is the architectural component where a set of rules governing the behavior and actions of the LLM LLC are specified.  A DWeb Camp breakout group discussed the project overall and read aloud the current version of the Constitutional Rules, as the starting point for an engaging and constructive conversation and light design charrette.  For more information, see the current code base, and this demo presentation of the project given to the Wyoming legislature.

Challenges and Triumphs in Community Self-Governance

This session, led by Metagov researcher Val Elefante, began with an overview of Metagov’s frameworks and tools including implicit feudalism, modular politics, CommunityRule, and a demo of CollectiveVoice. It was then followed by a rapid-fire collective brainstorm of challenges that communities face when it comes to online governance. Responses included: scale (from small to larger communities, from “not serious” to “serious” decisions), too many proposals, loss of institutional memory due to platform switching, not easy to experiment, and not many available models.

The group then brainstormed ways of solving some of these governance problems using modular governance frameworks including: a randomly-selected jury system for voting on proposals, organization and summary of relevant information for easy decision-making, improved deliberation formats, and using tech to facilitate in-person governance.

Val Elefante presenting at DWeb Camp

Qualitative Governance

How can community governance frameworks incorporate holistic, cooperative, and emergent processes? How can community governance embrace differing needs and wants, encourage agency, and promote whole group purpose and wellness? 

Facilitated by cooperative governance researcher Riley Wong, this Qualitative Governance session sought to co-create possibilities to these questions and more by naming and observing qualities of effective governance; describing the emotional experience of how effective governance feels; identifying and speculating practices that create these experiences; and ideating ways to integrate and experiment with these practices within our own communities. 

For some, effective governance was described as transparent, creative, honest, flowing, participatory, inclusive, resilient, signal boosting, and fun. It can feel energizing, activating, safe, emergent, warm, playful, joyful, caring, compassionate, holonic, empowering, euphoric, and open-hearted. Practices that can create this experience may include shared rituals, reflection, personal check-ins, “yes, and…”s, trust and relationship building, acknowledging consent, shared maintenance, space for tension processing, voluntary flows, ownership, mini-juries for direct democracy, and dancing. Integration of these practices may involve playing, prioritizing, ceding power, building trust, and celebrating stories. 

Feeling safe as a necessary foundation for navigating differences, feeling seen and heard by others, building trust in the community, and keeping “epistemological humility” were also overarching themes and discussion points throughout the session. Follow-up discussions highlighted personal experiences of governance where community members felt valued, heard, safe, and trusting, and therefore empowered to take on more risk and responsibility. 

Tech for Listening to Each Other Online

In this session, led by Metagov member Ronen Tamari, participants reflected on the dynamics of (figuratively) “speaking” vs “listening” in online spaces such as social media. We have lots of tools for speaking, enabling us to effortlessly broadcast our opinions to wide audiences. On the other hand, listening feels under-served: we lack tools to help sift through noise and distractions on social media and end up doom-scrolling or wandering aimlessly across platforms. 

What would better tech for listening look like?

We did some embodied listening exercises to get a better sense for what listening in the real world involves. We then tried to apply the insights we gained to listening in the social media context; what does empathetic and active listening feel like online, and how can we create a shared sense of reality beyond reality-distorting algorithmic echo chambers?

Brainstorming together was a delight (”One of my fav events from the whole weekend”, as one participant wrote us); we covered a lot of topics (and whiteboards), from AI and human-powered curation to the design of new tools, norms, and rituals. We shared contact details to keep the listening conversation going as we left the luxury of intimate shared physical spaces behind and headed back to our noisy digital metropolises.

Citizen Journalist Traces the Science to Debunk Public Health Misinformation

Sarah Barry wanted to become a fighter for something—but she didn’t know exactly what.

Citizen journalist Sarah Barry

“I was frustrated with all that was going on in the world. I knew I couldn’t wave a magic wand and fix everything, but I wanted to help in some small way,” said the 28-year-old who lives in Columbus, Ohio, and works in IT.

She decided to leverage her research skills to help correct misinformation about vaccines and public health.

For Barry, the Wayback Machine has been critical in tracking the science and sharing what she’s discovered. Without the Internet Archive, she said, valuable internet history that she needs to do effective research would have been completely lost.

“I use the Internet Archive to look up old links and resources that have since gone defunct,” said Barry. “I also use the Archive to actively input web pages that need to be saved or saved again to ensure that any resources I’m currently using are saved for mine or other’s future reference.”

“It’s a common language among people like me who do research. We all know the Internet Archive is legit.”

Sarah Barry, citizen journalist

She has turned into a citizen journalist and independent activist, volunteering for nonprofit organizations to better inform the public. Barry has given public presentations on her findings and provided materials to reporters that have appeared in a variety of news outlets.

 As a millennial, Barry said she grew up being active online and has long used the Internet Archive as a tool.  “It’s a common language among people like me who do research,” she said. “We all know the Internet Archive is legit.”

DWeb Fellows 2023: Lighting the Path Towards a Better Web

By Mai Ishikawa Sutton and Nicolás Pace

Photo of Fellows 2023 cohort giving their closing statement

The design and development of most network technologies remains in the hands of the few. In light of this, the right to privacy and freedom of expression can end up being a privilege controlled by large corporations that are incentivized to profit from our digital connections. Meanwhile, a homogenized internet makes it difficult for individuals and communities to express multiple identities and have the agency to determine their own networks.

Thankfully, around us you can always find people who in their day-by-day work contribute to developing a fairer reality for everyone – one that defends environmental justice and social inclusion, innovation at the service of life, and a world where all worlds fit, both online and offline.

The DWeb Fellowship invites people from around the world to come to California for DWeb Camp. This year, we had 36 Fellows – they traveled from India, Cambodia, Argentina, Cuba, Kenya, Malawi, Germany, Italy, and from many other places overseas, as well as from across North America and the Bay Area. We selected these exceptional individuals because they invite and challenge us to transform our reality and co-create a vision of a better Web. 

And in practice, they are the embodiment of the DWeb Principles ( The DWeb Principles reflect what we aim for as we work to build a decentralized web – the distributed protocols, applications, organizations, culture, and everything in between that make it possible to manifest the webs of digital connection that make us better humans for each other and all other life on this planet. Our Fellows work to realize the promise of a decentralized Web – where power is decentralized and control over digital infrastructure is meaningfully distributed. They use and build interoperable, free and open source tools to uplift communities in some of the most challenging contexts. They come from open and transparent organizations that govern their projects in a way that actively pursues equity, mutual trust, and respect. And they demonstrate how network technologies can bring about justice and advance individual and collective agency by prioritizing relationships and building communities of care.

In honor of the summer solstice, the longest day of the year, we asked the Fellows to participate in our opening ceremony. One of our Fellows, Kanyon “Coyote Woman” Sayers-Roods, led us in a song in the language of the Costanoan Ohlone-Mutsun and Chumash people, those native to the area that is now known as Northern California. As the Fellows each lit a candle around us, we recognized them as leaders lighting the way towards a better, truly decentralized web – one that distributes power and ensures that individuals and communities share the privileges and responsibilities to steward the network technologies they rely on.

We were lucky to have them at Camp this year to share their perspectives, wisdom, and stories with us. As organizers of DWeb Camp, we continue to strive to find ways to amplify their voices in this movement and support their work. 

2023 DWeb Fellows

Photo of Akhilesh Thite

Akhilesh Thite ( is an Indian tech enthusiast with a passion for decentralization. He is the founder of P2P Labs (, an open-source organization with a focus on building curated web3 infrastructure tools for the decentralized internet, leveraging the IPFS protocol. He is currently developing a minimal p2p web browser named Peersky. Akhilesh is often found participating in Hackathons or working on devgrants, he has won eight Web3 hackathons. His goal is to develop decentralized tools that significantly contribute to the betterment of humanity.

Photo of Amber Gallant

Amber Gallant is a Masters’ student at the iSchool at the University of British Columbia. She is a librarian, writer, and open-source enthusiast with professional interests in data ethics and digital commoning spaces. She currently acts as the project manager of the Guardians of the Record Lab (, a group that conducts research into maintaining and protecting the integrity of records in human rights contexts and investigates the use of decentralized archival technologies for this purpose. She is also completing an original research project through Blockchain@UBC, where she is examining humanitarian blockchain projects and the data rights of users in conflict contexts through the lens of data justice.

Photo of Andrew Chou

Andrew Chou ( is a technologist based in NYC that tends to explore the various corners of the internet. He currently works as a developer with Digital Democracy ( and Manyverse (, building offline-first applications that are designed on the basis of decentralization and autonomy. 

Photo of Anh Lê

Anh Lê is a transdisciplinary researcher and artist based in Lenapehoking/NYC. Recently, they’ve built community-owned internet infrastructure with Community Tech NY/Community Technology Collective and designed advocacy campaigns to support Southeast Asian movement building in NYC. They are currently pursuing their Masters in International Affairs at The New School, where their research focuses on border technologies, migration, and digital rights.

Photo of Arky Ambati Rakesh

Arky Ambati Rakesh is a technologist and a visual storyteller based in Southeast Asia. Arky has contributed to open source projects aimed at providing equitable access to digital tools and an open web. Over the past decade, Arky has been involved with Free/Libre and Open Source communities and has worked with organizations such as Braille Without Borders (BWB), NGO Resource Center and Mozilla in Asia and Africa.

Photo of Barbara Gonzalez Segovia

Barbara Gonzalez Segovia (she/they) is a BIPOC, queer, feminist who sees herself as a social activist. She is passionate about amplifying people’s voices from anti-racist and anti-oppressive lenses, both in her professional and personal life. She values kindness and vulnerability, and is fully committed to infuse the world with joy. These days Barbara works with Digital Democracy (, supporting grassroot communities and earth defenders utilizing tech tools to defend their ancestral lands. She has over a decade of experience in community development, indigenous rights, and gender equality. Her work has been focusing on program planning, community outreach, and organizational development, particularly within Indigenous organizations and indigenous nations from different countries in South America.

Photo of Benson Tilya

Benson Tilya is a conservation manager and seedbank analyst at Saving Africa’s Nature ( in Tanzania. He has been instrumental in the encouragement, support and monitoring of SANA projects in Saadani National Park villages in Tanzania; engaged in conservation activities such as seed banking, greenhouse management and restoration of the forest corridor via tree planting projects. He stands on the thesis that technology and nature don’t have to act as antagonists; that the science behind digital technology can and should work in tandem with the respect for the natural world to subvert deforestation and promote long-term environmentally conscientious solutions.

Photo of Blake Stoner

Blake Stoner is a grassroots reporter, social entrepreneur, and tech enthusiast with a history of community advocacy. After working on over 10 grassroots campaigns, he noticed many communities across the United States of America needed more representation to highlight their culture and concerns. He believes that an important challenge to address right now is the growing crisis of news deserts that disproportionately leave communities of color ill-represented and uninformed. In response, he founded Vngle, a grassroots news network which provides an equitable decentralized approach to local reporting and brings nonpartisan coverage to underreported geographic and demographic areas. Through a gig-economy model, it verifies and trains local citizens with smartphones to serve as reporters and editors. Through scaling, Vngle seeks to make verifiable news mainstream, where anyone can check the origin of where, when, & how stories are captured through a public ledger.

Photo of brandon king

brandon king is a dj/sound-selector, multidisciplinary artist, and cultural organizer from the Atlantic Ocean by way of Hampton Roads VA, who creates installations exploring African Diasporic identities, honoring his ancestors’ stories through archival and found materials, sound collages, painting, film, and other forms. he is a founding member of Cooperation Jackson (, a cooperative network in Jackson Mississippi and currently serves as the Executive of Resonate Coop (, an international, open source, music streaming platform cooperative. he is also a member of the NYC based artist collective PTP (Purple Tape Pedigree)( and is currently an MFA candidate at Queens College focusing on Social Practice and Installation.

Photo of Calum Bowden

Calum Bowden is an artist working with organizations as a medium. He collaborates on stories, games, and platforms that relink the cultural with technology, economics, politics and ecology. He co-founded Trust ( and Black Swan. Trust is a network of utopian conspirators, a sandbox for creative, technical, and critical projects, and site of experimentation for new ways of learning together. Trust is a hybrid online and physical space in Berlin for inquiry into emerging social and political phenomena through the lenses of aesthetic, narrative, game, technical, climate and design research. Since 2018, Trust has developed a public programme that includes lectures, installations, residency programmes, reading groups, working groups, live-streamed participatory events, and online resources. Trust incubates software projects that build a creative culture of the commons.

Photo of Camille Nibungco

Camille Nibungco ( is a designer currently based in Los Angeles, CA. They most recently helped build the Angelena Atlas project, an crowd-sourced intersectional community network/resource for marginalized folks in Los Angeles. They currently work in the healthcare tech space and are interested in decentralized technologies/web3 as a tool for working class sovereignty, labor, and grassroots change.

Photo of Chia Amisola

Chia Amisola ( is an internet + ambient artist born and raised in the Philippines, and now based in San Francisco. Their (web)site-specific art is an act of worldmaking constructing spaces, systems, and tools that posit worlds where creation is synonymous with liberation. Ambience is political: their environments tackle infrastructure, poetics, labor, and maintenance. Simply put, they wish to gather all the people they love in one place and explore how the internet might be that place. Chia is the Founder of Developh ( and the Philippine Internet Archive ( They graduated from Yale University in 2022 with a BA in Computing & the Arts, receiving the Sudler Prize.

Photo of Cody Harris

Cody Harris is a technical volunteer with Seattle Community Network ( and assisted with the deployment and operations of the DWeb network in 2022. He has volunteered at the Connections Museum in Seattle, a hands-on museum of vintage (mostly Bell System) telecom equipment, giving tours and working on the exhibits since 2019. At ToorCamp 2022, he participated in a performance art project with the ShadyTel hacker collective establishing a telecom bureaucracy and deploying an analog switched telephone network to connect campers’ landline phones, modems, and fax machines.

Photo of Esther Jang

Esther Jang is a PhD student in Computer Science at the University of Washington. Her research focuses on community networks in both rural remote and urban contexts, and especially how communities of practice can build and sustain technical infrastructures. She has helped install community networks in the Philippines, Mexico, Tanzania, and various states around the US. She is currently a lead organizer and installer for the Seattle Community Network (, which seeks to build community-owned and maintained Internet access infrastructure to support digital equity in Seattle and Tacoma. She serves as a Director at the Local Connectivity Lab, a 501(c)(3) nonprofit focusing on technology research, deployment, and teaching in support of community networks around the world. In her free time, she is an avid jazz singer and plays with a band called Django Junction in Seattle.

Photo of fauno

fauno’s work and activism is focused on investigating, adapting and implementing ecological and resilient technologies, specially autonomous, collectively managed infrastructure. In the last five years he has been working almost exclusively on resilient web sites using Jekyll and developing a platform for updating and hosting them called Sutty (

Photo of Jack Fox Keen

Jack Fox Keen is the Data Empowerment Lead for the Guardian Project’s ProofMode application (, a cryptographically verifiable way of providing visual evidence of the world around us. Jack has been doing data analytics for non-profits for the last two years, after graduating from Florida State University with a degree in biomathematics and scientific computing. They will be starting a PhD program at UC Santa Cruz this September, where they will focus on explainable artificial intelligence. They are focused on ethical data acquisition and analysis, pulling inspiration and guidance from many realms of life, including intersectional feminism, queer theory, and decolonial studies.

Photo of Jacky Zhao

Jacky Zhao ( is an independent researcher and open source maintainer. Currently, he is exploring what agentic, interoperable, and communal technology looks like in his research practice: how might we create infrastructures and technologies that empower the residents of the web to have access to the same tools as the architect? On a broader level, he cares deeply about creating spaces that enable others to have more agency: agency to ask questions without judgement; agency to do what they are intrinsically drawn toward; agency to play (because what’s the point if we can’t have a bit of fun?). In his spare time, he works with Hypha Worker Co-op on Distributed Press ( and is a core contributor at verses (

Photo of James Gondwe

James Gondwe is the founder and Director of Centre for Youth and Development. His passion for decentralized approaches to digital literacy and connectivity has positioned him at the forefront of exploring the transformative role of ICT, including the internet, in enabling opportunities for marginalized communities. James is a recipient of the Royal Commonwealth Queens Young Leaders Associate Fellowship, 2016 One Young World Ambassador, honored with the Trust Conference Changemakers Award, and is a recipient of the African Community Networks Summit Fellowship. Through his unwavering dedication to community empowerment, he drives change by bridging the digital divide and creating opportunities for marginalized individuals and communities.

Photo of Kanyon Coyote Woman Sayers-Roods

Kanyon Coyote Woman Sayers-Roods ( is an Ohlone Mutsun and Chumash Native American whose art serves as a heartfelt expression of her Native heritage. Kanyon is a dedicated and active member of the Native Community, assuming various roles as an artist, poet, activist, student, and teacher, inspiring emerging scholars to explore their creative paths and embrace decolonization. Graduated with an A.S+B.S with honors from the Art Institute of CA majoring in Web Design and Interactive Media, Kanyon weaves her knowledge of the digital world and her ancestral knowledge of the land. In addition to her artistic pursuits, Kanyon also serves as the CEO of Kanyon Konsulting ( and acts as a caretaker for Indian Canyon, a “Federally recognized Indian Country” ( situated between San Francisco and Monterey ( 

Photo of Luisa Bagope

Luisa Bagope is a documentary director interested in cyber as well as natural and human technology. With support from APC she has been documenting community network activities in the global south and was an active participant of PSP Community Network (Portal sem Porteiras – for 3 years. Luisa coordinated the Nodes That Bond project: a collective learning process centered around technology that happened through circular encounters amongst women. Focusing on feminist methods of community-based organization, she now continues to work with communication as a potency for social transformation in the Afluentes Association, in Monteiro Lobato, Brasil.

Photo of Marcela Guerra

Marcela Guerra is a writer, artisan, and mother. She learned with Oankali that humans have an inevitable tendency to hierarchy. Even though she recognizes this tendency in all the relationships she can witness, she challenges herself to imagine non-hierarchical technologies, especially the communication ones. Marcela is part of the Portal sem Porteiras association (PSP – that runs a community internet network. She is a co-creator of the project Nodes that Bonds ( which takes place in the PSP network and member of the collective Sítio do Astronauta ( that teaches electronic handicraft. She is also part of Marlu Studio, which develops methodologies for the creation of community fictions.

Photo of Mark Anthony Hernandez Motaghy

Mark Anthony Hernandez Motaghy is an artist and cultural worker of Mexican and Iranian descent. Operating with mediums such as experimental video, as well as installation, books, and oral histories, Mark’s practice explores the digital commons, care-based economies, and sociotechnical imaginaries. They recently published the zine-book Rehearsing Solidarity: Learning from Mutual Aid with Thick Press. The book archives how mutual aid groups assembled solidarity digital infrastructures for the COVID-19 crisis and how they sustainably reassembled for sustaining communal care. Currently, they are a fellow at Ujima Boston Project, providing artistic and editorial direction for a new magazine on art, culture, and the solidarity economy.

Photo of Maurice Haedo Sanabria

Maurice Haedo Sanabria ( is an industrial designer passionate about technology and its impact on society. His work focuses on the circulation of information and the creation of goods through open collaboration, especially in Cuba, where material scarcity and limited Internet connectivity have forced society to seek creative alternatives. Five years ago, he transformed his own home in Downtown Havana into a hackerspace/laboratory called Copincha. (In Cuban slang, “pincha” means work, so “Copincha” can be understood as “collective work”.) Inspired by “DIY” and “do it together” philosophies, Copincha’s members use collaborative, open-source methods to share knowledge and develop solutions to local challenges through transdisciplinary, resilient and ecological practices.

Photo of Muhammad Noor

A Rohingya himself, Muhammad Noor has established several Rohingya institutions and trained several highly-regarded members of the Rohingya community worldwide. His most notable contributions include the digitization and Unicode of First Rohingya Alphabet, serving as the chairman of Rohingya Football Club, authoring “ Born to Struggle: The Child of Rohingya Refugees and His Inspiring Journey” and working on several assignments with the UN High Commission for Refugees, the Red Cross, International Organization for Migration, International Network of Human Rights. Noor is the Co-Founder of Rohingya Vision (RVISION), the world’s first Rohingya Satellite television channel.

Photo of Nicolás Pace

Nicolás Pace ( is the technology and innovation co-coordinator within the LOCNET initiative, which supports organizations and communities in exploring the innovative approaches to the use of technology in the context of community networks in the global south. Nicolás has traveled to more than 15 countries to build bridges between community networks and to understand the diversity and complexity of the field.

Photo of Qianqian (Q) Ye

Qianqian (Q) Ye is a Chinese artist, creative technologist, and educator based in Los Angeles. Trained as an architect, she creates digital, physical, and social spaces exploring issues around gender, immigration, power, and technology. Her most recent collaborative project, The Future of Memory, was a recipient of the Mozilla Creative Media Award. At the Processing Foundation, Qianqian is the Lead of p5.js, an open-source art and education platform that prioritizes access and diversity in learning to code, with over 1.5 million users. She currently teaches creative coding as an Adjunct Assistant Professor at USC Media Arts + Practice and 3D Arts at Parsons School of Design. For 2022-2023, Qianqian is a NYU ITP/IMA Project fellow and Civic Media Fellow at USC Annenberg Innovation Lab.

Photo of Risper A Rose

Risper A Rose works with the low cost community wireless network, TunapandaNET ( in Nairobi, Kenya, as a gender and community engagement expert. She is involved in digital outreach, understanding women and their usage of connectivity, amplifying meaningful usage and utilization of connectivity, and conducting impact assessment studies of connectivity in the community. She has handled tech-centered advisories and training on digital rights, digital inclusion, digital advocacy, and digital protection and privacy. Her main focus is on gender justice, community capacity development, community research using human-centered design, stakeholder engagement, and public participation in policymaking. She holds a Bachelor of Arts in Gender and Development (with Honors) Degree from Kenyatta University.

Photo of Saqib Sheikh

Saqib Sheikh‘s work centers on advocacy, social inclusion, and educational access for refugees and stateless people. He serves as Project Director for the Rohingya Project, a grassroots initiative for the empowerment of the Rohingya diaspora using blockchain technology. He is also a co-founder and advisor for the Refugee Coalition of Malaysia (RCOM) where he focuses on creating formal pathways for refugee placement in higher education institutes in Malaysia. A journalist by training, Saqib received his Masters in Communication from Purdue University, and is currently a PhD researcher at the S. Rajaratnam School of International Studies (RSIS), Singapore, researching the use of technology for legitimization of stateless communities.

Photo of Sheley Gomes

Sheley Gomes is a POC, queer feminist, researcher and activist for digital and human rights, as well as the right to communication, being part of non-profit organisations both in Brazil and Europe. Her research goes from contexts such as Latin America, western-European, and Sub-saharan African countries, investigating the role of media, the ownership, and freedom of expression in those different scenarios. Her focus goes especially to new media technologies and its impacts for marginalised communities.

Photo of Stacco Troncoso

Stacco Troncoso ( teaches and writes on the Commons, P2P politics and economics, open culture, post-growth futures, Platform and Open Cooperativism, decentralised governance, blockchain, and more. He is the co-founder of (, project lead for Commons Transition, and co-founder of the P2P translation collective Guerrilla Translation. His work in communicating commons culture extends to public speaking and relationship-building with prefigurative communities, policymakers, and potential commoners.

Photo of Subhashish Panigrahi

Subhashish Panigrahi ( is interested in research and building resources in the intersection of community, tech, and media. A public interest archivist, non-fiction filmmaker, and civil society leader, he has served and catalyzed many open knowledge/internet communities through his work at Wikimedia, Mozilla, Internet Society and the Internet Society. He currently serves as the director of the Law for All Initiative at Ashoka. A National Geographic Explorer, he has made ten critically acclaimed documentaries, focusing on endangered languages, digital rights, and the open internet movement in South Asia. He founded OpenSpeaks and co-founded O Foundation in 2017, both building openly-licensed media and resources for low- and medium-resourced languages through participatory means.

Photo of TB Dinesh

TB Dinesh is a community media activist with a background in Computer Science. The recent focus of their work is on infrastructure for encouraging people from marginalised communities to document their ways of life to help tell their stories. This involves helping create a Community Owned Wifimesh (COWMesh) with Libre Routers, Bamboo towers, ASPi client kiosks and Internet independent services with Janastu ( Services include audio-video fragment-annotating tools, voice communication and negotiation of traffic vouchers. Set in a remote rural hilly forest region, near Bangalore, India, their Lab is open for visitors and residents who wish to creatively engage in creating a replicable model of self-determined future Community Networks. Anthillhacks ( is their end of year annual event where everyone is invited to live with their community.

Photo of Tommi Marmo

Tommi Marmo is self-described “enthusiastic and curious 22 years old weirdo from Italy.” He is the co-founder of Scambi Festival (, a cultural event focused on interactive workshops which is organized exclusively by a staff of volunteers under 25 years old coming from all over Europe. He just graduated in Philosophy, International Studies, and Economics at Ca’ Foscari University of Venice. Tommi is a dreamer and an activist concerning the need of a deeper sociological and philosophical analysis of the Internet, at its essential core. In 2020, he deleted all of his mainstream social media accounts and created, which he considers the virtual representation of his mind. He is the admin of Pan (, a Fediverse node.

Victor von Sydow is a member of Coolab (, a co-operative lab that builds community telecommunication projects promoting autonomous infrastructures through technical training and community activation. He is interested in research and strategy development focused on systemic and infrastructural conditions that shape socio-economic, political, and institutional realities. To this extent, he develops and operationalises experimental approaches to organisational design, policy, finance and rights.

Photo of Xin Xin

Xin Xin is an artist currently making socially-engaged software that explores the possibilities of reshaping language and power relations. Through mediating, subverting, and innovating modes of social interaction in the digital space, Xin invites participants to relate to one another and experience togetherness in new and unfamiliar ways. As an artist, their work has been exhibited internationally at Ars Electronica, Eyebeam, DIS, Kunstverein Wolfsburg, and the Gene Siskel Film Center. They were an Eyebeam Rapid Response for a Better Digital Future Fellow and a Sundance Art of Practice Fellow. As an organizer, Xin co-founded voidLab, a LA-based intersectional feminist collective dedicated to women, trans, and queer folks. They were the Director for Processing Community Day 2019 and they serve on the Processing Foundation Board.


We want to extend our deep gratitude to the sponsors who made this Fellowship program possible: Filecoin Foundation for the Decentralized Web, Ford Foundation, Ethereum Foundation, Storj, RSS3, Planet, Gitcoin, NextID, and Paul Lindner.

Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

We are excited to announce the public availability of ARCH (Archives Research Compute Hub), a new research and education service that helps users easily build, access, and analyze digital collections computationally at scale. ARCH represents a combination of the Internet Archive’s experience supporting computational research for more than a decade by providing large-scale data to researchers and dataset-oriented service integrations like ARS (Archive-it Research Services) and a collaboration with the Archives Unleashed project of the University of Waterloo and York University. Development of ARCH was generously supported by the Mellon Foundation.

ARCH Dashboard

What does ARCH do?

ARCH helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those datasets. In line with best practices in reproducibility, ARCH supports open publication and preservation of user-generated datasets. ARCH is currently optimized for working with tens of thousands of web archive collections, covering a broad range of subjects, events, and timeframes, and the platform is actively expanding to include digitized text and image collections. ARCH also works with various portions of the overall Wayback Machine global web archive totaling 50+ PB going back to 1996, representing an extensive archive of contemporary history and communication.

ARCH, In-Browser Visualization

Who is ARCH for? 

ARCH is for any user that seeks an accessible approach to working with digital collections computationally at scale. Possible users include but are not limited to researchers exploring disciplinary questions, educators seeking to foster computational methods in the classroom, journalists tracking changes in web-based communication over time, to librarians and archivists seeking to support the development of computational literacies across disciplines. Recent research efforts making use of ARCH include but are not limited to analysis of COVID-19 crisis communications, health misinformation, Latin American women’s rights movements, and post-conflict societies during reconciliation. 

ARCH, Generate Datasets

What are core ARCH features?

Build: Leverage ARCH capabilities to build custom research collections that are well scoped for specific research and education purposes.

Access: Generate more than a dozen different research-ready datasets (e.g., full text, images, pdfs, graph data, and more) from digital collections with the click of a button. Download generated datasets directly in-browser or via API. 

Analyze: Easily work with research-ready datasets in interactive computational environments and applications like Jupyter Notebooks, Google CoLab, Gephi, and Voyant and produce in-browser visualizations.

Publish and Preserve: Openly publish datasets in line with best practices in reproducible research. All published datasets will be preserved in perpetuity. 

Support: Make use of synchronous and asynchronous technical support, online trainings, and extensive help center documentation.

How can I learn more about ARCH?

To learn more about ARCH please reach out via the following form

Unveiling the Hidden Truth: UCSF Industry Documents Library Empowers Research Into Tobacco, Drug and Related Industries

Whether you are a teacher, filmmaker, journalist, scientist or historian, having access to recordings about the tobacco, drug and other industries can be invaluable.

Still frames from a Marlboro commercial compilation.

For more than fifteen years, archivists at the University of California, San Francisco (UCSF) Industry Documents Library (IDL) have curated a collection of more than 5,000 video and audio files documenting the marketing, manufacturing, sales, and scientific research of tobacco, chemical, drug, and food products, as well as materials produced by public health advocates. As of 2023, the collection has received more than 300,000 views.

This wealth of information is available to the public through the UCSF Industry Archives Videos on the Internet Archive. The recordings include commercials, focus groups, internal corporate meetings and communications, depositions of tobacco industry employees, and government hearings.

Most of the files were made public beginning in 1998, following a lawsuit involving 46 states against tobacco manufacturers. In the settlement, the court ordered the companies to restrict advertising and release internal documents. “The industry put out misinformation for years to hold off on regulations,” said Rachel Taketa, IDL processing and reference archivist at UCSF. Having access to these materials provides new insight into marketing strategies that can help the public be on the lookout for future industry activities.

“It provides transparency and accountability,” said Kate Tasker, IDL managing archivist at UCSF. Examples from the collection are marketing campaigns and materials that targeted marginalized groups, in particular women and the African American and LGBTQ+ communities. “We talk to community advocacy organizations that often say it is powerful to show these videos to a group where it lays out clearly what the industry was doing to their community. It empowers people and inspires them to take action.”

Senate hearings in regards to S1883 The Tobacco Education Control Act of 1990.

UCSF archivists say the partnership with the Internet Archive provides users with two different access points and expands the audience for the collection beyond academics.  The Medical Heritage Library  has also added videos and audio files from UCSF into its larger collection on the Internet Archive, spreading the materials’ reach even further.

Next, the UCSF archivists are looking to develop new ways of working with and accessing the collection, using automated transcription to enable data scientists to analyze the recordings in new ways. The IDL is also adding opioid industry recordings to the collection as part of its work on the Opioid Industry Documents Archive, a collaboration with Johns Hopkins University. These new recordings will enable the public to learn more about the circumstances leading to the opioid crisis.

“It’s exciting to be connected to such an innovative organization as the Internet Archive,” Tasker said. “It’s out in front of a lot of big issues that most digital archives are facing. Whenever we’re looking to do something with a new media type, format, or a new way of distributing content to people, archivists and librarians look to what the Internet Archive is doing as a guide.”

Let us serve you, but don’t bring us down

What just happened on today, as best we know:

Tens of thousands of requests per second for our public domain OCR files were launched from 64 virtual hosts on amazon’s AWS services. (Even by web standards,10’s of thousands of requests per second is a lot.)

This activity brought down for all users for about an hour.

We are thankful to our engineers who could scramble on a Sunday afternoon on a holiday weekend to work on this.

We got the service back up by blocking those IP addresses.

But, another 64 addresses started the same type of activity a couple of hours later.  

We figured out how to block this new set, but again, with about an hour outage.


How this could have gone better for us:

Those wanting to use our materials in bulk should start slowly, and ramp up. 

Also, if you are starting a large project please contact us at, we are here to help.

If you find yourself blocked, please don’t just start again, reach out.

Again, please use the Internet Archive, but don’t bring us down in the process.