A Deep Dive into Openness

Posted on June 29, 2019 by francessawyer

Laying a Shared Foundation for a Decentralized Web

In this guest post, Richard Whitt builds on his prepared remarks from the “Defining Our Terms” conversations at the Decentralized Web Summit in 2018. The remarks have been modified to account for some excellent contemporaneous feedback from the session.

Some Widespread Confusion on Openness

So, we all claim to love the concept of openness — from our economic markets, to our political systems, to our trade policies, even to our “mindedness.” And yet, we don’t appear to have a consistent, well-grounded way of defining what it means to be open, and why it is a good thing in any particular situation. This is especially the case in the tech sector, where we talk at great length about the virtues of open standards, open source, open APIs, open data, open AI, open science, open government, and of course open Internet.

Even over the past few months, there are obvious signs of this rampant and growing confusion in the tech world.

Recently, former FCC Chairman, and current cable lobbyist Michael Powell gave a television interview, where he decried what he calls “the mythology… of Nirvana-like openness.” He claims this mythology includes the notion that information always wants to be free, and openness is always good. Interestingly, he also claims that the Internet is moving towards what he called “narrow tunnels,” meaning only a relative few entities control the ways that users interact with the Web.

Here are but three recent examples of tech openness under scrutiny:

Facebook and the data mining practices of Cambridge Analytica: was the company too open with its data sharing practices and its APIs?

Google and the $5B fine on Android in the EU: was the company not open enough with its open source OS?

Network neutrality at the FCC in its rise and fall, and rise and now again fall. Proponents of network neutrality claim they are seeking to save an open Internet; opponents of network neutrality insist they are seeking to restore an open Internet. Can both be right, or perhaps neither?

The concept of openness seems to share some commonalities with decentralization and federation, and with the related edge/core dichotomy. To be open, to be decentralized, to be on the edge, generally is believed to be good. To be closed, to be centralized, to be from the core, is bad. Or at least, that appears to be the natural bias of many of the folks at the Summit.

Whether the decentralized Web, however we define it, is synonymous to openness, or merely related in some fashion, is an excellent question.

The Roots of Openness

First, at a very basic level, openness is a very old and ancient thing. In the Beginning, everything was outside. Or inside. Or both.

A foundational aspect of systems is the notion that they have boundaries. But that is only the beginning of the story. A boundary is merely a convenient demarcation between what is deemed the inner and what is deemed the outer. Determining where one system ends and another ends is not such a straightforward task.

It turns out that in nature, the boundaries between the inner and the outer are not nearly as firm and well-defined as many assume. Many systems display what are known as semi-permeable boundaries. Even a human being, in its physical body, in its mental and emotional “spaces,” can be viewed as having extensible selves, reaching far into the environment. And in turn, that environment reaches deep into the self. Technologies can be seen as one form of extension of the properties of the physical and mental self.

The world is made up of all types of systems, from simple to complex, natural to human-made. Most systems are not static, but constantly changing patterns of interactions. They exist to survive, and even to flourish, so long as they gain useful energy in their interactions with their environments.

“Homeostasis” is a term describing the tendency of a system to seek a stable state by adapting and tweaking and adjusting to its internal and external environments. There is no set path to achieving that stability, however — no happy medium, no golden mean, no end goal. In fact, the second law of thermodynamics tells us that systems constantly are holding off the universe’ relentless drive towards maximum entropy. Only in the outright death of the system, do the inner and the outer conjoin once again.

Human beings too are systems, a matrix of cells organized into organs, organized into the systems of life. The most complex of these systems might be the neurological system, which in turn regulates our own openness to experience of the outside world. According to psychologists, openness to experience is one of the Big Five personality traits. This is considered a crucial element in each of us because it often cuts against the grain of our DNA and our upbringing. Surprise and newness can be perceived as a threat, based on how our brains are wired. After all, as someone once said, we are all descendants of nervous monkeys. Many of our more adventurous, braver, more open forebears probably died out along the way. Some of them, however, discovered and helped create the world we live in today.

From a systems perspective, the trick is to discover and create the conditions that optimize how the particular complex system functions. Whether a marketplace, a political system, a community, the Internet — or an individual.

Second, it may be useful to include networks and platforms in our consideration of a systems approach to openness.

There is no firm consensus here. But for many, a network is a subset of a system, while a platform is a subset of a network. All share some common elements of complex adaptive systems, including emergence, tipping points, and the difficulty of controlling the resource, versus managing it.

The subsets also have their own attributes. So, networks show network effects, while platforms show platform economic effects.

From the tech business context, openness may well look different — and more or less beneficial — depending on which of these systems structures is in play, and where we place the boundaries. An open social network premised on acquiring data for selling advertising, may not be situated the same as an open source mobile operating system ecosystem, or Internet traffic over an open broadband access network. The context of the underlying resource is all-important and as such, changes the value (and even the meaning) of openness.

Third, as Michael Powell correctly calls out, this talk about being open or closed cannot come down to simple black and white dichotomies. In fact, using complex systems analysis, these two concepts amount to what is called a polarity. Neither pole is an absolute unto itself, but in fact exists, and can only be defined, in terms of its apparent opposite.

And this makes sense, right? After all, there is no such thing as a completely open system. At some point, such a system loses its very functional integrity, and eventually dissipates into nothingness. Nor is there such a thing as a completely closed system. At some point it becomes a sterile, dessicated wasteland, and simply dies from lack of oxygen.

So, what we tend to think of as the open/closed dichotomy is in fact a set of systems polarities which constitute a continuum. Perhaps the decentralized Web could be seen in a similar way, with some elements of centralization — like naming conventions — useful and even necessary for the proper functioning of the Web’s more decentralized components.

Fourth, the continuum between the more open and the more closed changes and shifts with time. Being open is a relative concept. It depends for meaning on what is going on around it. This means there is no such thing as a fixed point, or an ending equilibrium. That is one reason to prefer the term “openness” to “open,” as it suggests a moving property, an endless becoming, rather than a final resting place, a being. Again, more decentralized networks may have similar attributes of relative tradeoffs. This suggests as well that the benefits we see from a certain degree of openness are not fixed in stone.

Relatedly, a system is seen as open as compared to its less open counterpart. In this regard, the openness that is sought can be seen as reactive, a direct response to the more closed system it has been, or could be. Open source is so because it is not proprietary. Open APIs are so because they are not private. Could it be that openness actually is a reflexive, even reactionary concept? And can we explore its many aspects free from the constraints of the thing we don’t wish for it to be?

Fifth, openness as a concept seems not to be isolated, but spread all across the Internet, as well as all the various markets and technologies that underlay and overlay the Internet. Even if it is often poorly understood and sometimes misused, openness is still a pervasive subject.

Just on the code (software) and computational sides, the relevant domains include:

Open standards, such as the Internet Protocol
Open source, such as Android
Open APIs, such as Venmo
Open data, such as EU Open Data Portal
Open AI, such as RoboSumo

How we approach openness in each of these domains potentially has wide-ranging implications for the others.

“Open” Source

One quick example is open source.

The Mozilla Foundation recently published a report acknowledging the obvious: there is no such thing as a single “open source” model. Instead, the report highlights no fewer than 10 different types of open source archetypes, from “B2B” to “Rocket Ship to Mars” to “Bathwater.” A variety of factors are at play in each of the ten proposed archetypes, including component coupling, development speed, governance structure, community standards, types of participants, and measurements of success.

Obviously, if the smart folks at Mozilla have concluded that open source can and does mean many different things, it must be true. And that same nuanced thinking probably is suitable as well for all the other openness domains.

So, in sum, openness is a systems polarity, a relational and contextual concept, a reflexive move, and a pervasive aspect of the technology world.

Possible Openness Taxonomies

Finally, here are a few proposed taxonomies that would be useful to explore further:

Means versus End

Is openness a tool (a means), or an outcome (an end)? Or both? And if both, when is it best employed in one way compared to another? There are different implications for what we want to accomplish.

The three Fs

Generally speaking, openness can refer to one of three things: a resource, a process, or an entity.

The resource is the virtual or physical thing subject to being open.
The process is the chosen way for people to create and maintain openness, and which itself can be more or less open.
The entity is the body of individuals responsible for the resource and the process. Again, the entity can be more or less open.

Perhaps a more alliterative way of putting this is that the resource is the function, the chosen process is the form, and the chosen entity is the forum.

For example, in terms of the Internet, the Internet Protocol and other design elements constitute the function, the RFC process is the form, and the IETF is the forum. Note that all these are relatively open, but obviously in different ways. Also note that a relatively closed form and forum can yield a more open function, or vice versa.

Form and forum need not follow function. But the end result is probably better if it does.

So, in all cases of openness, we should ask: What is the Form, what is the Forum, and what is the Function?

Scope of Openness

Openness also be broken down into scope, or the different degrees of access provided.

This can run the gamut, from the bare minimum of awareness that a resource or process or entity even exists, to transparency about what it entails, then to accessing and utilizing the resource, and have a reasonable ability to provide input into it, influence its operation, control its powers, and ultimately own it outright. One can see it as the steps involved from identifying and approaching a house, and eventually possessing the treasure buried inside or even forging that treasure into new being.

Think about the Android OS, for example, and how its labelling as an open source platform does, or does not, comport with the full scope of openness, and perhaps how those degrees have shifted over time. Clearly it matches up to one of Mozilla’s ten open source archetypes — but what are the tradeoffs, who has made them and why, and what is the full range of implications for the ecosystem? That would be worth a conversation.

Interestingly, many of these degrees of openness seem to be rooted in traditional common carrier law and regulation, going back decades if not centuries.

Visibility and Transparency: the duty to convey information about practices and services
Access: the norms of interconnection and interoperability
Reasonable treatment: the expectation of fair, reasonable, and nondiscriminatory terms and conditions
Control: the essential facilities and common carriage classifications

In fact, in late July 2018, US Senator Mark Warner (D-VA) released a legislative proposal to regulate the tech platforms, with provisions that utilize many of these same concepts.

Openness as Safeguards Taxonomy

Finally, openness has been invoked over the years by policymakers, such as the Congress and the FCC in the United States. Often it has been employed as a means of “opening up” a particular market, such as the local telecommunications network, to benefit another market sector, like the information services and OTT industries.

Over time, these market entry and safeguards have fallen into certain buckets. The debates over access broadband networks is one interesting example:

definitional — basic/enhanced dichotomy
structural —Computer II structural separation
functional — Computer III modular interfaces
behavioral — network neutrality
informational — transparency

In each case, it would be useful if stakeholders engaged in a thorough analysis of the scope and tradeoffs of openness, as defined from the vantage points of the telecom network owners, the online services, and the ultimate end users.

The larger point, however, is that openness is a potentially robust topic that will influence the ways all of us think about the decentralized Web.

Richard S. Whitt Whitt is an experienced corporate strategist and technology policy attorney. Currently he serves as Fellow in Residence with the Mozilla Foundation, and Senior Fellow with the Georgetown Institute for Technology Law and Policy. As head of NetsEdge LLC, he advises companies on the complex governance challenges at the intersection of market, technology, and policy systems. He is also president of the GLIA Foundation, and founder of the GLIAnet Project.

Identity in the Decentralized Web

Posted on June 28, 2019 by ravirosen

By Jim Nelson

In today’s world, why do platforms require so many accounts for a single person? (Courtesy of Jolocom)

In July of 2018, more than 1000 people gathered at the Decentralized Web Summit to share the latest decentralized protocols for the Web. Over three days, groups took deep dives into the “roadblock” issues we must surmount to reach scale, including identity. The following report by Jim Nelson explains what identity might look like in a decentralized world.

In B. Traven’s The Death Ship, American sailor Gerard Gales finds himself stranded in post-World War I Antwerp after his freighter departs without him. He’s arrested for the crime of being unable to produce a passport, sailor’s card, or birth certificate—he possesses no identification at all. Unsure how to process him, the police dump Gales on a train leaving the country. From there, Gales endures a Kafkaesque journey across Europe, escorted from one border to another by authorities who do not know what to do with a man lacking any identity. “I was just a nobody,” Gales complains to the reader.

As The Death Ship demonstrates, the concept of verifiable identity is a cornerstone of modern life. Today we know well the process of signing in to shopping websites, checking email, doing some banking, or browsing our social network. Without some notion of identity, these basic tasks would be impossible.

That’s why at the Decentralized Web Summit 2018, questions of identity were a central topic. Unlike the current environment, in a decentralized web users control their personal data and make it available to third-parties on a need-to-know basis. This is sometimes referred to as self-sovereign identity: the user, not web services, owns their personal information.

The idea is that web sites will verify you much as a bartender checks your ID before pouring a drink. The bar doesn’t store a copy of your card and the bartender doesn’t look at your name or address; only your age is pertinent to receive service. The next time you enter the bar the bartender once again asks for proof of age, which you may or may not relinquish. That’s the promise of self-sovereign identity.

At the Decentralized Web Summit, questions and solutions were bounced around in the hopes of solving this fundamental problem. Developers spearheading the next web hashed out the criteria for decentralized identity, including:

secure: to prevent fraud, maintain privacy, and ensure trust between all parties
self-sovereign: individual ownership of private information
consent: fine-tuned control over what information third-parties are privy to
directed identity: manage multiple identities for different contexts (for example, your doctor can access certain aspects while your insurance company accesses others)
and, of course, decentralized: no central authority or governing body holds private keys or generates identifiers

One problem with decentralized identity is that these problems often compete, pulling in polar directions.

For example, while security seems like a no-brainer, with self-sovereign identity the end-user is in control (and not Facebook, Google, or Twitter). It’s incumbent on them to secure their information. This raises questions of key management, data storage practices, and so on. Facebook, Google, and Twitter pay full-time engineers to do this job; handing that responsibility to end-users shifts the burden to someone who may not be so technically savvy. The inconvenience of key management and such also creates more hurdles for widespread adoption of the decentralized web.

The good news is, there are many working proposals today attempting to solve the above problems. One of the more promising is DID (Decentralized Identifier).

A DID is simply a URI, a familiar piece of text to most people nowadays. Each DID references a record stored in a blockchain. DIDs are not tied to any particular blockchain, and so they’re interoperable with existing and future technologies. DIDs are cryptographically secure as well.

DIDs require no central authority to produce or validate. If you want a DID, you can generate one yourself, or as many was you want. In fact, you should generate lots of them. Each unique DID gives the user fine-grained control over what personal information is revealed when interacting with a myriad of services and people.

If you’re interested to learn more, I recommend reading Michiel Mulders’ article on DIDs, “the Internet’s ‘missing identity layer’.” The DID working technical specification is being developed by the W3C. And those looking for code and community, check out the Decentralized Identity Foundation.

(While DIDs are promising, it is a nascent technology. Other options are under development. I’m using DIDs as an example of how decentralized identity might work.)

What does the future hold for self-sovereign identification? From what I saw at the Decentralized Web, I’m certain a solution will be found.

Prior to joining the Internet Archive, Jim Nelson was lead engineer and Executive Director of the Yorba Foundation, an open-source nonprofit. In the past he’s worked at XTree Company, Starlight Networks, and a whole lot of Silicon Valley startups you’ve probably never heard of. Jim also writes novels and short fiction. You can read more at j-nelson.net.

The Internet Archive’s 2019 Artists in Residency Exhibition

Posted on June 22, 2019 by Amir Saber Esfahani

Still from Meeting Mr. Kid Pix (2019) by Jeffrey Alan Scudder and Matt Doyle

The Internet Archive’s 2019 Artist in Residency Exhibition

New works by Caleb Duarte, Whitney Lynn, and Jeffrey Alan Scudder

Exhibition: June 29 – August 17, 2019

Ever Gold [Projects]
1275 Minnesota Street
Suite 105
San Francisco, CA, 94107

Hours: Tuesday – Saturday, 12-5 pm and by appointment

Ever Gold [Projects] is pleased to present The Internet Archive’s 2019 Artists in Residency Exhibition, a show organized in collaboration with the Internet Archive as the culmination of the third year of this non-profit digital library’s visual arts residency program. This year’s exhibition features work by artists Caleb Duarte, Whitney Lynn, and Jeffrey Alan Scudder.

The Internet Archive visual arts residency, organized by Amir Saber Esfahani, is designed to connect emerging and mid-career artists with the Archive’s millions of collections and to show what is possible when open access to information intersects with the arts. During this one-year residency, selected artists develop a body of work that responds to and utilizes the Archive’s collections in their own practice.

Building on the Internet Archive’s mission to preserve cultural heritage artifacts, artist Caleb Duarte’s project focuses on recording oral histories and preserving related objects. Duarte’s work is intentionally situated within networks peripheral to the mainstream art world in order to establish an intimate relationship with the greater public. His work is produced through situational engagement with active sites of social and cultural resistance and strives to extend the expressions of marginalized communities through a shared authorship.

During his residency at the Internet Archive, Duarte visited communities in temporary refugee camps that house thousands of displaced immigrants in Tijuana, Mexico. By recording oral histories and producing sculptural objects, participants exercised their ability to preserve their own histories, centered around the idea of home as memory; the objects come to represent such a place. Using the Internet Archive, Duarte was able to preserve these powerful stories of endurance and migration that otherwise might be subject to the ongoing processes of erasure. The preservation of these memories required transferring the objects and oral histories into a digital format, some of which are carefully and thoughtfully curated into the Internet Archive’s collections for the public to access. For the exhibition at Ever Gold [Projects], Caleb has created an architectural installation representing ideas of “human progress,” using the same materials from Home Depot that we use to construct our suburban homes: white walls, exposed wooden frames, and gated fences. These materials and the aesthetics of their construction form a direct visual link to the incarceration of immigrant children. This installation is juxtaposed with raw drawings on drywall and video documentation of sculptural performances and interviews created at the temporary refugee camps in Tijuana.

Artist Whitney Lynn’s project builds on previous work in which Lynn questions representations of the archetypal temptress or femme fatale. This type of character is the personification of a trap, a multifaceted idea that interests Lynn. Many of her recent projects are influenced by the potential of an object designed to confuse or mislead. For her residency at the Internet Archive, Lynn has turned her attention to the ultimate femme fatale—the mythological siren. Taking advantage of the Archive’s catalog of materials, Lynn tracks the nature of the siren’s depiction over time. From their literary appearance in Homer’s Odyssey (where they are never physically described), to ancient Greek bird-creatures (occasionally bearded and often featured on funerary objects), to their current conflation with mermaids, sirens have been an object for much projection. Around the turn of the century, topless mermaids begin to appear in Odyssey-related academic paintings, but in the Odyssey not only are the Sirens never physically described, but their lure is knowledge—they sing of the pain of war, claim that they know everything on earth, and say that whoever listens can “go on their way with greater knowledge.” In Homer’s iconic story, Odysseus’s men escape temptation and death because they stuff their ears with wax and remain blissfully ignorant, while Odysseus survives through bondage. The Internet Archive’s mission statement is to provide “universal access to all knowledge” and the myth of the siren is both a story about forbidden knowledge and an example of how images can reflect and reinforce systems of power. Lynn’s investigation of the siren brings up related questions about the lines between innocence and ignorance, and the intersections of knowledge, power, and privilege.

Programmer and digital painter Jeffrey Alan Scudder’s project centers around Kid Pix, an award-winning and influential painting app designed for children released in 1989 by Craig Hickman. The user interface of Kid Pix was revolutionary—it was designed to be intuitive (violating certain Apple guidelines to reduce dialog boxes and other unwieldy mechanics), offered unusual options for brushes and tools, and had a sense of style and humor that would prove hard to beat for competitor products. The original binaries of Kid Pix and related digital ephemera are in the collections of the Internet Archive. As part of his practice, Scudder writes his own digital drawing and painting software, and has always wanted to meet Hickman. As part of his residency with the Internet Archive, he visited Hickman at his home in Oregon. In a video directed by Matthew Doyle, Scudder and Hickman discuss software, art, and creativity. Hickman donated his collection of Kid Pix-related artifacts and ephemera to the Computer History Museum, and the exhibition will include a display of these materials alongside Scudder’s work. In addition to the video work and the selection of artifacts on view, Scudder will present a whiteboard drawing/diagram about his work with the Internet Archive.

During the exhibition, Jeffrey Alan Scudder will produce a new iteration of Radical Digital Painting, an ongoing performance project which often includes other artists. Radical Digital Painting is named after Radical Computer Music, a project by Danish artist Goodiepal, with whom Scudder has been touring in Europe over the last two years. In 2018 alone, Jeffrey gave more than 45 lecture-performances on digital painting and related topics in the United States and Europe. On July 20 at 5 pm, Radical Digital Painting presents THE BUG LOG, a project by Ingo Raschka featuring Julia Yerger and Jeffrey Alan Scudder.

Please contact info@evergoldprojects.com with any inquiries.

More about the artists:

Caleb Duarte (b. 1977, El Paso, Texas) lives and works in Fresno. Duarte is best known for creating temporary installations using construction type frameworks such as beds of dirt, cement, and objects suggesting basic shelter. His installations within institutional settings become sights for performance as interpretations of his community collaborations. Recent exhibitions include Bay Area Now 8 at Yerba Buena Center for the Arts (San Francisco, 2018); Emory Douglas: Bold Visual Language at Los Angeles Contemporary Exhibitions (2018); A Decolonial Atlas: Strategies in Contemporary Art of the Americas at Vincent Price Art Museum (Monterey Park, CA, 2017); Zapantera Negra at Fresno State University (Fresno, CA, 2016); and COUNTERPUBLIC at the Luminary (St. Louis, MO, 2015).

Whitney Lynn (b. 1980, Williams Air Force Base) lives and works between San Francisco and Seattle. Lynn employs expanded forms of sculpture, performance, photography, and drawing in her project-based work. Mining cultural and political histories, she reframes familiar narratives to question dynamics of power. Lynn’s work has been included in exhibitions at the San Francisco Museum of Modern Art; Torrance Art Museum; Yerba Buena Center for the Arts (San Francisco); RedLine Contemporary Art Center (Denver); and Exit Art (New York). She has completed project residencies at the de Young Museum (San Francisco, 2017) and The Neon Museum (Las Vegas, 2016). She has created site-responsive public art for the San Diego International Airport, the San Francisco War Memorial Building, and the City of Reno City Hall Lobby. Lynn has taught at Stanford University, the San Francisco Art Institute, and UC Berkeley, and is currently an Assistant Professor in Interdisciplinary Visual Arts at the University of Washington.

Jeffrey Alan Scudder (b. 1989, Assonet, Massachusetts) lives and works between Maine and Massachusetts. Scudder spends his time programming and making pictures. He attended Ringling College of Art & Design (BFA, 2011) and Yale School of Art (MFA, 2013). He has taught at UCLA and Parsons School for Design at The New School, and worked at the design studio Linked by Air. Recent exhibitions include drawings at 650mAh (Hove, 2018); INTENTIONS BASED ON A FUTURE WHICH HAS ALREADY HAPPENED at Naming Gallery (Oakland, CA, 2018); Radical Digital Painting at Johannes Vogt Gallery (New York, 2018); Imaginary Screenshots at Whitcher Projects (Los Angeles, 2017) drawinghomework.net Presents at February Gallery (Austin, 2017); New Dawn at Neumeister Bar-Am (Berlin, 2017); and VIDEO MIXER at Yale School of Art (New Haven, 2015). In 2018 alone, Jeffrey gave over 45 lecture-performances on digital painting and related topics in the United States and Europe. Selected recent lecture-performance venues include Weber State University (Odgen, Utah, 2019); 650mAh0 (Hove, 2018); Chaos Communication Congress (Leipzig, 2018); the ZKM Museum (Karlsruhe, Germany, 2018); Estonian Academy of Arts (Tallinn, Estonia, 2018), Bauhaus University (Weimar, Germany, 2018); and Yale School of Art (New Haven, 2018).

About the Internet Archive:

At the Internet Archive, we believe passionately that access to knowledge is a fundamental human right. Founded by Brewster Kahle with the mission to provide “Universal Access to All Knowledge,” this digital library serves as a conduit for trusted information, connecting learners with the published works of humankind. Like the internet itself, the Internet Archive is a critical part of the infrastructure delivering the power of ideas to knowledge seekers and providers. For 23 years, we have preserved now more than 45 petabytes of data, including 330 billion web pages, 3.5 million digital books, and millions of audio, video and software items, making them openly accessible to all while respecting our patrons’ privacy. Each day, more than one million visitors use or contribute to the Archive, making it one of the world’s top 300 sites. As a digital library, we seek to transform learning and research by making the world’s scholarly data and information linked, accessible and preserved forever online.

Internet Archive Partners with University of Edinburgh to Provide Historical Web Data Supporting Machine Translation

Posted on June 19, 2019 by jefferson

The Internet Archive will provide portions of its web archive to the University of Edinburgh to support the School of Informatics’ work building open data and tools for advancing machine translation, especially for low-resource languages. Machine translation is the process of automatically converting text in one language to another.

The ParaCrawl project is mining translated text from the web in 29 languages. With over 1 million translated sentences available for several languages, ParaCrawl is often the largest open collection of translations for each language. The project is a collaboration between the University of Edinburgh, University of Alicante, Prompsit, TAUS, and Omniscien with funding from the EU’s Connecting Europe Facility. Internet Archive data is vastly expanding the data mined by ParaCrawl and therefore the amount of translated sentences collected. Lead by Kenneth Heafield of the University of Edinburgh, the overall project will yield open corpora and open-source tools for machine translation as well as the processing pipeline.

Archived web data from IA’s general web collections will be used in the project. Because translations are particularly scarce for Icelandic, Croatian, Norwegian, and Irish, the IA will also use customized internal language classification tools to prioritize and extract data in these languages from archived websites in its collections.

The partnership expands on IA’s ongoing effort to provide computational research services to large-scale data mining projects focusing on open-source technical developments for furthering the public good and open access to information and data. Other recent collaborations include providing web data for assessing the state of local online news nationwide, analyzing historical corporate industry classifications, and mapping online social communities. As well, IA is expanding its work in making available custom extractions and datasets from its 20+ years of historical web data. For further information on IA’s web and data services, contact webservices at archive dot org.

Please Donate 78rpm Records to the Internet Archive’s Great 78 Project

Posted on June 17, 2019 by jeff kaplan

Good news: we have funding to preserve at least another 250,000 sides of 78rpm records, and we are looking for donations to digitize and physically preserve. We try to do a good job of digitizing and hosting the recordings and then thousands of people listen, learn, and enjoy these fabulous recordings.

If you have 78s (or other recordings) that you would like to find a good home for, please think of us — we are a non-profit and your donations will be tax-deductible, digitized for all to hear, and physically preserved. If you are interested in donating recordings of any type or appropriate books, please start with this form and we will contact you immediately

We are looking for anything we do not already have. (We are finding 80% duplication rates sometimes, so we are trying to find larger or more niche collections). We will physically preserve all genres, but our current funding has directed us to prioritize digitization of non-classical and non-opera.

We can pay for packing and shipping, and are getting better at the logistics for collections of a few thousand and up. These are fragile objects and we are having good luck avoiding damage.

The collections get highlighted and if you submit a story we will post it prominently. For instance: Boston Public Library, Daniel McNeil and Tina Argumedo’s Argentinian Tango collection.

The reason to highlight the donors is twofold: one is the celebrate the donor and their story, but the other is to help contextualize these recordings for different generations. These stories help users find meaning in the materials and find things they want to listen to. This way we can lead new listeners to love this music as the original collectors have

Working together we can broaden this collection to works from around the world and different cultural groups in each country.

If you are a private individual or an institution and have records to contribute, even if they are not 78s, please start with this simple form, or email info@archive.org, or call +1-415-561-6767 and we will contact you immediately. Thank you.

Sneak Peek: Wellness Workshops at Dweb Camp

Posted on June 15, 2019 by Mai Ishikawa Sutton

We’re excited to gather technologists, creatives and visionaries together for an amazing 4-day weekend, July 18-21, at the first DWeb Camp. As we work to build a better internet in this beautiful natural location, it’s a chance to consider the impact that our technology has on societies, ecosystems and the world.

To explore the significant role we play in these complex systems, DWeb Camp will feature a series of wellness workshops to help us deepen our connection with each other and our ecosystem at large. For many, life on the internet can become disembodying over time; we lose our grounding in the reality of the natural environment. What happens when we think about our networks in the context of both the digital and physical realms?

Convene a conversation around the firepit with the ocean at your back. It’s a chance to look up from our screens and connect.

DWeb Camp takes place on farm land that has a unique history. The stewards of this farm are cultivating new methods of growing food and generating energy. The farm is not just a pretty backdrop–we hope it will catalyze some meaningful discussions about our connections to each other and the planet.

At DWeb Camp, anyone can propose a workshop or discussion—there will be plenty of time for self-organizing. In the meantime, here are some workshops focusing on wellness that you can look forward to:

Regenerative Agriculture: This is a system of farming principles and practices that increase biodiversity, enrich soils and improve watersheds. Our hosts are building a regenerative farm that can capture carbon and water in soil, reversing the global trends of CO2 accumulation.

Learn how the farm is using no-till gardening, beneficial insects, and microbiology to nourish the soil. Get your hands dirty transplanting crops while workshop leaders, Cassie and Jared, teach us about these regenerative practices that actually improve the planet.

Permaculture: Are mushroom spawn and worm casings your jam? Then join Cassie and Jared in a hands-on lesson on using King Stropharia mushrooms to create the richest soil for your garden. They’ll show you how to build a kitchen compost worm bin so your soil can be as rich as theirs!

Decentralized Renewable Energy: The Farm is on a path to installing an energy super-system: integrating wind, solar and water to produce more energy than they need for themselves. Joshua Tree, founder of Butterfly Power, will lead a workshop from Powerhouse 1, his mobile renewable energy trailer. How can each of us offset our own carbon footprint? If you bring a solar kit, Joshua Tree will help you install a solar panel on your car or RV, while exploring the future of mobile renewable energy. Order your solar kit ahead of time to do your own or help someone else!

Grow Your Own Edible Mushrooms: Growing edible mushrooms is easy and fun. Expert mycologist, Stephanie Manara, will show you how to inoculate a log for different types of mushrooms, leading to multiple years worth of edible fungi! Take yours home or put together a mushroom grow kit to spread in your garden later. Along the way, Stephanie will share the benefits, science and lore of mycelia!

Plant Walk: With paper microscopes in hand, take a guided tour of the native plants and seeds at the Farm. Seed saving is one of the most powerful skills a farmer (or backyard gardener) can practice. Learn how to become a seed steward from Steve Peters, founder of the Organic Seed Alliance. The planet’s seed diversity is rapidly decreasing, but we can change that starting right in our own backyards.

Fermentation: For centuries, our ancestors have understood the benefits of fermentation—from food preservation to using microbes for good gut health. Cassie will lead a workshop on fermenting vegetables for probiotic health and delicious cuisine.

Farm Tour: Hop in a 4-wheel drive vehicle with Bill, the Farm’s resident historian. He’ll trace the land’s roots, from the Amah Mutsun tribe through today’s vision for these largely uncultivated 700 acres. From ocean to forest, creek to lake, Bill will share the story of this amazing stretch of Pacific coast land.

Making Honey: Experience the bees of the Pacific Coast and help make a batch of Lion’s Mane Mushroom honey. Tasting is encouraged!

Hemp Workshop: What other plant can be transformed into a medicine, paper, textiles, clothing, biodegradable plastics, paint, insulation, biofuel, food, or animal feed? For 10,000 years, hemp has been woven into useful products. The Farm is soon to be a center for thriving hemp production. Josh West takes you on a tour through history, a green house and the many uses of this versatile plant.

Cacao Ceremony: Cacao is a fruit best known for its use in producing chocolate. But cacao is also a natural stimulant that can lead you to a warming, heart-opening experience. Matt Siegel, founder of the Envision Festival, will lead participants through a ceremony using Ecuadorian, fair-trade cacao. Come open your heart and increase your ability to connect—to yourself, to others and to the planet.

Stargazing and Myths: Take a hike up the hill to one of the area’s great stargazing spots. Weather permitting, Joanna and Ben will show you how to identify the constellations and share myths that the Greeks spun out of the stars.

Join Us –Reserve Your Ticket today

Interested in leading a workshop? Want to organize a discussion? You’re welcome to! Please let us know here. We will share more information on organizing workshops.

Reserve Your Tickets Today – space is filling up fast!
Learn More Here.

Money and Utopia at the Internet Archive

Posted on June 11, 2019 by Katie Barrett

Guest blog post by Author Finn Brunton

The history of money is history itself. From the accounting and contracts of Sumerian cuneiform tablets (the earliest written language) to buried coin hoards, stamps and letters of credit, Incan khipu knot-counts, or the maps and censuses written in the service of levying taxes, part of the great archive from which history is made are the records of cash, debt, credit, assets, and coinage.

A lot of that archive is durable: cowry shells, wooden tally-sticks, clay tablets, coins buried under floorboards or in the hulls of sunken ships. (/Rai/ stones, the indigenous currency of the Micronesian island of Yap, will outlast us all.) And a lot of that archive gives people their own incentives to preserve and maintain: saving precious metals, stock certificates, banknotes, deeds, or the proofs of kinship debts and IOUs. But most of the money transacted now is electronic. How could you write the history of digital cash?

That’s where the Internet Archive comes in. About eight years ago I began work in earnest on a book about the prehistory of cryptocurrency: the technologies, visions, subcultures, and fantasies that drove the project of building digital objects that could work like cash — anonymous transactions with money that could prove itself, as a dollar does, rather than needing the identities of the transactors, like a credit card.

Digging up this history meant a crash course in the history of money itself — and, as strange as this might sound, the /history/ of the history of money, how people thought about what money meant and how to read it at different times, with collections like the Newman Numismatic Portal and documents like the playwright and poet Joseph Addison’s marvelous 1726 “Dialogues Upon the Usefulness of Ancient Medals,” a kind of dreamy, melancholy short story about coins, poetry, and the legacies of the past.

It also meant leaning about various utopian projects which used new forms of money and economic schemes to try to change society — these were often led the sort of ahead-of-their-time, or out-of-this-world, characters whose archives are, to put it gently, difficult to find. Not for the Internet Archive, though: the documents of the strange project of American Technocracy in the 1930s — like an autobiography written from the perspective of the economic price system! — are, through phone, tablet, or mouse, at one’s fingertips.

The book, focused as it is on small circles of monetary and cryptographic utopians in the Bay Area of California from the 1970s through the arrival of Bitcoin, also required study of the subcultures, publications, and movements within which my subjects crossed paths and dreamed big dreams — venues like Mondo 2000 (individual issues are incredibly rich time capsules and the people around Ted Nelson’s amazing Xanadu. But, of course, many of these people were among the first to leave print behind and begin writing and publishing primarily online — especially on the fragile, ephemeral Web. Which is where the Wayback Machine came in! Here crucial developments that would otherwise be lost were preserved, like Hal Finney’s “reusable proof of work” token system — an important step toward what would become Bitcoin and subsequent cryptocurrency and blockchain systems.

The book that I built using all these archives was written over the course of several years in many places, from the back seat of a car in the Colorado Rockies to a family farm in rural Quebec, a laundromat in New Hampshire, and a cabin in Finland, but anywhere that I could get the faintest wireless signal, these archives — and many more — were with me. (As a user of the search engine DuckDuckGo, “!archive” and “!wayback” are my favorite, reflexive search operators.) Some of the earliest discussions of computerized, digital money happened in the context of dreams of what networked computing could be: the world’s libraries and archives, across all media, on your “home information terminal,” available at a gesture. With the Internet Archive, that utopia is at last being realized.

BOOK LAUNCH EVENT
Join us Tuesday, June 25th at the Internet Archive in San Francisco for the book launch of DIGITAL CASH: The Unknown History of the Anarchists, Utopians, and Technologists Who Created Cryptocurrency by Finn Brunton.

Date: Tuesday, June 25th 2019
Time: Doors Open: 6:00 PM
In Conversation with Finn Brunton: 6:30 – 7:45 PM
Reception: 7:45 – 9:00 PM
Light refreshments will be served. Finn Brunton’s book will also be available for purchase and signing during the reception, courtesy of The Green Arcade bookstore.
Where: Internet Archive
300 Funston Ave
SF, CA 94118

RSVP Now

About the Author: Finn Brunton (finnb.net) is the author of Spam: A Shadow History of the Internet (2013) and Digital Cash: The Unknown History of the Anarchists, Technologists, and Utopians Who Created Cryptocurrency (2019), and the co-author of Obfuscation: A User’s Guide for Privacy and Protest (2015) and Communication (2019). He teaches in Media, Culture, and Communication at New York University.

Branding the Decentralized Web

Posted on June 11, 2019 by Wendy Hanamura

The new DWeb logo draws inspiration from our colorful code base!

The Decentralized Web is a concept. It’s a set of technologies. It’s a network of builders and designers and dreamers. What started with small gatherings in San Francisco, London, Los Angeles, Toronto and Berlin is growing into a global movement. So how do we visually convey the identity of this idea we call the “DWeb”?

Designer, Iryna Nezhynska, has created a flexible, new brand identity for the DWeb community.

That’s the assignment that Berlin-based designer, Iryna Nezhynska, volunteered to take on. Born in Ukraine, she developed her skills in brand agencies based in Warsaw, designing for some of the world’s largest brands: Lindt Chocolates, Mercedes/Daimler, John Deere. But Nezhynska’s joy came from working with startups, helping small teams express their collective vision. Today, she is the digital designer for Jolocom, the Berlin-based team building tools for self-sovereign identity. “Visual design is in my bones,” Nezhynska told me. “Branding is the air I breathe.”

“If you are a visual communication designer—you aren’t a graphic designer or an artist,” Nezhynska explained. “You are a translator of the emotions/ideas/values that a group wants to communicate. You translate it into a visual language.” So the first step is understanding the “personality” of the brand. How it should feel and speak. Nezhynska began by launching a survey for community members in Berlin, Toronto and San Francisco. Collectively, we imagined the DWeb’s to be “friendly,” “open-minded” and “playful.” But it is also a dynamic change-agent, prioritizing people over technology.

When it came to designing the logo, Nezhynska first had to consider the ubiquitous image of connected nodes that have come to represent the blockchain. “That graphic is so overused,” she ruminated. “I tried to find out who actually came up with it and why it became the blockchain symbol. The blockchain graphic is random, but it has movement. I wanted to build on the common associations of the blockchain, but push the idea further. So I thought, what if we take the graphic and delete the sharp edges, transforming the lines into the D shape? Could we make the cluster of dots ‘live’ within the D? I want people to play with the brand. You can use the D or just suggest it.”

The logo’s central shape is a simple dot. “The concept is that we have different size dots. There are many people building Web 3.0, but they have different influence. They are living and moving. Small dots can become bigger over time. Yet it’s one community,” Nezhynska explained. In her flexible design concept, the dot can be a balloon, a fingerprint, a face. A speck of light. She sought to create pattern and consistency, yet enough free space to allow people to play.

“A logo should be breathing and have a heartbeat. Every dot changes size. Every dot beats differently. A beautiful cacophony.” –Iryna Nezhynska, designer

Next, given the values of the Decentralized Web around open source code and free iteration, Nezhynska looked for an open source font and landed on Lab Mono, a typeface created by a Berlin-based coder and designer, Martin Wecke.

Yet designing the logo was only the beginning of the brand assignment. With the DWeb personality firmly in mind, Nezhynska next set off in search of the right visuals, colors and applications in the real world to create a mood board. Taken together, the logo and mood board create a memorable look and feel. “The moodboard should be the brand’s ‘North Star,” Nezhynska explained. “If you apply it consistently across all visual touchpoints, even if you delete the logo, you should still be able to identity the brand. It’s that consistent. That recognizable.”

How can the world use this brand identity? We see the DWeb as a global community, adding new nodes in cities around the globe. Each city can adopt its own color: SF might be yellow, Berlin red.

At DWeb Camp, Nezhynska will lead a workshop to design UX/UI for a central landing page that will direct you to DWeb groups and events around the world. She envisions a community akin to Creative Commons, now in 150 cities with local ambassadors creating events: meet ups, camps, an Annual Summit. “I hope the DWeb can keep the money aspects at bay,” the designer mused. “If you run a local community you aren’t promoting your own brand, you are promoting the community.”

And five years from now? “If the community keeps the brand alive and growing, in five years the visual won’t be so important,” Nezhynska said. “But the tonal voice will be important. Friendly. Playful. Human. The brand personality should persist.”

To build the DWeb Brand with Iryna Nezhynska, register for DWeb Camp, July 18-21. Join her workshop to extend the global brand.

Cult of the Dead Cow Book Reading and Discussion —Tuesday June 18 at 6pm

Posted on June 7, 2019 by Caitlin Olson

Watch a recording from the evening here!

Join us on June 18, 2019 at the the Internet Archive for a book reading and panel discussion about — and with — some of the original hacking supergroup, the Cult of the Dead Cow. Modern security owes much to this irreverent group, whose members pioneered both smart independent security research and hacking for human rights.

The event is in celebration of the new book by veteran technology reporter, Joseph Menn, entitled Cult of the Dead Cow: How the Original Hacking Supergroup Might Just Save the World. Light refreshments and small snacks will be provided, and books will be available for purchase. Tickets are free, but donations are greatly appreciated. The event will also be live-streamed on our YouTube channel.

RSVP HERE

EFF and the Internet Archive Present:
Cult of the Dead Cow Book Reading & Discussion

Date: Tuesday June 18, 2019
Time: 6:00-9:00 pm
Where: Internet Archive
300 Funston Ave. SF, CA 94118

Schedule:
• Reception: 6:00-7:00 pm
• Reading by Joseph Menn: 7:00-7:15 pm
• Panel Discussion: 7:15-8:15 pm
• Post-Panel Mingling: 8:15-9:00 pm

Speaker:
Joseph Menn – author of Cult of the Dead Cow: How the Original Hacking Supergroup Might Just Save the World

Panel:
MC: Cindy Cohn – Executive Director of EFF
Chris Rioux – BO2k (Back Orifice 2000) author and Veracode founder
Window Snyder – cDc fellow traveler and former core security staffer at Microsoft and Apple and now Square
Omega – formerly anonymous cDc text file editor

GET YOUR FREE TICKETS HERE

The IA Client – The Swiss Army Knife of Internet Archive

Posted on June 5, 2019 by Jason Scott

As someone who’s uploaded hundreds of thousands of items to the Internet Archive’s stacks and who has probably done a few million transactions with the materials over the years, I just “know” about the Internet Archive python client, and if you’re someone who wants to interact with the site as a power user (or were looking for an excuse to), it’ll help you to know about it too.

You might even be the kind of power user who is elbowing me out of the way saying “show me the code and show me the documentation”. Well, the documentation is here and the code is here. Have a great time.

Boy, they run fast.

So, for everyone still around, a little history about how this client came along and how, if you have a certain set of tasks and interactions you want to conduct with the massive treasures of archive.org, it might enable you to do some amazing things indeed. If you’ve never done command-line scripting before, here’s a great excuse to learn.

Started in 2012 and overseen primarily by Archive employee Jake Johnson, the internetarchive client (which is generally just called “ia”) is both a set of libraries and a command-line program for doing a wide range of activities and actions with the archive without having to come in through the website. There’s a range of advantages and differences from using the web interface, mostly that it can be called as a command-line request, and return the results (success, failure, other information) right into your scripts. It is coded to be in lock-step with our APIs and system, and does its best to respect capacity as well as return informative messages about success or errors.

The command comes in the form of ia [command], where command is a variety of functions:

It is possible to do a ia search command and return the item identifiers of every item that matches your query, which can then be fed to other scripts or utilitzed as a checklist for your own research.
The ia metadata command will return as much metadata as possible, including file sizes, metadata pairs, content type, and other useful information baked into every object in the collections.
The ia list command will tell you all the different files within an item identifier, to see which you might specifically want.
The ia download and ia upload commands let you pull down and upload items to the archive, setting all the attributes for uploads and adding conditions and specific matches for downloads.
The ia tasks command lets your scripts know how the addition of your items went into the archive’s sets, as well as where they stand in terms of post-processing.

All the commands, in fact, that a user might find themselves in desperate need of due to the size or complexity of the task, and clicking endlessly in a browser is just not going to cut it.

The client was originally created for the Archive to do many different processes itself, via scripts, that would both provide clear error messages, give accurate status updates, and allow the scripts to understand what was working or what needed modification. Many internal teams either use this client or depend on its output for information to do their tasks. With over six years of development on it, the tool is very mature and utilized thousands of times a day internally.

In my case, here are some automated or semi-automated tasks I use the ia client command set to do, often daily:

Analyze the text of a set of documents to provide me with best guesses as to their publication date, which I then sign off on
Take a donation of several hundred PDF files and turn them into individual items in a collection, including taking metadata from a .CSV sheet
Compare and contrast screenshots within an item to find the best one and make that a thumbnail for the item
Maintain “Pipelines” that pull from content located elsewhere (like the Bitsavers documentation project or the DNA Lounge) and place the resulting items into the Archive with no human intervention

For people who are using the Archive to simply play with and enjoy its many different materials, be they website histories, movies, music, and books – this tool is probably not what you need.

But for the scripting-comfortable folks.. for people who want to become scripting comfortable folks… for people who are maintaining collections or working hard with multiple uploads and doing a lot of manual work to enter metadata.. this multi-tool of Internet Archive access is exactly what you need.

As mentioned above, the documentation is here and the code is here. Have a great time.

Internet Archive Blogs

A blog from the team at archive.org