Author Archives: Alexis Rossi

New Views Stats for the New Year

We began developing a new system for counting views statistics on archive.org a few years ago. We had received feedback from our partners and users asking for more fine-grained information than the old system could provide. People wanted to know where their views were coming from geographically, and how many came from people vs. robots crawling the site.

The new system will debut in January 2019. Leading up to that in the next couple of weeks you may see some inconsistencies in view counts as the new numbers roll out across tens of millions of items.  

With the new system you will see changes on both items and collections.

Item page changes

An “item” refers to a media item on archive.org – this is a page that features a book, a concert, a movie, etc. Here are some examples of items: Jerky Turkey, Emma, Gunsmoke.

On item pages the lifetime views will change to a new number.  This new number will be a sum of lifetime views from the legacy system through 2016, plus total views from the new system for the past two years (January 2017 through December 2018). Because we are replacing the 2017 and 2018 views numbers with data from the new system, the lifetime views number for that item may go down. I will explain why this occurs further down in this post where we discuss how the new system differs from the legacy system.

Collection page changes

Soon on collection page About tabs (example) you will see 2 separate views graphs. One will be for the old legacy system views through the end of 2018. The other will contain 2 years of views data from the new system (2017 and 2018). Moving forward, only the graph representing the new system will be updated with views numbers. The legacy graph will “freeze” as of December 2018.

Both graphs will be on the page for a limited time, allowing you to compare your collections stats between the old and new systems.  We will not delete the legacy system data, but it may eventually move to another page. The data from both systems is also available through the views API.

People vs. Robots

The graph for new collection views will additionally contain information about whether the views came from known “robots” or “people.”  Known robots include crawlers from major search engines, like Google or Bing. It is important for these robots to crawl your items – search engines are a major source of traffic to all of the items on archive.org. The robots number here is your assurance that search engines know your items exist and can point users to them.  The robots numbers also include access from our own internal robots (which is generally a very small portion of robots traffic).

One note about robots: they like text-based files more than audio/visual files.  This means that text items on the archive that have a publicly accessible text file (the djvu.txt file) get more views from robots than other types of media in the archive. Search engines don’t just want the metadata about the book – they want the book itself.

“People” are a little harder to define. Our confidence about whether a view comes from a person varies – in some cases we are very sure, and in others it’s more fuzzy, but in all cases we know the view is not from a known robot. So we have chosen to class these all together as “people,” as they are likely to represent access by end users.

What counts as a view in the new system

  • Each media item in the archive has a views counter.
  • The view counter is increased by 1 when a user engages with the media file(s) in an item.
    • Media engagement includes experiencing the media through the player in the item page (pressing play on a video or audio player, flipping pages in the online bookreader, emulating software, etc.), downloading files, streaming files, or borrowing a book.
    • All types of engagements are treated in the same way – they are all views.
  • A single user can only increase the view count of a particular item once per day.
    • A user may view multiple media files in a single item, or view the same media file in a single item multiple times, but within one day that engagement will only count as 1 view.
  • Collection views are the sum of all the view counts of the items in the collection.
    • When an item is in more than one collection, the item’s view counts are added to each collection it is in. This includes “parent” collections if the item is in a subcollection.
    • When a user engages with a collection page (sorting, searching, browsing etc.), it does NOT count as a view of the collection.
    • Items sometimes move in or out of collections. The views number on a collection represents the sum of the views of the items that are in the collection at that time (e.g. the September 1, 2018 views number for the collection represents the sum of the views on items that were in the collection on September 1, 2018. If an item moves out of that collection, the collection does not lose the views from September 1, 2018.).

How the new system differs from the legacy system

When we designed the new system, we implemented some changes in what counted as a “view,” added some functionality, and repaired some errors that were discovered.  

  • The legacy system updated item views once per day and collection views once per month. The new system will update both item and collection views once per day.
  • The legacy system updated item views ~24 hours after a view was recorded.  The new system will update the views count ~4 days after the view was recorded. This time delay in the new system will decrease to ~24 hours at some point in the future.
  • The legacy system had no information about geographic location of users. The new system has approximate geolocation for every view. This geographic information is based on obfuscated IP addresses. It is accurate at a general level, but does not represent an individual user’s specific location.
  • The legacy system had no information about how many views were caused by robots crawling the site. The new system shows us how well the site is crawled by breaking out media access by robots (vs. interactions from people).
  • The legacy system did not count all book reader interactions as views.  The new system counts bookreader engagements as a view after 2 interactions (like page flips).
  • On audio and video items, the legacy system sometimes counted views when users saw *any* media in the item (like thumbnail images). The new system only counts engagements with the audio or video media files in an item in those media types, respectively.

In some cases, the differences above can lead to drastic changes in views numbers for both items and collections. While this may be disconcerting, we think the new system more accurately reflects end user behavior on archive.org.

If you have questions regarding the new stats system, you may email us at info@archive.org.

Decades of music celebrating Audiovisual Heritage

In honor of World Day for Audiovisual Heritage (October 27) we’d like to take you on a brief tour through seven decades of digitized music and audio recordings from 1900 through 1970.  We’ve been working to digitize 78rpm discs for the Great 78 Project to preserve the heritage of the first half of the 20th century, and now we’re turning our eyes toward vinyl LPs that have fallen out of print in the Unlocked Recordings collection.

1905 – A Picnic For Two

1906 – Talmage on Infidelity (very judgy)


1912 – Till the Sands of the Desert Grow Cold

1916 – I’ll Take you Home Again, Kathleen


1920 – I Want a Jazzy Kiss (as opposed to a bluesy kiss)

1937 – A Cowboy Honeymoon (hint: includes yodeling)


1939 – The Red Army Chorus of the U.S.S.R. (when we were pals)

1945– Don’t you Worry ‘Bout That Mule” (spoiler alert – he ain’t goin’ blind)


1947 – Everything is Cool (so sayeth Bab’s 3 Bips & a Bop)

1950 – When both accordions and Hi-Fi were hip


1950 – “They’re all dressed up to go swinging and, Man, they’re a gas!” (Sonny Burke from the back cover)

1957 – Amongst fierce competition, this gem wins Most Nightmare Inducing Cover Image


1958 – Dance music from Israel

1959 – This intensely sleepy version of “Makin’ Whoopee” will send you to sleep in the lounge.


1960 – My next story is a little risque (and so is the one after that)

1961 – Recorded live at the Second City Cabaret Theatre, Chicago, Ill.


1961 – Easy winner for the worst song opening we’ve ever heard, enjoy Tiger Rag from The Percussive Twenties.

1962 – Significant improvement on the Tiger Rag from the Doowackadoodlers


1963 – “Adults only” saucy comedy

1966 – Organ-ized wins best pun, as well as having “Popular songs arranged for organ” by “Brazil’s #1 Organist”


1966 – The music stylings of Mrs. Miller are not to be missed – personal favorites are “Hard day’s night” and “These boots are made for walkin'”

1966 – The “You Don’t Have to be Jewish” Players are falling in love


1969 – The Begatting of the President

World’s largest collection of Tibetan Buddhist literature now available on the Internet Archive

The Buddhist Digital Resource Center (BDRC) and Internet Archive (IA) announced today that they are making a large corpus of Buddhist literature available via the Internet Archive. This collection represents the most complete record of the words of the Buddha available in any language, plus many millions of pages of related commentaries, teachings and works such as medicine, history, and philosophy.

BDRC founder E. Gene Smith sits at the computer with Buddhist monks and others

BDRC’s founder, E. Gene Smith, spent decades collecting and preserving Tibetan texts in India before starting the organization in 1999. Since then, as a neutral organization they have been able to work on both sides of the Himalayas in search of rare texts.

Several months ago in a remote monastery in Northeast Tibet, a BDRC employee photographed an old work and sent it in to their library. It was a text that the tradition has always known about, but which was long considered to have been lost. Its very existence was unknown to anyone outside of the caretakers of the monastery that had safeguarded it for centuries.

The Kadampa school, active in the 11th and 12 centuries, was known to scholars – they knew who had started the tradition and where it fit in the history of Buddhism – but most of the writings from that period had not survived the centuries. And yet suddenly here was a lost classic of this tradition, the only surviving manuscript of the work: The exposition on the graduated path by Kadam Master Sharawa Yontan Drak (1070-1141). Dozens of pithy sayings are attributed to Sharawa in later works but this writing of his is never directly cited in the classics of the genre that date back to the fifteenth century and before.

The exposition on the graduated path by Kadam Master Sharawa Yontan Drak (1070-1141).

BDRC’s digitizers never know what they will find when they arrive at a new location, but their work has uncovered missing links, beautiful woodblock versions of known texts, writings of previously unknown authors, and texts by famous people that they thought had been lost to time. While the manuscript above is an amazing find, it is by no means the only one their work has unearthed.

Children holding a manuscript in its box

This work highlights the importance of preserving cultures before they disappear or are too dispersed to gather together. In its efforts to make all of Buddhist literature available, BDRC is also digitizing fragile palm leaf manuscripts in Thailand, Sanskrit texts in Nepal, and the entire Tibetan collection of the National Library of Mongolia. Brewster Kahle, founder of Internet Archive, said, “In 2011 we announced that we had digitized every historic work in Balinese, and this year we are making Tibetan literature available. We hope that this is a trend that will see the literatures of many more cultures become openly available.”

Children studying Buddhist teachings

This is not an academic pursuit. Many Tibetans have left their homeland, spreading to India and around the world. Younger generations who have been displaced and raised in other societies may not have the opportunity to grow up with these traditional teachings. The work of the BDRC is to make those teachings available to everyone.

Jeff Wallman, Executive Director Emeritus of BDRC and Jann Ronis, Executive Director of BDRC, addressed their reasons for making this information available on the Internet Archive: “The founding mission of BDRC is to make the treasures of Buddhist literature available to all on the Internet. We recognize that you cannot preserve culture; you can only create the right conditions for culture to preserve itself. We hope that by making these texts available via the Internet Archive, we can spur a new generation of usage. Openness ensures preservation.”

Buddhist monks

The BDRC’s extensive collection is used by laypeople and monks alike. Karmapa Ogyen Trinley Dorje is a frequent user of their collection. He and other traveling teachers call on the BDRC’s library for references and works when they are away from their libraries, or whenever they need a rare text that they could not otherwise access.

Chokyi Nyima Rinpoche, the Abbot of Ka-Nying Shedrub Ling Monastery in Nepal, and a well regarded teacher of Tibetan Buddhism around the world, is gratified that the teachings of Buddha have been made available. “We can share the entire body of literature with every Tibetan who can use it. These texts are sacred, and should be free.”

BDRC’s home office is in Cambridge, Massachusetts, with additional offices and digitization centers in Hangzhou, China; Bangkok, Thailand; Kathmandu, Nepal; and at the National Library of Mongolia in Ulaanbaatar where it is establishing a project in collaboration with the Asian Classics Input Project (ACIP).

Internet Archive and BDRC are both delighted to join forces on sharing the Buddhist literary tradition for the benefit of humanity.

About Buddhist Digital Resource Center

BDRC is a 501(c)(3) nonprofit dedicated to seeking out, preserving, organizing, and disseminating Buddhist literature. Joining digital technology with scholarship, BDRC ensures that the treasures of the Buddhist literary tradition are not lost, but are made available for future generations. BDRC would like every monastery, every Buddhist master, every scholar, every translator, and every interested reader to have access to the complete range of Buddhist literature, regardless of social, political, or economic circumstances. BDRC is headquartered in Harvard Square in Cambridge, Massachusetts.

About Internet Archive

The Internet Archive is a 501(c)(3) nonprofit digital library based in San Francisco that specializes in offering broad public access to digitized and born-digital books, music, movies and Web pages.

Contacts:
Jann Ronis, BDRC, jann@tbrc.org
Jeff Wallman, BDRC jeffwallman@tbrc.org
Brewster Kahle, Internet Archive, brewster@archive.org

Images of Afghanistan 1987-1994

Afghan Media Resource Center’s correspondent interviewing a Muj Commander, 1991

Journalists and others risk their lives to keep the public informed in times of conflict. War imagery provides us with important information in the moment, and creates a trove of invaluable archival content for the future.

Please be aware that this collection contains some disturbing photos of violence and its aftermath (though we have not included any in this blog post).

The Afghan Media Resource Center (AMRC) was founded in Peshawar, Pakistan, in 1987, by a team of media trainers working under contract to Boston University. The goal of the project was to assist Afghans to produce and distribute accurate and reliable accounts of the Afghan war to news agencies and television networks throughout the world.  Beginning in the early 1980’s amidst a news blackout imposed by the Soviet backed Kabul government, foreign journalists had become targets to be captured or killed. The AMRC was an effort to overcome the substantial obstacles encountered by media representatives in bringing events surrounding the Afghan-Soviet war to world attention.

An armed Muj posing for the camera, 1988

Beginning in 1987, a series of six week training sessions were conducted at the AMRC original home in University Town, Peshawar, Pakistan.  Qualified Afghans were recruited from all major political parties, all major ethnic groups and all regions of Afghanistan, to receive professional training in print journalism, photo journalism and video news production.  Haji Sayed Daud, a former television producer and journalist at Kabul TV before the Soviet invasion, was named AMRC Director.

After the completion of their training, 3-person teams were dispatched on specific stories throughout Afghanistan’s 27 provinces, with 35mm cameras, video cameras, notebooks, and audio tape recorders. Photo materials were distributed internationally through SYGMA and Agence France Press (AFP). Video material was syndicated and broadcast by VisNews (now Reuters), with 150 broadcasters in 87 countries, Euronews and London-based WTN (now Associated Press), Thames Television, ITN, Swedish, French, Pakistani and other regional networks.

A young girl carrying clean drinking water, 1989

In 2000 AMRC began publishing a popular and influential newspaper in Kabul: ERADA (Intention). With one interruption, ERADA publication continued until 2012.

Beyond the AMRC archive, the AMRC conducted dozens of training programs and workshops for writers and radio journalists, including training programs for Refugee Women in Development (REFWID). The AMRC also established radio and TV studios in the provincial capitaol, Jalalabad, and produced radio and TV programs, including educational radio dramas, for a variety of international organizations. AMRC also conducted public opinion polls in Afghanistan, including an extensive Media Use Survey in Afghanistan, financed by InterMedia, a Washington, D.C. group.

Armed Muj pulling out an unexploded missile, 1989

The AMRC collection spans a critical period in Afghanistan’s history – (1987 – 1994), including 76,000 photographs, 1,175 hours of video material, 356 hours of audio material, and many stories from print media.

An Afghan weaving carpet, 1990

In 2012 AMRC received a grant to digitize the entire AMRC archive, to preserve the collection at the U.S. Library of Congress. AMRC senior media advisors Stephen Olsson and Nick Mills were trained in the digitization processes by the Library of Congress, then spent two weeks in Kabul training the AMRC staff. The digitization and metadata sheets (in English, Dari and Pashto) were completed in 2016, and were welcomed into the Library of Congress with a formal ceremony.  We are now making the entire AMRC collection available through our on-line partner, The Internet Archive.

Now the entire collection is readily available to scholars, researchers and publishers.  All royalties for commercial use of the photo images and video material will continue to support the non-profit work of the AMRC.

78s Bring the Past to Life

For many of us, music is an integral part of our memories.  It evokes a period of time in our lives, or inspires specific recollections.  Music can also conjure times long past, outside of our personal memories.

 

When we watch this movie, we see and hear Argentina in the early 20th Century.  The music in this clip came from a 78 collected in Buenos Aires by Tina Argumedo, part of her personal collection of hundreds of discs.

Tina Argumedo (L), D’Anna Alexander (C), Lucretia Hug (R)

Tina began collecting 78s in the 1930s.  The world was waking up from the Great Depression, Buenos Aires was celebrating its 400th birthday, and tango was moving from working class barrooms to grand, middle-class dance halls.  Tina was a 20 year old newlywed, and she and her husband loved to dance.  She began to collect the music she loved, and she continued to build her collection of 78s for 20 years.

When Tina’s husband died, she moved to the U.S. to live with her daughter, Lucretia Hug.  She sold almost everything she owned, packed up her life, and moved to a foreign country.  One of the few things she brought with her were her 78s.  She passed her love of music down to her children and grandchildren and shared this music with her family, sitting on the couch in the living room with her 78s on shelves around them.

A few years ago when Tina passed away, her family had to decide what to do with this collection.  They no longer had room to store the discs, but they knew how important this music had been to Tina and couldn’t imagine throwing the 78s away or donating them to some charity to be disposed of piecemeal.  Fortunately, a family friend (and Argentinian composer), Débora Simcovich, offered to take the discs and keep them together.

And then, just a few months ago, Tina’s granddaughter, D’Anna Alexander, heard about The Great 78 Project and she remembered her grandmother’s collection.  She and Débora agreed that the best tribute to Tina would be to donate her collection to the Internet Archive for digitization and preservation, so that people all over the world could appreciate this music and the collection that Tina built. It is a way to preserve and celebrate their family’s heritage and culture.

We have begun to digitize Tina’s collection and you can hear the first discs now in the Tina Argumedo & Lucretia Hug Collection.

More than 3 million recordings were produced on 78rpm discs, and many of them have never made the leap forward to modern formats. In some cases, these 78s contain the only version of these performances, and they are mostly inaccessible to people today.  Few people have equipment to play a 78, and the discs themselves are frighteningly fragile.

These 78rpm discs contain an entire era of our musical history, from about 1898 through the 1950s when LP records were introduced.  If we do not digitize the music on these discs, we will lose it. 

The Great 78 Project aims to digitize as many of those 3 million recordings as we can find. Currently we have digitized 50,000 recordings, and we continue to add about 5,000 each month.  We are preserving discs from more than 20 collectors and institutions around the world, and we continue to look for more people with collections they would like to share.  If you would like to work with us on this collection, please let us know at info@archive.org.

If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine

In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

There are several ways to save pages and whole sites so that they appear in the Wayback Machine.  Here are 6 of them.

1. Save Page Now

Put a URL into the form, press the button, and we save the page.  You will instantly have a permanent URL for your page.

save page now

At the moment, there are a few exceptions for this method – some sites prohibit crawling, a few have SSL (security) settings that make it break – but this method will work for most pages.  The feature saves the page you enter including the images and CSS.  It does not save any of the outlinks, and can’t be used to initiate a crawl of an entire web site. We do not keep your IP address, so your submission is anonymous.

2. Chrome extension

Install the Wayback Machine Chrome extension in your browser.  Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We will save the page and give you a permanent URL.

Chrome extension allows save page now

The same provisos from “Save Page Now” apply – there are some pages where it won’t work, and it only saves one page at a time.  One plus to installing the extension though is that now as you surf around, when you run into a missing page we will alert you if we have a saved copy.

We also have a “Wayback Machine” Firefox add-on

A “Wayback Machine” Safari Extension

A “Wayback Machine” iOS app

And a “Wayback Machine” Android app

3. Wikipedia JavaScript Bookmarklet

Nobody loves a primary source more than a Wikipedia editor.  To that end, they offer a Wayback Machine JavaScript Bookmarklet that allows you to quickly save a web page from any browser.

wikipedia wayback bookmarklet

4. Volunteer for Archive Team

Archive Team is an entirely volunteer driven group who are interested in saving Internet history.  Many of the sites and pages they save end up in the Wayback Machine.  Visit the Archive Team site to learn more about how to volunteer with them.

Archive Team

5. Sign up for an Archive-It Account

Archive-It is a subscription service provided by Internet Archive that allows you to run your own crawling projects without any technical expertise.  Tell us what to crawl and how often to crawl it, and we execute the crawl and put the results in the Wayback Machine.

Archive-It

Archive-It is a paid subscription service with technical and web archivist support. This option is most appropriate for organizations that have a mandate to save certain types or categories of web content on a regular basis. If your institution is a current Archive-It partner, contact them for how you can contribute.

6. End of Term Archive

Every time the US government administration changes, Internet Archive works with partners to make a copy of government-related sites and web presences.  We call it the End of Term Archive.  You can help us discover new government sites by using the Nomination Tool to suggest pages or sites.  These nominations are added to the crawl and end up in the Wayback Machine.

End of term archive nomination tool

 

The Internet Archive has been saving web pages for 20 years.  This archive has been built by thousands of people, and we would like you to help.  Use one of the methods above to make sure we have the pages you care about.

 

Robots.txt Files and Archiving .gov and .mil Websites


The Internet Archive is
collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. Some have asked if we ignore URL exclusions expressed in robots.txt files.

The answer is a bit complicated.  Historically, sometimes yes and sometimes no; but going forward the answer is “even less so.”

mollymonsterRobots.txt files live on the top level of a website at a url like this: https://example.com/robots.txt. This standard was developed in 1994 to guide search engine crawlers in a variety of ways, including some areas to avoid crawling.   This standard is used by Google, for instance.

These files were useful 20 years ago for the Internet Archive’s crawlers, but have become less and less so over the years because many sites have not actively maintained the files from the point of view of archiving. Also, large websites or hosted websites often do not make it easy for their users to edit these files, and large websites increasingly guide or block crawlers with technological measures. Another problem is knowing when a domain name changes hands, so a current robots.txt file is not relevant to a different era. As time has gone on, for those who want to exclude their sites we encourage webmasters to send exclusion requests to info@archive.org and encourage them to specify what time period they apply to.

Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies.  We have had little or no negative feedback on this, and little or no positive feedback — in fact little feedback at all. The Wayback Machine has also been replaying the captured .gov and .mil webpages for some time in the beta wayback, regardless of robots.txt.   

Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future.

Lending Launches on Archive.org, Plus Bookreader Updates

We have been loaning digital books through Open Library since 2010. We started with about 10,000 books in the lending collections, and soon there will be more than 500,000 books available.  

Today we launch lending on Archive.org, so patrons no longer need to go to Open Library to borrow books. The same parameters for borrowing apply — books are free to borrow for logged in users, and they can be borrowed for a period of 2 weeks.

For Open Library users, the lending path has changed a bit — see this post for more information.

For Archive.org users, you’re going to see many more modern books available in the coming weeks. These books will appear in collections and search results with a blue “Borrow” notice on them.

ia-borrowbookinsearchresults

 

Logged in users will be able to borrow the book from the book’s details page where you see the full metadata. Remember, creating an account on archive.org is free, and so is borrowing books.

ia-borrow-detail

When you click “Borrow This Book” you will be taken to the new bookreader.  You can search, use the read aloud feature, zoom in and out, and change the number of pages you see at once. The book will be available in your browser for 2 weeks as long as you are connected to the Internet.

If you prefer to read your book offline, you can download a PDF or EPUB version of the book to be read in Adobe Digital Editions (free download).  You must install Adobe Digital Editions before you can read the offline version of your book.

ia-bookloandownloaddialog-detail

When you want to return the book, you can return it from Adobe Digital Editions (if you chose to download) and from the bookreader.

ia-bookreturndialog-detail

In addition to the new borrow features, we have updated the bookreader to display better on mobile devices. The layout now changes when you are on a very small screen in order to make it easier to use.  You will see one page at a time, and some of the functions are located in the menu on the left.

ia-mobile-menucircled

If you would like to download an offline copy of the book accessible through Adobe Digitial Editions (don’t forget to download the app first!) open the menu and choose “Loan Information.”

ia-mobile-menu-loaninfocircled

From here you can download a PDF or EPUB to read offline, or return the book.

ia-mobile-downloadloans

We hope you will explore the books available for lending, and enjoy the features of the new bookreader.

Many thanks to: Richard Caceres, Brenton Cheng, Carolyn Li-Madeo, Tracey Jaquith, Jessamyn West, Jeff Kaplan, John Lekashman, Dwalu Khasu, John Gonzalez and Alexis Rossi.

The Evolving Internet Archive

v2concert

The new archive.org site

The new version of the archive.org site has been evolving over the past 6 months in response to the feedback we’ve received from thousands of our awesome users.

If you haven’t been following along, you can review a little bit of the journey through these blog posts:

Why change the site at all?  The posts above help answer that, but in brief:

  • 35% of our ~3 million daily users are on mobile/tablet devices, and the classic site is not easy to use on small formats.
  • The new tools we want to offer our users would be difficult to implement in the old site architecture.
  • The classic site was built a long time ago, using methods that are outdated.  Finding programmers who have the skills to work in that environment is becoming increasingly difficult, and the ramp up time for new employees is painful.  The redesign has given us an opportunity to start pulling the front end (what you see) apart from the back end, so they can evolve separately.

percent of archive.org users viewing the new site

Blue represents people in classic archive.org (v1), red represents people in the new version (v2)

Currently about 85% of archive.org users are in the new version. Over the next few weeks we will be asking the remaining 15% to try it out.  For the time being, users will be able to exit exitthe new archive.org and return to the “classic” version — but the classic will not always be available or supported, so please give the new version a try and give us feedback if there are things on the site that you don’t like, can’t find, or that seem like bugs.  (When you click “exit” you will have an opportunity to give us feedback.)

We have made several video tours that introduce you to the new site. I recommend starting with the site tour, below.

down-button

The original download button

In the past few months we have received more than 16,000 feedback emails from people using the new version.  The redesign team reads every single one of them.  Some just say, “I love it!” and some immediately say, “I hate it!”  But a great many of you have also taken the time to share a little more – something you missed from the old site, a question about the new tools, concern about accessibility, suggestions for how to adjust things, etc.

Download menu open by default

Download menu open by default

We took that input — along with information from user tests, interviews with some of our power users, chats with partners — and tried to identify areas of the interface that seemed to be working well, and other areas that were not.

The evolution of downloading files from items is a great example of the process we’ve been following.  The original design for item pages de-emphasized download as a feature. Our conversations with users told us that most people wanted to hit a play button, not download a file.

You could still download in the original design, of course, but you had to click a button to get options and then click again if you wanted specific files.

But when we opened the new site up to more users, we got many comments from people who either disliked the extra clicking, didn’t like leaving the page to get individual files, didn’t understand what the options represented, or couldn’t find the download options at all.

The first thing we tried was just opening up the download menu by default.  Instead of just seeing the black download button on the page, you now also saw a menu of options.  More people saw the download, but feedback made it clear that users still had issues.

What if we make it blue?  (Nope!)

What if we make it blue? (Nope!)

We thought perhaps if we increased the visibility of the download options by turning the Download header blue that people would see it faster.  We did an A/B test with 50% of users seeing each option — neither option really won.  And the feedback about this feature continued to be negative.

It became clear that we needed to rethink the design of the download options all together, trying to keep it clean-looking and easy to use while also satisfying the concerns of our most advanced users.

We set some goals for the download changes based on the feedback we had received:

  • must be able to download an individual file without leaving the item page
  • if there is only one file in a particular format, you should only need one click to download it
  • improve the ability to download groups of files (e.g. “just give me all the FLAC files”)

The current version of downloads allows you to consume individual media files without leaving the page and gives you a lot more options for downloading groups of files from an item.  Since we released the new Download Options feature, the negative feedback about this feature has dropped off almost entirely.  So we think we’re on the right track!  We have created a short video tour for the downloads feature if you want to learn more.

New Download Options feature, illustrating how to display individual files

New Download Options feature, illustrating how to display individual files

The download changes are just one example of how much your feedback has helped us identify areas of confusion on the site and understand how to improve things.  Here are a few more examples:

  • A-Z filters available when sorting by title or creator
  • better experience for people with javascript disabled
  • fixes to improve software emulation
  • default search results to List view (instead of image-based Thumbnail view)
  • pull user page images from gravatar if available (if user has not uploaded one)

We have a lot more in store for the new site – better accessibility for sight disabled people, tools for creating your own collections, improved playback for multimedia items, etc.  As these features trickle into the site, we hope you will continue to share your questions and ideas with us – you are truly helping us to make the archive a better place for everyone.

This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.

What’s new with v2

As many of you have already seen, we are working on the next generation of the archive.org web site, which we call Version 2.0 (v2). It’s in beta right now, so go check it out!

trybeta

Version 1 (v1) showing the banner to try the BETA Version 2 (v2)

We get a lot of feedback from the people who have elected to try out v2, and we read ALL of it. As themes emerge about what people are having trouble with, we make changes to the design and then we pay attention to subsequent feedback to try to gauge whether we solved the problem (or not).

volumes

Volume prepended to title

The goal of this redesign is to make the site more inviting and easier to use. Right now our work is focused on how the site looks and how things are organized on the page. For the most part, everything that is available to you in Version 1 (v1) of the site is available to you in v2 – but those things may be in different places!

Rights information displayed in About tab

Rights information displayed in About tab

We have a lot of long-time users of the site, and we know that any major changes will cause them to have to relearn where things are and how to accomplish the things they already know how to do on v1. This kind of major change can be very annoying, so we’re working hard to make sure you only need to relearn things once. While we will be adding more features as time goes by, we expect those changes to be incremental and not to affect the basic layout of pages.

If you’ve been using v2, you’ve probably noticed some changes over the last few weeks. I’ll discuss some of those changes here, and some of them are highlighted in the included images.

about

The collection About tab contains a longer description, info about contributors, and stats for reviews, forums, views and items

Volume information.  We have a lot of journals and books with Volume information that was not showing in search, collection or account pages. The volume information is now prepended to the title for easier visual scanning within a collection.

Live Music. Rights information for a collection is now displayed on the About tab. We also changed the way shows are described in band collections to list the date and venue before the band name, making it easier to visually scan the items in a collection.

Mobile. On most mobile devices we decreased the initial number of search results from 50 to 25 in order to lighten the page load time.

Collections Page

Go to list view for a collection and click the "Show details" checkbox

Go to list view for a collection and click the “Show details” checkbox

Collection description. The description area for the collection at the top of the page has been shortened. We encourage collection builders to add useful descriptions, and you can see the additional information in the new About tab.

Click to see additional collections for an item

Click to see additional collections for an item

About tab. The About tab replaces the Contributors tab. We wanted to have a place for all of the information about a collection, and “Contributors” didn’t cover it. The new About tab contains the longer description for a collection, rights information (when it exists), data about how many reviews and forum posts are in that collection, and the content from the previous Contributors tab – the collection creator, people who have added to the collection, and charts for Views and Items over time.  You will also find related collections listed on the About tab below the graphs. Parent collections and subcollections still show up in the Collections tab, since they are part of a collection’s direct hierarchy.

The See All Files page

The See All Files page

Collection tab. The Collection tab has a few changes as well. In list view, you can now “show details” for each item if you want to see more information.

Item Pages

Additional collections. If an item belongs to more than one collection, you can choose to view those additional collections.

Upload tile on user account page

Upload tile on user account page

Stream only. When an item is not available for download, you will see a “Stream Only” notification where the “Download” button normally appears. We made some visual changes to this notification to make it seem less button-like.

Favorites list sorted by Date Favorited

Favorites list sorted by Date Favorited

See All Files. In the “see all files” view, “playable” media files are pushed to the top, just under the “all files” options for torrent and zip. Files are grouped logically, with the original first and bolded and the derivative files listed below.

User Account Page

uploadicon Uploads. Your Uploads tab has a new “Upload” tile in it, just to make uploading easier to find. You can still upload from anywhere on the site by clicking the upload icon at the top of the page, of course.

Favorites. Your Favorites list (called bookmarks in v1) will now display your favorites sorted by “date favorited” so that you can see your most recently favorited items first.

Tell Us!

As always, please use the Beta feedback link in the top right corner to let us know what you think.  Is everything awesome?  Are you confused about where to find something?  Tell us!

If you’re interested in a more detailed running log of changes from our lead developer, Tracey Jaquith, you can get the “nerd version” here: https://archive.org/CHANGELOG.txt

This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.