Author Archives: Alexis Rossi

Images of Afghanistan 1987-1994

Afghan Media Resource Center’s correspondent interviewing a Muj Commander, 1991

Journalists and others risk their lives to keep the public informed in times of conflict. War imagery provides us with important information in the moment, and creates a trove of invaluable archival content for the future.

Please be aware that this collection contains some disturbing photos of violence and its aftermath (though we have not included any in this blog post).

The Afghan Media Resource Center (AMRC) was founded in Peshawar, Pakistan, in 1987, by a team of media trainers working under contract to Boston University. The goal of the project was to assist Afghans to produce and distribute accurate and reliable accounts of the Afghan war to news agencies and television networks throughout the world.  Beginning in the early 1980’s amidst a news blackout imposed by the Soviet backed Kabul government, foreign journalists had become targets to be captured or killed. The AMRC was an effort to overcome the substantial obstacles encountered by media representatives in bringing events surrounding the Afghan-Soviet war to world attention.

An armed Muj posing for the camera, 1988

Beginning in 1987, a series of six week training sessions were conducted at the AMRC original home in University Town, Peshawar, Pakistan.  Qualified Afghans were recruited from all major political parties, all major ethnic groups and all regions of Afghanistan, to receive professional training in print journalism, photo journalism and video news production.  Haji Sayed Daud, a former television producer and journalist at Kabul TV before the Soviet invasion, was named AMRC Director.

After the completion of their training, 3-person teams were dispatched on specific stories throughout Afghanistan’s 27 provinces, with 35mm cameras, video cameras, notebooks, and audio tape recorders. Photo materials were distributed internationally through SYGMA and Agence France Press (AFP). Video material was syndicated and broadcast by VisNews (now Reuters), with 150 broadcasters in 87 countries, Euronews and London-based WTN (now Associated Press), Thames Television, ITN, Swedish, French, Pakistani and other regional networks.

A young girl carrying clean drinking water, 1989

In 2000 AMRC began publishing a popular and influential newspaper in Kabul: ERADA (Intention). With one interruption, ERADA publication continued until 2012.

Beyond the AMRC archive, the AMRC conducted dozens of training programs and workshops for writers and radio journalists, including training programs for Refugee Women in Development (REFWID). The AMRC also established radio and TV studios in the provincial capitaol, Jalalabad, and produced radio and TV programs, including educational radio dramas, for a variety of international organizations. AMRC also conducted public opinion polls in Afghanistan, including an extensive Media Use Survey in Afghanistan, financed by InterMedia, a Washington, D.C. group.

Armed Muj pulling out an unexploded missile, 1989

The AMRC collection spans a critical period in Afghanistan’s history – (1987 – 1994), including 76,000 photographs, 1,175 hours of video material, 356 hours of audio material, and many stories from print media.

An Afghan weaving carpet, 1990

In 2012 AMRC received a grant to digitize the entire AMRC archive, to preserve the collection at the U.S. Library of Congress. AMRC senior media advisors Stephen Olsson and Nick Mills were trained in the digitization processes by the Library of Congress, then spent two weeks in Kabul training the AMRC staff. The digitization and metadata sheets (in English, Dari and Pashto) were completed in 2016, and were welcomed into the Library of Congress with a formal ceremony.  We are now making the entire AMRC collection available through our on-line partner, The Internet Archive.

Now the entire collection is readily available to scholars, researchers and publishers.  All royalties for commercial use of the photo images and video material will continue to support the non-profit work of the AMRC.

78s Bring the Past to Life

For many of us, music is an integral part of our memories.  It evokes a period of time in our lives, or inspires specific recollections.  Music can also conjure times long past, outside of our personal memories.

 

When we watch this movie, we see and hear Argentina in the early 20th Century.  The music in this clip came from a 78 collected in Buenos Aires by Tina Argumedo, part of her personal collection of hundreds of discs.

Tina Argumedo (L), D’Anna Alexander (C), Lucretia Hug (R)

Tina began collecting 78s in the 1930s.  The world was waking up from the Great Depression, Buenos Aires was celebrating its 400th birthday, and tango was moving from working class barrooms to grand, middle-class dance halls.  Tina was a 20 year old newlywed, and she and her husband loved to dance.  She began to collect the music she loved, and she continued to build her collection of 78s for 20 years.

When Tina’s husband died, she moved to the U.S. to live with her daughter, Lucretia Hug.  She sold almost everything she owned, packed up her life, and moved to a foreign country.  One of the few things she brought with her were her 78s.  She passed her love of music down to her children and grandchildren and shared this music with her family, sitting on the couch in the living room with her 78s on shelves around them.

A few years ago when Tina passed away, her family had to decide what to do with this collection.  They no longer had room to store the discs, but they knew how important this music had been to Tina and couldn’t imagine throwing the 78s away or donating them to some charity to be disposed of piecemeal.  Fortunately, a family friend (and Argentinian composer), Débora Simcovich, offered to take the discs and keep them together.

And then, just a few months ago, Tina’s granddaughter, D’Anna Alexander, heard about The Great 78 Project and she remembered her grandmother’s collection.  She and Débora agreed that the best tribute to Tina would be to donate her collection to the Internet Archive for digitization and preservation, so that people all over the world could appreciate this music and the collection that Tina built. It is a way to preserve and celebrate their family’s heritage and culture.

We have begun to digitize Tina’s collection and you can hear the first discs now in the Tina Argumedo & Lucretia Hug Collection.

More than 3 million recordings were produced on 78rpm discs, and many of them have never made the leap forward to modern formats. In some cases, these 78s contain the only version of these performances, and they are mostly inaccessible to people today.  Few people have equipment to play a 78, and the discs themselves are frighteningly fragile.

These 78rpm discs contain an entire era of our musical history, from about 1898 through the 1950s when LP records were introduced.  If we do not digitize the music on these discs, we will lose it. 

The Great 78 Project aims to digitize as many of those 3 million recordings as we can find. Currently we have digitized 50,000 recordings, and we continue to add about 5,000 each month.  We are preserving discs from more than 20 collectors and institutions around the world, and we continue to look for more people with collections they would like to share.  If you would like to work with us on this collection, please let us know at info@archive.org.

If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine

In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

There are several ways to save pages and whole sites so that they appear in the Wayback Machine.  Here are 6 of them.

1. Save Page Now

Put a URL into the form, press the button, and we save the page.  You will instantly have a permanent URL for your page.

save page now

At the moment, there are a few exceptions for this method – some sites prohibit crawling, a few have SSL (security) settings that make it break – but this method will work for most pages.  The feature saves the page you enter including the images and CSS.  It does not save any of the outlinks, and can’t be used to initiate a crawl of an entire web site. We do not keep your IP address, so your submission is anonymous.

2. Chrome extension

Install the Wayback Machine Chrome extension in your browser.  Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We will save the page and give you a permanent URL.

Chrome extension allows save page now

The same provisos from “Save Page Now” apply – there are some pages where it won’t work, and it only saves one page at a time.  One plus to installing the extension though is that now as you surf around, when you run into a missing page we will alert you if we have a saved copy.

We also have a “Wayback Machine” Firefox add-on

A “Wayback Machine” Safari Extension

A “Wayback Machine” iOS app

And a “Wayback Machine” Android app

3. Wikipedia JavaScript Bookmarklet

Nobody loves a primary source more than a Wikipedia editor.  To that end, they offer a Wayback Machine JavaScript Bookmarklet that allows you to quickly save a web page from any browser.

wikipedia wayback bookmarklet

4. Volunteer for Archive Team

Archive Team is an entirely volunteer driven group who are interested in saving Internet history.  Many of the sites and pages they save end up in the Wayback Machine.  Visit the Archive Team site to learn more about how to volunteer with them.

Archive Team

5. Sign up for an Archive-It Account

Archive-It is a subscription service provided by Internet Archive that allows you to run your own crawling projects without any technical expertise.  Tell us what to crawl and how often to crawl it, and we execute the crawl and put the results in the Wayback Machine.

Archive-It

Archive-It is a paid subscription service with technical and web archivist support. This option is most appropriate for organizations that have a mandate to save certain types or categories of web content on a regular basis. If your institution is a current Archive-It partner, contact them for how you can contribute.

6. End of Term Archive

Every time the US government administration changes, Internet Archive works with partners to make a copy of government-related sites and web presences.  We call it the End of Term Archive.  You can help us discover new government sites by using the Nomination Tool to suggest pages or sites.  These nominations are added to the crawl and end up in the Wayback Machine.

End of term archive nomination tool

 

The Internet Archive has been saving web pages for 20 years.  This archive has been built by thousands of people, and we would like you to help.  Use one of the methods above to make sure we have the pages you care about.

 

Robots.txt Files and Archiving .gov and .mil Websites


The Internet Archive is
collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. Some have asked if we ignore URL exclusions expressed in robots.txt files.

The answer is a bit complicated.  Historically, sometimes yes and sometimes no; but going forward the answer is “even less so.”

mollymonsterRobots.txt files live on the top level of a website at a url like this: https://example.com/robots.txt. This standard was developed in 1994 to guide search engine crawlers in a variety of ways, including some areas to avoid crawling.   This standard is used by Google, for instance.

These files were useful 20 years ago for the Internet Archive’s crawlers, but have become less and less so over the years because many sites have not actively maintained the files from the point of view of archiving. Also, large websites or hosted websites often do not make it easy for their users to edit these files, and large websites increasingly guide or block crawlers with technological measures. Another problem is knowing when a domain name changes hands, so a current robots.txt file is not relevant to a different era. As time has gone on, for those who want to exclude their sites we encourage webmasters to send exclusion requests to info@archive.org and encourage them to specify what time period they apply to.

Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies.  We have had little or no negative feedback on this, and little or no positive feedback — in fact little feedback at all. The Wayback Machine has also been replaying the captured .gov and .mil webpages for some time in the beta wayback, regardless of robots.txt.   

Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future.

Lending Launches on Archive.org, Plus Bookreader Updates

We have been loaning digital books through Open Library since 2010. We started with about 10,000 books in the lending collections, and soon there will be more than 500,000 books available.  

Today we launch lending on Archive.org, so patrons no longer need to go to Open Library to borrow books. The same parameters for borrowing apply — books are free to borrow for logged in users, and they can be borrowed for a period of 2 weeks.

For Open Library users, the lending path has changed a bit — see this post for more information.

For Archive.org users, you’re going to see many more modern books available in the coming weeks. These books will appear in collections and search results with a blue “Borrow” notice on them.

ia-borrowbookinsearchresults

 

Logged in users will be able to borrow the book from the book’s details page where you see the full metadata. Remember, creating an account on archive.org is free, and so is borrowing books.

ia-borrow-detail

When you click “Borrow This Book” you will be taken to the new bookreader.  You can search, use the read aloud feature, zoom in and out, and change the number of pages you see at once. The book will be available in your browser for 2 weeks as long as you are connected to the Internet.

If you prefer to read your book offline, you can download a PDF or EPUB version of the book to be read in Adobe Digital Editions (free download).  You must install Adobe Digital Editions before you can read the offline version of your book.

ia-bookloandownloaddialog-detail

When you want to return the book, you can return it from Adobe Digital Editions (if you chose to download) and from the bookreader.

ia-bookreturndialog-detail

In addition to the new borrow features, we have updated the bookreader to display better on mobile devices. The layout now changes when you are on a very small screen in order to make it easier to use.  You will see one page at a time, and some of the functions are located in the menu on the left.

ia-mobile-menucircled

If you would like to download an offline copy of the book accessible through Adobe Digitial Editions (don’t forget to download the app first!) open the menu and choose “Loan Information.”

ia-mobile-menu-loaninfocircled

From here you can download a PDF or EPUB to read offline, or return the book.

ia-mobile-downloadloans

We hope you will explore the books available for lending, and enjoy the features of the new bookreader.

Many thanks to: Richard Caceres, Brenton Cheng, Carolyn Li-Madeo, Tracey Jaquith, Jessamyn West, Jeff Kaplan, John Lekashman, Dwalu Khasu, John Gonzalez and Alexis Rossi.

The Evolving Internet Archive

v2concert

The new archive.org site

The new version of the archive.org site has been evolving over the past 6 months in response to the feedback we’ve received from thousands of our awesome users.

If you haven’t been following along, you can review a little bit of the journey through these blog posts:

Why change the site at all?  The posts above help answer that, but in brief:

  • 35% of our ~3 million daily users are on mobile/tablet devices, and the classic site is not easy to use on small formats.
  • The new tools we want to offer our users would be difficult to implement in the old site architecture.
  • The classic site was built a long time ago, using methods that are outdated.  Finding programmers who have the skills to work in that environment is becoming increasingly difficult, and the ramp up time for new employees is painful.  The redesign has given us an opportunity to start pulling the front end (what you see) apart from the back end, so they can evolve separately.
percent of archive.org users viewing the new site

Blue represents people in classic archive.org (v1), red represents people in the new version (v2)

Currently about 85% of archive.org users are in the new version. Over the next few weeks we will be asking the remaining 15% to try it out.  For the time being, users will be able to exit exitthe new archive.org and return to the “classic” version — but the classic will not always be available or supported, so please give the new version a try and give us feedback if there are things on the site that you don’t like, can’t find, or that seem like bugs.  (When you click “exit” you will have an opportunity to give us feedback.)

We have made several video tours that introduce you to the new site. I recommend starting with the site tour, below.

down-button

The original download button

In the past few months we have received more than 16,000 feedback emails from people using the new version.  The redesign team reads every single one of them.  Some just say, “I love it!” and some immediately say, “I hate it!”  But a great many of you have also taken the time to share a little more – something you missed from the old site, a question about the new tools, concern about accessibility, suggestions for how to adjust things, etc.

Download menu open by default

Download menu open by default

We took that input — along with information from user tests, interviews with some of our power users, chats with partners — and tried to identify areas of the interface that seemed to be working well, and other areas that were not.

The evolution of downloading files from items is a great example of the process we’ve been following.  The original design for item pages de-emphasized download as a feature. Our conversations with users told us that most people wanted to hit a play button, not download a file.

You could still download in the original design, of course, but you had to click a button to get options and then click again if you wanted specific files.

But when we opened the new site up to more users, we got many comments from people who either disliked the extra clicking, didn’t like leaving the page to get individual files, didn’t understand what the options represented, or couldn’t find the download options at all.

The first thing we tried was just opening up the download menu by default.  Instead of just seeing the black download button on the page, you now also saw a menu of options.  More people saw the download, but feedback made it clear that users still had issues.

What if we make it blue?  (Nope!)

What if we make it blue? (Nope!)

We thought perhaps if we increased the visibility of the download options by turning the Download header blue that people would see it faster.  We did an A/B test with 50% of users seeing each option — neither option really won.  And the feedback about this feature continued to be negative.

It became clear that we needed to rethink the design of the download options all together, trying to keep it clean-looking and easy to use while also satisfying the concerns of our most advanced users.

We set some goals for the download changes based on the feedback we had received:

  • must be able to download an individual file without leaving the item page
  • if there is only one file in a particular format, you should only need one click to download it
  • improve the ability to download groups of files (e.g. “just give me all the FLAC files”)

The current version of downloads allows you to consume individual media files without leaving the page and gives you a lot more options for downloading groups of files from an item.  Since we released the new Download Options feature, the negative feedback about this feature has dropped off almost entirely.  So we think we’re on the right track!  We have created a short video tour for the downloads feature if you want to learn more.

New Download Options feature, illustrating how to display individual files

New Download Options feature, illustrating how to display individual files

The download changes are just one example of how much your feedback has helped us identify areas of confusion on the site and understand how to improve things.  Here are a few more examples:

  • A-Z filters available when sorting by title or creator
  • better experience for people with javascript disabled
  • fixes to improve software emulation
  • default search results to List view (instead of image-based Thumbnail view)
  • pull user page images from gravatar if available (if user has not uploaded one)

We have a lot more in store for the new site – better accessibility for sight disabled people, tools for creating your own collections, improved playback for multimedia items, etc.  As these features trickle into the site, we hope you will continue to share your questions and ideas with us – you are truly helping us to make the archive a better place for everyone.

This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.

What’s new with v2

As many of you have already seen, we are working on the next generation of the archive.org web site, which we call Version 2.0 (v2). It’s in beta right now, so go check it out!

trybeta

Version 1 (v1) showing the banner to try the BETA Version 2 (v2)

We get a lot of feedback from the people who have elected to try out v2, and we read ALL of it. As themes emerge about what people are having trouble with, we make changes to the design and then we pay attention to subsequent feedback to try to gauge whether we solved the problem (or not).

volumes

Volume prepended to title

The goal of this redesign is to make the site more inviting and easier to use. Right now our work is focused on how the site looks and how things are organized on the page. For the most part, everything that is available to you in Version 1 (v1) of the site is available to you in v2 – but those things may be in different places!

Rights information displayed in About tab

Rights information displayed in About tab

We have a lot of long-time users of the site, and we know that any major changes will cause them to have to relearn where things are and how to accomplish the things they already know how to do on v1. This kind of major change can be very annoying, so we’re working hard to make sure you only need to relearn things once. While we will be adding more features as time goes by, we expect those changes to be incremental and not to affect the basic layout of pages.

If you’ve been using v2, you’ve probably noticed some changes over the last few weeks. I’ll discuss some of those changes here, and some of them are highlighted in the included images.

about

The collection About tab contains a longer description, info about contributors, and stats for reviews, forums, views and items

Volume information.  We have a lot of journals and books with Volume information that was not showing in search, collection or account pages. The volume information is now prepended to the title for easier visual scanning within a collection.

Live Music. Rights information for a collection is now displayed on the About tab. We also changed the way shows are described in band collections to list the date and venue before the band name, making it easier to visually scan the items in a collection.

Mobile. On most mobile devices we decreased the initial number of search results from 50 to 25 in order to lighten the page load time.

Collections Page

Go to list view for a collection and click the "Show details" checkbox

Go to list view for a collection and click the “Show details” checkbox

Collection description. The description area for the collection at the top of the page has been shortened. We encourage collection builders to add useful descriptions, and you can see the additional information in the new About tab.

Click to see additional collections for an item

Click to see additional collections for an item

About tab. The About tab replaces the Contributors tab. We wanted to have a place for all of the information about a collection, and “Contributors” didn’t cover it. The new About tab contains the longer description for a collection, rights information (when it exists), data about how many reviews and forum posts are in that collection, and the content from the previous Contributors tab – the collection creator, people who have added to the collection, and charts for Views and Items over time.  You will also find related collections listed on the About tab below the graphs. Parent collections and subcollections still show up in the Collections tab, since they are part of a collection’s direct hierarchy.

The See All Files page

The See All Files page

Collection tab. The Collection tab has a few changes as well. In list view, you can now “show details” for each item if you want to see more information.

Item Pages

Additional collections. If an item belongs to more than one collection, you can choose to view those additional collections.

Upload tile on user account page

Upload tile on user account page

Stream only. When an item is not available for download, you will see a “Stream Only” notification where the “Download” button normally appears. We made some visual changes to this notification to make it seem less button-like.

Favorites list sorted by Date Favorited

Favorites list sorted by Date Favorited

See All Files. In the “see all files” view, “playable” media files are pushed to the top, just under the “all files” options for torrent and zip. Files are grouped logically, with the original first and bolded and the derivative files listed below.

User Account Page

uploadicon Uploads. Your Uploads tab has a new “Upload” tile in it, just to make uploading easier to find. You can still upload from anywhere on the site by clicking the upload icon at the top of the page, of course.

Favorites. Your Favorites list (called bookmarks in v1) will now display your favorites sorted by “date favorited” so that you can see your most recently favorited items first.

Tell Us!

As always, please use the Beta feedback link in the top right corner to let us know what you think.  Is everything awesome?  Are you confused about where to find something?  Tell us!

If you’re interested in a more detailed running log of changes from our lead developer, Tracey Jaquith, you can get the “nerd version” here: https://archive.org/CHANGELOG.txt

This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.

Update to Terms of Use

Terms difInternet Archive’s terms of use were written in March of 2001, and they haven’t changed once – until today.  The terms were written before the Wayback Machine was launched (in October 2001) when we had 4 billion web pages with no public access and 360 Prelinger Archive movies in the archive.  Now we have 435 billion web pages and more than 15 million public audio, video and text items.  Times have changed, and we have made a small change to our terms to reflect this.

In the interest of transparency, we want to show you exactly what the change is.

We have made small changes in paragraphs two and three of the terms.  The previous version of these sections is in red below:

“…You agree not to interfere with the work of other users or Archive personnel, servers, or resources. Further, you agree not to recirculate your password to other people or organizations or to copy offsite any part of the Collections without written permission. Please report any unauthorized use of your password promptly to info@archive.org…

“…You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive’s Collections will be noncommercial and will be limited to noninfringing or fair use under copyright law. In using the Archive’s site, Collections, and/or services…”

This is the new version with the changed portion in green type:

“…You agree not to interfere with the work of other users or Archive personnel, servers, or resources. Further, you agree not to recirculate your password to other people or organizations. Please report any unauthorized use of your password promptly to info@archive.org…

“…You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive’s Collections will be limited to noninfringing or fair use under copyright law. If a Creative Commons or other license has been declared for particular material on the Archive, to the extent you trust the declaration and declarer (which is rarely the Internet Archive), you may use the content according to the terms and conditions of the applicable license. In using the Archive’s site, Collections, and/or services…”

Thank you for continuing to use the amazing resources housed in the Internet Archive.

UPDATE 12/31/14:  The change on 12/30 applied to the language in the third paragraph of the terms.  On 12/31 we made an additional small change to the language in the second paragraph, and modified the text of this post to reflect both changes.

Redesigning Archive.org

Last week we announced a new beta version of the archive.org site.  The beta is the first step toward inviting people to participate in building libraries together.

archive1997

1997

2000

2000

archive2001

2001

archive2002Oct

2002

archive2005

2005

2014

2014

2014 beta site

2014 beta site

Why redesign the site?

The Wayback Machine was launched in 2001, and the current look of the site was debuted in 2002 when we added movies, texts, software, and music.  There have been minor design changes and we’ve added features over the years to make the library materials more usable, but the current interface has just accumulated over time.  We have not “rethought” the site in a holistic way in the past 12 years.

A lot has changed since 2002, for the Internet Archive and on the web.  In 2002 the archive contained 5,000 non-Wayback items, about half movies from the Prelinger Archive and half live music concerts from the Etree.org community with a few books and pieces of software sprinkled in. Those 5,000 files added up to about 3 terabytes of data.  Today we have more than 20 million media items that add up to about 10,000 terabytes of data (that’s not including 435 billion saved web pages that take up an additional 10,000 terabytes of space).

As we added more stuff to the archive, people came to visit.  We ended 2002 with about 9,000 registered users.  Today we have just a hair under 2 million registered users, and around 2.5 million individuals use the library materials every day.

Having thousands of movies available on the Internet in 2002 was actually pretty rare (remember, Youtube didn’t exist until 2005). Those 5,000 media items couldn’t be played on our site – you had to download them to your own computer to watch or listen. It was very difficult to add your own files to the Internet Archive – and who would have had the bandwidth to do it anyway?  In 2002 only 21% of U.S. homes had “high speed” internet connections.  High speed back then meant 200 kb per second. [1]

And of course, we can’t forget mobile. About 20-30% of our users today are on mobile devices, and the current web site is not serving them well.

Over the years the archive has grown immensely in terms of material and patrons. Our mission is Universal Access to All Knowledge.  And we think we can do better both with Access and with gathering All Knowledge if we have new tools and a better interface for the site.

Why this interface?

We started talking about the redesign in January of this year.  (Well, honestly we’ve been talking about it since 2006, but this was the first serious, archive-wide project.)

First we found a wonderful Creative Director, David Merkoski, and hired a great designer, Kristen Schlott.  We interviewed people, both users of the archive and people who had never heard of us, and asked them questions about how they use media. We examined how our site was being used, and talked about the intricacies and complications that come with archiving 20 million disparate things. We researched how other sites deal with large amounts of media. We used our current collections and use cases to understand how different designs would perform. Our lead developer, Tracey Jaquith, built prototypes and we user tested them. We talked to some of our power users and partners about our plans and showed them the prototype to get feedback. We had a LOT of meetings.

Idea clustering after user interviews

Idea clustering after user interviews

During this process we realized that we needed to find a way to open the archive up to more participation.  The Internet Archive has built some important and useful collections, both with partners and on our own.  We digitize 1,000 books per day.  We archive 1 billion URLs every week.  We capture television 24 hours per day, every single day.  But there is a lot of media out there in the world, and we can’t save all of it for the future without the help of experts.

Who are the experts?  You!  There are some amazing collections of media in the archive, out on the web, and sitting around on shelves and in basements that have been created by the people who know and care the most about saving those things and making sure their collections are complete and well described.  We want to create a place for those people to build communities around their interests where they can safely store these amazing collections and show them to as many people as possible.  If we all work together, we can create the most useful library the world has ever seen.

WHEN!?

Today the beta has the same basic functions as the current site, with some great additions: more visual cues to help you find things, facets on collections to quickly get you where you want to go, easy searching within collections, user pages, and many more.  We think it’s already an improvement over the current site – otherwise, we wouldn’t be showing it to you yet!

But the tools that will allow you to create your own collections and collaborate with others are still being built.  These features will be released in stages so that we can test them out in the beta and see how they work for people.  We will use feedback from patrons – both what you tell us, and the usage logs for the beta – to make decisions about how things will evolve. (Don’t worry, we aren’t keeping IP addresses — the beta respects user privacy.) When you’re in the beta, you’re going to run into things that might not work quite the way you expected, or that have suddenly changed since you used them yesterday. Sometimes it will be slow or you’ll find bugs. New things will appear, and other things may disappear. New tools will suddenly start working. We hope that for our intrepid beta users, this will be part of the fun. (Because we certainly think it’s fun!)

archivemedia

What new things are coming?

To some extent, this remains to be seen.  We will in part make decisions based on how the beta is used, so please use it!

Our current ideas include: speeding up the site; allowing patrons to create their own collections; improving accessibility for the print disabled, adding ways for patrons to collaborate around collections and items, etc.

There’s a lot more to come.  We hope you will explore all of these new options with us, and help us build the library.  If you would like to give us feedback, please write to us at info at archive dot org, or leave comments here.

 

 

Popular subjects in our book collection

We took a leisurely stroll through half a million books today, and we noticed that lots of the books were congregating around some popular categories.  This isn’t an exhaustive list, we just thought it would nice to share a little of the landscape with you.  Click through to download or borrow these books through our Open Library site.