In honor of World Day for Audiovisual Heritage (October 27) we’d like to take you on a brief tour through seven decades of digitized music and audio recordings from 1900 through 1970. We’ve been working to digitize 78rpm discs for the Great 78 Project to preserve the heritage of the first half of the 20th century, and now we’re turning our eyes toward vinyl LPs that have fallen out of print in the Unlocked Recordings collection.
The Buddhist Digital Resource Center (BDRC) and Internet Archive (IA) announced today that they are making a large corpus of Buddhist literature available via the Internet Archive. This collection represents the most complete record of the words of the Buddha available in any language, plus many millions of pages of related commentaries, teachings and works such as medicine, history, and philosophy.
BDRC’s founder, E. Gene Smith, spent decades collecting and preserving Tibetan texts in India before starting the organization in 1999. Since then, as a neutral organization they have been able to work on both sides of the Himalayas in search of rare texts.
Several months ago in a remote monastery in Northeast Tibet, a BDRC employee photographed an old work and sent it in to their library. It was a text that the tradition has always known about, but which was long considered to have been lost. Its very existence was unknown to anyone outside of the caretakers of the monastery that had safeguarded it for centuries.
The Kadampa school, active in the 11th and 12 centuries, was known to scholars – they knew who had started the tradition and where it fit in the history of Buddhism – but most of the writings from that period had not survived the centuries. And yet suddenly here was a lost classic of this tradition, the only surviving manuscript of the work: The exposition on the graduated path by Kadam Master Sharawa Yontan Drak (1070-1141). Dozens of pithy sayings are attributed to Sharawa in later works but this writing of his is never directly cited in the classics of the genre that date back to the fifteenth century and before.
BDRC’s digitizers never know what they will find when they arrive at a new location, but their work has uncovered missing links, beautiful woodblock versions of known texts, writings of previously unknown authors, and texts by famous people that they thought had been lost to time. While the manuscript above is an amazing find, it is by no means the only one their work has unearthed.
This work highlights the importance of preserving cultures before they disappear or are too dispersed to gather together. In its efforts to make all of Buddhist literature available, BDRC is also digitizing fragile palm leaf manuscripts in Thailand, Sanskrit texts in Nepal, and the entire Tibetan collection of the National Library of Mongolia. Brewster Kahle, founder of Internet Archive, said, “In 2011 we announced that we had digitized every historic work in Balinese, and this year we are making Tibetan literature available. We hope that this is a trend that will see the literatures of many more cultures become openly available.”
This is not an academic pursuit. Many Tibetans have left their homeland, spreading to India and around the world. Younger generations who have been displaced and raised in other societies may not have the opportunity to grow up with these traditional teachings. The work of the BDRC is to make those teachings available to everyone.
Jeff Wallman, Executive Director Emeritus of BDRC and Jann Ronis, Executive Director of BDRC, addressed their reasons for making this information available on the Internet Archive: “The founding mission of BDRC is to make the treasures of Buddhist literature available to all on the Internet. We recognize that you cannot preserve culture; you can only create the right conditions for culture to preserve itself. We hope that by making these texts available via the Internet Archive, we can spur a new generation of usage. Openness ensures preservation.”
The BDRC’s extensive collection is used by laypeople and monks alike. Karmapa Ogyen Trinley Dorje is a frequent user of their collection. He and other traveling teachers call on the BDRC’s library for references and works when they are away from their libraries, or whenever they need a rare text that they could not otherwise access.
Chokyi Nyima Rinpoche, the Abbot of Ka-Nying Shedrub Ling Monastery in Nepal, and a well regarded teacher of Tibetan Buddhism around the world, is gratified that the teachings of Buddha have been made available. “We can share the entire body of literature with every Tibetan who can use it. These texts are sacred, and should be free.”
BDRC’s home office is in Cambridge, Massachusetts, with additional offices and digitization centers in Hangzhou, China; Bangkok, Thailand; Kathmandu, Nepal; and at the National Library of Mongolia in Ulaanbaatar where it is establishing a project in collaboration with the Asian Classics Input Project (ACIP).
Internet Archive and BDRC are both delighted to join forces on sharing the Buddhist literary tradition for the benefit of humanity.
BDRC is a 501(c)(3) nonprofit dedicated to seeking out, preserving, organizing, and disseminating Buddhist literature. Joining digital technology with scholarship, BDRC ensures that the treasures of the Buddhist literary tradition are not lost, but are made available for future generations. BDRC would like every monastery, every Buddhist master, every scholar, every translator, and every interested reader to have access to the complete range of Buddhist literature, regardless of social, political, or economic circumstances. BDRC is headquartered in Harvard Square in Cambridge, Massachusetts.
The Internet Archive is a 501(c)(3) nonprofit digital library based in San Francisco that specializes in offering broad public access to digitized and born-digital books, music, movies and Web pages.
Journalists and others risk their lives to keep the public informed in times of conflict. War imagery provides us with important information in the moment, and creates a trove of invaluable archival content for the future.
Please be aware that this collection contains some disturbing photos of violence and its aftermath (though we have not included any in this blog post).
The Afghan Media Resource Center (AMRC) was founded in Peshawar, Pakistan, in 1987, by a team of media trainers working under contract to Boston University. The goal of the project was to assist Afghans to produce and distribute accurate and reliable accounts of the Afghan war to news agencies and television networks throughout the world. Beginning in the early 1980’s amidst a news blackout imposed by the Soviet backed Kabul government, foreign journalists had become targets to be captured or killed. The AMRC was an effort to overcome the substantial obstacles encountered by media representatives in bringing events surrounding the Afghan-Soviet war to world attention.
Beginning in 1987, a series of six week training sessions were conducted at the AMRC original home in University Town, Peshawar, Pakistan. Qualified Afghans were recruited from all major political parties, all major ethnic groups and all regions of Afghanistan, to receive professional training in print journalism, photo journalism and video news production. Haji Sayed Daud, a former television producer and journalist at Kabul TV before the Soviet invasion, was named AMRC Director.
In 2000 AMRC began publishing a popular and influential newspaper in Kabul: ERADA (Intention). With one interruption, ERADA publication continued until 2012.
The AMRC collection spans a critical period in Afghanistan’s history – (1987 – 1994), including 76,000 photographs, 1,175 hours of video material, 356 hours of audio material, and many stories from print media.
In 2012 AMRC received a grant to digitize the entire AMRC archive, to preserve the collection at the U.S. Library of Congress. AMRC senior media advisors Stephen Olsson and Nick Mills were trained in the digitization processes by the Library of Congress, then spent two weeks in Kabul training the AMRC staff. The digitization and metadata sheets (in English, Dari and Pashto) were completed in 2016, and were welcomed into the Library of Congress with a formal ceremony. We are now making the entire AMRC collection available through our on-line partner, The Internet Archive.
For many of us, music is an integral part of our memories. It evokes a period of time in our lives, or inspires specific recollections. Music can also conjure times long past, outside of our personal memories.
When we watch this movie, we see and hear Argentina in the early 20th Century. The music in this clip came from a 78 collected in Buenos Aires by Tina Argumedo, part of her personal collection of hundreds of discs.
Tina began collecting 78s in the 1930s. The world was waking up from the Great Depression, Buenos Aires was celebrating its 400th birthday, and tango was moving from working class barrooms to grand, middle-class dance halls. Tina was a 20 year old newlywed, and she and her husband loved to dance. She began to collect the music she loved, and she continued to build her collection of 78s for 20 years.
When Tina’s husband died, she moved to the U.S. to live with her daughter, Lucretia Hug. She sold almost everything she owned, packed up her life, and moved to a foreign country. One of the few things she brought with her were her 78s. She passed her love of music down to her children and grandchildren and shared this music with her family, sitting on the couch in the living room with her 78s on shelves around them.
A few years ago when Tina passed away, her family had to decide what to do with this collection. They no longer had room to store the discs, but they knew how important this music had been to Tina and couldn’t imagine throwing the 78s away or donating them to some charity to be disposed of piecemeal. Fortunately, a family friend (and Argentinian composer), Débora Simcovich, offered to take the discs and keep them together.
And then, just a few months ago, Tina’s granddaughter, D’Anna Alexander, heard about The Great 78 Project and she remembered her grandmother’s collection. She and Débora agreed that the best tribute to Tina would be to donate her collection to the Internet Archive for digitization and preservation, so that people all over the world could appreciate this music and the collection that Tina built. It is a way to preserve and celebrate their family’s heritage and culture.
We have begun to digitize Tina’s collection and you can hear the first discs now in the Tina Argumedo & Lucretia Hug Collection.
More than 3 million recordings were produced on 78rpm discs, and many of them have never made the leap forward to modern formats. In some cases, these 78s contain the only version of these performances, and they are mostly inaccessible to people today. Few people have equipment to play a 78, and the discs themselves are frighteningly fragile.
These 78rpm discs contain an entire era of our musical history, from about 1898 through the 1950s when LP records were introduced. If we do not digitize the music on these discs, we will lose it.
The Great 78 Project aims to digitize as many of those 3 million recordings as we can find. Currently we have digitized 50,000 recordings, and we continue to add about 5,000 each month. We are preserving discs from more than 20 collectors and institutions around the world, and we continue to look for more people with collections they would like to share. If you would like to work with us on this collection, please let us know at firstname.lastname@example.org.
In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.
There are several ways to save pages and whole sites so that they appear in the Wayback Machine. Here are 6 of them.
1. Save Page Now
Put a URL into the form, press the button, and we save the page. You will instantly have a permanent URL for your page.
At the moment, there are a few exceptions for this method – some sites prohibit crawling, a few have SSL (security) settings that make it break – but this method will work for most pages. The feature saves the page you enter including the images and CSS. It does not save any of the outlinks, and can’t be used to initiate a crawl of an entire web site. We do not keep your IP address, so your submission is anonymous.
2. Chrome extension
Install the Wayback Machine Chrome extension in your browser. Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We will save the page and give you a permanent URL.
The same provisos from “Save Page Now” apply – there are some pages where it won’t work, and it only saves one page at a time. One plus to installing the extension though is that now as you surf around, when you run into a missing page we will alert you if we have a saved copy.
We also have a “Wayback Machine” Firefox add-on
A “Wayback Machine” Safari Extension
A “Wayback Machine” iOS app
And a “Wayback Machine” Android app
4. Volunteer for Archive Team
Archive Team is an entirely volunteer driven group who are interested in saving Internet history. Many of the sites and pages they save end up in the Wayback Machine. Visit the Archive Team site to learn more about how to volunteer with them.
5. Sign up for an Archive-It Account
Archive-It is a subscription service provided by Internet Archive that allows you to run your own crawling projects without any technical expertise. Tell us what to crawl and how often to crawl it, and we execute the crawl and put the results in the Wayback Machine.
Archive-It is a paid subscription service with technical and web archivist support. This option is most appropriate for organizations that have a mandate to save certain types or categories of web content on a regular basis. If your institution is a current Archive-It partner, contact them for how you can contribute.
6. End of Term Archive
Every time the US government administration changes, Internet Archive works with partners to make a copy of government-related sites and web presences. We call it the End of Term Archive. You can help us discover new government sites by using the Nomination Tool to suggest pages or sites. These nominations are added to the crawl and end up in the Wayback Machine.
The Internet Archive has been saving web pages for 20 years. This archive has been built by thousands of people, and we would like you to help. Use one of the methods above to make sure we have the pages you care about.
The Internet Archive is collecting webpages from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts. Some have asked if we ignore URL exclusions expressed in robots.txt files.
The answer is a bit complicated. Historically, sometimes yes and sometimes no; but going forward the answer is “even less so.”
Robots.txt files live on the top level of a website at a url like this: https://example.com/robots.txt. This standard was developed in 1994 to guide search engine crawlers in a variety of ways, including some areas to avoid crawling. This standard is used by Google, for instance.
These files were useful 20 years ago for the Internet Archive’s crawlers, but have become less and less so over the years because many sites have not actively maintained the files from the point of view of archiving. Also, large websites or hosted websites often do not make it easy for their users to edit these files, and large websites increasingly guide or block crawlers with technological measures. Another problem is knowing when a domain name changes hands, so a current robots.txt file is not relevant to a different era. As time has gone on, for those who want to exclude their sites we encourage webmasters to send exclusion requests to email@example.com and encourage them to specify what time period they apply to.
Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies. We have had little or no negative feedback on this, and little or no positive feedback — in fact little feedback at all. The Wayback Machine has also been replaying the captured .gov and .mil webpages for some time in the beta wayback, regardless of robots.txt.
Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future.
We have been loaning digital books through Open Library since 2010. We started with about 10,000 books in the lending collections, and soon there will be more than 500,000 books available.
Today we launch lending on Archive.org, so patrons no longer need to go to Open Library to borrow books. The same parameters for borrowing apply — books are free to borrow for logged in users, and they can be borrowed for a period of 2 weeks.
For Archive.org users, you’re going to see many more modern books available in the coming weeks. These books will appear in collections and search results with a blue “Borrow” notice on them.
Logged in users will be able to borrow the book from the book’s details page where you see the full metadata. Remember, creating an account on archive.org is free, and so is borrowing books.
When you click “Borrow This Book” you will be taken to the new bookreader. You can search, use the read aloud feature, zoom in and out, and change the number of pages you see at once. The book will be available in your browser for 2 weeks as long as you are connected to the Internet.
If you prefer to read your book offline, you can download a PDF or EPUB version of the book to be read in Adobe Digital Editions (free download). You must install Adobe Digital Editions before you can read the offline version of your book.
When you want to return the book, you can return it from Adobe Digital Editions (if you chose to download) and from the bookreader.
In addition to the new borrow features, we have updated the bookreader to display better on mobile devices. The layout now changes when you are on a very small screen in order to make it easier to use. You will see one page at a time, and some of the functions are located in the menu on the left.
If you would like to download an offline copy of the book accessible through Adobe Digitial Editions (don’t forget to download the app first!) open the menu and choose “Loan Information.”
From here you can download a PDF or EPUB to read offline, or return the book.
We hope you will explore the books available for lending, and enjoy the features of the new bookreader.
Many thanks to: Richard Caceres, Brenton Cheng, Carolyn Li-Madeo, Tracey Jaquith, Jessamyn West, Jeff Kaplan, John Lekashman, Dwalu Khasu, John Gonzalez and Alexis Rossi.
The new version of the archive.org site has been evolving over the past 6 months in response to the feedback we’ve received from thousands of our awesome users.
If you haven’t been following along, you can review a little bit of the journey through these blog posts:
- Building Libraries Together: New Tools for a New Direction (10/28/14)
- Redesigning Archive.org (11/5/2014)
- What’s New with V2 (2/12/2015)
Why change the site at all? The posts above help answer that, but in brief:
- 35% of our ~3 million daily users are on mobile/tablet devices, and the classic site is not easy to use on small formats.
- The new tools we want to offer our users would be difficult to implement in the old site architecture.
- The classic site was built a long time ago, using methods that are outdated. Finding programmers who have the skills to work in that environment is becoming increasingly difficult, and the ramp up time for new employees is painful. The redesign has given us an opportunity to start pulling the front end (what you see) apart from the back end, so they can evolve separately.
Currently about 85% of archive.org users are in the new version. Over the next few weeks we will be asking the remaining 15% to try it out. For the time being, users will be able to exit the new archive.org and return to the “classic” version — but the classic will not always be available or supported, so please give the new version a try and give us feedback if there are things on the site that you don’t like, can’t find, or that seem like bugs. (When you click “exit” you will have an opportunity to give us feedback.)
We have made several video tours that introduce you to the new site. I recommend starting with the site tour, below.
In the past few months we have received more than 16,000 feedback emails from people using the new version. The redesign team reads every single one of them. Some just say, “I love it!” and some immediately say, “I hate it!” But a great many of you have also taken the time to share a little more – something you missed from the old site, a question about the new tools, concern about accessibility, suggestions for how to adjust things, etc.
We took that input — along with information from user tests, interviews with some of our power users, chats with partners — and tried to identify areas of the interface that seemed to be working well, and other areas that were not.
The evolution of downloading files from items is a great example of the process we’ve been following. The original design for item pages de-emphasized download as a feature. Our conversations with users told us that most people wanted to hit a play button, not download a file.
You could still download in the original design, of course, but you had to click a button to get options and then click again if you wanted specific files.
But when we opened the new site up to more users, we got many comments from people who either disliked the extra clicking, didn’t like leaving the page to get individual files, didn’t understand what the options represented, or couldn’t find the download options at all.
The first thing we tried was just opening up the download menu by default. Instead of just seeing the black download button on the page, you now also saw a menu of options. More people saw the download, but feedback made it clear that users still had issues.
We thought perhaps if we increased the visibility of the download options by turning the Download header blue that people would see it faster. We did an A/B test with 50% of users seeing each option — neither option really won. And the feedback about this feature continued to be negative.
It became clear that we needed to rethink the design of the download options all together, trying to keep it clean-looking and easy to use while also satisfying the concerns of our most advanced users.
We set some goals for the download changes based on the feedback we had received:
- must be able to download an individual file without leaving the item page
- if there is only one file in a particular format, you should only need one click to download it
- improve the ability to download groups of files (e.g. “just give me all the FLAC files”)
The current version of downloads allows you to consume individual media files without leaving the page and gives you a lot more options for downloading groups of files from an item. Since we released the new Download Options feature, the negative feedback about this feature has dropped off almost entirely. So we think we’re on the right track! We have created a short video tour for the downloads feature if you want to learn more.
The download changes are just one example of how much your feedback has helped us identify areas of confusion on the site and understand how to improve things. Here are a few more examples:
- A-Z filters available when sorting by title or creator
- fixes to improve software emulation
- default search results to List view (instead of image-based Thumbnail view)
- pull user page images from gravatar if available (if user has not uploaded one)
We have a lot more in store for the new site – better accessibility for sight disabled people, tools for creating your own collections, improved playback for multimedia items, etc. As these features trickle into the site, we hope you will continue to share your questions and ideas with us – you are truly helping us to make the archive a better place for everyone.
This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.
As many of you have already seen, we are working on the next generation of the archive.org web site, which we call Version 2.0 (v2). It’s in beta right now, so go check it out!
We get a lot of feedback from the people who have elected to try out v2, and we read ALL of it. As themes emerge about what people are having trouble with, we make changes to the design and then we pay attention to subsequent feedback to try to gauge whether we solved the problem (or not).
The goal of this redesign is to make the site more inviting and easier to use. Right now our work is focused on how the site looks and how things are organized on the page. For the most part, everything that is available to you in Version 1 (v1) of the site is available to you in v2 – but those things may be in different places!
We have a lot of long-time users of the site, and we know that any major changes will cause them to have to relearn where things are and how to accomplish the things they already know how to do on v1. This kind of major change can be very annoying, so we’re working hard to make sure you only need to relearn things once. While we will be adding more features as time goes by, we expect those changes to be incremental and not to affect the basic layout of pages.
If you’ve been using v2, you’ve probably noticed some changes over the last few weeks. I’ll discuss some of those changes here, and some of them are highlighted in the included images.
Volume information. We have a lot of journals and books with Volume information that was not showing in search, collection or account pages. The volume information is now prepended to the title for easier visual scanning within a collection.
Live Music. Rights information for a collection is now displayed on the About tab. We also changed the way shows are described in band collections to list the date and venue before the band name, making it easier to visually scan the items in a collection.
Mobile. On most mobile devices we decreased the initial number of search results from 50 to 25 in order to lighten the page load time.
Collection description. The description area for the collection at the top of the page has been shortened. We encourage collection builders to add useful descriptions, and you can see the additional information in the new About tab.
About tab. The About tab replaces the Contributors tab. We wanted to have a place for all of the information about a collection, and “Contributors” didn’t cover it. The new About tab contains the longer description for a collection, rights information (when it exists), data about how many reviews and forum posts are in that collection, and the content from the previous Contributors tab – the collection creator, people who have added to the collection, and charts for Views and Items over time. You will also find related collections listed on the About tab below the graphs. Parent collections and subcollections still show up in the Collections tab, since they are part of a collection’s direct hierarchy.
Collection tab. The Collection tab has a few changes as well. In list view, you can now “show details” for each item if you want to see more information.
Additional collections. If an item belongs to more than one collection, you can choose to view those additional collections.
Stream only. When an item is not available for download, you will see a “Stream Only” notification where the “Download” button normally appears. We made some visual changes to this notification to make it seem less button-like.
See All Files. In the “see all files” view, “playable” media files are pushed to the top, just under the “all files” options for torrent and zip. Files are grouped logically, with the original first and bolded and the derivative files listed below.
User Account Page
Uploads. Your Uploads tab has a new “Upload” tile in it, just to make uploading easier to find. You can still upload from anywhere on the site by clicking the upload icon at the top of the page, of course.
Favorites. Your Favorites list (called bookmarks in v1) will now display your favorites sorted by “date favorited” so that you can see your most recently favorited items first.
As always, please use the Beta feedback link in the top right corner to let us know what you think. Is everything awesome? Are you confused about where to find something? Tell us!
If you’re interested in a more detailed running log of changes from our lead developer, Tracey Jaquith, you can get the “nerd version” here: https://archive.org/CHANGELOG.txt
This project receives support from the John S. and James L. Knight Foundation’s Knight News Challenge.
In the interest of transparency, we want to show you exactly what the change is.
We have made small changes in paragraphs two and three of the terms. The previous version of these sections is in red below:
“…You agree not to interfere with the work of other users or Archive personnel, servers, or resources. Further, you agree not to recirculate your password to other people or organizations or to copy offsite any part of the Collections without written permission. Please report any unauthorized use of your password promptly to firstname.lastname@example.org…
“…You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive’s Collections will be noncommercial and will be limited to noninfringing or fair use under copyright law. In using the Archive’s site, Collections, and/or services…”
This is the new version with the changed portion in green type:
“…You agree not to interfere with the work of other users or Archive personnel, servers, or resources. Further, you agree not to recirculate your password to other people or organizations. Please report any unauthorized use of your password promptly to email@example.com…
“…You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive’s Collections will be limited to noninfringing or fair use under copyright law. If a Creative Commons or other license has been declared for particular material on the Archive, to the extent you trust the declaration and declarer (which is rarely the Internet Archive), you may use the content according to the terms and conditions of the applicable license. In using the Archive’s site, Collections, and/or services…”
Thank you for continuing to use the amazing resources housed in the Internet Archive.
UPDATE 12/31/14: The change on 12/30 applied to the language in the third paragraph of the terms. On 12/31 we made an additional small change to the language in the second paragraph, and modified the text of this post to reflect both changes.