Author Archives: Brewster Kahle

5,000 78rpm sides for the Great 78 Project are now posted

From the David Chomowicz and Esther Ready Collection.
Click to listen.

This month’s transfers of 5,000 78rpm sides for the Great 78 Project are now posted.

Many are Latin American music from the David Chomowicz and Esther Ready Collection.

Others are square dance music, with and without the calls, from the Larry Edelman Collection. (Thank you David Chomowicz, Esther Ready, and Larry Edelman for the donations.)

We are still working on some of the display issues with this month’s materials, so some changes are yet to come.

From the Larry Edelman Collection.
Click to listen.

Unfortunately we have only found dates for about 1/2 of this month’s batch using our automatic techniques of looking through 78disography.com, 45worlds, discogs, DAHR, and full text searching of Cashbox Magazine.  There are currently over 2,000 songs with missing dates.

If you like internet sleuthing, or leveraging our scanned discographies or your discographies and would like to join in on finding dates and reviews, please jump in. We have a slack channel of those doing this.

Congratulations to B George’s group, George Blood’s group, and the collections group at the Internet Archive for another large batch of largely disappeared 78’s.

Books from 1923 to 1941 Now Liberated!

[press: boingboing]

The Internet Archive is now leveraging a little known, and perhaps never used, provision of US copyright law, Section 108h, which allows libraries to scan and make available materials published 1923 to 1941 if they are not being actively sold. Elizabeth Townsend Gard, a copyright scholar at Tulane University calls this “Library Public Domain.”  She and her students helped bring the first scanned books of this era available online in a collection named for the author of the bill making this necessary: The Sonny Bono Memorial Collection. Thousands more books will be added in the near future as we automate. We hope this will encourage libraries that have been reticent to scan beyond 1923 to start mass scanning their books and other works, at least up to 1942.

While good news, it is too bad it is necessary to use this provision.

Trend of Maximum U.S. General Copyright Term by Tom W Bell

If the Founding Fathers had their way, almost all works from the 20th century would be public domain by now (14-year copyright term, renewable once if you took extra actions).

Some corporations saw adding works to the public domain to be a problem, and when Sonny Bono got elected to the House of Representatives, representing Riverside County, near Los Angeles, he helped push through a law extending copyright’s duration another 20 years to keep things locked-up back to 1923.  This has been called the Mickey Mouse Protection Act due to one of the motivators behind the law, but it was also a result of Europe extending copyright terms an additional twenty years first. If not for this law, works from 1923 and beyond would have been in the public domain decades ago.

Lawrence Lessig

Lawrence Lessig

Creative Commons founder, Larry Lessig fought the new law in court as unreasonable, unneeded, and ridiculous.  In support of Lessig’s fight, the Internet Archive made an Internet bookmobile to celebrate what could be done with the public domain. We drove the bookmobile across the country to the Supreme Court to make books during the hearing of the case. Alas, we lost.

Internet Archive Bookmobile in front of
Carnegie Library in Pittsburgh: “Free to the People”

But there is an exemption from this extension of copyright, but only for libraries and only for works that are not actively for sale — we can scan them and make them available. Professor Townsend Gard had two legal interns work with the Internet Archive last summer to find how we can automate finding appropriate scanned books that could be liberated, and hand-vetted the first books for the collection. Professor Townsend Gard has just released an in-depth paper giving libraries guidance as to how to implement Section 108(h) based on her work with the Archive and other libraries. Together, we have called them “Last Twenty” Collections, as libraries and archives can copy and distribute to the general public qualified works in the last twenty years of their copyright.  

Today we announce the “Sonny Bono Memorial Collection” containing the first books to be liberated. Anyone can download, read, and enjoy these works that have been long out of print. We will add another 10,000 books and other works in the near future. “Working with the Internet Archive has allowed us to do the work to make this part of the law usable,” reflected Professor Townsend Gard. “Hopefully, this will be the first of many “Last Twenty” Collections around the country.”

Now it is the chance for libraries and citizens who have been reticent to scan works beyond 1923, to push forward to 1941, and the Internet Archive will host them. “I’ve always said that the silver lining of the unfortunate Eldred v. Ashcroft decision was the response from people to do something, to actively begin to limit the power of the copyright monopoly through action that promoted open access and CC licensing,” says Carrie Russell, Director of ALA’s Program of Public Access to Information. “As a result, the academy and the general public has rediscovered the value of the public domain. The Last Twenty project joins the Internet Archive, the HathiTrust copyright review project, and the Creative Commons in amassing our public domain to further new scholarship, creativity, and learning.”

We thank and congratulate Team Durationator and Professor Townsend Gard for all the hard work that went into making this new collection possible. Professor Townsend Gard, along with her husband, Dr. Ron Gard, have started a company, Limited Times, to assist libraries, archives, and museums implementing Section 108(h), “Last Twenty” collections, and other aspects of the copyright law.

Prof. Elizabeth
Townsend Gard

Tomi Aina
Law Student

Stan Sater
Law Student

 

 

 

 

 

 

Hundreds of thousands of books can now be liberated. Let’s bring the 20th century to 21st-century citizens. Everyone, rev your cameras!

Why Bitcoin is on the Internet Archive’s Balance Sheet

 

A foundation was curious as to why we have Bitcoin on our balance sheet, and I thought I would explain it publicly.

The Internet Archive explores how bitcoin and other Internet innovations can be useful in the non-profit sphere– this is part of it. We want to see how donated bitcoin can be used, not just sold off. We are doing this publicly so others can learn from us.   And it is fun.  And it is interesting.

We started receiving donations in bitcoin in 2011, the first year we got about 2,700 and we sold them to an employee who was heavily involved (for the prevailing $2 per bitcoin). The next year, we held onto them and offered them to employees as an optional way to get their salary– ⅓ took some. We set up an ATM at the Internet Archive. We got the sushi place next door to take bitcoins, and encouraged our employees to buy books at Green Apple Books in bitcoin. We set up a vanity address. Started taking bitcoin in our swag store. Tried (and failed) to get our credit union to help bitcoin firms.

Another year we gave a small amount to people as an xmas bonus to those that set up a wallet (from a matching grant of bitcoins from me).

We paid vendors and contractors in bitcoin when they wanted it. Starting getting micropayments from the Brave Browser. Hosted a movie with filmmakers on living on bitcoin. We publicly tested if people are stealing bitcoins like the press was saying (didn’t steal ours).

A few years later, the price had gone up so much, I personally bought some at the going rate to decrease financial risk to the Internet Archive, but then I did not just cash those in for dollars. We may seem like we are geniuses, but we are not, we saw the price go down as well and we did not sell out then either.

Recently Zcash folks helped us set up a Zcash address, and would love people to donate there.

What we are doing is trying to “play the game” and see how it works for non-profits. It is not an investment for us, it is testing a technology in an open way. If you want to see the donations to us in bitcoin, they are here. Zcash here.

Bitcoin donations have been decreasing in recent years, which may reflect we are not moving with the times. I am hoping that someone will say, gosh, I will donate a thousand bitcoins to these guys who have been so good :). Here is to hoping.

So the Internet Archive has some bitcoin on its balance sheet to be a living example of an organization that is trying this innovative Internet technology. We do the same with bittorrent, tor, and decentralized web tech.

Please donate and we will put them to good use supporting the Internet Archive’s mission.

 

Dreaming of Semantic Audio Restoration at a Massive Scale

I believe we can do a fabulous job of bringing the music from the 78rpm era back to vibrant life if we really understand wear and if we could model the instruments and voices.

In other words, I believe we could reconstruct a performance by semantically modeling the noise and distortion we want to get rid of, as well as modeling the performer’s instruments.

To follow this reasoning—what if we knew we were examining a piano piece and knew what notes were being played on what kind of piano and exactly when and how hard for each note—we could take that information to make a reconstruction by playing it again and recording that version. This would be similar to what optical character recognition (OCR) does with images of pages with text—it knows the language and it figures out the words on the page and then makes a new page in a perfect font. In fact, with the OCR’ed text, you can change the font, make it bigger, and reflow the page to fit on a different device.

What if we OCR’ed the music? This might work well for the instrumental accompaniment, because then we would handle a voice, if any, differently. We could have a model of the singer’s voice based on not only this recording and other recordings of this song, but also all other recordings of that singer. With those models we could reconstruct the voice without any noise or distortion at all.

We would balance the reconstructed and the raw signals to maintain the subtle variations that make great performances.   This could also be done for context as sometimes digital filmmakers add in some scratched film effects.

So, there can be a wide variety of restoration tools if we make the jump into semantics and big data analysis.

The Great 78 Project will collect and digitize over 400,000 digitized 78rpm recordings to make them publicly available, creating a rich data set to do large scale analysis. These transfers are being done with four different styli shapes and sizes at the same time, and all recorded at 96KHz/24bit lossless samples, and in stereo (even though the records are in mono, this provides more information about the contours of the groove). This means each groove has 8 different high-resolution representations of every 11 microns. Furthermore, there are often multiple copies of the same recording that would have been stamped and used differently. So, modeling the wear on the record and using that to reconstruct what would have been on the master may be possible.

Many important records from the 20th century, such as jazz, blues, and ragtime, have only a few performers on each, so modeling those performers, instruments, and performances is quite possible.  Analyzing whole corpuses is now easier with modern computers, which can provide insights beyond restoration as well as understand playing techniques that are not commonly understood.

If we build full semantic models of instruments, performers, and pieces of music, we could even create virtual performances that never existed.  Imagine a jazz performer virtually playing a song that had not been written in their lifetime. We could have different musician combinations, or singers performing with different cadences. Areas for experimentation abound once we cross the threshold of full corpus analysis and semantic modeling.

We hope the technical work done on this project will have a far-reaching effect on a full media type since the Great 78 Project will digitize and hold a large percentage of all 78rpm records ever produced from 1908 to 1950.  Therefore, any techniques that are built upon these recordings can be used to restore many many records.

Please dive in and have fun with a great era of music and sound.

 

(we get a sample every 11microns when digitizing the outer rim of a 78rpm record at 96KHz.   And given we now have 8 different readings of that, with 24bit resolution, we hopefully can get a good idea of the groove.   There are optical techniques that are very cool, but those have their own issues, I am told

10″ * 3.14 = 31.4″ circumference = 80cm/revolution

@ 78rpm:  60 seconds/min / 78revolutions/minute = .77 seconds / revolution

80cm/rev   / (.77sec/rev)  = 104cm/sec

96Ksampes/sec

104cm/sec / (96ksamples/sec) = 11microns )

 

A Few Advanced Search Tips

The Internet Archive’s search engine is based on Elastic Search and implemented by Aaron Ximm.  Learning how to use the search engine can help using the website, but also using the command line tools for working with the Internet Archive.  Here are some tips.

It is capable of searching in just one collection:
https://archive.org/search.php?query=Casey%20Jones%20AND%20collection%3AGratefulDead

or with a particular field set, like just searching for Patsy Montana in the 78rpm collection:
https://archive.org/details/78rpm?and[]=creator:%22patsy%20montana%22

There is title, creator, date, year, description, and many other metadata fields that can be found by looking at a particular item’s metadata like so:
https://archive.org/metadata/78_give-me-a-home-in-montana_patsy-montana-the-prairie-ramblers_gbia0005195b/metadata

Searching for external-identifiers is tricky because of dealing with the embedded colons, which can throw off the parsing of the search string. If you’re looking for a specific full external-identifier, you can “escape” the colons by enclosing the target value in double quotes, like this:

https://archive.org/details/georgeblood?&and[]=external-identifier%3A%22urn%3Apubcat%3Ano-publisher%3A39981%22

but if you want to use a wildcard, you have to drop the double quotes. in that case, you need to remove any embedded colons by replacing them with `*`, like this:

https://archive.org/details/georgeblood?&and[]=external-identifier:urn*pubcat*no-publisher*399*

ISBN Searching: https://archive.org/search.php?query=isbn%3A9780964015319 but they can also be in related-external-identifiers if you want to find different editions that are fundamentally the same (thank you to oclc’s xisbn service for the help there).

LCCN searching: https://archive.org/search.php?query=lccn%3A94072390

OCLC numbers are in two places, but mostly: https://archive.org/search.php?query=oclc-id%3A31773958

Dates: If you want to find a book with a particular date in the date field: https://archive.org/details/Boston_College_Library?&and[]=date:1914

If you want to find all books that have a date in the date field: https://archive.org/details/Boston_College_Library?&and[]=date:*

All books that do not have any date field: https://archive.org/details/Boston_College_Library?&and[]=NOT%20date:*

You can also search by the number of bytes in an item: e.g.

https://archive.org/details/georgeblood?&and[]=item_size:[300000000%20TO%201000000000]%20AND%20publicdate:[2017-04-30%20TO%202099-01-01]

If you want to search the external identifier field, it is a bit tricky because it has “:” in the field.  So if you replace “?” it kind of works.   So these are the MGM records whose catalog numbers start with “30”:    https://archive.org/details/georgeblood?sort=-reviewdate&and%5B%5D=external%5C-identifier:urn?pubcat?mgm?30*

Books Donated for MacArthur Foundation 100&Change Challenge from BookMooch Users

Thank you, Richard from Georgia, for Theories of Development.

Thank you to the people that are starting to send books to the Internet Archive to be digitized.  The Internet Archive digitizes already, but as semifinalists for a $100million grant from the MacArthur Foundation, we are ramping up. Our proposal is to bring 4 million of the most beloved and important books to learners by helping all libraries become digital libraries.

Bookmooch is an online book exchange community whose members list what books they have and which books they want. When you send a book, you earn a point, to receive a book you spend a point. Some people have surplus points which they have generously donated to the Internet Archive to help us build our collection.

To start on our 4 million book quest we are looking at the most assigned books on course syllabi (as aggregated by the OpenSyllabus project).  We gave this list to the founder of BookMooch, John Buckman and he found 61,000 were held by community members and hundreds available right now.

Thank you, Suzanne from North Carolina, for Independence and Nationhood: Scotland, 1306-1469

The first books are starting to arrive, and there is much rejoicing!  Onward!

Thank you to Cindy from Massachusetts for Holy Land: A Suburban Memoir

Thank you, S Krashen, from California for Language Two

KittenFeed– sometimes a suddenly popular site leverages the Wayback Machine

From of the leader of the Wayback Machine project:

Lucy (17 year old girl from SF) setup TrumpScratch.com

Press reports suggest she received a cease and desist letter from the Trump organization in New York (update: now that letter is contested).

She changed the site to KittenFeed.com

At some point KittenFeed.com was re-directed to the Wayback Machine.

It overwhelmed our servers so our engineers re-configured our cache to support the 5 meg MP3 on the page (the Rick Roll audio).

KittenFeed.com appears now to have re-directed to facescratch.com, thereby not leveraging the Wayback Machine.

We found it interesting to see the Wayback Machine, meant for historical research, used on a live site.

Apple Pie Potluck and Constitutional Law Teach-In — Friday Feb 17th 5:30-9PM


Initial information — more details to come:
In honor of the General Strike:

Constitutional Law Teach-in at the Internet Archive with EFF and Others

EFF and other lawyers will lead a conversation about the current issues and threats in constitutional law. Focusing on specific sections and amendments we will talk about current cases on censorship, surveillance, search and seizure, and more.

Workshops on using encryption tools and maybe musical performances will accompany.
If you want to present, perform, or have other ideas, please email us.

When: Friday, February 17th 5:30pm-9pm (program 6-8)
Where: Internet Archive
300 Funston Ave. SF, CA 94118
Potluck-style: Please bring apple pie or other food
Reserve your free ticket here
Streamed via Facebook Live
Donations welcome

Lawyers Attending:

  • Cindy Cohn – Executive Director of EFF
  • Corynne McSherry – Legal Director of EFF
  • Victoria Baranetsky – First Look Media Technology Legal Fellow for the Reporter’s Committee for Freedom of the Press
  • Geoff King – Lecturer at UC Berkeley, and Non-Residential Fellow at Stanford Center for Internet and Society
  • Bill Fernholz – Lecturer In Residence at Berkeley Law

For those who cannot attend in person, we will stream the event on Facebook Live, so make sure you’re following us on Facebook.

Upgraded Secure Communications Applications I am Now Using

I am upgrading the security of my communications while still being easy to use. I thought I would share what I currently use in case it is helpful to copy and I would appreciate comments.

I want end-to-end encryption so nobody can intercept what I am saying (unless they have infected my phone or computer, but that is another issue), and bonus points for making it so that it is unknown who I am communicating with and when (private metadata and traffic). Skype, phonecalls, sms/texts, slack and email are now known to not be private (at least by default) thanks to Edward Snowden. This is too bad since I still use these. (Slack is not end-to-end encrypted even for direct messages, which it could and should.) So far I have only partially achieved the first step: end-to-end encryption. I am migrating to:

  • txt and sms replacement, somewhat phonecalls: Signal for point-to-point instant messaging replacing sms and skype. Free software, free of cost, and open source, works on smart phones.I have donated.
  • skype texting replacement: Signal for laptops and with a chrome-based desktop Signal app on my Mac (which is what I mostly use). It uses phone numbers as identifiers, which is kind of a pain. EFF friend called this “best of breed” for security. Small development staff.   There is a tip for updating it to have names rather than phonenumbers: go to the … menu, go to settings, at the bottom is update contacts.
  • skype video/slack audiovideo replacement:    appear.in for 1-on-1 and small group video chat that is end-to-end encrypted replacing Skype for me. This does not require a download or an account. Go to the homepage, type a bunch of characters to make a meeting room, then send the resulting url to someone and they can use that throw-away meeting room. Super easy. Uses webrtc (now standard in browsers), and https with it, they say it is end-to-end encrypted. They have a iphone app as well, but don’t know about security. This does not seemed designed for super high security, but seems to be pretty good.
  • webex replacement:   zoom.us for larger group video chats replacing Webex for me. Free of cost for most of my uses, easy to use (requires download, but is super easy) . It says it is end-to-end encrypted with a little lock icon when in use and encrypted.
  • Facetime occasionally on my iphone replacing cellphone calls to friends with an iphone. Apple says that it is end-to-end encrypted.
  • Thunderbird + Enigmail to sign all email, receive encrypted email, and sometimes sending encypted Email, with an organizational email server (archive.org not gmail). Enigmail is moderately hard to set up, I had help in a meetup. Cost free, and I believe free and open source software. I am donating.
  • encrypted notes file (the mac Notes app) on my mac for high priority secure notes. It syncs the encrypted file with my iphone via icloud.
  • Breadwallet, bitcoin wallet on my iphone, for small amounts of bitcoin for casual purchases. Super easy and a full wallet (does not hang off a server). Love this wallet. Cost free. I invested a tiny amount of money in the company– great guys.
  • Torbrowser for private web browsing beyond Firefox’s Private browsing feature. Free and open source software, cost free. I have donated.
  • On Macintosh os/x it’s easy to turn on full disk encryption (FileVault). Go to the “Security and Privacy” setting and turn on FileVault. If you do, be sure *not* to accept its offer to store the key in iCloud. Write down the “recovery key”, and hide it somewhere away from the computer. The security of this approach is based on the security of your normal login password, so if it’s lame, change it to something that can’t be guessed or brute forced easily.  (from a commenter, Eric Blossom)
  • Web search: DuckDuckGo or StartPage.com. (from a commenter, Reinout)

Any comments or ideas are welcome. I realize have traded off security for ease of use. I hope stronger tools get easier and I suggest we all invest in tools based on donations and development help. I wish I knew my mac and iphone were not compromised. Not sure how to do that.

I have tried ricochet as an instant messaging client that secures who I am talking to via Tor, easy to use, but few I know use it, so I don’t use it often. I have tried encrypting my email using pgp via enigmail but have run into trouble with others being able to read it, so I do not encrypt email by default. As an aside, encryption is related in a funny way to content-addressible systems, which is a different subject, but this is magic and the future.

(earlier version of this post is on http://brewster.kahle.org )

Micropayments to Archive.org by using the Brave Browser (and bitcoin)

I hope Ted Nelson is proud. The Internet Archive just signed up for getting micropayments from participating Brave Browser users.  Brave Browser is an alt-browser for controlling ads, mostly, but they added a micropayments feature (beta).

You need put in some bitcoin that will then be distributed to the sites you visit in a month. Cool! (they help you get bitcoin)

We don’t expect it will raise the money we need to make a copy of archive.org in Canada, but we are glad to participate in this program.  Thank you, Brave, and our intrepid users.