Author Archives: Brewster Kahle

Discover Books Donates Large Numbers of Books

discoverbooksInternet Archive is proud to partner with Discover Books, a major used book seller, to help let the stories in books live on.   Discover books is donating books that the Internet Archive does not yet own and would have gone to a landfill.   Through this process the Internet Archive has more books to digitize and preserve.

Together we are giving books the longest life possible both in print and online.

Thank you to discoverbooks.com.

Upcoming changes in epub generation

Epub is a format for ebooks that is used on book reader devices.   It is often mostly text, but can incorporate images. The Internet Archive offers these in two cases:  when a user uploads them, and when they are created from other formats, such as scanned books or uploaded PDFs that were made up of images of pages.

The Internet Archive creates them from images of pages using “optical character recognition” (OCR) technology. This is then reformatted into the epub format (currently epub v2). These files are sometimes created “on-the-fly” and sometimes created as files and stored in our item directories.   All “on-the-fly” epubs use the newest code, where stored ones use the code available at the time of generation.

Based on a change in the format from our OCR engine last August, many of the epubs generated between then and last week have been faulty. Newly generated epubs are now fixed, and we will soon be going back to fix the faulty ones that were stored. We have also discovered that some of the older epubs have also been faulty, and it is difficult to know which.

To fix this we are shifting to the “on-the-fly” generation for all epubs so that all epubs get the newest code.   This is how we already generate daisy, mobi, and many zip files as well.   To access the epubs for the books we have scanned the URL is https://archive.org/download/ID/ID.epub, for instance https://archive.org/download/recordofpennsylv00linn/recordofpennsylv00linn.epub.

More generally, to find when an epub can be generated, for items that do not have a field the ocr field in meta.xml, that says “language not currently OCRable”, and there is a file an abbyy format file will be in an item. For instance, in an item’s file list, the presence of an abbyy file downloadable at  http://archive.org/download/file_abbyy.gz will mean a corresponding epub file can be downloaded at http://archive.org/download/file.epub.

Getting back to “View Source” on the Web: the Movable Web / Decentralized Web

The Web 1.0 moved so fast partly because you could “View Source” on a webpage you liked and then modify and re-use it to make your own webpages. This even worked with pages with JavaScript programs—you could see how it worked, modify and re-use it. The Web jumped forward.

Then came Web 2.0, where the big thing was interaction with “APIs” or application programmable interfaces.  This meant that the guts of a website were on the server and you only got to ask approved questions to get approved answers, or it would specially format a webpage for you with your answer on it.   The plus side was that websites had more dynamic webpages, but learning from how others did things became harder.

Power to the People went to Power to the Server.

Can we get both?  I believe we can, and with a new Web built on top of the existing Web.  A “decentralized web” or a “movable web” has many privacy and archivability features, but another feature could be knowledge reuse.  In this way, the set of files that make up a website—text/HTML, programs, and data—are available to the user if they want to see them.

The decentralized Web works by having a p2p distribution of the files that make up the website, and then the website runs in your browser.  By being completely portable, the website has all the pieces it needs: text, programs, and data.  It can all be versioned, archived, and examined.

[Upcoming Summit on the Decentralized Web at the Internet Archive June 8th, 2016]

For instance, this demo has the pages of a blog in a peer-to-peer file system called IPFS, but also the search engine for the site, in JavaScript, that runs locally in the browser.    The browser downloads the pages and JavaScript and the search-engine index from many places on the net and then displays in the browser.  The complete website, including its search engine and index, are therefore downloadable and inspectable.

This new Web could be a way to distribute datasets because the data would move with programs that could make use of it, thus helping document the dataset.  This use of the decentralized Web became clear to me by talking with the Karissa McKelvey and Max Ogden of the DAT Data project working on distributing scientific datasets.

What if scientific papers evolved to become movable websites (or call them “distributed websites” or “decentralized websites”)?  That way, the text of the paper, the code, and the data would all move around together documenting itself.  It could be archived, shared, and examined.

Now that would be “View Source” we could all live with and learn from.

Save our Safe Harbor: Submission to Copyright Office on the DMCA Safe Harbor for User Contributions

lighthouseThe United States Copyright Office is seeking feedback on how the “notice and takedown” system created by the Digital Millennium Copyright Act, also known as the “DMCA Safe Harbors,” is working. Congress decided that in this country, users of the Internet should be allowed to share their ideas with the world via Internet platforms. In order to facilitate this broad goal, Congress established a system that protects platforms from liability for the copyright infringement of their users, as long as the platforms remove material when a copyright holder complains. The DMCA also allows users to challenge improper takedowns.

We filed comments this week, explaining that the DMCA is generally working as Congress intended it to. These provisions allow platforms like the Internet Archive to provide services such as hosting and making available user-generated content without the risk of getting embroiled in lawsuit after lawsuit. We also offered some thoughts on ways the DMCA could work better for nonprofits and libraries, for example, by deterring copyright holders from using the notice and takedown process to silence legitimate commentary or criticism.

The DMCA Safe Harbors, while imperfect, have been essential to the growth of the Internet as an engine for innovation and free expression. We are happy to provide our perspective on this important issue to the Copyright Office.

Next Librarian of Congress: Carla Hayden

Carla Hayden

Carla Hayden

The President has nominated Carla Hayden to be the next Librarian of Congress.    I have met her through IMLS and support her for this position.

As a public librarian, she can bring an access and public service orientation to a position that has traditionally been focused on Congress’ needs and collecting valuable materials.

The Library of Congress is both a powerful symbol and a fabulous organization.   Its collections are unbelievable– there are employees in Cairo and Delhi collecting the best that humanity has produced. The Library has high collecting standards and has resisted restrictions from being put on access.

For instance, the Library of Congress has actively pursued web archiving since 2000 and made these collections more available than almost any other institution. As the home of the US Copyright Office, the Library can keep the constitutional balance in mind as copyright laws evolve.

All of these features of the Library play into the strengths of Carla Hayden who can help shape a potent institution for our new century.

-brewster

 

Internet Credit Union 2011-2015: RIP

[background, NYTimes]

Internet Credit Union board and staff in April 2014.

Internet Credit Union board and staff in April 2014.

Rest in Peace

[previous, NYtimes]

Dear National Credit Union Administration,

You win, we lose.
This is our notice of Voluntary Liquidation of the Internet Credit Union.
We write this hoping many of you will read it.

You did not need to crush this credit union.
You did not need to make it take over 18 months to charter, forcing 5,000 changes to our application documents.
You did not need to restrict our total loan portfolio to $37,000 when we had $1,000,000 in reserve for bad loans.
You did not need to keep us from lending $8,000 to a student, forcing him from college.
You did not need to keep us from originating mortgages for permanently affordable housing.
You did not need to keep us from working with other credit unions on their loans and our loans.
You did not need to force us to revoke the membership of our migrant farm worker members.
You did not need to send people into our offices every month for 2 years taking up our time.
You did not need to treat us like children. And yes, your people said they were “treating us like children” in a meeting with our board.
You didn’t need to laugh when Occupy sent you an application to start a credit union, snarking
”That is not going to happen.” And yes, an NCUA official said that while we were in your offices in DC.
You do not need to overreact every time the Federal Reserve sends you a letter like you did over bitcoin.
You do not need to sow fear into every small and medium sized credit union with your subjective rating system.

You can stop hurting, and start helping.

You need to stand up to the banks that want our membership rules to be arbitrary and shifting under us.
Technology has made starting and running a Credit Union easy.
We need you to not shut down 200 to 300 Credit Unions a year.
We need you to not start 1 or 2 Credit Unions a year, but start 500 credit unions a year (as used to happened before the NCUA existed).
You can create a 5 page application, that any group can submit in a month to get started.
You can leave new credit unions alone for a couple of years.
Clean house of your agents of shutdown, and replace them with agents of start-up.
Thousands of communities are not geographically clumped, let them start credit unions.
27% of our citizens are “underbanked” – your organization is part of the reason for this.
We need a distributed and robust banking system for deposits, transactions, and grassroots credit.
NCUA: Please fix yourself.

We need you… We need you to change.

Your Sincerely,
Members of the Board of the, now dead, Internet Credit Union

Internet Credit Union 2011-2015

sources: Credit Union National Association, NCUA (via the Wayback Machine)

sources: Credit Union National Association, NCUA (via the Wayback Machine)

How You Can Put Knowledge into the Hands of Millions

Dear Friends,

Brewster Older PhotoToday is #GivingTuesday, the one day you are encouraged to give to your favorite charities. This GivingTuesday, I hope the Internet Archive will be at the top of your list. By giving a small amount, you can put knowledge in the hands of millions of people, for years and years.

I’ve always believed in public libraries. Now is no different. We need a library for the digital generation. A special place we can go to learn and explore. That’s why I founded the Internet Archive—to give everyone access to our cultural treasures. Forever. For free.

I made it a non-profit because this library is powered and enabled
by everyone else. By those who are building the collections and those who are using the collections. Other people are not working for us, we are working for them. I thought a non-profit was the right way to do that.

In our Wayback Machine, we’re saving one billion Web captures each week. People download 20 million books on our site each month. The key is to keep improving—and to keep it free. That’s where you can help us.

The InterHands 2net Archive is a non-profit library built on trust. Reader privacy is very important to us, so we don’t run ads that track your behavior. We don’t sell your personal information. But we still need to pay for the increasing costs of servers, staff and rent.

This is the one time of year I ask you to help keep the Internet Archive free and free of ads. Please consider donating $25, $50, $75 or whatever you can afford. It’s is a small amount to inform millions. Help us do more. I promise you–it’s money well spent.

Thank you.

Brewster Kahle

Founder & Digital Librarian, Internet Archive

Difficult Times at our Credit Union

Brewster Kahle, Chairman of the Internet Archive Federal Credit Union, November 2015

[NYtimes story, Motherboard, BoingBoing. Liquidation.]

All deposits are safe, all loans performing, great and dedicated staff, wonderful members. So why difficult? Despite five years of effort, $1 million in donations spent (from the Internet Archive) and $1 million in the bank to back any bad loans (from the Kahle/Austin Foundation), we are only further from our goal: To create a financial institution that can justly serve our communities. It now looks likely that overwhelming regulatory burden will force us to give up our quest. But don’t worry, even in this case we have more than enough money for all depositors and can place our few outstanding loans. So all is safe, but we thought we should give an update.

credit-union-atm

Started in New Brunswick New Jersey in January 2011 and then chartered 19 months later, we invested in growing our membership based on a dream of a new kind of credit union, but now our membership is shrinking because the regulators (the National Credit Union Administration, or NCUA) kept tightening our requirements for membership. Also, the services they allow us to offer have been restricted to payday-like loans and small car loans– ones we were not excited to offer in the first place and not lucrative enough to break even. So now we serve only about 100 members and a total of about 400 account holders, and we are not even serving them in ways we wanted to. We wanted to do 3 types of things, none of which we are succeeding at, which makes losing money even harder to endure.

Brewster Kahle and Jordan Modell, at our grand opening celebration November 2012.

Brewster Kahle and Jordan Modell, at our grand opening celebration November 2012.

We wanted to help the under-served, but the restrictions made this too difficult. We tried to offer student loans, but we were limited to lending only $5,000. This was a particular problem when, for example, an under-documented local Rutgers student with a 700+ credit score and a part-time job needed $8,000 to stay in school but others would not help him. We sought an exception from the NCUA, but they said no. In another case, we worked with a migrant farm workers association to offer their members access to the credit union. We set up a system that allowed them to send money back home with much lower fees than organizations such as Western Union. We had set up services to help undocumented workers so they could pay their fair share of taxes and put them on a path to citizenship. But after ruling that we could accept members of this migrant farm workers as members then a lower level examiners reversed this decision and we had to move all of those members to non-members effectively killing our relationship with the migrant farm workers association. Also, our members wanted to send outgoing wire transfers but the NCUA would not allow it resulting in many members leaving. You probably get the idea, we certainly did.

Internet Credit Union board and staff in April 2014.

Internet Credit Union board and staff in April 2014.

We wanted to create permanently affordable housing by offering targeted mortgages. With the banking crisis leading to millions going into foreclosure, we thought we could find ways to help. With abundant capital for our credit union (in this case donations) and experience from Jordan Modell, a banker of over 20 years, we built a great team, board, and partnerships, we gave it a whirl. We were encouraged by how generous the other credit unions and community members were. But as I said, the regulators never let us lend more than $5,000 to anyone much less originate mortgages. The Internet Archive has made progress anyway in affordable housing by starting a “Foundation House,” but unfortunately the credit union is not allowed to build on this example.

We also wanted to make a model so that thousands of credit unions could be started to serve their local communities as happened in the 1930’s and 40’s. Our CEO, Jordan Modell, wrote a blog so others might learn from us and spent hundreds of hours with others wanting to start credit unions. But even though it is logistically easy to start and run a fully functional credit union given the technology back-end services available today, the regulators made it very hard to succeed. After a year and a half of full time work on our application, and their demands for 5,236 changes (really) to our application documents, we were learning they were not interested in new credit unions. We found that generally only a handful of new federal credit unions are allowed to start each year. One of the four our year was a Navajo credit union that spent 44 months getting through the process. We were stunned to find we were the first full service credit union chartered in New Jersey since the NCUA was formed 1970.

After years of losing money I asked our board, which included several that ran credit unions, “How do we break even?”  They said we should find those that need services that are financially lucrative, and they suggested we look to our unique relationship with the Internet. Since we were prevented from making significant loans, we thought that maybe we could get enough deposits and invest them in CDs and wait for the NCUA to let us serve the communities. Fortunately there was an opportunity was opening up. In 2013 bitcoin firms were in the news and banks were closing their accounts. The Internet Archive had some experience with bitcoin because people had donated bitcoins for a couple of years. The Internet Archive used them as partial pay their interested employees, to buy books at the neighborhood bookstore, and buy sushi next door. I suggested the credit union present at a bitcoin conference, which was well received.  Our credit union asked permission from the NCUA to bank bitcoin companies and they granted it. We opened accounts for three small firms. All good– until it wasn’t. The NCUA suddenly demanded we close the accounts. So we reluctantly closed them forcing one of the companies into bankruptcy. The NCUA suggested we open accounts for the individual customers of one of the failed firms so they could receive their money. But then the NCUA kept auditing and investigating us at a level that often took more hours than what we spent on all member services combined. They have been in our branch now around once a month for 2 years, driving up our costs and driving down our services.

sources: Credit Union National Association, NCUA (via the Wayback Machine)

sources: Credit Union National Association, NCUA (via the Wayback Machine)

 

 

I don’t think it is just us. For sure, we made mistakes but also had unusual advantages: experienced banker CEO, almost unlimited capital, and a market that wanted alternative banking options. I now believe it is not just us, because 200 to 300 credit unions are shut down every year, many of which by the NCUA which was started in 1970. Only a few are allowed to start. All the while, it has never been easier to create and operate a small full-service credit union, complete with debit cards, ATM’s, and online banking. We have heard many tales from other credit unions and the associations that try to help new ones that echo our experiences. “By any measure, the future for small credit unions looks bleak,” says the Financial Brand. We now know first hand how they go after small and medium sized credit unions and force them to merge their assets into bigger credit unions. If you have an account in a credit union, especially a small or medium sized one, I would worry that they will go after yours.

I told my tale of woe to a friend, John Markoff, at a party and he suggested I tell it to another New York Times reporter Nathaniel Popper, who turned out to be interested. Few go to the press because there is little upside as the regulators hold absolute power and could react negatively to critical press.

We decided to go to the press as part of our original idea– share our experience so others may learn from us. Unfortunately, we do not have a successful model for others to copy. It may have just been us, but I don’t think so. The United States may not be the place to sustain a grassroots community banking system, at least one that has anything to do with the existing regulators. Maybe the regulators will change, and there are some bringing up issues. Maybe people will build a new system, but so far, the US regulators are aggressively resistant. Maybe some other country will be interested in new ideas and welcome entrepreneurs. Maybe other credit unions that have felt crushed by the regulators will come forward and tell their stories creating momentum for change. I see a system as unhealthy if regulators put 200 to 300 institutions out of business every year for decades on end while only allowing a few to start.

Part of the reason the regulators may act this way is how technically insecure the money system is. As an engineer, when I looked at how the transaction systems work, I was shocked to see few technological safeguards. I imagine there is major fraud activity. Ironically, the bankers and regulators need exactly the technologists that they are pushing away.

All in all, we are sad. Many people have spent years building a new credit union and we have little to show for it. We had hopes. When I was young I had a passbook from my village’s savings and loan– they helped me save my paperboy money so I could spend it on my stamp collection. But through the 80’s I saw the regulators, and the de-regulators, take our beloved savings and loans across the country and roll them up and blown them up– mine was gone in 1989. I wonder if this is what is happening to our small and medium sized credit unions.

More as it happens, but these are difficult times for our credit union. Thank you for all of the help.

 

internet-credit-union-staff-4-2014

Appendix:

Number of Credit Unions in the United States and the number change each year (NCUA started in 1970). Source: CUNA.

Year
Number Yearly Change
1939 8,035
1940 9,224 1,189
1941 10,316 1,092
1942 10,272 -44
1943 10,158 -114
1944 8,930 -1,228
1945 8,823 -107
1946 8,944 121
1947 9,130 186
1948 9,320 190
1949 10,062 742
1950 10,586 524
1951 11,278 692
1952 12,280 1,002
1953 13,690 1,410
1954 15,067 1,377
1955 16,192 1,125
1956 17,246 1,054
1957 18,191 945
1958 18,860 669
1959 19,512 652
1960 20,094 582
1961 20,604 510
1962 20,984 380
1963 21,363 379
1964 21,800 437
1965 22,109 309
1966 22,680 571
1967 23,029 349
1968 23,420 391
1969 23,866 446
1970 23,687 -179 Year the NCUA Started
1971 23,267 -420
1972 23,098 -169
1973 22,982 -116
1974 22,940 -42
1975 22,677 -263
1976 22,581 -96
1977 22,382 -199
1978 22,203 -179
1979 21,981 -222
1980 21,465 -516
1981 20,784 -681
1982 19,897 -887
1983 19,095 -802
1984 18,375 -720
1985 17,654 -721
1986 16,928 -726
1987 16,274 -654
1988 15,709 -565
1989 15,121 -588
1990 14,549 -572
1991 13,989 -560
1992 13,385 -604
1993 12,960 -425
1994 12,551 -409
1995 12,230 -321
1996 11,887 -343
1997 11,659 -228
1998 11,392 -267
1999 11,016 -376
2000 10,684 -332
2001 10,355 -329
2002 10,041 -314
2003 9,709 -332
2004 9,346 -363
2005 9,011 -335
2006 8,662 -349
2007 8,396 -266
2008 8,089 -307
2009 7,830 -259
2010 7,605 -225
2011 7,351 -254
2012 7,070 -281
2013 6,795 -275
2014 6,513 -282

Experimenting with One Million Album Covers

Rising to the challenge to create an image search engine using a corpus of one million album covers,  Professor Trenary of Western Michigan University lead a class project that found many exact matches (same file) and many near matches.

Their algorithm matched some that were not the same because it used rough shape matching, and many images were just of the CD or LP label which matched.

Screen Shot 2015-06-30 at 6.49.39 PM

While not at a point of being ready for production use for the Archive, they wrote a nice report on their findings that might be useful to others.   The Internet Archive hopes to enable many more studies using the data in the collection.

Thank you to Brandon Arrendondo,  James Jenkins, Austin Jones, and Professor Trenary.