Author Archives: brenton

In Memoriam of Python 2

Generated by Bing Image Creator

Today, on the Day of the Dead 2023, we at the Internet Archive honor the death of Python 2. Having mostly emerged from one of the greatest software upgrade SNAFU’s in history—the migration from Python 2 to Python 3—we now shed a tear for that old version that served us so well.

When Python 3 was launched in 2008, it contained a number of significant improvements which nevertheless broke compatibility with the previous version of Python at the syntax, string-handling, and library level. As terrible as this sounds, breaking changes are fairly normal for a major software upgrade.

Rather, the chaos that followed was rooted in the fact that unlike most software transitions of this sort, it could not be done incrementally. Instead of being offered a way to gradually upgrade, remaining compatible with both versions and spreading the incremental costs over time, developers were given a risky all-or-nothing choice. The result has been a reluctant, glacial, expensive migration that continues to plague the world.

At the Internet Archive, we did not begin our migration in earnest until 2021, starting with Open Library and then this year focusing on Archive.org and its underlying services. However, we are now happy to declare migration of our core storage service, S3, which underlies all of the millions of items stored in the Archive, complete. We are grateful for the intensive efforts over many months by Chris, Scott, and Tracey, and everyone who supported them!

There are just a few more projects to go, but we are nearly there. And come our next OS upgrade, Python 2 will be but the whisper of a memory, preserved in the Archive and honored on a day like today. Rest in peace, Python 2. And please stay dead.

Farewell to IE11

At the end of the movie “Titanic,” from her makeshift raft Rose Calvert promises Jack Dawson, “I will never let go,” but then, well, a floating board is only so big…

On June 1, we will gently release Internet Explorer, version 11, from the list of browsers supported on our website Archive.org into the oceanic depths of the obsolete. To give you an idea of what this means to us, a member of the UX team composed this little remembrance:

We hate you. Good-bye.

Why the ichor? Why the bile? No doubt one too many sleepless nights struggling to make our website layout work with this venerable browser, released in 2013, which lacks support for so many features that are now standard in today’s browsers: module imports, web components, CSS Grid, ES6, the list goes on. Like its ancestor IE6, version 11 has clung to life far longer than it should have.

Though Microsoft support for it will not officially end until 2025, Microsoft’s Chief of Security, Chris Jackson, recently recommended in a blog post that people stop using IE11 as their default browser. It is considered a “compatibility solution,” something you should only use for services that require it. Our analytics indicate that a mere 0.8% of our users use IE11 to browse the site. (Even worldwide usage is at 1-3%, the bulk of it from a country in which we are blocked.)

Plus, maintaining compatibility with IE11 — with its need for polyfills, transpilation, and other workarounds — gets expensive, especially for a small team such as ours. Generously supported by donations from people like you, we are committed to doing the greatest good with the resources we have, making the world’s knowledge available to as many people as possible. IE11 is a distraction, with a diminished and ever diminishing return on our efforts.

So farewell, IE11. We will sleep better and rise with a little more spring in our step, knowing that your phrase with us has reached its conclusion.

Searching Through Everything

With over 20 million items in the Internet Archive’s many collections, having a good way to search through them to find exactly what you want is crucial. It is equally important to be able to filter the data in flexible ways so that you see subsets of the data most relevant to you. We are pleased to offer two new features that might change everything about how you search.

Faceted Filtering

Once you’ve executed a site search, either from the search form at the top right of every page or by going to the search page directly, you’ll see a bunch of new checkboxes down the left-hand side, in addition to the search results. These checkboxes are grouped into categories, such as “Media Type” and “Topics & Subjects”.

Clicking any of the checkboxes adds the corresponding term to the search criteria, allowing you to more precisely define the filtered set of search results. Checkmarking more than one term within the same category causes items that match any of the selected terms to be displayed, whereas checkmarking items from two different categories means that only items matching both terms will be shown. Play around with it, and you’ll see how intuitive it is. Checking or unchecking new terms causes search results to be re-filtered on the fly.

We were looking for a way to provide a more powerful, visual approach to filtering search results. When we user-tested the faceted search interface, our testers loved it. It was a familiar interface already in use throughout the Internet which offered both simplicity and richness.

Full-Text Search (in Beta)

Every day, we see an average of 50,000 hits on our search pages, as you, our users, search for title, creator, and various other metadata about the items we’ve archived. But you have long asked when you would be able to search not only across all items but within them as well. For years you’ve been able to search within the text of a single book using our BookReader, but never before have you been able to search across and within all 9 million available text items at the Internet Archive in a single shot. Until now.

Full-Text Search

And here’s all you have to do: On the search page, after entering your search query in the text field, checkmark “Search full text of books” just underneath the text field, and then click or tap “GO”. That’s it! In seconds, you’ll have the results of searching through millions of texts. Note that the facets at the left work a little differently from non-full-text searches; just click or tap one to add it as a filter criterion.

At the moment, we’re still in beta. Suffice to say, we’ve faced quite a number of challenges in configuring and populating our full-text search engine, from creating the Elasticsearch clusters to dealing with optical character recognition (OCR) issues related to strange fonts, running page headers, or language recognition. We are continuing to make improvements, and still have a ways to go.

But please use it! Try searching for some phrase that’s stuck in your head from a book long ago forgotten, and see what comes up. You now have the contents of 9 million texts at your fingertips.