With over 20 million items in the Internet Archive’s many collections, having a good way to search through them to find exactly what you want is crucial. It is equally important to be able to filter the data in flexible ways so that you see subsets of the data most relevant to you. We are pleased to offer two new features that might change everything about how you search.
Once you’ve executed a site search, either from the search form at the top right of every page or by going to the search page directly, you’ll see a bunch of new checkboxes down the left-hand side, in addition to the search results. These checkboxes are grouped into categories, such as “Media Type” and “Topics & Subjects”.
Clicking any of the checkboxes adds the corresponding term to the search criteria, allowing you to more precisely define the filtered set of search results. Checkmarking more than one term within the same category causes items that match any of the selected terms to be displayed, whereas checkmarking items from two different categories means that only items matching both terms will be shown. Play around with it, and you’ll see how intuitive it is. Checking or unchecking new terms causes search results to be re-filtered on the fly.
We were looking for a way to provide a more powerful, visual approach to filtering search results. When we user-tested the faceted search interface, our testers loved it. It was a familiar interface already in use throughout the Internet which offered both simplicity and richness.
Full-Text Search (in Beta)
Every day, we see an average of 50,000 hits on our search pages, as you, our users, search for title, creator, and various other metadata about the items we’ve archived. But you have long asked when you would be able to search not only across all items but within them as well. For years you’ve been able to search within the text of a single book using our BookReader, but never before have you been able to search across and within all 9 million available text items at the Internet Archive in a single shot. Until now.
And here’s all you have to do: On the search page, after entering your search query in the text field, checkmark “Search full text of books” just underneath the text field, and then click or tap “GO”. That’s it! In seconds, you’ll have the results of searching through millions of texts. Note that the facets at the left work a little differently from non-full-text searches; just click or tap one to add it as a filter criterion.
At the moment, we’re still in beta. Suffice to say, we’ve faced quite a number of challenges in configuring and populating our full-text search engine, from creating the Elasticsearch clusters to dealing with optical character recognition (OCR) issues related to strange fonts, running page headers, or language recognition. We are continuing to make improvements, and still have a ways to go.
But please use it! Try searching for some phrase that’s stuck in your head from a book long ago forgotten, and see what comes up. You now have the contents of 9 million texts at your fingertips.