A Few Advanced Search Tips

The Internet Archive’s search engine is based on Elastic Search and implemented by Aaron Ximm.  Learning how to use the search engine can help using the website, but also using the command line tools for working with the Internet Archive.  Here are some tips.

It is capable of searching in just one collection:
https://archive.org/search.php?query=Casey%20Jones%20AND%20collection%3AGratefulDead

or with a particular field set, like just searching for Patsy Montana in the 78rpm collection:
https://archive.org/details/78rpm?and[]=creator:%22patsy%20montana%22

There is title, creator, date, year, description, and many other metadata fields that can be found by looking at a particular item’s metadata like so:
https://archive.org/metadata/78_give-me-a-home-in-montana_patsy-montana-the-prairie-ramblers_gbia0005195b/metadata

Searching for external-identifiers is tricky because of dealing with the embedded colons, which can throw off the parsing of the search string. If you’re looking for a specific full external-identifier, you can “escape” the colons by enclosing the target value in double quotes, like this:

https://archive.org/details/georgeblood?&and[]=external-identifier%3A%22urn%3Apubcat%3Ano-publisher%3A39981%22

but if you want to use a wildcard, you have to drop the double quotes. in that case, you need to remove any embedded colons by replacing them with `*`, like this:

https://archive.org/details/georgeblood?&and[]=external-identifier:urn*pubcat*no-publisher*399*

ISBN Searching: https://archive.org/search.php?query=isbn%3A9780964015319 but they can also be in related-external-identifiers if you want to find different editions that are fundamentally the same (thank you to oclc’s xisbn service for the help there).

LCCN searching: https://archive.org/search.php?query=lccn%3A94072390

OCLC numbers are in two places, but mostly: https://archive.org/search.php?query=oclc-id%3A31773958

Dates: If you want to find a book with a particular date in the date field: https://archive.org/details/Boston_College_Library?&and[]=date:1914

If you want to find all books that have a date in the date field: https://archive.org/details/Boston_College_Library?&and[]=date:*

All books that do not have any date field: https://archive.org/details/Boston_College_Library?&and[]=NOT%20date:*

You can also search by the number of bytes in an item: e.g.

https://archive.org/details/georgeblood?&and[]=item_size:[300000000%20TO%201000000000]%20AND%20publicdate:[2017-04-30%20TO%202099-01-01]

If you want to search the external identifier field, it is a bit tricky because it has “:” in the field.  So if you replace “?” it kind of works.   So these are the MGM records whose catalog numbers start with “30”:    https://archive.org/details/georgeblood?sort=-reviewdate&and%5B%5D=external%5C-identifier:urn?pubcat?mgm?30*