Daily Archives: October 24, 2016

FAQs for some new features available in the Beta Wayback Machine

blog-wbinfo

The Beta Wayback Machine has some new features including searching to find a website and a summary of types of media on a website.

How can I use the Wayback Machine’s Site Search to find websites? The Site Search feature of the Wayback Machine is based on an index built by evaluating terms from hundreds of billions of links to the homepages of more than 350 million sites. Search results are ranked by the number of captures in the Wayback and the number of relevant links to the site’s homepage.

Can I find sites by searching for words that are in their pages? No, at least not yet. Site Search for the Wayback Machine will help you find the homepages of sites, based on words people have used to describe those sites, as opposed to words that appear on pages from sites.

Can I search sites with text from multiple languages? Yes! In fact, you can search for any unicode character, e.g. you can search for (try clicking on it). If you can generate characters with your computer, you should be able to use them to search for sites via the Wayback Machine. Go ahead, try searching for правда

Can I still find sites in the Wayback Machine if I just know the URL? Yes, just enter a domain or URL the way you have in the past and press the “Browse History” button.

What is the “Summary of <site>” link above the graph on the calendar page telling me? It shows you the breakdown of the web captures for a given domain by content type (text, images, videos, PDFs, etc.) In addition, it shows the number of captures, URLs and new URLs, by year for all the years available via the Wayback Machine, so you can see how a certain site has changed over time.

What are the sources of your captures? When you roll over individual web captures (that pop-up when you roll over the dots on the calendar page for a URL) you may notice some text links shows up above the calendar, along with the word “why”. Those links will take you to the Collection of web captures associated with the specific web crawl the capture came from. Every day hundreds of web crawls contribute to the web captures available via the Wayback Machine. Behind each, there is a story about factors like who, why, when and how.

Why are some of the dots on the calendar page different colors? We color the dots, and links, associated with individual web captures, or multiple web captures, for a given day. Blue means the web server result code the crawler got for the related capture was a 2nn (good); Green means the crawlers got a status code 3nn (redirect); Orange means the crawler got a status code 4nn (client error), and Red means the crawler saw a 5nn (server error). Most of the time you will probably want to select the blue dots or links.

Can I find sites by searching for a word specific to that site? Yes, by adding in “site:<domain>” your results will be restricted to the specified domain. E.g. “site:gov clinton” will search for sites related to the term “clinton” in the domain “gov”.

Open Library New Features and Fixes

OpenLibrary team has added pages for 200,000 new modern works and rolled out a brigade of fixes and features.

screen shot of book reader

Prioritized by feedback from openlibrary patrons,

  • Full-text search through all books hosted on the Internet Archive is back online and is faster than ever. You can try the new feature, for example, to see over 115,000 places where works reference Benjamin Franklin’s maxim: “Little strokes fell great oaks”.
  • Updated new Book Reader, which looks great on mobile devices and provides a much clearer and simpler book borrowing experience. Try out the new Book Reader and see for yourself!

There are a few small changes in the BookReader that we think you’ll like specifically. EPUB and PDF loans can be initiated from within an existing BookReader loan. What this means for Open Library users is two pretty cool things you’ve long requested:

  • Users who start loans from the BookReader can borrow either EPUB or PDF formats, and switch formats during the loan period.
  • Users who start loans from the BookReader can return loans early, even EPUBs and PDFs.

 

screen shot showing onscreen areas to download and return books

We hope these changes will delight readers, empower developers, and help the community to make even more quality contributions. The path ahead looks even more promising. With clear direction and exciting redesign concepts in the works, the Open Library team is eager to bring you an Open Library at the cutting edge of the 21st century while giving you access to five centuries’ of texts.

image from old reading textbook

Thank you to Jessamyn West, Brenton Cheng, Mek Karpeles, Giovanni Damiola, Richard Carceres, and the many volunteers in the community.

[from the Open Library blog]

Beta Wayback Machine – Now with Site Search!

Wayback Machine with Site Search
For the last 15 years, users of the Wayback Machine have browsed past versions of websites by entering in URLs into the main search box and clicking on Browse History. With the generous support of The Laura and John Arnold Foundation, we’re adding an exciting new feature to this search box: keyword search!

With this new beta search service, users will now be able to find the home pages of over 361 Million websites preserved in the Wayback Machine just by typing in keywords that describe these sites (e.g. “new york times”). As they type keywords into the search box, they will be presented with a list of relevant archived websites with snippets containing:

  • a link to the archived versions of the site’s home page in the Wayback Machine
  • a thumbnail image of the site’s homepage (when available)
  • a short description of the site’s homepage
  • a capture summary of the site
    • number of unique URLs by content type (webpage, image, audio, video)
    • number of valid web captures over the associated time period

keyword search in wayback machine

Key Features

  • Search as you type
    • Instant results as you type — predictive, interactive and speedy
  • Multilingual
    • Search in any language or using symbols — expanding scope and utility
  • Site-based Filtering
    • Limit results to certain websites or domains using the site: operator (e.g. site:edu)

Behind the Scenes

  • Search index was built by processing over 250 billion webpages archived over 20 years
    • Index contains more than a billion terms collected from over 400 billion hyperlinks to the homepages of websites
  • Search results are ranked based on the number of relevant hyperlinks to the site’s homepage and the total number of web captures from the site

Example queries

We hope that this service, to search and discover archived web resources through time, will create new opportunities for scholarly work and innovation.

A big Thank You to: Vinay Goel, Kenji Nagahashi, Mark Graham, Bill Lubanovic, John Lekashman, Greg Lindahl, Vangelis Banos, Richard Caceres, Zijian He, Eugene Krevenets, Benjamin Mandel, Rakesh Pandey, Wendy Hanamura and Brewster Kahle