Tag Archives: search

The Whole Earth Web Archive

As part of the many releases and announcements for our October Annual Event, we created The Whole Earth Web Archive. The Whole Earth Web Archive (WEWA) is a proof-of-concept to explore ways to improve access to the archived websites of underrepresented nations around the world. Starting with a sample set of 50 small nations we extracted their archived web content from the Internet Archive’s web archive, built special search and access features on top of this subcollection, and created a dedicated discovery portal for searching and browsing. Further work will focus on improving IA’s harvesting of the national webs of these and other underrepresented countries as well as expanding collaborations with libraries and heritage organizations within these countries, and via international organizations, to contribute technical capacity to local experts who can identify websites of value that document the lives and activities of their citizens.

whole earth web archive screenshot

Archived materials from the web play an increasingly necessary role in representation, evidence, historical documentation, and accountability. However, the web’s scale is vast, it changes and disappears quickly, and it requires significant infrastructure and expertise to collect and make permanently accessible. Thus, the community of National Libraries and Governments preserving the web remains overwhelmingly represented by well-resourced institutions from Europe and North America. We hope the WEWA project helps provide enhanced access to archived material otherwise hard to find and browse in the massive 20+ petabytes of the Wayback Machine. More importantly, we hope the project provokes a broader reflection upon the lack of national diversity in institutions collecting the web and also spurs collective action towards diminishing the overrepresentation of “first world” nations and peoples in the overall global web archive.

As with prior special projects by the Web Archiving & Data Services team, such as GifCities (search engine for animated Gifs from the Geocities web collection) or Military Industrial Powerpoint Complex (ebooks of Powerpoints from the archive of the .mil (military) web domain), the project builds on our exploratory work to provide improved access to valuable subsets of the web archive. While our Archive-It service gives curators the tools to build special collections of the web, we also work to build unique collections from the pre-existing global web archive.

The preliminary set of countries in WEWA were determined by selecting the 50 “smallest” countries as measured by number of websites registered on their national web domain (aka ccTLD) — a somewhat arbitrary measurement, we acknowledge. The underlying search index is based on internally-developed tools for search of both text and media. Indices are built from features like page titles or descriptive hyperlinks from other pages, with relevance ranking boosted by criteria such as number of inbound links and popularity and include a temporal dimension to account for the historicity of web archives. Additional technical information on search engineering can be found in “Exploring Web Archives Through Temporal Anchor Texts.”

We intend both to do more targeted, high-quality archiving of these and other smaller national webs and also have undertaking active outreach to national and heritage institutions in these nations, and to related international organizations, to ensure this work is guided by broader community input. If you are interested in contributing to this effort or have any questions, feel free to email us at webservices [at] archive [dot] org. Thanks for browsing the WEWA!

Open Library New Features and Fixes

OpenLibrary team has added pages for 200,000 new modern works and rolled out a brigade of fixes and features.

screen shot of book reader

Prioritized by feedback from openlibrary patrons,

  • Full-text search through all books hosted on the Internet Archive is back online and is faster than ever. You can try the new feature, for example, to see over 115,000 places where works reference Benjamin Franklin’s maxim: “Little strokes fell great oaks”.
  • Updated new Book Reader, which looks great on mobile devices and provides a much clearer and simpler book borrowing experience. Try out the new Book Reader and see for yourself!

There are a few small changes in the BookReader that we think you’ll like specifically. EPUB and PDF loans can be initiated from within an existing BookReader loan. What this means for Open Library users is two pretty cool things you’ve long requested:

  • Users who start loans from the BookReader can borrow either EPUB or PDF formats, and switch formats during the loan period.
  • Users who start loans from the BookReader can return loans early, even EPUBs and PDFs.

 

screen shot showing onscreen areas to download and return books

We hope these changes will delight readers, empower developers, and help the community to make even more quality contributions. The path ahead looks even more promising. With clear direction and exciting redesign concepts in the works, the Open Library team is eager to bring you an Open Library at the cutting edge of the 21st century while giving you access to five centuries’ of texts.

image from old reading textbook

Thank you to Jessamyn West, Brenton Cheng, Mek Karpeles, Giovanni Damiola, Richard Carceres, and the many volunteers in the community.

[from the Open Library blog]