Every day hundreds of millions of web pages are archived to the Internet Archive’s Wayback Machine. Tens of millions of them submitted by users like you using our Save Page Now service. You can now do that in a way that is easier, faster and better than ever before.
Save Page Now (SPN) just got a major upgrade as a result of a total code rewrite, adding a slew of new and awesome features, with more on the way.
Let’s explore what’s new with Save Page Now
You can now save all the “outlinks” of a web page with a single click. By selecting the “save outlinks” checkbox you can save the requested page (and all the embedded resources that make up that page) and also all linked pages (and all the embedded resources that make up those pages). Often, a request to archive a single web page, with outlinks, will cause us to archive hundreds of URLs. Every one of which is shown via the SPN interface as it is archived.
When users are logged in with their free Archive.org account, SPN-generated archives can be saved to that user’s “My web archive” public gallery of archived pages.
Have you ever wanted to archive all the web pages linked from an email message? Well, you are in luck because now you can forward that email to “firstname.lastname@example.org” and after a few minutes you will get an email back filled with Wayback Machine playback URLs.
Some of you might like the new “First capture” badge you will see if any of the URLs you submit to be archived (including outlinked URLs and URLs included in emails) have not been archived yet. And, yes, for those of you who are feeling competitive, we are planning to launch a “leader board” soon. Let the games begin!
Maybe you want the URLs embedded in a web-based PDF file, RSS feed, or JSON file archived. The new SPN will parse those files and archive all the URLs they contain. To use this feature, simply submit PDF/RSS or JSON URLs to SPN, and don’t forget to select the “capture outlinks” checkbox.
This new version of SPN is also being used as the back-end support for a number of Wayback Machine services, including the iOS and Android apps as well as the Chrome, Firefox and Safari browser extensions. And, in case you wondered, those apps and extensions will also be getting major updates very soon.
And, yes, of course SPN has a brand new API that you can use to automate a range of Web archiving projects. Please write to us at email@example.com if you would like to learn more about the API.
We have often gotten requests to archive URLs from a Google Sheet. We now support that feature for authorised users. Please write to us for access to this advanced capability at firstname.lastname@example.org.
We LOVE hearing about ways we can make the Wayback Machine better. In fact most of these new SPN features started with your user suggestions.
Please let us know what you think. Good, bad, or otherwise. Who knows, the next cool SPN feature might be invented by you!
And remember, “If you see something, save something!”
It’s a good feature, but wouldn’t it be better to choose another way of sending email?
Can I pass in a ‘sitemap.xml’ for true bulk archiving?
“wouldn’t it be better to choose another way of sending email?”
Hmmm… not sure what you are suggesting. What other way did you have in mind?
I’ve been using a Save Page Now bookmarklet that doesn’t work anymore since this feature was launched. It simply appends the URL:
I looked at the new source and the problem is the form now requires a fancy POST method. Why break what has worked?
This is such valuable information
I really like reading this kind of informative blog.
Thank you for sharing.
To save a page using the new brozzler features to you need to choose to save the page as an IA item to your library, or can you just save it to the web.archive.org/web/URL like usual?
The Wayback Machine data is stored in WARC or ARC files which are written at web crawl time by the Heritrix crawler (or other crawlers) and stored as regular files in the archive.org storage cluster. Playback is accomplished by binary searching a 2-level index of pointers into the WARC data.
It isn’t working, I tested saving torrentfreak.com via an email to email@example.com that contain only that URL and no other text, no “reply” to indicate it tried to save pages and looking through this google chrome addon: https://chrome.google.com/webstore/detail/save-to-the-wayback-machi/eebpioaailbjojmdbmlpomfgijnlcemk it says the last time it saved was before now.
In my opinion, change the way you send email
I really like reading this kind of informative blog