More than 9 million broken links on Wikipedia are now rescued

As part of the Internet Archive’s aim to build a better Web, we have been working to make the Web more reliable — and are pleased to announce that 9 million formerly broken links on Wikipedia now work because they go to archived versions in the Wayback Machine.

22 Wikipedia Language Editions with more than 9 million links now pointing to the Wayback Machine.

For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week.

And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a ‘404’, or ‘Page Not Found’). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability’.

To date we have successfully used IABot to edit and “fix” the URLs of nearly 6 million external references that would have otherwise returned a 404. In addition, members of the Wikipedia community have fixed more than 3 million links individually. Now more than 9 million URLs, on 22 Wikipedia sites, point to archived resources from the Wayback Machine and other web archive providers.

 

 

                   (Broken Link)                                                      (Rescued Page)

One way to measure the real-world benefit of this work is by counting the number of click-throughs from Wikipedia to the Wayback Machine. During a recent 10-day period, the Wikimedia Foundation started measuring external link click-throughs, as part of a new research project (in collaboration with a team of researchers at Stanford and EPFL) to study how Wikipedia readers use citations and external links. Preliminary results suggest that, by far, the most popular external destination was the Wayback Machine, three times the next most popular site, books.google.com. In real numbers, on average, more than 25,000 clicks/day were made from the English Wikipedia to the Wayback Machine.

From “Research:Characterizing Wikipedia Citation Usage/First Round of Analysis

Running IABot on a given Wikipedia site requires both technical integration and operations support as well as the approval of each related Wikipedia community. Two key people have worked on this project.

Maximilian Doerr, known in the Wikipedia world as “Cyberpower”, is a long time volunteer with the Wikipedia community and now a consultant to the Internet Archive. He is the author of the InternetArchiveBot (IABot) software.

Stephen Balbach is a long time volunteer with the Wikipedia community who collaborates with Max and the Internet Archive. He has authored programs that find and fix data errors, verifies existing archives on Wikipedia, and discovers new archives amongst Wayback’s billions of pages and across dozens of other web archive providers.

The number of rescued links, and the quality of the edits, is the result of Max and Stephen’s dedicated, creative and patient work.

What have we learned?

We learned that links to resources on the live web are fragile and not a persistently reliable way to refer to those resources. See “49% of the Links Cited in Supreme Court Decisions Are Broken”, The Atlantic, 2013.

We learned that archiving live-web linked resources, as close to the time they are linked, is required to ensure we capture those links before they go bad.

We learned that the issue of “link rot” (when once-good links return a 404, 500 or other complete failure) is only part of the problem, and that “content drift” (when the content related to a URL changes over time) is also a concern. In fact, “content drift” may be a bigger problem for reliably using external resources because there is no way for the user to know the content they are looking at is not the same as the editor had originally intended.

We learned that by working in collaboration with staff members of the Wikimedia Foundation, volunteers from the Wikipedia communities, paid contractors and the archived resources of the Wayback Machine and other web archives, we can have a material impact on the quality and reliability of Wikipedia sites and in so doing support our mission of “helping to make the web more useful and reliable”.

What is next?

We will expand our efforts to check and edit more Wikipedia sites and increase the speed which we scan those sites and fix broken links.

We will improve our processes to archive externally referenced resources by taking advantage of the Wikimedia Foundation’s new “EventStreams” web service.

We will explore how we might expand our link checking and fixing efforts to other media and formats, including more web pages, digital books and academic papers.

We will investigate and experiment with methods to support authors and editors use of archived resources (e.g. using Wayback Machine links in place of live-web links).

We will continue to work with the Wikimedia Foundation, and the Wikipedia communities world-wide, to advance tools and services to promote and support the use of persistently available and reliable links to externally referenced resources.

111 thoughts on “More than 9 million broken links on Wikipedia are now rescued

  1. Pingback: Internet Archive/Wayback Machine Has Now “Rescued” More than 9 Million Broken links on Wikipedia | LJ infoDOCKET

  2. Pingback: New top story on Hacker News: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia – Golden News

  3. Pingback: New top story on Hacker News: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia – News about world

  4. Pingback: New top story on Hacker News: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia - EYFnews

  5. Pingback: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia – Hacker News Robot

  6. pvoberstein

    You’re doing God’s[12] work here![12][13][15]

    20 million URLs a _week_ is far more than I thought were being ingested. Very impressive!

    > We will investigate and experiment with methods to support authors and editors use of archived resources (e.g. using Wayback Machine links in place of live-web links).

    On a more serious note, I believe that that may require some change in editorial culture. I’m generally in the habit of always adding the “|archive-url” parameter, but I’ve actually had those URLs _removed_ by other editors, who have stated that they don’t generally add archive links _until after_ the cited page is 404’d. It’d be nice if we could be as proactive/anticipatory as possible.

  7. Pingback: New best story on Hacker News: More than 9M broken links on Wikipedia are now rescued – Fiverr Alternative

  8. Pingback: More than 9M broken links on Wikipedia are now rescued – Infinity News

  9. Pingback: New best story on Hacker News: More than 9M broken links on Wikipedia are now rescued – Golden News

  10. Nemo

    Excellent! (Also, click-tracking is a rather invasive investigation technique, but good to know about all those clicks.)

    I’m very interested in «We will explore how we might expand our link checking and fixing efforts to other media and formats, including […] academic papers». OAbot is already able to identify broken links within the journal citation templates, but it doesn’t do anything about it yet.
    https://phabricator.wikimedia.org/T196255

  11. Pingback: More than 9M broken links on Wikipedia are now rescued

  12. Pingback: MED 100218 – mediaeater

  13. Pingback: More than 9M broken links on Wikipedia are now rescued | toppertrick

  14. Pingback: More than 9M broken links on Wikipedia are now rescued | Infozonic

  15. Judith Anderson

    I have inserted links in Wiki pages that I know are subject to drift, because I’m generating some of that drift. I’m referring to sites such as “Trove” on the Australian National Library site. The OCR interpretations of newspaper pages are being compared to the original scans by volunteers and edited. The originals are part of the site and can be found, but there isn’t any indication with the link that parts of the linked material is subject to drift. Have you considered adding “boiler plate” language that will tell readers, or is that the intent of the “retrieved on day/month/year note?

  16. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia – Fjoddes.Net

  17. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia | Bonafide News Source

  18. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia » @FinTechLog

  19. Pingback: Web Archive Says It Has Restored 9 Million Damaged Wikipedia Hyperlinks By Directing Them To Archived Variations in Wayback Machine - Doers Nest

  20. Pingback: Internet Archive Says It Has Restored 9 Million Broken Wikipedia Links By Directing Them To Archived Versions in Wayback Machine - R- Pakistan Daily Roznama

  21. For Old Hack

    Thank you. As a former editor, I have fixed many links, first of course, looking into the internet archive.

    Thank you for the internet archive, its a wholly more useful tool than Wikipedia, because it reflects history like a mirror, not like a shouting match,

  22. Pingback: Internet Archive Says It Has Restored 9 Million Broken Wikipedia Links By Directing Them To Archived Versions in Wayback Machine - Wiki Blog

  23. Pingback: Wikipédia : plus de 9 millions de liens morts ont été « ressuscités » - - Numerama

  24. Pingback: 互联网档案馆拯救了维基百科上超过 900 万死链 – My Blog

  25. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) | Mr Trance Movement

  26. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) - Maxx Consulting di Maurizio Triggiani

  27. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) · CYBERDEN

  28. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) – DUI Lawyer

  29. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) – Home Improvement Designs

  30. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) – Infotainment Factory

  31. Pingback: Internet.org project helps restore millions of broken Wikipedia links – TechCrunch - WebDnet

  32. Pingback: Internet.org project helps restore millions of broken Wikipedia links – David Yahid

  33. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Synergy Integration Advisers

  34. Pingback: Internet.org project helps restore millions of broken Wikipedia links

  35. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Software CreatorsSoftware Creators

  36. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Progdemon

  37. Pingback: Internet.org project helps restore millions of broken Wikipedia links | Cryptorawr

  38. Pingback: Internet.org project helps restore millions of broken Wikipedia links – You review tech

  39. Pingback: Internet.org project helps restore millions of broken Wikipedia links – FM Servers

  40. Pingback: Internet.org project helps restore millions of broken Wikipedia links | My News Cart

  41. Pingback: Internet.org project helps restore millions of broken Wikipedia links - THE Politico Post

  42. Pingback: Internet.org project helps restore millions of broken Wikipedia links | KWOTABLE

  43. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Global Business & Finance News | Tech | Health | Food

  44. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Maxx Consulting di Maurizio Triggiani

  45. Pingback: Internet.org project helps restore millions of broken Wikipedia links – TechCrunch | Gadgetvibe | Technology Made Easy

  46. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Yorkshire Tech Center

  47. Pingback: Internet.org project helps restore millions of broken Wikipedia links – TechCrunch | Breaking News, CNN, BBC, Nairaland.com

  48. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Jeffrey Lipton Barbados

  49. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Techit News !!

  50. Pingback: Internet.org project helps restore millions of broken Wikipedia links | GlobalNewsFactory

  51. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Objective News

  52. Pingback: Internet.org project helps restore millions of broken Wikipedia links | CLOUTWORK

  53. Pingback: Internet.org project helps restore millions of broken Wikipedia links

  54. Pingback: Internet Archive project helps restore millions of broken Wikipedia links

  55. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – You review tech

  56. Pingback: Internet Archive project helps restore millions of broken Wikipedia links - RocketNews | Top News Stories From Around the Globe

  57. Pingback: Internet Archive project helps restore millions of broken Wikipedia links | Geek Tech News

  58. Pingback: Internet Archive project helps restore millions of broken Wikipedia links | SERVINFO SOLUCIONES GLOBALES

  59. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Una White

  60. Pingback: Wikipedia fixes 9 million broken links thanks to the Internet Archive - Tesco Inc.

  61. Pingback: Internet Archive project helps restore millions of broken Wikipedia links - Techheadlines

  62. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – TechCrunch

  63. Pingback: archive.org ha “aggiustato” Wikipedia! | Notiziole di .mau.

  64. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Rich Beginner

  65. Pingback: Daily Technology News - Internet Archive project helps restore millions of broken Wikipedia links

  66. Pingback: Internet Archive já recuperou 9 milhões de links errados do Wikipedia – Carlos Trentini

  67. Pingback: Internet Archive project helps restore millions of broken Wikipedia links » @FinTechLog

  68. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Finance Crypto Community

  69. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – Social Media

  70. Pingback: Over 9 Million Broken Links on Wikipedia Are Now Rescued • Stephen Petrey

  71. Pingback: Internet Archive project helps restore millions of broken Wikipedia links - TheTechnoBuzz.com

  72. Pingback: Wikipedia: Major project fixes millions of its old, broken links – Sebastian Gogola's Interests

  73. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – TechCrunch | Digitpol

  74. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Thrifty 30

  75. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – My Blog

  76. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – TechCrunch – Download Top Apps Sofftware Articlez

  77. Pingback: Wikipédia : plus de 9 millions de liens morts ont été « ressuscités » – – TELES RELAY

  78. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – The Conservative Insider

  79. Pingback: Wikipedia: Major project fixes millions of its old, broken links – Tech Entourage

  80. Assignment Writing UK

    The Wayback Machine is basically a tool for Internet age as a dictionary, but it is more than an Internet age tool. You can use it as a research tool for your assignment writing. During an election year, if you have an assignment to write “What did the candidates say about health care 10 years ago, and how does that compare to now” with the Wayback Machine, you can find out rather easily. It is much harder to hide.

  81. Pingback: Internet Archive répare 9 millions de liens cassés dans Wikipédia – KelNews

  82. Pingback: Wikipedia: Community-Projekt repariert über neun Millionen Links

  83. Pingback: Internet Archive project helps restore millions of broken Wikipedia links | TECH WORLD

  84. Pingback: Internet Archive Fixes More Than 9 Million Broken Wikipedia Links – Auto Blog

  85. Pingback: Millions of Old, Broken Wikipedia Links Have Been Brought Back to Life – AMQOR

  86. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia – Mysore Leads

  87. Pingback: Internet Archive project helps restore millions of broken Wikipedia links |

  88. Pingback: Internet Archive ร่วมกับชุมชนซ่อมลิงก์เสียบน Wikipedia ให้ชี้มาที่ Wayback Machine แล้วราว 9 ล้านลิงก์ - Bbestit.com

  89. Pingback: More than 9 million broken links on Wikipedia are now rescued – no-Flux

  90. Pingback: Bots and Volunteers Replaced 9 Million Broken Wikipedia References with Wayback Machine Links - FeedBox

  91. Pingback: Bots and Volunteers Replaced 9 Million Broken Wikipedia References with Wayback Machine Links – CHEPA website

  92. Pingback: Bots y voluntarios reemplazaron 9 millones de referencias de Wikipedia rotas con enlaces de Wayback Machine – Net Windows

  93. Pingback: 9 Million Broken, Old Wikipedia Links Restored Back To Life  – नेपाली टाईम्स

  94. Pingback: Wikipedia上の壊れたリンク(ページが存在しないリンク)をInternet Archiveらの努力で大量に修復 | 暮らしのニュース速報まとめサイト KURASOKU

  95. Pingback: Wikipedia上の壊れたリンク(ページが存在しないリンク)をInternet Archiveらの努力で大量に修復 | 暗号通貨ジャーナル

  96. Pingback: Internet Archive следит за правильностью ссылок в ВикипедииТоррент портал | Торрент портал

  97. Pingback: Internet Archive следит за правильностью ссылок в Википедии | Новости высоких технологий

  98. Pingback: Internet Archive следит за правильностью ссылок в Википедии | Leads24.RU

  99. Pingback: Internet Archive следит за правильностью ссылок в Википедии — Новости IT

  100. Pingback: Vol. 4 Issue 41 | October 12, 2018 | Axis Virtual

  101. Pingback: Fixing broken links on Wikipedia - Wolne Media

  102. Pingback: [Перевод] Хранители интернета - My Blog

Comments are closed.