More than 9 million broken links on Wikipedia are now rescued

As part of the Internet Archive’s aim to build a better Web, we have been working to make the Web more reliable — and are pleased to announce that 9 million formerly broken links on Wikipedia now work because they go to archived versions in the Wayback Machine.

22 Wikipedia Language Editions with more than 9 million links now pointing to the Wayback Machine.

For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week.

And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a ‘404’, or ‘Page Not Found’). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability’.

To date we have successfully used IABot to edit and “fix” the URLs of nearly 6 million external references that would have otherwise returned a 404. In addition, members of the Wikipedia community have fixed more than 3 million links individually. Now more than 9 million URLs, on 22 Wikipedia sites, point to archived resources from the Wayback Machine and other web archive providers.

 

 

                   (Broken Link)                                                      (Rescued Page)

One way to measure the real-world benefit of this work is by counting the number of click-throughs from Wikipedia to the Wayback Machine. During a recent 10-day period, the Wikimedia Foundation started measuring external link click-throughs, as part of a new research project (in collaboration with a team of researchers at Stanford and EPFL) to study how Wikipedia readers use citations and external links. Preliminary results suggest that, by far, the most popular external destination was the Wayback Machine, three times the next most popular site, books.google.com. In real numbers, on average, more than 25,000 clicks/day were made from the English Wikipedia to the Wayback Machine.

From “Research:Characterizing Wikipedia Citation Usage/First Round of Analysis

Running IABot on a given Wikipedia site requires both technical integration and operations support as well as the approval of each related Wikipedia community. Two key people have worked on this project.

Maximilian Doerr, known in the Wikipedia world as “Cyberpower”, is a long time volunteer with the Wikipedia community and now a consultant to the Internet Archive. He is the author of the InternetArchiveBot (IABot) software.

Stephen Balbach is a long time volunteer with the Wikipedia community who collaborates with Max and the Internet Archive. He has authored programs that find and fix data errors, verifies existing archives on Wikipedia, and discovers new archives amongst Wayback’s billions of pages and across dozens of other web archive providers.

The number of rescued links, and the quality of the edits, is the result of Max and Stephen’s dedicated, creative and patient work.

What have we learned?

We learned that links to resources on the live web are fragile and not a persistently reliable way to refer to those resources. See “49% of the Links Cited in Supreme Court Decisions Are Broken”, The Atlantic, 2013.

We learned that archiving live-web linked resources, as close to the time they are linked, is required to ensure we capture those links before they go bad.

We learned that the issue of “link rot” (when once-good links return a 404, 500 or other complete failure) is only part of the problem, and that “content drift” (when the content related to a URL changes over time) is also a concern. In fact, “content drift” may be a bigger problem for reliably using external resources because there is no way for the user to know the content they are looking at is not the same as the editor had originally intended.

We learned that by working in collaboration with staff members of the Wikimedia Foundation, volunteers from the Wikipedia communities, paid contractors and the archived resources of the Wayback Machine and other web archives, we can have a material impact on the quality and reliability of Wikipedia sites and in so doing support our mission of “helping to make the web more useful and reliable”.

What is next?

We will expand our efforts to check and edit more Wikipedia sites and increase the speed which we scan those sites and fix broken links.

We will improve our processes to archive externally referenced resources by taking advantage of the Wikimedia Foundation’s new “EventStreams” web service.

We will explore how we might expand our link checking and fixing efforts to other media and formats, including more web pages, digital books and academic papers.

We will investigate and experiment with methods to support authors and editors use of archived resources (e.g. using Wayback Machine links in place of live-web links).

We will continue to work with the Wikimedia Foundation, and the Wikipedia communities world-wide, to advance tools and services to promote and support the use of persistently available and reliable links to externally referenced resources.

This entry was posted in Announcements, News. Bookmark the permalink.

111 Responses to More than 9 million broken links on Wikipedia are now rescued

  1. Pingback: Internet Archive/Wayback Machine Has Now “Rescued” More than 9 Million Broken links on Wikipedia | LJ infoDOCKET

  2. Pingback: New top story on Hacker News: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia – Golden News

  3. Pingback: New top story on Hacker News: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia – News about world

  4. Pingback: New top story on Hacker News: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia - EYFnews

  5. Pingback: Internet Archive Has Now Rescued More Than 9 Million Broken Links on Wikipedia – Hacker News Robot

  6. pvoberstein says:

    You’re doing God’s[12] work here![12][13][15]

    20 million URLs a _week_ is far more than I thought were being ingested. Very impressive!

    > We will investigate and experiment with methods to support authors and editors use of archived resources (e.g. using Wayback Machine links in place of live-web links).

    On a more serious note, I believe that that may require some change in editorial culture. I’m generally in the habit of always adding the “|archive-url” parameter, but I’ve actually had those URLs _removed_ by other editors, who have stated that they don’t generally add archive links _until after_ the cited page is 404’d. It’d be nice if we could be as proactive/anticipatory as possible.

  7. Adam says:

    Talk about being a fixer.

  8. Wallaroo says:

    Thank you!

  9. Pingback: New best story on Hacker News: More than 9M broken links on Wikipedia are now rescued – Fiverr Alternative

  10. Pingback: More than 9M broken links on Wikipedia are now rescued – Infinity News

  11. Pingback: New best story on Hacker News: More than 9M broken links on Wikipedia are now rescued – Golden News

  12. Nemo says:

    Excellent! (Also, click-tracking is a rather invasive investigation technique, but good to know about all those clicks.)

    I’m very interested in «We will explore how we might expand our link checking and fixing efforts to other media and formats, including […] academic papers». OAbot is already able to identify broken links within the journal citation templates, but it doesn’t do anything about it yet.
    https://phabricator.wikimedia.org/T196255

  13. Pingback: More than 9M broken links on Wikipedia are now rescued

  14. Pingback: MED 100218 – mediaeater

  15. Pingback: More than 9M broken links on Wikipedia are now rescued | toppertrick

  16. Pingback: More than 9M broken links on Wikipedia are now rescued | Infozonic

  17. .mau. says:

    Awesome!

  18. Die Jones says:

    Thank you

  19. Judith Anderson says:

    I have inserted links in Wiki pages that I know are subject to drift, because I’m generating some of that drift. I’m referring to sites such as “Trove” on the Australian National Library site. The OCR interpretations of newspaper pages are being compared to the original scans by volunteers and edited. The originals are part of the site and can be found, but there isn’t any indication with the link that parts of the linked material is subject to drift. Have you considered adding “boiler plate” language that will tell readers, or is that the intent of the “retrieved on day/month/year note?

  20. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia – Fjoddes.Net

  21. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia | Bonafide News Source

  22. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia » @FinTechLog

  23. Pingback: Web Archive Says It Has Restored 9 Million Damaged Wikipedia Hyperlinks By Directing Them To Archived Variations in Wayback Machine - Doers Nest

  24. Pingback: Internet Archive Says It Has Restored 9 Million Broken Wikipedia Links By Directing Them To Archived Versions in Wayback Machine - R- Pakistan Daily Roznama

  25. For Old Hack says:

    Thank you. As a former editor, I have fixed many links, first of course, looking into the internet archive.

    Thank you for the internet archive, its a wholly more useful tool than Wikipedia, because it reflects history like a mirror, not like a shouting match,

  26. Pingback: Internet Archive Says It Has Restored 9 Million Broken Wikipedia Links By Directing Them To Archived Versions in Wayback Machine - Wiki Blog

  27. Pingback: Wikipédia : plus de 9 millions de liens morts ont été « ressuscités » - - Numerama

  28. Pingback: 互联网档案馆拯救了维基百科上超过 900 万死链 – My Blog

  29. something something says:

    Absolutely brilliant, really appreciate what you’re doing here.

  30. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) | Mr Trance Movement

  31. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) - Maxx Consulting di Maurizio Triggiani

  32. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) · CYBERDEN

  33. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) – DUI Lawyer

  34. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) – Home Improvement Designs

  35. Pingback: Internet Archive says that 9M formerly broken links on Wikipedia now automatically go to archived versions on the Wayback Machine (Mark Graham/Internet Archive Blogs) – Infotainment Factory

  36. Pingback: Internet.org project helps restore millions of broken Wikipedia links – TechCrunch - WebDnet

  37. Pingback: Internet.org project helps restore millions of broken Wikipedia links – David Yahid

  38. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Synergy Integration Advisers

  39. Pingback: Internet.org project helps restore millions of broken Wikipedia links

  40. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Software CreatorsSoftware Creators

  41. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Progdemon

  42. Pingback: Internet.org project helps restore millions of broken Wikipedia links | Cryptorawr

  43. Pingback: Internet.org project helps restore millions of broken Wikipedia links – You review tech

  44. Pingback: Internet.org project helps restore millions of broken Wikipedia links – FM Servers

  45. Pingback: Internet.org project helps restore millions of broken Wikipedia links | My News Cart

  46. Pingback: Internet.org project helps restore millions of broken Wikipedia links - THE Politico Post

  47. Pingback: Internet.org project helps restore millions of broken Wikipedia links | KWOTABLE

  48. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Global Business & Finance News | Tech | Health | Food

  49. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Maxx Consulting di Maurizio Triggiani

  50. Pingback: Internet.org project helps restore millions of broken Wikipedia links – TechCrunch | Gadgetvibe | Technology Made Easy

  51. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Yorkshire Tech Center

  52. Pingback: Internet.org project helps restore millions of broken Wikipedia links – TechCrunch | Breaking News, CNN, BBC, Nairaland.com

  53. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Jeffrey Lipton Barbados

  54. treebeardie says:

    Good work. Thanks.

  55. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Techit News !!

  56. Pingback: Internet.org project helps restore millions of broken Wikipedia links | GlobalNewsFactory

  57. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Objective News

  58. Pingback: Internet.org project helps restore millions of broken Wikipedia links | CLOUTWORK

  59. Pingback: Internet.org project helps restore millions of broken Wikipedia links

  60. Pingback: Internet Archive project helps restore millions of broken Wikipedia links

  61. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – You review tech

  62. Pingback: Internet Archive project helps restore millions of broken Wikipedia links - RocketNews | Top News Stories From Around the Globe

  63. Pingback: Internet Archive project helps restore millions of broken Wikipedia links | Geek Tech News

  64. Pingback: Internet Archive project helps restore millions of broken Wikipedia links | SERVINFO SOLUCIONES GLOBALES

  65. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Una White

  66. Pingback: Wikipedia fixes 9 million broken links thanks to the Internet Archive - Tesco Inc.

  67. Pingback: Internet Archive project helps restore millions of broken Wikipedia links - Techheadlines

  68. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – TechCrunch

  69. Pingback: archive.org ha “aggiustato” Wikipedia! | Notiziole di .mau.

  70. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Rich Beginner

  71. Pingback: Daily Technology News - Internet Archive project helps restore millions of broken Wikipedia links

  72. Pingback: Internet Archive já recuperou 9 milhões de links errados do Wikipedia – Carlos Trentini

  73. Pingback: Internet Archive project helps restore millions of broken Wikipedia links » @FinTechLog

  74. Pingback: Internet.org project helps restore millions of broken Wikipedia links - Finance Crypto Community

  75. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – Social Media

  76. Pingback: Over 9 Million Broken Links on Wikipedia Are Now Rescued • Stephen Petrey

  77. Pingback: Internet Archive project helps restore millions of broken Wikipedia links - TheTechnoBuzz.com

  78. Pingback: Wikipedia: Major project fixes millions of its old, broken links – Sebastian Gogola's Interests

  79. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – TechCrunch | Digitpol

  80. Pingback: Internet.org project helps restore millions of broken Wikipedia links – Thrifty 30

  81. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – My Blog

  82. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – TechCrunch – Download Top Apps Sofftware Articlez

  83. Pingback: Wikipédia : plus de 9 millions de liens morts ont été « ressuscités » – – TELES RELAY

  84. Pingback: Internet Archive project helps restore millions of broken Wikipedia links – The Conservative Insider

  85. Pingback: Wikipedia: Major project fixes millions of its old, broken links – Tech Entourage

  86. The Wayback Machine is basically a tool for Internet age as a dictionary, but it is more than an Internet age tool. You can use it as a research tool for your assignment writing. During an election year, if you have an assignment to write “What did the candidates say about health care 10 years ago, and how does that compare to now” with the Wayback Machine, you can find out rather easily. It is much harder to hide.

  87. Vipin says:

    This is absolutely awesome stuff.

    That is the reason I just love archive . org

    You guys are simply the best.

  88. Pingback: Internet Archive répare 9 millions de liens cassés dans Wikipédia – KelNews

  89. Pingback: Wikipedia: Community-Projekt repariert über neun Millionen Links

  90. Pingback: Internet Archive project helps restore millions of broken Wikipedia links | TECH WORLD

  91. Pingback: Internet Archive Fixes More Than 9 Million Broken Wikipedia Links – Auto Blog

  92. Pingback: Millions of Old, Broken Wikipedia Links Have Been Brought Back to Life – AMQOR

  93. Pingback: The Internet Archive Fixes 9 Million Broken Links on Wikipedia – Mysore Leads

  94. Pingback: Internet Archive project helps restore millions of broken Wikipedia links |

  95. Pingback: Internet Archive ร่วมกับชุมชนซ่อมลิงก์เสียบน Wikipedia ให้ชี้มาที่ Wayback Machine แล้วราว 9 ล้านลิงก์ - Bbestit.com

  96. Pingback: More than 9 million broken links on Wikipedia are now rescued – no-Flux

  97. Pingback: Bots and Volunteers Replaced 9 Million Broken Wikipedia References with Wayback Machine Links - FeedBox

  98. Pingback: Bots and Volunteers Replaced 9 Million Broken Wikipedia References with Wayback Machine Links – CHEPA website

  99. Pingback: Bots y voluntarios reemplazaron 9 millones de referencias de Wikipedia rotas con enlaces de Wayback Machine – Net Windows

  100. Ace says:

    very cool to see this rescue! wonder if this was part of the fund the pineapple fund guy who donated 1 million in bitcoin to archive.org and now we see such a big move! nicely done

  101. Pingback: 9 Million Broken, Old Wikipedia Links Restored Back To Life  – नेपाली टाईम्स

  102. Pingback: Wikipedia上の壊れたリンク(ページが存在しないリンク)をInternet Archiveらの努力で大量に修復 | 暮らしのニュース速報まとめサイト KURASOKU

  103. Pingback: Wikipedia上の壊れたリンク(ページが存在しないリンク)をInternet Archiveらの努力で大量に修復 | 暗号通貨ジャーナル

  104. Pingback: Internet Archive следит за правильностью ссылок в ВикипедииТоррент портал | Торрент портал

  105. Pingback: Internet Archive следит за правильностью ссылок в Википедии | Новости высоких технологий

  106. Pingback: Internet Archive следит за правильностью ссылок в Википедии | Leads24.RU

  107. Pingback: Internet Archive следит за правильностью ссылок в Википедии — Новости IT

  108. Pingback: Vol. 4 Issue 41 | October 12, 2018 | Axis Virtual

  109. Pingback: Fixing broken links on Wikipedia - Wolne Media

  110. Pingback: [Перевод] Хранители интернета - My Blog

Comments are closed.