One of my pet peeves about technical documentation is dead links. There is nothing worse than having the answer to your question be a link that leads to the dreaded 404. Unfortunately, this is quite a common situation. Without some kind of automated link checking, eventually, any documentation that links to other resources will include stale links.
Fortunately checking for dead links does not have to be difficult. Iridium is a free and open source end to end testing tool, and when combined with a CI server like Travis CI, Iridium can be turned into a comprehensive automated link checking solution.
In this article, I will show you how to take advantage of the scheduling functionality in Travis CI and the power of Iridium to build an automated link checking solution.
First up, we need a Travis CI settings file that can be used to execute Iridium. I’ve documented the process of building this configuration in a previous DZone article, so I won’t go into much more detail here.
sudo: required dist: trusty language: java jdk: - oraclejdk8 addons: firefox: "56.0" before_install: - export CHROME_BIN=/usr/bin/google-chrome - sudo apt-get update - sudo apt-get install -y libappindicator1 fonts-liberation - wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb - sudo dpkg -i google-chrome*.deb - export DISPLAY=:99.0 - sh -e /etc/init.d/xvfb start - wget https://s3.amazonaws.com/iridium-release/IridiumApplicationTesting.jar script: > java -DtestSource=https://raw.githubusercontent.com/mcasperson/IridiumLinkCheck/master/linkcheck.feature -DtestDestination=Chrome -DfailAllAfterFirstScenarioError=false -jar IridiumApplicationTesting.jar
The only thing I want to point out about this configuration file is that we are making use of the failAllAfterFirstScenarioError system property when launching Iridium. By setting this to false, we instruct Iridium to run all the scenarios regardless of whether any of them fail. This ensures all the web pages we want to check will be examined.
Next, we need an Iridium test script that will perform the link checking. From Iridium’s point of view, link checking involves opening a page, opening all the links on that page, and checking to see if any of the HTTP requests failed.
In this example, I am checking some documentation I wrote for my day job at Octopus Deploy.
This documentation links out to a number of external sites, and those sites have a number of issues, mostly to do with accessing missing images. I’m not overly concerned about those images not loading (nor could I do anything about it anyway), so I’ve added steps that block access to the images and other resources and respond with a 201 to force the requests to appear to have succeeded.
This feature makes use of the Cucumber Scenario Outline functionality to rerun a Scenario multiple times with different inputs. This is a convenient way to test all the documentation pages I am responsible for without copying and pasting scenarios.
Feature: Check for dead links Scenario Outline: Launch App # There are some page requests that we are not interested in verifying. # These requests are blocked with a 201 to make them seem like they were successful. Given I block access to the URL regex ".*?realtime\.services\.disqus\.com.*" with response "201" And I block access to the URL regex ".*?semver\.org.*" with response "201" And I block access to the URL regex ".*?img\.youtube\.com.*" with response "201" And I block access to the URL regex ".*?apis\.google\.com.*" with response "201" And I block access to the URL regex ".*?accounts\.google\.com.*" with response "201" And I block access to the URL regex ".*?favicon\.ico$" with response "201" And I block access to the URL regex ".*?pippio.com.*" with response "201" And I block access to the URL regex ".*?static.jboss.org.*\.png" with response "201" And I block access to the URL regex ".*?www.jboss.org.*?\.png" with response "201" And I block access to the URL regex ".*?jbossremoting.jboss.org.*\.png" with response "201" And I block access to the URL regex ".*?customer-context-gateway.atlassian.com.*" with response "201" And I open the page "<url>" Then I open all links in new tabs and then close the tabs And I verify that there were no HTTP errors Examples: | url | | https://octopus.com/docs/deploying-applications/deploy-java-applications | | https://octopus.com/docs/api-and-integration/bamboo/bamboo-plugin |
The final step is to run this build automatically. Travis CI allows you to schedule a build every day, week, or month. This is perfect, as a daily run of the build allows me to quickly pick up on any broken links.
If there are any errors, you will see output like this in the report:
And I verify that there were no HTTP errors # ValidationStepDefinitions.verifyHttpCodes(String) au.com.agic.apptesting.exception.HttpResponseException: The following URLs returned HTTP errors https://octopus.com/img/layout/logo-2x.png at au.com.agic.apptesting.steps.ValidationStepDefinitions.verifyHttpCodes(ValidationStepDefinitions.java:283) at ✽.And I verify that there were no HTTP errors(2822153383616609457webapptester8681345984955767574.feature:18)
This report tells me that the call to https://octopus.com/img/layout/logo-2x.png has failed, and so this image should be added to the site.
Now I have a build running every day that will check my documentation for dead links, and I can take advantage of the reporting provided by Travis CI to notify me of any errors. The whole setup is free, and it only took a few minutes to implement.
If you are interested in writing end-to-end tests like the one shown above, check out the Iridium documentation, and take a look at the course on Udemy. Iridium itself is a free and open source project on GitHub.