Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Find Broken Links and Email Yourself a 404 Not Found Report

DZone's Guide to

How to Find Broken Links and Email Yourself a 404 Not Found Report

How to create an automated tool for checking broken links to improve UI and PageRank.

· Web Dev Zone
Free Resource

Add user login and MFA to your next project in minutes. Create a free Okta developer account, drop in one of our SDKs to your application and get back to building.

Image title

Last week we showed you how to use the 404 Error Scanner microservice to crawl through every page, and find broken links and bad URLs on your website.

We wanted to take this a step further, and turn the raw JSON response into a nicely formatted email report. This report can be sent to our developers, content, and marketing teams to help ensure that broken links get fixed. The emailed report identifies the broken link, and where it originated.

This broken link report will be useful for keeping tabs on redirects you’ve previously setup, and catching the ones that no longer resolve properly.

Improve your SEO ranks by 301 redirecting broken URLs to relevant pages on your site, and point users – and Google – in the right direction. Otherwise, an internal page pointing to a 404 will leak PageRank, which could cause the page to be removed from Google.

So here’s a quick Python script for creating a broken link email report. We’ll be using the 404 Error Scanner algorithm from Algorithmia and Mailgun. Try running this every few weeks to keep tabs on your site’s link health.

Step 1: Install the Algorithmia Client

We’re doing this tutorial in Python, but this microservice could easily be built using any of the supported clients, like Javascript, Ruby, Java, cURL, etc. Check out the Python client guide for more information on calling the Algorithmia API.

Install the client from PyPi: pip install algorithmia

You’ll also need a free Algorithmia account, which includes 5,000 free credits a month. Sign up, and then grab your API key.

Step 2: Create a Free Mailgun Account

To email our report, we’re going to use Mailgun. Sign up for free here. They’re an email service provider with an API built for developers. Plus, they have a free tier that will cover the number of reports we plan to send every month.

Once you’re signed up, you need to grab both the base API URL and your API key from yourMailgun Dashboard.

Step 3: Check for Broken Links

Create a new Python file. We’re going to import Algorithmia and Requests at the top. Request will be used to POST our message over HTTP, rather than dealing with SMTP.

Next, we add the Algorithmia client and our input. Replace ALGORITHMIA_API_KEY with your key. Then set the URL of the domain you want to check for broken links.

To call the microservice, we pipe our input into the web/ErrorScanner algorithm.

import Algorithmia
import requests

# Algorithmia API key here
client = Algorithmia.client(“ALGORITHMIA_API_KEY”)

example_input = {
  “url”: “http://algorithmia.com/”,
  “depth”: 3
}

res = client.algo(“web/ErrorScanner”).set_options(timeout=2000).pipe(example_input)

broken_links = res.result[“brokenLinks”]

We’re also going to also set a timeout. By default, the timeout is 5-minutes. 404 Error Scanner algorithm first has to crawl all the pages on your site using the Site Mapper algorithm. This can take a long time if you have a large site.

We’ll set the timeout at 2000 seconds (33-minutes) just to be safe.

Step 4: Send Broken Link Report

Now we’re ready to format the results and send our email.

We need to iterate through the our list of broken links from the previous step. We’ll use a For loop for this to create a series of strings in this format:

broken link: {brokenLink} (referring page: {refPage}

Each string is then added to the email_str .

We’re going to keep this simple and copy the example API call from Mailgun’s quick start guide. Replace OUR_DOMAIN_NAME , and YOUR_API_KEY.

email_str = ""

# Iterate through our list of broken links
# to create a string we can add to the body of our email
for linkPair in broken_links:
    email_str += "broken link: " + linkPair["brokenLink"] + " (referring page: " + linkPair["refPage"] + ")" + "\n\n"

# Print the result from the API call
print email_str

# Send the email
def send_simple_message():
return requests.post(
# Mailgun Documentation: https://documentation.mailgun.com/quickstart-sending.html#send-via-api
"https://api.mailgun.net/v3/YOUR_DOMAIN_NAME/messages",
auth=("api", "YOUR_API_KEY"),
data={"from": "Excited User ",
              "to": ["bar@example.com", "YOU@YOUR_DOMAIN_NAME"],
              "subject": "Hello",
              "text": email_str})

# Print the response code from Mailgun
print send_simple_message()

This is also where you’d set the From, To, and Subject of your email. The body of the message will be our email_str we put together in the previous step.

Step 5: Running Your Broken Link Report

Okay, let’s run this. Fire up the command line and type: python 404.py

The script will run, first calling the error scanner algorithm from Algorithmia to find all the broken links on the site. Then, it’ll format all the broken links into strings. Finally, we call oursend_simple_message() function, which makes a POST request to Mailgun, and sends the message and our report.

Go check your email. Viola!

Image title

Conclusion

That’s was a simple way of creating a nicely formatted broken link report email. From here, you could deploy this script to Heroku, and use the Scheduler add-on to automatically run it weekly, or monthly. 

Or, consider using AWS Lambda and have it run whenever your team publishes a new blog post, or deploys front-end changes to your site. For more, check out the Algorithmia AWS Lambda blueprint.

Get the full 404 Error Scanner recipe on GitHub

import Algorithmia
import requests

# Algorithmia API key here
client = Algorithmia.client("ALGORITHMIA_API_KEY")

example_input = {
  "url": "http://algorithmia.com/",
  "depth": 3
}

res = client.algo("web/ErrorScanner").set_options(timeout=2000).pipe(example_input)

broken_links = res.result["brokenLinks"]

email_str = ""

# Iterate through our list of broken links
# to create a string we can add to the body of our email
for linkPair in broken_links:
    email_str += "broken link: " + linkPair["brokenLink"] + " (referring page: " + linkPair["refPage"] + ")" + "\n\n"

# Print the result from the API call
print email_str

# Send the email
def send_simple_message():
return requests.post(
# Mailgun Documentation: https://documentation.mailgun.com/quickstart-sending.html#send-via-api
"https://api.mailgun.net/v3/YOUR_DOMAIN_NAME/messages",
auth=("api", "YOUR_API_KEY"),
data={"from": "Excited User ",
              "to": ["bar@example.com", "YOU@YOUR_DOMAIN_NAME"],
              "subject": "Hello",
              "text": email_str})

# Print the response code from Mailgun
print send_simple_message()

Launch your application faster with Okta’s user management API. Register today for the free forever developer edition!

Topics:
seo ,marketing ,404 error ,reporting tools ,algorithm

Published at DZone with permission of Matt Kiser, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}