DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Muffet: Quick and Easy Link Checking

If you need a way to check the integrity of the links on your site, then we have a solution for you. Read on for the details.

Lorna Mitchell user avatar by
Lorna Mitchell
·
Sep. 07, 18 · Tutorial
Like (1)
Save
Tweet
Share
5.81K Views

Join the DZone community and get the full member experience.

Join For Free

In my not-so-new job I work on Nexmo's developer portal and that means a lot of documents, a lot of links, just a lot to keep track of! One thing I worry about is changing something and breaking links from somewhere else, so I wanted to be able to check for existing links, broken links, and to include internal links like http://example.com/home#something as well since all our titles are linkable in that way.

Enter: muffet.

This was a brilliant and easy tool and these notes are mostly for my own reference as I had to figure a few things out as I went along.

Finding Broken Links

This tool can spider through your site, follow all the links, and show you any that are broken (including those internal links unless you specifically turn it off). It does cache the results so it's not hitting that cookies policy linked from your footer for every page it checks!

My command:

muffet -c 4 --exclude linkedin [url] | tee links.txt 

Setting the concurrency very low seemed to help get through the link checking without issue when running on my laptop. I'm really not sure what the right settings are here but I had success with this one.

I'm excluding LinkedIn here because we link it on every page and it returns a status code 999 to spiders.

Tools are everything: I'll give a shout out to tee which is a utility that both outputs to the terminal and writes output to a file. Once you have the file, it outputs the page the tool is on followed by a list of links and their status codes. I found that once I had the file, I could work with grep to find particular patterns of links I was interested in. Also if there's something showing up that you don't care about, grep -v [pattern] will exclude it from your grepped results.

I also loved using wc -l links.txt to get an immediate sense of how many errors we have (it's not an accurate count because the file includes the page titles as well as the failed links but it gives you a sense of scale).

Identifying Links to One Site from Another

Like most organizations, we have more than one website and it can be easy to miss when a change in one would cause a broken link on the other. For this I used muffet's -v switch to show me ALL links, not just the broken ones.

muffet -v -c 4 --exclude linkedin [url] | tee all-links.txt 

This shows all the links and enables me to build a map of the links from one site. Then I take the file and look at just the ones I'm interested in (the ones on that developer portal I mentioned) with a command like this:

grep "developer.nexmo.com" all-links.txt | sort | uniq 

And now I can see everything that links into the site that I should be mindful of (or that we already broke, oops! Luckily there weren't many of those).

Hopefully if you have a similar requirement, this tool could help you too. I'm not sure I'd run it as a build step as it takes a long time but I'm considering scheduling it to do a regular check on sites. I'd be interested to hear how others are using this tool too.

Links

Published at DZone with permission of Lorna Mitchell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Public Key and Private Key Pairs: Know the Technical Difference
  • How Chat GPT-3 Changed the Life of Young DevOps Engineers
  • Create a REST API in C# Using ChatGPT
  • 10 Easy Steps To Start Using Git and GitHub

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: