Case Study: Broken Link Checker
How to check a website for broken links using Selenium WebDriver
--
While Selenium WebDriver is mainly used for web test automation, don’t forget Selenium is an automation framework, which can be used for other automation tasks, not limited in testing. In this case study, I will show a simple way to scrape a website’s links and check if they are broken, using Selenium WebDriver with C#.
This article will take this a tiny step further and specifically check for broken links with a little bit of programming know-how.
Note: This will be a simple unoptimised script just for demonstration purposes.
Test Design
- Create a list of URLs to check (
to_check_list
), starting with a site’s home URL. - Retrieve one URL from the
to_check_list
- Check whether that URL is broken.
- If it is a valid URL, visit the URL using Selenium WebDriver
- Get a list of hyperlinks from the new page, and add them to the
to_check_list
(unless it is already there) - Repeat Steps 2–5, until all URLs in the
to_check_list
have been checked. - Print out all the broken links.
For this example, I tried with two practice sites: https://travel.agileway.net, https://whenwise.agileway.net.
Test Steps
1. Create a list of URLs to check — starting with the home URL
Simply create an array urls
containing just the home URL. I’ve also created one empty array — brokenLinks
to keep track of the visited and broken URLs in later steps. The variable pointer
will keep track of the current url (index in urls
).
List<string> urls = new List<string>(); // URLs to check
urls.Add(site_host);
List<string> brokenLinks = new List<string>();
int pointer = 0; // the index of "urls" we are currently checking
2. Retrieve one URL from the “to check list”
We will use the index pointer
to get the URL, so retrieving the URL is easy:
while(urls.Count > pointer) {
var url =…