How to Detect Broken Links Using Playwright

Fri Dec 27 202410 min read
How to Detect Broken Links Using Playwright

Broken links lead to poor SEO performance and frustrate users, making their detection a crucial task for automation testers. Here's how testers can use Playwright to detect broken links effectively.

Detecting Broken Links with Playwright

  • Playwright is a versatile tool that allows testers to interact with webpages programmatically. By leveraging its capabilities, automation testers can identify broken links through simple scripts. Broken links usually result in HTTP responses like 404 or other 4xx/5xx status codes.
  • Here’s a step-by-step approach to detecting broken links with Playwright.

1. Fetching All Links from a Page

  • To begin, you need to gather all the links on a webpage. Playwright provides powerful selectors to scrape all anchor (<a>) tags. These tags usually contain the URLs or links you want to check.

Code for Fetching all links form a page

const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();

    // Navigate to the target website
    await page.goto('https://example.com');

    // Extract all links from the page
    const links = await page.$$eval('a', elements => elements.map(el => el.href));

    console.log('Links found:', links);

    await browser.close();
})();

This script navigates to your target website, fetches all href attributes from anchor tags, and logs them. You’ll notice that Playwright makes it straightforward to collect data from any page.

2. Checking Link Status Codes

  • Once you’ve collected the URLs, the next step is verifying their status codes. Broken links typically return response codes of 4xx or 5xx . Playwright’s built-in request functionality makes it simple to check the status of each link.

Script to validate link status codes

const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();

    await page.goto('https://example.com');

    // Extract all links
    const links = await page.$$eval('a', elements => elements.map(el => el.href));

    // Check each link's status
    for (const link of links) {
        try {
            const response = await page.evaluate((url) => {
                return fetch(url, { method: 'HEAD' }).then(res => res.status);
            }, link);

            if (response >= 400) {
                console.log(`Broken link found: ${link} (status: ${response})`);
            } else {
                console.log(`Valid link: ${link} (status: ${response})`);
            }
        } catch (error) {
            console.log(`Error fetching URL: ${link}. Error: ${error.message}`);
        }
    }

    await browser.close();
})();
    

This code snippet checks every link fetched from the webpage. It uses fetch with a HEAD request to get the response status efficiently. Links returning status codes 400 or higher are flagged as broken.

3. Logging Results

  • Properly logging the results is critical. This allows you to document broken links and investigate further. By saving the output into a file or database, you create a permanent reference for debugging.

Script to log the results into a text file

const fs = require('fs');

(async () => {
    // Initialize an empty report
    const brokenLinksReport = [];

    // (Insert the link-fetching and status-checking logic here)

    // Example: Append broken links to the report
    for (const link of links) {
        if (response >= 400) {
            brokenLinksReport.push(`Broken link: ${link} (status: ${response})`);
        }
    }

    // Write to a file
    fs.writeFileSync('broken_links_report.txt', brokenLinksReport.join('\n'), 'utf8');

    console.log('Broken links have been logged to broken_links_report.txt');
})();

This method ensures that broken links are saved in a readable and reusable format. Readers of the report can quickly assess problem areas on the site.