THE AUTO-MATE | Your Trusted Partner for Automation Testing Solutions and Training

27 Dec, 2024

Post by: Admin

How to Detect Broken Links Using Playwright

Identifying and resolving broken links is essential for maintaining a seamless user experience and ensuring website functionality. Broken links lead to poor SEO performance and frustrate users, making their detection a crucial task for automation testers. With Playwright, a modern testing framework, this process can be automated efficiently. Here's how testers can use Playwright to detect broken links effectively.

Detecting Broken Links with Playwright

Playwright is a versatile tool that allows testers to interact with webpages programmatically. By leveraging its capabilities, automation testers can identify broken links through simple scripts. Broken links usually result in HTTP responses like 404 or other 4xx/5xx status codes.
Here’s a step-by-step approach to detecting broken links with Playwright.

1. Fetching All Links from a Page

To begin, you need to gather all the links on a webpage. Playwright provides powerful selectors to scrape all anchor (<a>) tags. These tags usually contain the URLs or links you want to check.

Code for Fetching all links form a page

                    
const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();

    // Navigate to the target website
    await page.goto('https://example.com');

    // Extract all links from the page
    const links = await page.$$eval('a', elements => elements.map(el => el.href));

    console.log('Links found:', links);

    await browser.close();
})();

This script navigates to your target website, fetches all href attributes from anchor tags, and logs them. You’ll notice that Playwright makes it straightforward to collect data from any page.

2. Checking Link Status Codes

Once you’ve collected the URLs, the next step is verifying their status codes. Broken links typically return response codes of 4xx or 5xx . Playwright’s built-in request functionality makes it simple to check the status of each link.

Script to validate link status codes

                    
const { chromium } = require('playwright');

(async () => {
    const browser = await chromium.launch();
    const page = await browser.newPage();

    await page.goto('https://example.com');

    // Extract all links
    const links = await page.$$eval('a', elements => elements.map(el => el.href));

    // Check each link's status
    for (const link of links) {
        try {
            const response = await page.evaluate((url) => {
                return fetch(url, { method: 'HEAD' }).then(res => res.status);
            }, link);

            if (response >= 400) {
                console.log(`Broken link found: ${link} (status: ${response})`);
            } else {
                console.log(`Valid link: ${link} (status: ${response})`);
            }
        } catch (error) {
            console.log(`Error fetching URL: ${link}. Error: ${error.message}`);
        }
    }

    await browser.close();
})();

This code snippet checks every link fetched from the webpage. It uses fetch with a HEAD request to get the response status efficiently. Links returning status codes 400 or higher are flagged as broken.

3. Logging Results

Properly logging the results is critical. This allows you to document broken links and investigate further. By saving the output into a file or database, you create a permanent reference for debugging.

Script to log the results into a text file

                    
const fs = require('fs');

(async () => {
    // Initialize an empty report
    const brokenLinksReport = [];

    // (Insert the link-fetching and status-checking logic here)

    // Example: Append broken links to the report
    for (const link of links) {
        if (response >= 400) {
            brokenLinksReport.push(`Broken link: ${link} (status: ${response})`);
        }
    }

    // Write to a file
    fs.writeFileSync('broken_links_report.txt', brokenLinksReport.join('\n'), 'utf8');

    console.log('Broken links have been logged to broken_links_report.txt');
})();

This method ensures that broken links are saved in a readable and reusable format. Readers of the report can quickly assess problem areas on the site.

Services

Automate Routine Tasks
Automation Testing Service
Industry Leading Training Courses

Courses

Playwright
Selenium
SDET (Software Development Engineer in Test)
Full Stack Automation
WebDriverIO
TestCafe
Appium
Rest API automation

Blogs

Home / Blogs Details