How to Detect Broken Links Using Playwright
Identifying and resolving broken links is essential for maintaining a seamless user experience and ensuring website functionality. Broken links lead to poor SEO performance and frustrate users, making their detection a crucial task for automation testers. With Playwright, a modern testing framework, this process can be automated efficiently. Here's how testers can use Playwright to detect broken links effectively.
Detecting Broken Links with Playwright
- Playwright is a versatile tool that allows testers to interact with webpages programmatically. By leveraging its capabilities, automation testers can identify broken links through simple scripts. Broken links usually result in HTTP responses like 404 or other 4xx/5xx status codes.
- Here’s a step-by-step approach to detecting broken links with Playwright.
1. Fetching All Links from a Page
- To begin, you need to gather all the links on a webpage. Playwright provides powerful selectors to scrape all anchor (<a>) tags. These tags usually contain the URLs or links you want to check.
Code for Fetching all links form a page
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
// Navigate to the target website
await page.goto('https://example.com');
// Extract all links from the page
const links = await page.$$eval('a', elements => elements.map(el => el.href));
console.log('Links found:', links);
await browser.close();
})();
This script navigates to your target website, fetches all href attributes from anchor tags, and logs them. You’ll notice that Playwright makes it straightforward to collect data from any page.
2. Checking Link Status Codes
- Once you’ve collected the URLs, the next step is verifying their status codes. Broken links typically return response codes of 4xx or 5xx . Playwright’s built-in request functionality makes it simple to check the status of each link.
Script to validate link status codes
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Extract all links
const links = await page.$$eval('a', elements => elements.map(el => el.href));
// Check each link's status
for (const link of links) {
try {
const response = await page.evaluate((url) => {
return fetch(url, { method: 'HEAD' }).then(res => res.status);
}, link);
if (response >= 400) {
console.log(`Broken link found: ${link} (status: ${response})`);
} else {
console.log(`Valid link: ${link} (status: ${response})`);
}
} catch (error) {
console.log(`Error fetching URL: ${link}. Error: ${error.message}`);
}
}
await browser.close();
})();
This code snippet checks every link fetched from the webpage. It uses fetch with a HEAD request to get the response status efficiently. Links returning status codes 400 or higher are flagged as broken.
3. Logging Results
- Properly logging the results is critical. This allows you to document broken links and investigate further. By saving the output into a file or database, you create a permanent reference for debugging.
Script to log the results into a text file
const fs = require('fs');
(async () => {
// Initialize an empty report
const brokenLinksReport = [];
// (Insert the link-fetching and status-checking logic here)
// Example: Append broken links to the report
for (const link of links) {
if (response >= 400) {
brokenLinksReport.push(`Broken link: ${link} (status: ${response})`);
}
}
// Write to a file
fs.writeFileSync('broken_links_report.txt', brokenLinksReport.join('\n'), 'utf8');
console.log('Broken links have been logged to broken_links_report.txt');
})();
This method ensures that broken links are saved in a readable and reusable format. Readers of the report can quickly assess problem areas on the site.