Blog

Check For Broken Links

Keeping your website free of broken links is very, very important for a good user experience and keeping your site’s SEO health. If the idea of manually checking each link sounds horrible, don’t worry! In this guide, we’ll show you how to automate the process using Python, a popular programming language, and Beautiful Soup, a powerful tool for parsing HTML and XML files.

What You’ll Need

  • A Mac or PC: This guide works for both.
  • Python Installed: Most computers come with Python already installed, but we’ll check to make sure.
  • An Internet Connection: To install necessary tools and access your website’s sitemaps.
  • A Text Editor: Anything from Notepad (Windows), TextEdit (Mac), to more advanced editors like VSCode or Sublime Text.

Step 1: Check If Python Is Installed

  1. Open Terminal (Mac) or Command Prompt (Windows).
  2. Type python3 --version and press Enter.
    • If you see a version number, you’re all set!
    • If not, download and install Python from python.org.

Step 2: Install Beautiful Soup

  1. In your Terminal or Command Prompt, type pip3 install beautifulsoup4 and press Enter.
  2. Wait for the installation to finish.

Step 3: Find Your Website’s Sitemaps

Your website’s sitemap is a roadmap of all the pages and content on your site. You’ll typically find this at yourwebsite.com/sitemap.xml. If you have multiple sitemaps, note down their URLs.

Step 4: Create Your Python Script

  1. Open your text editor and create a new file named check_broken_links.py.
  2. Copy the following script into your file:
import requests
from bs4 import BeautifulSoup

def fetch_urls_from_sitemap(sitemap_url):
"""Fetch URLs from a single sitemap."""
response = requests.get(sitemap_url)
soup = BeautifulSoup(response.content, 'xml')
urls = [loc.text for loc in soup.find_all('loc')]
return urls

def check_link_for_404(url):
"""Check if the given URL returns a 404 status code."""
try:
response = requests.head(url, allow_redirects=True)
if response.status_code == 404:
print(f"404 FOUND: {url}")
return True
except requests.RequestException as e:
print(f"Error checking {url}: {e}")
return False

def main(sitemap_urls):
"""Main function to process each sitemap and check URLs for 404s."""
for sitemap_url in sitemap_urls:
print(f"Processing {sitemap_url}")
urls = fetch_urls_from_sitemap(sitemap_url)
found_broken = False
for url in urls:
if check_link_for_404(url):
found_broken = True
if not found_broken:
print(f"No broken links found in {sitemap_url}.")

# List of your sitemap URLs
sitemap_urls = [
'https://www.your-domain.com/post-sitemap1.xml',
'https://www.your-domain.com/page-sitemap.xml',
# Add any additional sitemap URLs here.
]

main(sitemap_urls)

main()
  1. Replace 'yourwebsite.com/sitemap1.xml' with your actual sitemap URLs.

Step 5: Run Your Script

  1. Save your check_broken_links.py file.
  2. Open Terminal or Command Prompt and navigate to the folder where your file is saved.
    • Use the cd command to change directories, e.g., cd Downloads.
  3. Type python3 check_broken_links.py and press Enter.
  4. The script will run and print out any broken links it finds.

Pro Tip: Before you run the script, it might be useful to review a comprehensive checklist of best practices, including security, tagging, and analytics, to ensure your website is fully optimised. Check out this Tool For Developers Checklist for more details.

Congratulations!

You’ve just automated checking for broken links on your website! This script is just a starting point. As you get more comfortable, you can expand it to meet your specific needs.

About the author: Michael Masa

Why should you listen to me? With a rich marketing background and a passion for sharing knowledge, I have dedicated the last 9 years of my life to the field. I have worked as Marketing Director and have been instrumental in shaping the marketing strategy of one of Europe’s leading insurers, BAVARIA AG.

Prior to my current role, I spent 12 years as Sales Director, managing a team of 12 dynamic people and applying the latest sales techniques to drive success. This experience allowed me to hone my leadership skills and gain a deep understanding of the sales industry.

I am now at the helm of Dealers League, a marketing agency that not only creates and manages websites for businesses, but also focuses on the importance of effective marketing strategies. Recognising the need for continuous learning in this fast-paced industry, we offer courses on the latest marketing techniques.

My varied experience in sales and marketing gives me a unique insight into how these two crucial areas intersect. I look forward to sharing my knowledge and insights with you through this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *