Skip to main content

Automate a Redirect Map with Python: A Step-by-Step Guide

When managing large websites or overseeing website migrations, creating redirect maps is an essential task that can take a significant amount of time. If you’ve ever had to manually match hundreds or even thousands of URLs between an old and a new site, you know how time-consuming and error-prone it can be. This article will show you how to automate the process using Python, saving you hours of work and reducing the chances of mismatches.

Why Automate a Redirect Map?

Redirect maps are vital for maintaining SEO equity when URLs change. They help ensure that users and search engines can still find content even when it’s been moved or renamed. Manually building these maps is often challenging, especially for larger sites, but with automation, you can:

  • Save time by eliminating manual URL comparisons.
  • Reduce errors with a reliable script that identifies the best URL matches.
  • Focus on more complex tasks by allowing automation to handle basic redirects.

How the Script Works

This script automates the matching process by analyzing content similarities between two sets of URLs. Here’s a breakdown:

  1. URL Import: Two TXT files are imported — one for the source website URLs and one for the target website URLs.
  2. Content Extraction: Using BeautifulSoup, the script scrapes and extracts the main body content from each URL, ignoring headers, footers, and irrelevant elements.
  3. Content Matching: The Python library PolyFuzz is used to calculate the similarity percentage between the content of the source and target URLs. This helps find the closest content match.
  4. Results Export: The results, including similarity percentages, are saved in a CSV file, allowing you to manually review low-similarity URLs and ensure accurate redirects.

The Python Script

Here’s the Python code you can use to automate your redirect map:

pythonCopy code# Import necessary libraries
from bs4 import BeautifulSoup, SoupStrainer
from polyfuzz import PolyFuzz
import concurrent.futures
import csv
import pandas as pd
import requests

# Import URLs
with open("source_urls.txt", "r") as file:
    url_list_a = [line.strip() for line in file]

with open("target_urls.txt", "r") as file:
    url_list_b = [line.strip() for line in file]

# Create a content scraper using BeautifulSoup
def get_content(url_argument):
    page_source = requests.get(url_argument).text
    strainer = SoupStrainer('p')
    soup = BeautifulSoup(page_source, 'lxml', parse_only=strainer)
    paragraph_list = [element.text for element in soup.find_all(strainer)]
    content = " ".join(paragraph_list)
    return content

# Scrape the URLs for content
with concurrent.futures.ThreadPoolExecutor() as executor:
    content_list_a = list(executor.map(get_content, url_list_a))
    content_list_b = list(executor.map(get_content, url_list_b))

content_dictionary = dict(zip(url_list_b, content_list_b))

# Get content similarities via PolyFuzz
model = PolyFuzz("TF-IDF")
model.match(content_list_a, content_list_b)
data = model.get_matches()

# Map similarity data back to URLs
def get_key(argument):
    for key, value in content_dictionary.items():
        if argument == value:
            return key
    return key

with concurrent.futures.ThreadPoolExecutor() as executor:
    result = list(executor.map(get_key, data["To"]))

# Create a dataframe for the final results
to_zip = list(zip(url_list_a, result, data["Similarity"]))
df = pd.DataFrame(to_zip)
df.columns = ["From URL", "To URL", "% Identical"]

# Export to a CSV file
with open("redirect_map.csv", "w", newline="") as file:
    columns = ["From URL", "To URL", "% Identical"]
    writer = csv.writer(file)
    writer.writerow(columns)
    for row in to_zip:
        writer.writerow(row)

How to Use the Script

  1. Prepare Your Data:
    • Create two TXT files: source_urls.txt and target_urls.txt. List the URLs of the old site (source) and the new site (target) in these files, one URL per line.
  2. Run the Script:
    • Run the script, and it will extract the content from each URL, compare them, and generate a CSV file (redirect_map.csv) with the redirect matches and their similarity percentages.
  3. Review the Results:
    • Manually check URLs with a low similarity percentage to ensure they redirect to the most appropriate content.

Final Thoughts

Automating a redirect map saves time, reduces human error, and allows SEOs and developers to focus on higher-level tasks. By using Python and libraries like BeautifulSoup and PolyFuzz, you can quickly build an efficient workflow for redirect management.

Implement this method today to streamline your site migrations and ensure a smooth user experience!


Daniel Dye

Daniel Dye is the President of NativeRank Inc., a premier digital marketing agency that has grown into a powerhouse of innovation under his leadership. With a career spanning decades in the digital marketing industry, Daniel has been instrumental in shaping the success of NativeRank and its impressive lineup of sub-brands, including MarineListings.com, LocalSEO.com, MarineManager.com, PowerSportsManager.com, NikoAI.com, and SearchEngineGuidelines.com. Before becoming President of NativeRank, Daniel served as the Executive Vice President at both NativeRank and LocalSEO for over 12 years. In these roles, he was responsible for maximizing operational performance and achieving the financial goals that set the foundation for the company’s sustained growth. His leadership has been pivotal in establishing NativeRank as a leader in the competitive digital marketing landscape. Daniel’s extensive experience includes his tenure as Vice President at GetAds, LLC, where he led digital marketing initiatives that delivered unprecedented performance. Earlier in his career, he co-founded Media Breakaway, LLC, demonstrating his entrepreneurial spirit and deep understanding of the digital marketing world. In addition to his executive experience, Daniel has a strong technical background. He began his career as a TAC 2 Noc Engineer at Qwest (now CenturyLink) and as a Human Interface Designer at 9MSN, where he honed his skills in user interface design and network operations. Daniel’s educational credentials are equally impressive. He holds an Executive MBA from the Quantic School of Business and Technology and has completed advanced studies in Architecture and Systems Engineering from MIT. His commitment to continuous learning is evident in his numerous certifications in Data Science, Machine Learning, and Digital Marketing from prestigious institutions like Columbia University, edX, and Microsoft. With a blend of executive leadership, technical expertise, and a relentless drive for innovation, Daniel Dye continues to propel NativeRank Inc. and its sub-brands to new heights, making a lasting impact in the digital marketing industry.

More Articles By Daniel Dye

Social media and SEO (Search Engine Optimization) have a symbiotic relationship. While social signals themselves may not be a direct ranking factor, a strong social media presence can enhance your SEO efforts. Social platforms drive traffic, boost brand visibility, and help create valuable backlinks. Understanding how each social network aligns with SEO efforts allows businesses […]
Negative Google reviews are often a source of frustration for business owners, whether they arise from customer misunderstandings, high expectations, or deliberate attempts to damage a business’s reputation. However, negative feedback doesn’t have to mean disaster. When handled strategically, even the worst reviews can be an opportunity to rebuild trust, enhance your customer service, and […]

Was this helpful?