Skip to main content

Using Python for Web Servers

Using Python in combination with .htaccess files is a great way to automate and manage SEO-related tasks, especially when it comes to setting up redirects or optimizing site URLs. The .htaccess file is a configuration file for Apache web servers that allows you to control website behavior, including redirects, URL rewrites, and access control.

Here’s how you can use Python to generate .htaccess rules for various SEO tasks:

1. Redirects

Redirects are important for SEO, especially when URLs change. A 301 redirect informs search engines that a page has permanently moved to a new location, preserving link equity.

Example: Redirecting Old URLs to New URLs

Let’s say you want to automate 301 redirects for several old URLs to new URLs using a Python script.

Python Script to Generate .htaccess Redirect Rules:

import pandas as pd

# Load your old and new URLs from a CSV or Excel file
data = {
    'old_url': ['/old-page1', '/old-page2', '/old-page3'],
    'new_url': ['/new-page1', '/new-page2', '/new-page3']
}

# Convert it into a DataFrame (you can also load from an external file)
df = pd.DataFrame(data)

# Define a function to create redirect rules for htaccess
def create_redirect_rules(row):
    return f"Redirect 301 {row['old_url']} {row['new_url']}"

# Apply the function to each row of the DataFrame
df['htaccess_rule'] = df.apply(create_redirect_rules, axis=1)

# Save the rules into a .htaccess file
with open('.htaccess', 'w') as f:
    f.write("# Redirectsn")
    for rule in df['htaccess_rule']:
        f.write(rule + "n")

print("htaccess file created with the redirect rules.")

Explanation:

  • We store the old and new URLs in a DataFrame.
  • We then apply a function to each row to generate the redirect rules in .htaccess format.
  • Finally, we write these rules to a .htaccess file.

The output in the .htaccess file will look like this:

Redirect 301 /old-page1 /new-page1
Redirect 301 /old-page2 /new-page2
Redirect 301 /old-page3 /new-page3

2. Rewriting URLs for SEO

Another common SEO task is to rewrite URLs to make them more user- and search engine-friendly. For example, converting query string URLs (/page.php?id=123) into clean URLs (/page/123).

Python Script to Generate URL Rewrite Rules:

import pandas as pd

# Example data: list of dynamic URLs and their clean counterparts
data = {
    'dynamic_url': ['/page.php?id=123', '/post.php?id=456', '/article.php?id=789'],
    'clean_url': ['/page/123', '/post/456', '/article/789']
}

# Convert it into a DataFrame
df = pd.DataFrame(data)

# Define a function to create URL rewrite rules
def create_rewrite_rules(row):
    dynamic_url = row['dynamic_url'].replace("?", "\?")
    return f"RewriteRule ^{row['clean_url'][1:]}$ {dynamic_url} [L,QSA]"

# Apply the function to each row of the DataFrame
df['htaccess_rewrite_rule'] = df.apply(create_rewrite_rules, axis=1)

# Save the rewrite rules into a .htaccess file
with open('.htaccess', 'w') as f:
    f.write("# Rewrite Rulesn")
    for rule in df['htaccess_rewrite_rule']:
        f.write(rule + "n")

print("htaccess file created with the rewrite rules.")

Explanation:

  • Here, we have dynamic URLs and their clean counterparts.
  • The function create_rewrite_rules generates RewriteRule for .htaccess.
  • The resulting .htaccess rules will convert clean URLs back into dynamic URLs when accessed.

The output in the .htaccess file will look like this:

RewriteRule ^page/123$ /page.php?id=123 [L,QSA]
RewriteRule ^post/456$ /post.php?id=456 [L,QSA]
RewriteRule ^article/789$ /article.php?id=789 [L,QSA]

3. Forcing HTTPS or Removing www for SEO

Enforcing HTTPS and stripping www from URLs are common SEO best practices. You can use Python to automate the creation of these .htaccess rules for multiple domains.

Python Script to Generate HTTPS and www Removal Rules:

# List of domains (for multi-site or multi-domain management)
domains = ['example1.com', 'example2.com']

# Define HTTPS and www removal rules
def create_https_www_rules(domain):
    return f"""
# Force HTTPS and remove www for {domain}
RewriteEngine On
RewriteCond %{{HTTPS}} off
RewriteRule ^ https://{domain}%{{REQUEST_URI}} [L,R=301]
RewriteCond %{{HTTP_HOST}} ^www.{domain} [NC]
RewriteRule ^ https://{domain}%{{REQUEST_URI}} [L,R=301]
"""

# Write the rules to a .htaccess file
with open('.htaccess', 'w') as f:
    for domain in domains:
        f.write(create_https_www_rules(domain))

print("htaccess file created with HTTPS and www removal rules.")

Explanation:

  • This script loops through a list of domains and creates rules for forcing HTTPS and removing www from URLs.
  • The output .htaccess file will contain rules like:
# Force HTTPS and remove www for example1.com
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^ https://example1.com%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} ^www.example1.com [NC]
RewriteRule ^ https://example1.com%{REQUEST_URI} [L,R=301]

# Force HTTPS and remove www for example2.com
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^ https://example2.com%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} ^www.example2.com [NC]
RewriteRule ^ https://example2.com%{REQUEST_URI} [L,R=301]

Summary of SEO Uses for Python and .htaccess:

  • Automating 301 redirects: Useful when you’re migrating pages or changing URLs.
  • URL rewrites: Helps create clean, user-friendly URLs, which is better for SEO.
  • Enforcing HTTPS and removing www: Ensures consistent and secure URL formats for SEO.

This combination of Python and .htaccess allows for efficient and scalable management of website SEO optimizations.


Daniel Dye

Daniel Dye is the President of NativeRank Inc., a premier digital marketing agency that has grown into a powerhouse of innovation under his leadership. With a career spanning decades in the digital marketing industry, Daniel has been instrumental in shaping the success of NativeRank and its impressive lineup of sub-brands, including MarineListings.com, LocalSEO.com, MarineManager.com, PowerSportsManager.com, NikoAI.com, and SearchEngineGuidelines.com. Before becoming President of NativeRank, Daniel served as the Executive Vice President at both NativeRank and LocalSEO for over 12 years. In these roles, he was responsible for maximizing operational performance and achieving the financial goals that set the foundation for the company’s sustained growth. His leadership has been pivotal in establishing NativeRank as a leader in the competitive digital marketing landscape. Daniel’s extensive experience includes his tenure as Vice President at GetAds, LLC, where he led digital marketing initiatives that delivered unprecedented performance. Earlier in his career, he co-founded Media Breakaway, LLC, demonstrating his entrepreneurial spirit and deep understanding of the digital marketing world. In addition to his executive experience, Daniel has a strong technical background. He began his career as a TAC 2 Noc Engineer at Qwest (now CenturyLink) and as a Human Interface Designer at 9MSN, where he honed his skills in user interface design and network operations. Daniel’s educational credentials are equally impressive. He holds an Executive MBA from the Quantic School of Business and Technology and has completed advanced studies in Architecture and Systems Engineering from MIT. His commitment to continuous learning is evident in his numerous certifications in Data Science, Machine Learning, and Digital Marketing from prestigious institutions like Columbia University, edX, and Microsoft. With a blend of executive leadership, technical expertise, and a relentless drive for innovation, Daniel Dye continues to propel NativeRank Inc. and its sub-brands to new heights, making a lasting impact in the digital marketing industry.

More Articles By Daniel Dye

Here’s how you can automate sending daily email reports in Python using smtplib for sending emails and scheduling the job with the schedule or APScheduler library. I’ll walk you through the process step by step. Step 1: Set Up Your Email Server Credentials To send emails using Python, you’ll need access to an email SMTP […]
Google’s search algorithm is one of the most sophisticated systems on the internet. It processes millions of searches every day, evaluating the relevance and quality of billions of web pages. While many factors contribute to how Google ranks search results, the underlying system is based on advanced mathematical models and principles. In this article, we’ll […]

Was this helpful?