Let’s take a closer look at the intricate SEO Python script, breaking it down step-by-step and providing detailed, simplified explanations of the corresponding code.
1. Keyword Research Using Google Ads Keyword Planner API
Objective:
The goal is to fetch relevant keyword suggestions, search volume, and competition data. This helps SEO practitioners find keywords with potential high traffic and low competition.
How It Works:
- We connect to Google’s Ads API (specifically the Keyword Planner API) using
googleads
. - By providing seed keywords (e.g., “SEO”, “digital marketing”), the API returns related keywords.
- For each keyword, we retrieve the search volume and competition level to guide keyword targeting decisions.
Code Breakdown:
from googleads import adwords
import pandas as pd
# Function to retrieve keyword ideas using Google Ads API
def get_keyword_ideas(client, seed_keywords):
targeting_idea_service = client.GetService('TargetingIdeaService', version='v201809')
selector = {
'ideaType': 'KEYWORD',
'requestType': 'IDEAS',
'requestedAttributeTypes': ['KEYWORD_TEXT', 'SEARCH_VOLUME', 'COMPETITION'],
'searchParameters': [{
'xsi_type': 'RelatedToQuerySearchParameter',
'queries': seed_keywords
}],
'paging': {
'startIndex': '0',
'numberResults': '100' # Limit to 100 results for speed
}
}
# Fetch data from the API
page = targeting_idea_service.get(selector)
# Parsing the API response
keyword_data = []
if 'entries' in page:
for result in page['entries']:
data = {}
for attribute in result['data']:
if attribute in ['KEYWORD_TEXT', 'SEARCH_VOLUME', 'COMPETITION']:
data[attribute] = result['data'][attribute]['value']
keyword_data.append(data)
return pd.DataFrame(keyword_data) # Return as DataFrame for easy analysis
- Why is this important?: You can identify keywords that are high in search volume but low in competition, offering the opportunity to rank easier and gain more traffic.
2. Website Crawling for SEO Data
Objective:
To extract the SEO-critical elements from a webpage like title tags, meta descriptions, H1 tags, and internal links.
How It Works:
- We use the
requests
library to make an HTTP request to the website, retrieving its HTML content. BeautifulSoup
parses this HTML to find the title tag, meta description, H1 tags, and internal links.- This helps in on-page SEO analysis by ensuring all important elements are present and optimized.
Code Breakdown:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Function to crawl a website and retrieve SEO elements
def crawl_website(url):
response = requests.get(url) # Fetch the page content
soup = BeautifulSoup(response.text, 'html.parser') # Parse the content
# Extract important SEO elements
title = soup.find('title').text if soup.find('title') else None
meta_desc = soup.find('meta', attrs={'name': 'description'})
meta_desc_content = meta_desc['content'] if meta_desc else None
h1_tags = [h1.text for h1 in soup.find_all('h1')]
internal_links = [a['href'] for a in soup.find_all('a', href=True) if url in a['href']]
return {
'Title': title,
'Meta Description': meta_desc_content,
'H1 Tags': h1_tags,
'Internal Links': internal_links
}
- Why is this important?: Crawling ensures that your on-page SEO elements are optimized. Missing meta tags, poorly structured H1 tags, or broken internal links could hurt rankings.
3. On-Page Analysis using TF-IDF
Objective:
To find the most important terms (keywords) on a webpage based on the TF-IDF algorithm. This is useful for understanding how well your page is optimized around specific keywords.
How It Works:
- TF-IDF (Term Frequency-Inverse Document Frequency) quantifies the importance of a word relative to all other words on the page.
- High TF-IDF values indicate that a term is important to the document compared to other terms in a collection of documents.
Code Breakdown:
from sklearn.feature_extraction.text import TfidfVectorizer
# Function to perform TF-IDF analysis on page content
def tfidf_analysis(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.get_text() # Extract full text from page
tfidf = TfidfVectorizer(stop_words='english')
response = tfidf.fit_transform([text]) # Apply TF-IDF on page text
feature_names = tfidf.get_feature_names_out() # Extract terms
importance_scores = response.toarray().flatten() # Get scores
tfidf_data = pd.DataFrame({
'Term': feature_names,
'Importance': importance_scores
})
return tfidf_data.sort_values(by='Importance', ascending=False).head(10)
- Why is this important?: Helps identify if the right keywords are emphasized on the page. Ensures that the most relevant keywords appear frequently enough in your content.
4. Backlink Retrieval Using Ahrefs API
Objective:
Retrieve backlinks pointing to your website, which is crucial for off-page SEO analysis.
How It Works:
- Ahrefs API provides detailed backlink data including the source URL, anchor text, and link type (follow or nofollow).
- This script helps you analyze the quality and quantity of backlinks, which play a significant role in determining your website’s domain authority.
Code Breakdown:
import requests
import json
# Function to fetch backlinks from Ahrefs API
def get_backlinks(api_token, domain):
ahrefs_url = f'https://apiv2.ahrefs.com?from=backlinks&target={domain}&mode=domain&output=json&token={api_token}'
response = requests.get(ahrefs_url)
backlinks = response.json()
backlink_list = []
for backlink in backlinks['backlinks']:
backlink_list.append({
'Source URL': backlink['referring_page'],
'Anchor Text': backlink['anchor'],
'Link Type': backlink['type']
})
return pd.DataFrame(backlink_list) # Return backlink data as DataFrame
- Why is this important?: Backlinks from reputable websites can drastically improve your site’s ranking, while low-quality or spammy backlinks may harm it.
5. Content Recommendations Based on Readability and Keyword Density
Objective:
Evaluate your content’s readability score and keyword density to ensure the content is both SEO-friendly and user-friendly.
How It Works:
- Readability is measured using the Flesch Reading Ease score, which indicates how easy it is to read the content.
- Keyword Density calculates how frequently a keyword appears in the text relative to the total number of words. This ensures the content is neither over-optimized nor under-optimized.
Code Breakdown:
import requests
from bs4 import BeautifulSoup
from textstat import flesch_reading_ease
# Function to analyze content readability and keyword density
def analyze_readability_and_density(url, keyword):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.get_text()
# Readability score (Flesch Reading Ease)
readability_score = flesch_reading_ease(text)
# Keyword density
word_count = text.lower().split().count(keyword.lower())
total_words = len(text.split())
keyword_density = (word_count / total_words) * 100
return {
'Readability Score': readability_score,
'Keyword Density (%)': keyword_density
}
- Why is this important?: If the content is too difficult to read, users will bounce. If the keyword density is too high, it could trigger search engines to penalize the page for keyword stuffing.
6. Automated Sitemap Creation
Objective:
To generate an XML sitemap for your website that search engines like Google use to crawl and index your pages.
How It Works:
- Crawled internal links are structured in a sitemap XML format.
- This sitemap is used by search engines to efficiently crawl the website and index its pages.
Code Breakdown:
from urllib.parse import urljoin
# Function to create a sitemap from crawled internal links
def create_sitemap(base_url, internal_links):
sitemap = f'<?xml version="1.0" encoding="UTF-8"?>n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">n'
for link in internal_links:
full_link = urljoin(base_url, link) # Resolve full URL from relative path
sitemap += f' <url>n <loc>{full_link}</loc>n </url>n'
sitemap += '</urlset>'
# Write the sitemap to an XML file
with open('sitemap.xml', 'w') as f:
f.write(sitemap)
- Why is this important?: A well-structured sitemap helps search engines crawl and index your website efficiently, improving visibility insearch engine results.
Combining Everything into an SEO Pipeline
The final part of the script combines all of the above processes into a comprehensive SEO pipeline. This pipeline can automate the analysis of keyword research, on-page SEO, backlinks, content recommendations, and sitemap creation.
def seo_pipeline(domain):
print("Starting SEO Pipeline...")
# Step 1: Keyword Research
print("Step 1: Retrieving Keywords...")
seed_keywords = ['SEO', 'marketing']
keywords_df = get_keyword_ideas(client, seed_keywords)
# Step 2: Website Crawl
print("Step 2: Crawling Website...")
seo_data = crawl_website(domain)
# Step 3: On-Page Analysis
print("Step 3: Analyzing On-Page Content...")
tfidf_result = tfidf_analysis(domain)
# Step 4: Backlink Analysis
print("Step 4: Retrieving Backlinks...")
backlinks_df = get_backlinks(api_token, domain)
# Step 5: Content Optimization
print("Step 5: Analyzing Content for Improvements...")
keyword = 'SEO'
content_analysis = analyze_readability_and_density(domain, keyword)
# Step 6: Generate Sitemap
print("Step 6: Generating Sitemap...")
create_sitemap(domain, seo_data['Internal Links'])
return {
'Keywords': keywords_df,
'On-Page Analysis': tfidf_result,
'Backlinks': backlinks_df,
'Content Optimization': content_analysis,
}
# Run the SEO pipeline
domain = 'https://example.com'
result = seo_pipeline(domain)
print(result)
Final Thoughts:
Each step in the pipeline addresses a critical aspect of SEO. By automating these tasks, you can significantly reduce the time required for SEO audits and improvements while gaining insights into your website’s performance. This pipeline can be expanded further depending on your needs, such as adding competitor analysis, automating reporting, or handling advanced content optimization techniques.