Skip to main content

Let’s take a closer look at the intricate SEO Python script, breaking it down step-by-step and providing detailed, simplified explanations of the corresponding code.

|

1. Keyword Research Using Google Ads Keyword Planner API

Objective:
The goal is to fetch relevant keyword suggestions, search volume, and competition data. This helps SEO practitioners find keywords with potential high traffic and low competition.

How It Works:

  • We connect to Google’s Ads API (specifically the Keyword Planner API) using googleads.
  • By providing seed keywords (e.g., “SEO”, “digital marketing”), the API returns related keywords.
  • For each keyword, we retrieve the search volume and competition level to guide keyword targeting decisions.

Code Breakdown:

from googleads import adwords
import pandas as pd

# Function to retrieve keyword ideas using Google Ads API
def get_keyword_ideas(client, seed_keywords):
    targeting_idea_service = client.GetService('TargetingIdeaService', version='v201809')

    selector = {
        'ideaType': 'KEYWORD',
        'requestType': 'IDEAS',
        'requestedAttributeTypes': ['KEYWORD_TEXT', 'SEARCH_VOLUME', 'COMPETITION'],
        'searchParameters': [{
            'xsi_type': 'RelatedToQuerySearchParameter',
            'queries': seed_keywords
        }],
        'paging': {
            'startIndex': '0',
            'numberResults': '100'  # Limit to 100 results for speed
        }
    }

    # Fetch data from the API
    page = targeting_idea_service.get(selector)

    # Parsing the API response
    keyword_data = []
    if 'entries' in page:
        for result in page['entries']:
            data = {}
            for attribute in result['data']:
                if attribute in ['KEYWORD_TEXT', 'SEARCH_VOLUME', 'COMPETITION']:
                    data[attribute] = result['data'][attribute]['value']
            keyword_data.append(data)

    return pd.DataFrame(keyword_data)  # Return as DataFrame for easy analysis
  • Why is this important?: You can identify keywords that are high in search volume but low in competition, offering the opportunity to rank easier and gain more traffic.

2. Website Crawling for SEO Data

Objective:
To extract the SEO-critical elements from a webpage like title tags, meta descriptions, H1 tags, and internal links.

How It Works:

  • We use the requests library to make an HTTP request to the website, retrieving its HTML content.
  • BeautifulSoup parses this HTML to find the title tag, meta description, H1 tags, and internal links.
  • This helps in on-page SEO analysis by ensuring all important elements are present and optimized.

Code Breakdown:

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Function to crawl a website and retrieve SEO elements
def crawl_website(url):
    response = requests.get(url)  # Fetch the page content
    soup = BeautifulSoup(response.text, 'html.parser')  # Parse the content

    # Extract important SEO elements
    title = soup.find('title').text if soup.find('title') else None
    meta_desc = soup.find('meta', attrs={'name': 'description'})
    meta_desc_content = meta_desc['content'] if meta_desc else None
    h1_tags = [h1.text for h1 in soup.find_all('h1')]
    internal_links = [a['href'] for a in soup.find_all('a', href=True) if url in a['href']]

    return {
        'Title': title,
        'Meta Description': meta_desc_content,
        'H1 Tags': h1_tags,
        'Internal Links': internal_links
    }
  • Why is this important?: Crawling ensures that your on-page SEO elements are optimized. Missing meta tags, poorly structured H1 tags, or broken internal links could hurt rankings.

3. On-Page Analysis using TF-IDF

Objective:
To find the most important terms (keywords) on a webpage based on the TF-IDF algorithm. This is useful for understanding how well your page is optimized around specific keywords.

How It Works:

  • TF-IDF (Term Frequency-Inverse Document Frequency) quantifies the importance of a word relative to all other words on the page.
  • High TF-IDF values indicate that a term is important to the document compared to other terms in a collection of documents.

Code Breakdown:

from sklearn.feature_extraction.text import TfidfVectorizer

# Function to perform TF-IDF analysis on page content
def tfidf_analysis(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    text = soup.get_text()  # Extract full text from page

    tfidf = TfidfVectorizer(stop_words='english')
    response = tfidf.fit_transform([text])  # Apply TF-IDF on page text

    feature_names = tfidf.get_feature_names_out()  # Extract terms
    importance_scores = response.toarray().flatten()  # Get scores

    tfidf_data = pd.DataFrame({
        'Term': feature_names,
        'Importance': importance_scores
    })

    return tfidf_data.sort_values(by='Importance', ascending=False).head(10)
  • Why is this important?: Helps identify if the right keywords are emphasized on the page. Ensures that the most relevant keywords appear frequently enough in your content.

4. Backlink Retrieval Using Ahrefs API

Objective:
Retrieve backlinks pointing to your website, which is crucial for off-page SEO analysis.

How It Works:

  • Ahrefs API provides detailed backlink data including the source URL, anchor text, and link type (follow or nofollow).
  • This script helps you analyze the quality and quantity of backlinks, which play a significant role in determining your website’s domain authority.

Code Breakdown:

import requests
import json

# Function to fetch backlinks from Ahrefs API
def get_backlinks(api_token, domain):
    ahrefs_url = f'https://apiv2.ahrefs.com?from=backlinks&target={domain}&mode=domain&output=json&token={api_token}'
    response = requests.get(ahrefs_url)
    backlinks = response.json()

    backlink_list = []
    for backlink in backlinks['backlinks']:
        backlink_list.append({
            'Source URL': backlink['referring_page'],
            'Anchor Text': backlink['anchor'],
            'Link Type': backlink['type']
        })

    return pd.DataFrame(backlink_list)  # Return backlink data as DataFrame
  • Why is this important?: Backlinks from reputable websites can drastically improve your site’s ranking, while low-quality or spammy backlinks may harm it.

5. Content Recommendations Based on Readability and Keyword Density

Objective:
Evaluate your content’s readability score and keyword density to ensure the content is both SEO-friendly and user-friendly.

How It Works:

  • Readability is measured using the Flesch Reading Ease score, which indicates how easy it is to read the content.
  • Keyword Density calculates how frequently a keyword appears in the text relative to the total number of words. This ensures the content is neither over-optimized nor under-optimized.

Code Breakdown:

import requests
from bs4 import BeautifulSoup
from textstat import flesch_reading_ease

# Function to analyze content readability and keyword density
def analyze_readability_and_density(url, keyword):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    text = soup.get_text()

    # Readability score (Flesch Reading Ease)
    readability_score = flesch_reading_ease(text)

    # Keyword density
    word_count = text.lower().split().count(keyword.lower())
    total_words = len(text.split())
    keyword_density = (word_count / total_words) * 100

    return {
        'Readability Score': readability_score,
        'Keyword Density (%)': keyword_density
    }
  • Why is this important?: If the content is too difficult to read, users will bounce. If the keyword density is too high, it could trigger search engines to penalize the page for keyword stuffing.

6. Automated Sitemap Creation

Objective:
To generate an XML sitemap for your website that search engines like Google use to crawl and index your pages.

How It Works:

  • Crawled internal links are structured in a sitemap XML format.
  • This sitemap is used by search engines to efficiently crawl the website and index its pages.

Code Breakdown:

from urllib.parse import urljoin

# Function to create a sitemap from crawled internal links
def create_sitemap(base_url, internal_links):
    sitemap = f'<?xml version="1.0" encoding="UTF-8"?>n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">n'
    for link in internal_links:
        full_link = urljoin(base_url, link)  # Resolve full URL from relative path
        sitemap += f'  <url>n    <loc>{full_link}</loc>n  </url>n'
    sitemap += '</urlset>'

    # Write the sitemap to an XML file
    with open('sitemap.xml', 'w') as f:
        f.write(sitemap)
  • Why is this important?: A well-structured sitemap helps search engines crawl and index your website efficiently, improving visibility insearch engine results.

Combining Everything into an SEO Pipeline

The final part of the script combines all of the above processes into a comprehensive SEO pipeline. This pipeline can automate the analysis of keyword research, on-page SEO, backlinks, content recommendations, and sitemap creation.

def seo_pipeline(domain):
    print("Starting SEO Pipeline...")

    # Step 1: Keyword Research
    print("Step 1: Retrieving Keywords...")
    seed_keywords = ['SEO', 'marketing']
    keywords_df = get_keyword_ideas(client, seed_keywords)

    # Step 2: Website Crawl
    print("Step 2: Crawling Website...")
    seo_data = crawl_website(domain)

    # Step 3: On-Page Analysis
    print("Step 3: Analyzing On-Page Content...")
    tfidf_result = tfidf_analysis(domain)

    # Step 4: Backlink Analysis
    print("Step 4: Retrieving Backlinks...")
    backlinks_df = get_backlinks(api_token, domain)

    # Step 5: Content Optimization
    print("Step 5: Analyzing Content for Improvements...")
    keyword = 'SEO'
    content_analysis = analyze_readability_and_density(domain, keyword)

    # Step 6: Generate Sitemap
    print("Step 6: Generating Sitemap...")
    create_sitemap(domain, seo_data['Internal Links'])

    return {
        'Keywords': keywords_df,
        'On-Page Analysis': tfidf_result,
        'Backlinks': backlinks_df,
        'Content Optimization': content_analysis,
    }

# Run the SEO pipeline
domain = 'https://example.com'
result = seo_pipeline(domain)
print(result)

Final Thoughts:

Each step in the pipeline addresses a critical aspect of SEO. By automating these tasks, you can significantly reduce the time required for SEO audits and improvements while gaining insights into your website’s performance. This pipeline can be expanded further depending on your needs, such as adding competitor analysis, automating reporting, or handling advanced content optimization techniques.


Daniel Dye

Daniel Dye is the President of NativeRank Inc., a premier digital marketing agency that has grown into a powerhouse of innovation under his leadership. With a career spanning decades in the digital marketing industry, Daniel has been instrumental in shaping the success of NativeRank and its impressive lineup of sub-brands, including MarineListings.com, LocalSEO.com, MarineManager.com, PowerSportsManager.com, NikoAI.com, and SearchEngineGuidelines.com. Before becoming President of NativeRank, Daniel served as the Executive Vice President at both NativeRank and LocalSEO for over 12 years. In these roles, he was responsible for maximizing operational performance and achieving the financial goals that set the foundation for the company’s sustained growth. His leadership has been pivotal in establishing NativeRank as a leader in the competitive digital marketing landscape. Daniel’s extensive experience includes his tenure as Vice President at GetAds, LLC, where he led digital marketing initiatives that delivered unprecedented performance. Earlier in his career, he co-founded Media Breakaway, LLC, demonstrating his entrepreneurial spirit and deep understanding of the digital marketing world. In addition to his executive experience, Daniel has a strong technical background. He began his career as a TAC 2 Noc Engineer at Qwest (now CenturyLink) and as a Human Interface Designer at 9MSN, where he honed his skills in user interface design and network operations. Daniel’s educational credentials are equally impressive. He holds an Executive MBA from the Quantic School of Business and Technology and has completed advanced studies in Architecture and Systems Engineering from MIT. His commitment to continuous learning is evident in his numerous certifications in Data Science, Machine Learning, and Digital Marketing from prestigious institutions like Columbia University, edX, and Microsoft. With a blend of executive leadership, technical expertise, and a relentless drive for innovation, Daniel Dye continues to propel NativeRank Inc. and its sub-brands to new heights, making a lasting impact in the digital marketing industry.

More Articles By Daniel Dye

Social media and SEO (Search Engine Optimization) have a symbiotic relationship. While social signals themselves may not be a direct ranking factor, a strong social media presence can enhance your SEO efforts. Social platforms drive traffic, boost brand visibility, and help create valuable backlinks. Understanding how each social network aligns with SEO efforts allows businesses […]
Negative Google reviews are often a source of frustration for business owners, whether they arise from customer misunderstandings, high expectations, or deliberate attempts to damage a business’s reputation. However, negative feedback doesn’t have to mean disaster. When handled strategically, even the worst reviews can be an opportunity to rebuild trust, enhance your customer service, and […]

Was this helpful?