Skip to main content

Leveraging Python for Competitor Analysis: Character, Word Count, and More

In the competitive world of SEO, it’s important to understand the textual strategies your competitors are employing on their pages. While Google’s algorithm has evolved and no longer places as much emphasis on text length as it once did, understanding the quantity and structure of content on competitor sites can still provide valuable insights. This article will cover how to use Python to count characters, words, analyze keyword density, and even create text spinners for repetitive tasks.

Let’s dive into how Python can help automate these processes, starting with simple character and word counting.

1. Counting Characters and Words in a Text

Counting the number of characters and words from a competitor’s webpage allows you to establish benchmarks for content length and depth. Even though the length of content might not be a primary ranking factor in today’s Google algorithm, it still gives insight into how much content high-ranking competitors use. This can help inform your own content strategy.

Luckily, this is an easy task with Python. The len() function counts characters, while .split() can be used to count words. Here’s an example using a text fragment:

text = "PageSpeed Insights API is a very powerful tool as it can give us lots of data to enhance the speed performance in a bulk way for many pages and we can even store this data in a database to analyze the speed evolution over time."

number_characters = len(text)
number_words = len(text.split(" "))

print(f"Characters: {number_characters}, Words: {number_words}")

This script will return the number of characters (439) and words (86) in the given text. It’s a simple and effective way to gain insight into the general content structure.

2. Analyzing Word Occurrences and Keyword Density

Beyond word counts, understanding how often certain words appear on a page can provide valuable insight into your competitors’ keyword strategy. This is essential for SEO, as it allows you to gauge which keywords they are targeting most heavily.

To analyze word occurrences and keyword density, we can use a dictionary to count how often each word appears:

count_words = dict()
words = text.split(" ")

for word in words:
    if word in count_words:
        count_words[word] += 1
    else:
        count_words[word] = 1

sorted_count = sorted(count_words.items(), key=lambda kv: kv[1], reverse=True)[0:20]
print(sorted_count)

This code will output the 20 most frequent words in the text, sorted in descending order of occurrence. However, it’s important to note that common words like prepositions or conjunctions might dominate the results. To filter these out, you can implement a stopword list to exclude common terms that don’t provide meaningful SEO insights.

You can also extend this logic to check for two-word phrases (bigrams), which are often more useful when analyzing keyword strategies. Here’s how to generate a list of the most frequent bigrams:

count_bigrams = dict()
words = text.split(" ")

for i in range(1, len(words)):
    bigram = words[i-1] + " " + words[i]
    if bigram in count_bigrams:
        count_bigrams[bigram] += 1
    else:
        count_bigrams[bigram] = 1

sorted_bigrams = sorted(count_bigrams.items(), key=lambda kv: kv[1], reverse=True)[0:20]
print(sorted_bigrams)

Once you have the bigrams sorted by frequency, it’s often helpful to add keyword density information. Keyword density is calculated by dividing the number of occurrences of a word or phrase by the total word count. Here’s how to add keyword density to the list:

for i in range(len(sorted_bigrams)):
    occurrences = sorted_bigrams[i][1]
    keyword_density = round((occurrences / len(words)) * 100, 2)
    sorted_bigrams[i] = (sorted_bigrams[i][0], occurrences, f"{keyword_density}%")

print(sorted_bigrams)

The output will show each bigram along with its number of occurrences and keyword density as a percentage. This data can be extremely valuable in formulating your content strategy, allowing you to target similar or alternative keywords based on your findings.

3. Automating Content Creation with Text Spinners

While Google’s algorithm has become advanced enough to detect spun content, there are still cases where generating slight variations of text is helpful. For instance, if you need to create multiple versions of meta titles or descriptions for a large set of similar products, Python can help you automate this task efficiently.

A simple example of a text spinner could involve generating titles for a real estate company, where the only difference is the neighborhood and whether a property is for rent or for sale. Here’s a Python script that demonstrates this concept:

neighborhoods = ["Eixample", "Poblenou", "Gracia", "Poblesec", "Bogatell", "Montjuic", "Sants"]
statuses = ["For Rent", "For Sale"]

for neighborhood in neighborhoods:
    for status in statuses:
        print(f"Flat {status} in {neighborhood}, Barcelona - MySite")

This code generates meta titles like:

Flat For Rent in Eixample, Barcelona - MySite
Flat For Sale in Poblenou, Barcelona - MySite
...

This can be useful for quickly producing variations of titles or other repetitive content. However, keep in mind that overusing this technique can result in content that lacks originality, which could negatively impact SEO.

Conclusion

Python provides a powerful, flexible toolset for analyzing competitor content and automating routine SEO tasks. Whether you’re counting words and characters to get a sense of competitor content, analyzing keyword density to optimize your own strategy, or generating multiple versions of repetitive content, Python can simplify the process. By leveraging these techniques, you can gain deeper insights into your competitors’ strategies and streamline your own SEO efforts.


Daniel Dye

Daniel Dye is the President of NativeRank Inc., a premier digital marketing agency that has grown into a powerhouse of innovation under his leadership. With a career spanning decades in the digital marketing industry, Daniel has been instrumental in shaping the success of NativeRank and its impressive lineup of sub-brands, including MarineListings.com, LocalSEO.com, MarineManager.com, PowerSportsManager.com, NikoAI.com, and SearchEngineGuidelines.com. Before becoming President of NativeRank, Daniel served as the Executive Vice President at both NativeRank and LocalSEO for over 12 years. In these roles, he was responsible for maximizing operational performance and achieving the financial goals that set the foundation for the company’s sustained growth. His leadership has been pivotal in establishing NativeRank as a leader in the competitive digital marketing landscape. Daniel’s extensive experience includes his tenure as Vice President at GetAds, LLC, where he led digital marketing initiatives that delivered unprecedented performance. Earlier in his career, he co-founded Media Breakaway, LLC, demonstrating his entrepreneurial spirit and deep understanding of the digital marketing world. In addition to his executive experience, Daniel has a strong technical background. He began his career as a TAC 2 Noc Engineer at Qwest (now CenturyLink) and as a Human Interface Designer at 9MSN, where he honed his skills in user interface design and network operations. Daniel’s educational credentials are equally impressive. He holds an Executive MBA from the Quantic School of Business and Technology and has completed advanced studies in Architecture and Systems Engineering from MIT. His commitment to continuous learning is evident in his numerous certifications in Data Science, Machine Learning, and Digital Marketing from prestigious institutions like Columbia University, edX, and Microsoft. With a blend of executive leadership, technical expertise, and a relentless drive for innovation, Daniel Dye continues to propel NativeRank Inc. and its sub-brands to new heights, making a lasting impact in the digital marketing industry.

More Articles By Daniel Dye

Here’s how you can automate sending daily email reports in Python using smtplib for sending emails and scheduling the job with the schedule or APScheduler library. I’ll walk you through the process step by step. Step 1: Set Up Your Email Server Credentials To send emails using Python, you’ll need access to an email SMTP […]
Google’s search algorithm is one of the most sophisticated systems on the internet. It processes millions of searches every day, evaluating the relevance and quality of billions of web pages. While many factors contribute to how Google ranks search results, the underlying system is based on advanced mathematical models and principles. In this article, we’ll […]

Was this helpful?