Leveraging Python for Competitor Analysis: Character, Word Count, and More
In the competitive world of SEO, it’s important to understand the textual strategies your competitors are employing on their pages. While Google’s algorithm has evolved and no longer places as much emphasis on text length as it once did, understanding the quantity and structure of content on competitor sites can still provide valuable insights. This article will cover how to use Python to count characters, words, analyze keyword density, and even create text spinners for repetitive tasks.
Let’s dive into how Python can help automate these processes, starting with simple character and word counting.
1. Counting Characters and Words in a Text
Counting the number of characters and words from a competitor’s webpage allows you to establish benchmarks for content length and depth. Even though the length of content might not be a primary ranking factor in today’s Google algorithm, it still gives insight into how much content high-ranking competitors use. This can help inform your own content strategy.
Luckily, this is an easy task with Python. The len()
function counts characters, while .split()
can be used to count words. Here’s an example using a text fragment:
text = "PageSpeed Insights API is a very powerful tool as it can give us lots of data to enhance the speed performance in a bulk way for many pages and we can even store this data in a database to analyze the speed evolution over time."
number_characters = len(text)
number_words = len(text.split(" "))
print(f"Characters: {number_characters}, Words: {number_words}")
This script will return the number of characters (439) and words (86) in the given text. It’s a simple and effective way to gain insight into the general content structure.
2. Analyzing Word Occurrences and Keyword Density
Beyond word counts, understanding how often certain words appear on a page can provide valuable insight into your competitors’ keyword strategy. This is essential for SEO, as it allows you to gauge which keywords they are targeting most heavily.
To analyze word occurrences and keyword density, we can use a dictionary to count how often each word appears:
count_words = dict()
words = text.split(" ")
for word in words:
if word in count_words:
count_words[word] += 1
else:
count_words[word] = 1
sorted_count = sorted(count_words.items(), key=lambda kv: kv[1], reverse=True)[0:20]
print(sorted_count)
This code will output the 20 most frequent words in the text, sorted in descending order of occurrence. However, it’s important to note that common words like prepositions or conjunctions might dominate the results. To filter these out, you can implement a stopword list to exclude common terms that don’t provide meaningful SEO insights.
You can also extend this logic to check for two-word phrases (bigrams), which are often more useful when analyzing keyword strategies. Here’s how to generate a list of the most frequent bigrams:
count_bigrams = dict()
words = text.split(" ")
for i in range(1, len(words)):
bigram = words[i-1] + " " + words[i]
if bigram in count_bigrams:
count_bigrams[bigram] += 1
else:
count_bigrams[bigram] = 1
sorted_bigrams = sorted(count_bigrams.items(), key=lambda kv: kv[1], reverse=True)[0:20]
print(sorted_bigrams)
Once you have the bigrams sorted by frequency, it’s often helpful to add keyword density information. Keyword density is calculated by dividing the number of occurrences of a word or phrase by the total word count. Here’s how to add keyword density to the list:
for i in range(len(sorted_bigrams)):
occurrences = sorted_bigrams[i][1]
keyword_density = round((occurrences / len(words)) * 100, 2)
sorted_bigrams[i] = (sorted_bigrams[i][0], occurrences, f"{keyword_density}%")
print(sorted_bigrams)
The output will show each bigram along with its number of occurrences and keyword density as a percentage. This data can be extremely valuable in formulating your content strategy, allowing you to target similar or alternative keywords based on your findings.
3. Automating Content Creation with Text Spinners
While Google’s algorithm has become advanced enough to detect spun content, there are still cases where generating slight variations of text is helpful. For instance, if you need to create multiple versions of meta titles or descriptions for a large set of similar products, Python can help you automate this task efficiently.
A simple example of a text spinner could involve generating titles for a real estate company, where the only difference is the neighborhood and whether a property is for rent or for sale. Here’s a Python script that demonstrates this concept:
neighborhoods = ["Eixample", "Poblenou", "Gracia", "Poblesec", "Bogatell", "Montjuic", "Sants"]
statuses = ["For Rent", "For Sale"]
for neighborhood in neighborhoods:
for status in statuses:
print(f"Flat {status} in {neighborhood}, Barcelona - MySite")
This code generates meta titles like:
Flat For Rent in Eixample, Barcelona - MySite
Flat For Sale in Poblenou, Barcelona - MySite
...
This can be useful for quickly producing variations of titles or other repetitive content. However, keep in mind that overusing this technique can result in content that lacks originality, which could negatively impact SEO.
Conclusion
Python provides a powerful, flexible toolset for analyzing competitor content and automating routine SEO tasks. Whether you’re counting words and characters to get a sense of competitor content, analyzing keyword density to optimize your own strategy, or generating multiple versions of repetitive content, Python can simplify the process. By leveraging these techniques, you can gain deeper insights into your competitors’ strategies and streamline your own SEO efforts.