Google alerts using Python
Step 1: Installing google-alerts
for Python
First, let’s install the google-alerts
library for Python and initialize our Google Alerts session. To begin, run the following command in your terminal: pip install google-alerts
Once installed, you’ll need to configure your Google Alerts session by entering your email and password. This can be done with the following setup command:
google-alerts setup --email <your-email-address> --password '<your-password>'
However, due to the library not being updated since 2020, it only works with Chrome Driver and Google Chrome version 84. You’ll need to install both Chrome Driver v84 and Google Chrome v84—be cautious not to overwrite your existing Chrome version during installation. After installation, seed the Google Alerts session using:
google-alerts seed --driver /tmp/chromedriver --timeout 60
Step 2: Creating Your First Alert
Once the session is seeded, you can start creating alerts using Python, either in a script or Jupyter notebook. First, authenticate using the following code:
from google_alerts import GoogleAlerts
ga = GoogleAlerts('<your_email_address>', '<your_password>')
ga.authenticate()
Now, to create your first Google Alert for a specific search term, such as “Barcelona” in Spain, use this:
ga.create("Barcelona", {'delivery': 'RSS', "language": "es", 'monitor_match': 'ALL', 'region' : "ES"})
If successful, the function will return an object containing details such as the search term, language, region, match type, and the RSS link for that alert.
Note: Unfortunately, it’s not currently possible to create an alert that monitors a term across all countries. If you leave the language and region fields blank, it defaults to English and the USA.
To review active alerts, use:
ga.list()
If you need to delete an alert, reference its monitor_id
and use:ga.delete("monitor_id")
Step 3: Parsing the RSS Feed
With the alert set up, let’s move on to parsing the RSS feed. We will extract key information such as the alert ID, title, publication date, URL, and content. Using the requests
library and BeautifulSoup
, we can extract and structure the data:
import requests
from bs4 import BeautifulSoup as Soup
r = requests.get('<your RSS feed>')
soup = Soup(r.text, 'xml')
id_alert = [x.text for x in soup.find_all("id")[1:]]
title_alert = [x.text for x in soup.find_all("title")[1:]]
published_alert = [x.text for x in soup.find_all("published")]
update_alert = [x.text for x in soup.find_all("updated")[1:]]
link_alert = [[x["href"].split("url=")[1].split("&ct=")[0]] for x in soup.find_all("link")[1:]]
content_alert = [x.text for x in soup.find_all("content")]
compiled_list = [[id_alert[x], title_alert[x], published_alert[x], update_alert[x], link_alert[x], content_alert[x]] for x in range(len(id_alert))]
This code will generate a comprehensive list with all relevant metrics for each alert. If desired, you can save the results to an Excel file using Pandas
:
import pandas as pd
df = pd.DataFrame(compiled_list, columns=["ID", "Title", "Published on", "Updated on", "Link", "Content"])
df.to_excel('new_alerts.xlsx', header=True, index=False)
Step 4: Automating Outreach to Sites
Leveraging Google Alerts with Python becomes particularly useful when automating outreach to websites that mention your brand or specific keywords. Using Python, you can scrape each URL and attempt to locate contact information such as email addresses.
Here’s a sample script that finds email addresses within the page content and links to contact pages:
import re
import requests
from bs4 import BeautifulSoup as Soup
for iteration in link_alert:
request_link = requests.get(iteration[0])
soup = Soup(request_link.text, 'html')
body = soup.find("body").text
match = [x for x in re.findall(r'[\w.+-]+@[\w-]+\.[\w.-]+', body) if ".png" not in x]
contact_urls = []
links = soup.find_all("a")
for y in links:
if "contact" in y.text.lower():
contact_urls.append(y["href"])
iteration.append([match])
iteration.append([contact_urls])
Once you’ve compiled the list of email addresses, you can even automate sending thank-you emails using smtplib
:
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import smtplib
msg = MIMEMultipart()
password = '<your email address password>'
msg['From'] = "<your email address>"
msg['To'] = "<Receiver email address>"
msg['Subject'] = "Thank you for mentioning my brand!"
message = "<p>Dear Sir or Madam,<br><br>Thank you for mentioning my brand in your article: " + URL + ". I would appreciate it if you could include a link to my website at https://www.example.com.<br><br>Thanks in advance!</p>"
msg.attach(MIMEText(message, 'html'))
server = smtplib.SMTP('smtp.gmail.com: 587')
server.starttls()
server.login('<your email address>', password)
server.sendmail(msg['From'], msg['To'], msg.as_string())
server.quit()
By combining these steps, you can automate both the monitoring of online mentions and the outreach process, streamlining your brand management and awareness efforts.