Monitor Your Competitors With AWS Lambda and Python
Be the first to know about price changes using web scraping with AWS Lambda and Python.
Join the DZone community and get the full member experience.
Join For FreeIn this tutorial, we are going to see how to monitor a competitor web page for changes using Python/AWS Lambda and the serverless framework.
We're going to make a CRON job that will scrape the ScrapingBee (my company website) pricing table and checks whether the prices changed.
It could be done for lots of other use cases, like receiving an alert each time a new job is posted on a job board, or an apartment on a rental website, for example.
You may also enjoy: Introduction to Monitoring Serverless Applications
Serverless refers to the execution of code inside ephemeral containers (Function as a Service, or FaaS). Some cloud providers also call thes "cloud functions."
Generally, you can trigger the function's execution with different mechanisms such as:
- An HTTP call to a REST API
- A job in a message queue
- A log (in Cloudwatch for example)
- An IoT event
Cloud functions can be a really good fit for different use cases, like when you don't care about latency/cold start for your CRON jobs, or when you need to "glue" different services together with some API calls.
In our example, it would be a "perfect" use case: we're going to scrape the pricing table of a competitor to get an alert in case it changes. Web scraping is I/O bound, so most of the time is spent waiting for an HTTP response from the server. You don't need a high-end CPU for this, nor a lot of RAM.
Prerequisites
In order to scaffold and deploy our project to AWS lambda, we will use the serverless framework. It's an amazing project that makes building/configuring your cloud functions really easy with a simple configuration file. It handles many different clouds (AWS, Google Cloud, Azure...) and different languages.
In order to install the CLI, you will need Node.js on your system and an AWS account. You can follow the instruction here.
Creating the Project
Now that you installed the serverless CLI, we can create a new python project for AWS with:
serverless create --template aws-python3 --name cron-scraping --path cron-scraping
In order to scrape ScrapingBee's pricing table, we will use Requests and BeautifulSoup packages:
pip install requests
pip install beautifulsoup4
pip freeze > requirements.txt
Without using serverless this can be a problem because you need to package your dependencies into a zip file and upload everything to AWS.
With serverless, you can use a plugin that will directly read your requirements.txt file and handle the dependencies.
In order to do so, you will only need to use this command:
npm init
Accept all the defaults, and then add this to your serverless.yml:
# serverless.yml
plugins:
- serverless-python-requirements
custom:
pythonRequirements:
dockerizePip: non-linux
You can also follow this guide if you want to know more about this.
Web Scraping
We are going to scrape the different prices on the pricing table that you can see below. If the price that we extract is not $9 / $29 or $99 then we will send an alert in a Slack channel (or an email).
We are going to use the Requests package to get the HTML code, and BeautifulSoup to parse it and select the different prices in the table.
In our case, we can select the prices with this CSS selector:
.price.color-1 span.a
Now let's code! In the serverless.yml, a handler is specified, it's the name of the python function that will be executed. It has two parameters, the event, which is generally a Python dictionary that contains the data your function will need (in our case it will be empty, we don't need any parameters) and the context parameter.
The context parameter is an object with different properties about the execution of your lambda function, such as the function name, version, and memory limit.
import json
import requests
from bs4 import BeautifulSoup
def hello(event, context):
base_url = "https://www.scrapingbee.com"
known_prices = ["$9", "$29", "$89"]
status = "Nothing changed"
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
prices = soup.select('.price.color-1 span.a')
for price in prices[:3]:
if price.text.strip() not in known_prices:
status = f'Something changed: {price.text}'
response = {
"statusCode": 200,
"body": status
}
return response
We then get the ScrapingBee's home page, parse the HTML with BeautifulSoup, and select the prices with the appropriate CSS selector. If the price is different from the known prices (9/29/99) then we change the status.
Instead of only setting a status, we could send a Slack notification to a channel. It's really easy, you just have to create an app to get a webhook URL as explained here.
And then, with Requests:
json = {"text": f"A price was updated on ScrapingBee's pricing table"}
slack_request = requests.post(
WEBHOOK_URL, json=json, headers={"Content-Type": "application/json"}
)
Deployment and Invocation and CRON
Deploying your function to AWS is really easy with serverless:
serverless deploy
In order to invoke your function:
serverless invoke -f hello --log
We don't want to do this manually, so we are going to add a simple line to our configuration file (serverless.yml) to invoke the function automatically every hour:
functions:
hello:
handler: handler.hello
events:
- schedule: rate(1 day)
You can learn more about schedule expression here.
Going Further
This was a little introduction to the serverless framework and how easy it is to build and deploy simple scripts to AWS Lambda.
There are many more things to explore, such as the integration with other AWS services like to trigger a function with an HTTP call.
Another interesting topic is AWS Lambda Layers that were introduced recently. It allows you to handle dependencies (including binaries) in your lambda execution environment.
Further Reading
Opinions expressed by DZone contributors are their own.
Trending
-
Which Is Better for IoT: Azure RTOS or FreeRTOS?
-
Top Six React Development Tools
-
Alpha Testing Tutorial: A Comprehensive Guide With Best Practices
-
How Agile Works at Tesla [Video]
Comments