How I built my ideal daily Python newsletter with AWS and Python

I'm a heavy Python user. I've basically used it to do my job every day for the last decade, so it's valuable for me to keep tabs on new developments in the Python ecosystem. Amazing programmers talk about new Python developments on Hacker News, Reddit, and Twitter all the time, so I created a daily newsletter to collect the top Python stories from all three platforms. If you want to take a look at the finished product, it can be found here: compellingpython.com

Secret weapon number one: AWS Lambda with Docker

Using Docker containers on Lambda has been a little bit of a game-changer for me, mainly because it's so easy to add extra supporting stuff (Jinja HTML template files, packages, etc) to the container without having to have an elaborate custom build system.

Sending a daily email is a perfect application for AWS Lambda, because all it takes is a single invocation that runs for a max of 15 minutes a day (using 128 mb of RAM). This comes out to about 6 cents per month (yes, you read that correctly, 6 cents per month), which is essentially free.

With Docker Lambdas, all I have to do is write a Dockerfile that looks like this:

FROM public.ecr.aws/lambda/python:3.9

COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"

COPY hn.py reddit.py twitter.py article_processor.py app_handler.py ${LAMBDA_TASK_ROOT}/
COPY templates/ ${LAMBDA_TASK_ROOT}/templates/

CMD [ "app_handler.handler" ]

This will install the python packages I specify in a requirements.txt file, then copy over my python source files for reading data from HN, Reddit, and Twitter, along with the main Lambda app_handler.py file, and then tell the Lambda container where the entry point function is (a function called handler in the app_handler.py file.

After I build and push the docker container to ECR (instructions for this can be found right in your ECR repo!), then it's time for me to deploy the Docker image to Lambda with AWS CDK.

Secret weapon number two: AWS CDK

Infrastructure as Code – so valuable, yet so tedious. In the past I would always groan inwardly when it was time to write Terraform or Cloudformation to template out my AWS resources, but in 2019 Amazon released CDK. CDK supports Python, so I can now write all my IaC in Python, which is a joy! Python CDK code transpiles to Cloudformation (with an intermediate Typescript step, more on that later), so all the state information lives in AWS and I can use drift detection on my stacks. I know this is a relatively new tool, but I'm surprised it's not used more widely.

So the first step in building the newsletter is deploying the Docker Lambda I discussed above using CDK. This Lambda will obtain data from Hacker News, Reddit, and Twitter, filter it, and then send me a nicely formatted email.

To provision the Lambda along with an IAM role, this is all the CDK code I need:

lambda_role = iam.Role(self, id="python-newsletter-lambda",
    role_name='PythonNewsletterRole',
    assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
    managed_policies= [
                iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSLambdaVPCAccessExecutionRole"),
                iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSLambdaBasicExecutionRole"),
                iam.ManagedPolicy.from_aws_managed_policy_name("AmazonS3FullAccess"),
                iam.ManagedPolicy.from_aws_managed_policy_name("AmazonSESFullAccess"),
            ]
)

repo = ecr.Repository.from_repository_name(self, "NewsletterRepo", "python_newsletter")

newsletter_lambda = lambda_.DockerImageFunction(self,
    "PythonNewsletterLambda",
    code=lambda_.DockerImageCode.from_ecr(
        repository=repo,
        tag=os.environ["CDK_DOCKER_TAG"]
        ),
    role=lambda_role,
    timeout=Duration.minutes(15)
)

There are three pieces here:

  1. Generating the IAM role. I'm not being super locked-down in this example, since I'm giving the Lambda full S3 access and full SES (Simple Email Service, more on that in the next section) access. You can generate custom locked-down policies here instead of using the AWS-managed policies.
  2. Creating a reference to an ECR repo. ECR is the Docker image repo service that Amazon provides. This is the repo that I pushed my Docker image to in the first section of this article. If you're new to CDK, I want to point out that if you're referencing existing resources you should always use the from_* function from the CDK documentation about the resource you want to use. Check out what I'm talking about in the CDK ECR documentation here.
  3. Deploy the Docker Lambda. You can see that all I had to do was specify which image tag in the ECR repo I want to deploy (via the CDK_DOCKER_TAG environment variable), specify the role I created, and specify that the timeout is 15 minutes.

That's it! Now when I run cdk deploy my Lambda will be deployed with all the proper configuration, and if I ever need to change anything I just have to rerun cdk deploy. The final step is just to schedule the Lambda daily using a cron job in EventBridge.

Secret weapon number three: Python and AWS SES

OK, I acknowledge that Python is about the least secret weapon there is. But now that I've described the infrastructure, you might be wondering what's inside this magical little email-sending Lambda function. I'll leave the tedious API calls to get the data from HN/Reddit/Twitter as an exercise for the reader, but I do want to talk briefly about the design of the code package – particularly how the emails are sent.

Here's the basic process:

  • First, I grab all the data from the community APIs for the previous day and rank by upvotes.
  • Next I grab the top 3 posts about Python from each community, and grab the full HTML of the web page.
  • After parsing out the main content of the page (doesn't always work), I send the main body text to my summarizer algorithm (post about that later!) and get a summary of the article.
  • I take all the metadata about the top three articles for each community and use it to populate a Jinja2 HTML template.
  • Finally, I send out the HTML emails to myself (and my discerning friends and colleagues who have signed up at compellingpython.com) using AWS SES!

SES (Amazon's Simple Email Service) is fantastic for an application like this, because it's just about the cheapest way to send bulk emails programmatically that there is. You can send 62,000 emails a month for free, and after that it's 10 cents per thousand. Ten dollars to send a hundred thousand emails without maintaining all your own SMTP infrastructure is kind of a ridiculous deal.

There are some drawbacks, like having to carefully manage your own deliverability metrics and email lists (most other hosted email services provide these things) but as a simple email sending utility I have no complaints. If you want to pay more you can get dedicated IPs, but for fun projects like this one I just use the shared IPs that SES provides for free.

That's it! That's the service. There is one other aspect to this whole newsletter that I haven't discussed, which is the collection and management of email addresses themselves – I have a separate Lambda function for this that I might write up at a later date. For now though, take care and have a great week.

Joe

Subscribe to The Cloud Consultant

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe