Building a Serverless Email Document Extraction Solution with AWS Textract: Part 3 - Routing Objects for Downstream Processing

In the previous post of this series, we tackled how to land inbound emails routed to an entire domain using SES, a Lambda function, and an S3 bucket. As the whole point of these posts is parsing image-based documents of invoices using AWS Textract, you’re probably wondering how we get from files in S3 to magic, OCR-extracted text. This post gets us most of the way there, addressing some points of improvement on our original solution. In addition, we add the function to our serverless application in this post that actually gets our extracted text back from Textract. In a future post, we’ll stand up a DynamoDB table for storing our outputs and then look at ways to interact with the data we’ve stored there.

AWS SSO: The Good, the Bad, and the Ugly

I recently set up AWS SSO on an engagement. In this instance, I was setting it up using AWS Managed Active Directory as the identity provider. My first impression was how incredibly easy it was to set up the integration of SSO with the Managed AD. The SSO console basically prompts you for your Managed AD instance from a dropdown list. You connect it, and that’s it. AWS Managed AD comes with an OU already set up for you that SSO federates with.

Why You Won't Be Running c7n-org in an AWS Lambda Function

I recently had the good fortune to take on a really fun project at work. First off, the client was incredibly easy to work with, which makes any project (even something I might consider tedious and boring, like migration work) a win in my book. In any case, this wasn’t a boring project – the client asked us to roll out Cloud Custodian across their entire AWS footprint – which at this point consists of an AWS Organization with a decent number of accounts (and more to follow).

Building a Serverless Email Document Extraction Solution with AWS Textract: Part 2 - Landing Inbound Emails

In the first post of this series, we looked at a solution to allow us to define a serverless, email-based workflow to extract relevant information from auto maintenance invoices. Even in this age of accelerated digital transformation, there are still many scenarios in business and life where we receive data that is not in a machine-friendly format; we are building this solution to address these kinds of situations. We use the Serverless Framework to build the core of this solution. We create a few resources and configuration items in this post manually, but you are certainly free to manage these elements with something like CloudFormation or Terraform if desired. This post focuses on the resources highlighted in the figure below, where we design a solution to land incoming emails with S3 and SES, and sort them with a Lambda function:

Solution Focus

Building a Serverless Email Document Extraction Solution with AWS Textract: Part 1 - Overview

Earlier this year, I tried to consolidate all of my automotive maintenance histories into a database-backed system that was the lowest-friction means possible for me to keep up with my records. At the time, I settled on building out a solution using Airtable. I was able to set up a solution very quickly. I am honestly quite happy with the outcome, except that I am still manually keying in records to either the Airtable app or on their site based on the paper records that my auto shop gives me on every visit. Ideally, I would like a solution that handles the data extraction from the paper records I get from my shop and stores it in a structured format that I can easily consume.

AWS Certification Journey: 11 Certifications

11 AWS certifications. What does it mean to have 11 AWS certifications? In the basest interpretation, it means that I have passed (at least) 11 certification exams. Other than that, it means a lot of things, many of which I would have never expected. It means that: many people on LinkedIn now believe that I am their personal learning coach, there to provide them with personalized learning plans and advice; in a similar vein, I have also received quite a few questions/requests from people who now believe me to be their personal AWS Q&A service.

Thoughts on the AWS ML (Beta) Exam

I was able to squeeze in the beta AWS ML exam the week before Christmas. Given that it was several weeks ago, some of the other resources on Medium may be more informative, but I’ll throw my two cents out here for anyone who may be interested. Generally speaking, know about different types of machine learning models (particularly those supported by SageMaker) and in what sorts of situations they’re applicable. These include:

Working With Azure AD in AWS and Moving from Azure SQL to RDS SQL Server

For the last week, I worked on a pretty intense migration of a fairly sizable Azure SQL instance that moved into AWS’ RDS service (running SQL Server). It was an intense project due to the timeline and size of the database. Of course, this involved access to both services, using both web consoles, CLI and native database interfaces. The client was controlling access to AWS and Azure using Azure AD, so I had to figure out federated access to the AWS API/CLI (since we built out their new environment using Terraform).

One-Liners - Get AWS AZ Counts

OK, so not truly a one-liner, but a nice quick-n-dirty way to get a count of all active AZs for each region for your AWS account. echo -e "$(tput bold)Region | # AZs$(tput sgr0)" for region in $(aws ec2 describe-regions | jq -r '.Regions[].RegionName'); do num_azs=$(aws ec2 describe-availability-zones --region ${region} | jq -r '.AvailabilityZones | length') printf '%-15s | %5s\n' ${region} ${num_azs} done This requires jq and the AWS CLI to be installed.

Getting the AWS Big Data Certification

“Why am I so nervous? I haven’t felt like this since… I took the first one.” After five AWS exams (particularly the two pro exams) and a handful of other certification exams, I didn’t really expect to be nervous taking this exam. Oddly enough, this is exactly how I felt going in, during, and in that inexorable “moment” of time between clicking the Submit Exam button and actually seeing the result on the page.