Posts

Building a Serverless Email Document Extraction Solution with AWS Textract: Part 3 - Routing Objects for Downstream Processing

In the previous post of this series, we tackled how to land inbound emails routed to an entire domain using SES, a Lambda function, and an S3 bucket. As the whole point of these posts is parsing image-based documents of invoices using AWS Textract, you’re probably wondering how we get from files in S3 to magic, OCR-extracted text. This post gets us most of the way there, addressing some points of improvement on our original solution. In addition, we add the function to our serverless application in this post that actually gets our extracted text back from Textract. In a future post, we’ll stand up a DynamoDB table for storing our outputs and then look at ways to interact with the data we’ve stored there.

AWS SSO: The Good, the Bad, and the Ugly

I recently set up AWS SSO on an engagement. In this instance, I was setting it up using AWS Managed Active Directory as the identity provider. My first impression was how incredibly easy it was to set up the integration of SSO with the Managed AD. The SSO console basically prompts you for your Managed AD instance from a dropdown list. You connect it, and that’s it. AWS Managed AD comes with an OU already set up for you that SSO federates with.

Why You Won't Be Running c7n-org in an AWS Lambda Function

I recently had the good fortune to take on a really fun project at work. First off, the client was incredibly easy to work with, which makes any project (even something I might consider tedious and boring, like migration work) a win in my book. In any case, this wasn’t a boring project – the client asked us to roll out Cloud Custodian across their entire AWS footprint – which at this point consists of an AWS Organization with a decent number of accounts (and more to follow).

Building a Serverless Email Document Extraction Solution with AWS Textract: Part 2 - Landing Inbound Emails

In the first post of this series, we looked at a solution to allow us to define a serverless, email-based workflow to extract relevant information from auto maintenance invoices. Even in this age of accelerated digital transformation, there are still many scenarios in business and life where we receive data that is not in a machine-friendly format; we are building this solution to address these kinds of situations. We use the Serverless Framework to build the core of this solution. We create a few resources and configuration items in this post manually, but you are certainly free to manage these elements with something like CloudFormation or Terraform if desired. This post focuses on the resources highlighted in the figure below, where we design a solution to land incoming emails with S3 and SES, and sort them with a Lambda function:

Solution Focus

Building a Serverless Email Document Extraction Solution with AWS Textract: Part 1 - Overview

Earlier this year, I tried to consolidate all of my automotive maintenance histories into a database-backed system that was the lowest-friction means possible for me to keep up with my records. At the time, I settled on building out a solution using Airtable. I was able to set up a solution very quickly. I am honestly quite happy with the outcome, except that I am still manually keying in records to either the Airtable app or on their site based on the paper records that my auto shop gives me on every visit. Ideally, I would like a solution that handles the data extraction from the paper records I get from my shop and stores it in a structured format that I can easily consume.

Eight Month (Blogging) Hiatus: Writing My First Book

True story: my goal this year was to post here weekly or bi-weekly. I chronicled my intention to score the Cloud Native Foundation’s Certified Kubernetes Administrator (CKA) certification in early January. I loaded that post with more intent to write about the journey than ever. I had a spreadsheet for tracking my efforts; I was going to post about my experiences frequently, the whole nine. Then this happened… As I’m sitting here writing this post, I can’t help but laugh as I look at timestamps of these two events.

AWS Certification Journey: 11 Certifications

11 AWS certifications. What does it mean to have 11 AWS certifications? In the basest interpretation, it means that I have passed (at least) 11 certification exams. Other than that, it means a lot of things, many of which I would have never expected. It means that: many people on LinkedIn now believe that I am their personal learning coach, there to provide them with personalized learning plans and advice; in a similar vein, I have also received quite a few questions/requests from people who now believe me to be their personal AWS Q&A service.

Review of Bespoke Post: Why I Cannot Recommend Their Service

I signed up for Bespoke Post last October, receiving two orders and skipping one. The second box showed up in December – a decanter and two glasses (their Parlor box), only one of the glasses was shattered. “No big deal,” I reason to myself. This sort of thing probably happens every now and then, so I pull up Bespoke’s site and find their support email. I send an email explaining the situation and receive some great news in reply:

AWS Batch Cloud Custodian Docker Starter Pack

I recently implemented a series of AWS Batch jobs for a client. While most of these were for implementing, well, batch jobs, in the form of reporting functions, I decided to give a go to deploying Cloud Custodian using the same framework, as it basically involved creating an additional CloudFormation nested stack, building Custodian policies and baking them into a container to deploy through Batch. Getting everything up and going was a fair bit of work, so I wanted to encapsulate my learnings into something else others could use.

Passing the Google Cloud Professional Data Engineer Exam

This one’s short, sweet, and to the point. Here are my GCP Professional Data Engineer exam notes. Good luck!