Custom App Metrics with Datadog [Part 2]

In my last post, we discovered how to customize some of Datadog’s pre-packaged integrations to build actionable insights for SQL-backed applications quickly. In this post, we are going to dive a bit deeper and look at how we might integrate Datadog with pre-built/custom metrics tooling (such as a shell script, for example).

As I wrapped up the last post, I alluded to a metric “gathered via a different route”; this metric keeps a count of the number of individual client processes running on the host. It is generated using a lightweight shell script called from a cron job; for a general point of reference, the shell script looks like this:

#!/usr/bin/env bash

num_procs=$(ps awx | grep textract.py | grep -v grep | wc -l)
echo $num_procs

I called this script textract-count and placed it in my PATH so that I can call it without path prefixing.

Before discovering how easy it is to include and alert on these types of metrics in Datadog, I had my cron job push a quick message to an SNS topic which pings my mobile should the number fall below a tolerable threshold. In the course of building out the dashboard from the previous post, I discovered it is quite trivial to push the value to Datadog and wrap it in enough context to expose it for alerting and dashboarding.

Datadog has a blog post covering the mechanics of constructing the (text-based) message and how to send it to the DogStatsD daemon running locally (which is a prerequisite to using this method), so I am not going to rehash that content here. The only issue I encountered with their documentation is where the writer employs this syntax to echo out his string to localhost:8125:

 $ echo -n "datadogstring" >/dev/udp/localhost/8125

The use of /dev/tcp and /dev/udp are bash builtins. In fact, bash specifically has to be compiled to use them. My Ubuntu distro did not have this feature in bash (nor did I care to recompile bash). I was able to use netcat to work around the issue like this:

$ echo -n "datadogstring" | nc -4u -w1 localhost 8125

Netcat is available for most distros, is a quick yum install or apt-get install away, and is probably much easier than recompiling bash. With this solution able to effectively push a metric, I swapped out my SNS-based cron task with one simply runs an echo command like the above and looks like this:

#!/usr/bin/bash env
PATH=$PATH:/usr/local/bin
echo -n "extractions.processes.by_host.$HOSTNAME:`textract-count`|g|#document_parser" | nc -4u -w1 localhost 8125

Now that the metrics are available in Datadog, they are part of a wider ecosystem of available metrics, such as EC2 and host-level metrics, making it much easier to analyse issues when they arise and perform RCAs when things go awry. Also, now I can add it to my dashboard!

Datadog Dashboard

comments powered by Disqus