Some notes as I am working on setting up monitoring of some servers.

Pingdom - External active monitoring of availability of various services (ping, http, etc.) including transactions.

Pager Duty

New Relic

Terraform

The world of StatsD

StatsD is a combination of a number of services and protocols.

  • StatsD client (talking in UDP to the dameon)
  • StatsD dameon (talking in batches with the data aggregator)
  • Data aggregator (some kind of a database)
  • Graphing service (used the data in the aggregator)

Data points can be supplied by our application. These use the StatsD client library available in most of the programming languages. They can also be supplied by tools related to our application. (e.g. the tools used to deploy a new version.) There can be also some generic data collector that collects generic server-related data. (e.g. CPU load, memory usage, disk usage).

The data points are sent by the StatsD clients to the StatsD daemon that usually runs on the same machine via UDP. That means the data collection has minimal impact on our service and even if the daemon is down, it does not impact our service. Just the logging data is lost.

There are several StatsD daemon implementations. Apparently the one by Etsy is the most popular.

There are several Data aggregators and Graphings tools (StatsD Backends)

Graphite is an Open Source backend. Build your own monitoring dashboard with Graphite, Statsd, & Grafana How To Install and Use Graphite on an Ubuntu 14.04 Server

Ganglia

DataDog is an integrated service providing both data aggregation, graphs and integration with lots of other tools and services.

Plain database backends (MySQL, InfluxDB, MongoDB) ?

Metric types used in StatsD

  • count (incr, decr) (e.g. every request)
  • amount (e.g. elapsed time)

Other tools