Some notes as I am working on setting up monitoring of some servers.
Pingdom - External active monitoring of availability of various services (ping, http, etc.) including transactions.
The world of StatsD
StatsD is a combination of a number of services and protocols.
- StatsD client (talking in UDP to the dameon)
- StatsD dameon (talking in batches with the data aggregator)
- Data aggregator (some kind of a database)
- Graphing service (used the data in the aggregator)
Data points can be supplied by our application. These use the StatsD client library available in most of the programming languages. They can also be supplied by tools related to our application. (e.g. the tools used to deploy a new version.) There can be also some generic data collector that collects generic server-related data. (e.g. CPU load, memory usage, disk usage).
The data points are sent by the StatsD clients to the StatsD daemon that usually runs on the same machine via UDP. That means the data collection has minimal impact on our service and even if the daemon is down, it does not impact our service. Just the logging data is lost.
There are several StatsD daemon implementations. Apparently the one by Etsy is the most popular.
There are several Data aggregators and Graphings tools (StatsD Backends)
DataDog is an integrated service providing both data aggregation, graphs and integration with lots of other tools and services.
Plain database backends (MySQL, InfluxDB, MongoDB) ?
Metric types used in StatsD
- count (incr, decr) (e.g. every request)
- amount (e.g. elapsed time)
- Collectd to collect statistics and save in RDDTool or send to a StatsD data collector. (without the need for a StatsD daemon)
- Grafana to display graphs
- Munin - Practical resource monitoring with Munin
- collecting data with Carbon daemon