• Link to LinkedIn Link to LinkedIn Link to LinkedIn
  • Link to Facebook Link to Facebook Link to Facebook
  • Link to Bluesky Link to BlueskyLink to Bluesky
  • Link to Mastodon Link to MastodonLink to Mastodon
  • Link to Mail Link to Mail Link to Mail
  • Link to Rss Link to Rss Link to Rss
  • Dogsbody Technology Charity Support 2025
Contact us: 01276 818576
Dogsbody Technology
  • Emergency support
  • Infrastructure Services
    • Infrastructure Design
    • Infrastructure Build
    • Server management and monitoring
    • In-life Support
    • Pen Testing & Audit
    • Hosting Services
      • Plesk Hosting
      • VPS & Dedicated Servers
      • Tor Hosting
  • Happy Customers
  • About Us
  • Careers
    • Write your own job
  • News & Views
  • Contact Us
  • Menu Menu

Replacement Server Monitoring – Part 2: Building the replacement

27 Mar 2018/3 Comments/in Company/by Gary Rixon

This is part two of a three part series of blog posts about picking a replacement monitoring solution, getting it running and ready, and finally moving our customers over to it.

In our last post we discussed our need for a replacement monitoring system and our pick for the software stack we were going to build it on. If you haven’t already, you should go and read that before continuing with this blog post.

This post aims to detail the set up and configuration of the different components to work together, along with some additional customisations we made to get the functionality we wanted.

Component Installation

As mentioned in the previous entry in this series, InfluxData, the TICK stack creators, provide package repositories where pre-built and ready to use packages are available. This eliminates the need for configuration and compilation of source code before we can use it. This allows us to install and run software with the use of a few commands with very predictable results, as opposed to often many commands needed for compilation, with sometimes wildly varying results. Great stuff.

All components are available from the same repository. Here’s how you install them (example shown is for an Ubuntu 16.04 “Xenial” system

curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/lsb-release
echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
sudo apt-get update && sudo apt-get install influxdb
sudo systemctl start influxdb

The above steps are also identical for the other components, Telegraf, Chronograf and Kapacitor. You’ll just need to replace “influxdb” with the correct name in lines 4 and 5.

Configuring and linking the components

As each of the components are created by the same people, InfluxData, linking them together is fortunately very easy (another reason we went with the TICK stack). I’ll show you what additional configuration was put in place for the components and how we then linked together. Note that the components are out of order here, as the configuration of some components is a prerequisite to linking them to another.

InfluxDB

The main change that we make to InfluxDB is to have it listen for connections over HTTPS, meaning any data flowing to/from it will be encrypted. (To do this, you will need to have an SSL certificate and key pair to use. Obtaining that cert/key pair is outside the scope of the blog post). We also require authentication for logins, and disable the query log. We then restart InfluxDB for these changes to take effect.

sudo vim /etc/influx/influx.conf

[http]
    enabled = true
    bind-address = "0.0.0.0:8086"
    auth-enabled = true
    log-enabled = false
    https-enabled = true
    https-certificate = "/etc/influxdb/ssl/reporting-endpoint.dogsbodytecnhology.com.pem"

sudo systemctl restart influxd

Note that the path used for the “https-certificate” parameter will need to exist on your system of course.

We then need to create an administrative user like so:

influx -ssl -host ivory.dogsbodyhosting.net
> CREATE USER admin WITH PASSWORD 'superstrongpassword' WITH ALL PRIVILEGES

Telegraf

The customisations for Telegraf involve telling it where to reports its metrics to, and what metrics to record. We have an automated process, using ansible for rolling these customisations out to customer servers, which we’ll cover in the next part of this series. Make sure you check back for that. These are essentially what changes are made:

sudo vim /etc/telegraf.d/outputs.conf

[[outputs.influxdb]]
  urls = ["https://reporting-endpoint.dogsbodytechnology.com:8086"]
  database = "3340ad1c-31ac-11e8-bfaf-5ba54621292f"
  username = "3340ad1c-31ac-11e8-bfaf-5ba54621292f"
  password = "supersecurepassword"
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"

The above dictates that Telegraf should connect securely over HTTPS and tells it the username, database and password to use for it’s connection.

We also need to tell Telegraf what metrics it should record. This is configured like so:

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = true
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.net]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.procstat]]
  pattern = "."

The above tells Telegraf what metrics to report, and customises how they are reported a little. For example, we tell it to ignore some pseudo-filesystems in the disk section, as these aren’t important to us.

Kapacitor

The customisations for Kapacitor primarily tell it which InfluxDB instance it should use, and the channels it should use for sending out alerts:

sudo vim /etc/kapacitor/kapacitor.conf
    [http]
    log-enabled = false
    
    [logging]
    level = “WARN”

    [[influxdb]]
    name = "ivory.dogsbodyhosting.net"
    urls = ["https://reporting-endpoint.dogsbodytechnology.com:8086"]
    username = admin
    password = “supersecurepassword”

    [pushover]
    enabled = true
    token = “yourpushovertoken”
    user-key = “yourpushoveruserkey”

    [smtp]
    enabled = true
    host = "localhost"
    port = 25
    username = ""
    password = ""
    from = "alerts@example.com"
    to = ["sysadmin@example.com"]

As you can probably work out, we use Pushover and email to send/receive our alert messages. This is subject to change over time. During the development phase, I used the Slack output.

Chronograf Grafana

Although the TICK stack offers it’s own visualisation (and control) tool, Chronograf, we ended up using the very popular Grafana instead. At the time when we were building the replacement solution, Chronograf, although very pretty, was somewhat lacking in features, and the features that did exist were sometimes buggy. Please do note that Chronograf was the only component that was still in beta at this period in time. It’s now had a full release and another ~5 months of development. You should definitely try it out for yourself before jumping straight to Grafana. We intend to re-evaluate Chronograf ourselves soon, especially as it is able to control the other components in the TICK stack, something which Grafana does not offer at all.

The Grafana install is pretty straightforward, as it also has a package repository:

sudo vim /etc/apt/sources.list.d/grafana.list
    deb https://packagecloud.io/grafana/stable/debian/ jessie main
sudo apt update
sudo apt install grafana

We then of course make some customisations. The important part here is setting the base URL which is required due to the fact we’ve got Grafana running behind an nginx reverse proxy. (We love nginx and use it wherever we get the chance. We won’t detail the customisations here though as they’re not strictly related to the monitoring solution, and Grafana works just fine on it’s own.)

sudo vim /etc/grafana/grafana.ini
    [server]
    domain = display-endpoint.dogsbodytechnology
    root_url = %(protocol)s://%(domain)s:/grafana
sudo systemctl restart grafana

Summary

The steps above left us with a very powerful and customisable monitoring solution, which worked fantastically for us. Be sure to check back for future instalments in this series. In part 3 we cover setting up alerts with Kapacitor, creating awesome visualisations with Grafana, and getting all of our hundreds of customers’ servers reporting in and alerting.

Part three is here.

Replacement Server Monitoring

  • Part 1: Picking a Replacement
  • Part 2: Building the replacement (you are here)
  • Part 3: Kapacitor alerts and going live!

Feature image background by tomandellystravels licensed CC BY 2.0.

Tags: monitoring
Share this entry
  • Facebook Facebook Share on Facebook
  • Whatsapp Whatsapp Share on WhatsApp
  • Linkedin Linkedin Share on LinkedIn
  • Reddit Reddit Share on Reddit
  • Mail Mail Share by Mail
https://www.dogsbody.com/wp-content/uploads/5092961223_55a91f8cd1_o.jpg 1920 2560 Gary Rixon https://www.dogsbody.com/wp-content/uploads/Dogsbody-site-logo-1.png Gary Rixon2018-03-27 10:55:502019-02-07 16:05:50Replacement Server Monitoring – Part 2: Building the replacement
You might also like
What are Status Pages?
Tripwire – How and Why
Warning sign - Outages Common warning signs before server outages
Replacement Server Monitoring – Part 1: Picking a Replacement
Replacement Server Monitoring – Part 3: Kapacitor alerts and going live!
Turning Prometheus data into metrics for alerting
3 replies

Trackbacks & Pingbacks

  1. Turning Prometheus data into metrics for alerting | Dogsbody Technology says:
    6 Aug 2020 at 14:38

    […] “most interesting” metric to calculate. For both Telegraf, which we discuss setting up here, and Node Exporter I found looking at the kernel docs most useful for confirming that disk […]

    Reply
  2. Replacement Server Monitoring - Part 1: Picking a Replacement » Dogsbody Technology Ltd. says:
    7 Feb 2019 at 15:41

    […] Part two is here. […]

    Reply
  3. Replacement Server Monitoring – Part 3: Kapacitor alerts and going live! – Dogsbody Technology Ltd. says:
    10 Apr 2018 at 12:48

    […] in this series of blog posts we’ve discussed picking a replacement monitoring solution and getting it up and running. This instalment will cover setting up the actual alerting rules for our customers’ servers, […]

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

We are Dogsbody. We take the pain away from building, securing and maintaining IT infrastructure.

Find out how we can help your business

Everything we do is about security. Our team is our strength.

Get in touch

Latest thoughts and news

  • Our Trusted Suppliers after 15+ Years
  • Avoid Surprise AWS RDS Charges in 2026
  • A Season of Giving: Dogsbody Technology Charity Support 2025
  • Wrapping Up 2025: Our Christmas Hours
  • PHP 8.1 will go end of life – 31 Dec 2025
Search Search

Useful links

  • About Us
  • Dogsbody News & Views
  • Contact Us

Linux & cloud services

  • Infrastructure Design
  • Infrastructure Build
  • In life Support
  • Infrastructure Audit
  • Penetration Testing
  • Hosting Services

In life support

  • Overview
  • Helpdesk support
  • Server management and monitoring

Careers

  • Working at Dogsbody
  • Write your own job description
© Copyright 2010-2026 Dogsbody Technology Ltd - Registered in England and Wales 07236558
  • Link to LinkedIn Link to LinkedIn Link to LinkedIn
  • Link to Facebook Link to Facebook Link to Facebook
  • Link to Bluesky Link to BlueskyLink to Bluesky
  • Link to Mastodon Link to MastodonLink to Mastodon
  • Link to Mail Link to Mail Link to Mail
  • Link to Rss Link to Rss Link to Rss
  • Contact us
  • Terms of use
  • Privacy policy
Link to: Replacement Server Monitoring – Part 1: Picking a Replacement Link to: Replacement Server Monitoring – Part 1: Picking a Replacement Replacement Server Monitoring – Part 1: Picking a Replacement Link to: Replacement Server Monitoring – Part 3: Kapacitor alerts and going live! Link to: Replacement Server Monitoring – Part 3: Kapacitor alerts and going live! Replacement Server Monitoring – Part 3: Kapacitor alerts and going live!
Scroll to top Scroll to top Scroll to top