Dogsbody is proud to annouce StatusPile

What is StatusPile?

Simply put – it’s your status page of status pages.

Most service providers have some kind of status page. When something goes wrong, you have to visit all providers, to find out where the issue lies.

With StatusPile you need to visit just one place to see all statuses at-a-glance. 

Login via Auth0 to create your very own customised dashboard and then visit just one dashboard, to see which provider has issues.

Each tile also links directly to that provider’s status page, for when you need the detail.

Oh and did we mention, it’s completely free to use?

Why did we build StatusPile?

One of Dogsbody’s stated aims is to give back to the OpenSource community.

This project has certainly done that. StatusPile is already helping developers and DevOps people around the world. We are also actively encouraging contributions and additions to the service provider list.

The code is on Github.  Feel free to contribute, fork the code, submit a pull request or submit any suggestions of providers you would like to see added in the future.

We asked Dogsbody’s founder what’s behind the project.

“We needed a status dashboard for the services we monitor for our clients. We couldn’t find one, so we built one.
It’s simple to use, works with AuthO authentication and is available for forking. We hope you find it as useful as we do.”
– Dan Benton, founder of Dogsbody Technology

How do I get started?

1) Visit StatusPile.com.

2) Customise your dashboard:

3) Login to save your configuration, using your favourite platform (or email):

(Hat tip to Auth0 for the free license).

We hope you find it as useful as we do. We plan to add more providers and features over the coming months – so why not check it out today?

Tripwire – How and Why

Open Source Tripwire is a powerful tool to have access to.  Tripwire is used by the MOD to monitor systems.  The tool is based on code contributed by Tripwire – a company that provide security products and solutions.  If you need to ensure the integrity of your filesystem Tripwire could be the perfect tool for you.

What is Tripwire

Open Source Tripwire is a popular host based intrusion detection system (IDS).  It is used to monitor filesystems and alert when changes occur.  This allows you to detect intrusions or unexpected changes and respond accordingly.  Tripwire has great flexibility over which files and directories you choose to monitor which you specify in a policy file.

How does it work

Tripwire keeps a database of file and directory meta data.  Tripwire can then be ran regularly to report on any changes.

If you install Tripwire from Ubuntu’s repo as per the instructions below a daily cron will be set-up to send you an email report.  The general view with alerting is that no news is good news.  Due to the nature of Tripwire it’s useful to receive the daily email, that way you’ll notice if Tripwire gets disabled.

Before we start

Before setting up Tripwire please check the following:

  • You’ve configured email on your server.  If not you’ll need to do that first, we’ve got a guide.
  • You’re manually patching your server.  Make sure you don’t have unattended upgrades running (see the manual updates section) as unless you’re co-ordinating Tripwire with your patching process it will be hard to for you to distinguish between expected and unexpected changes.
  • You’re prepared to put some extra time into maintaining this system for the benefit of knowing when your files change.

Installation on Ubuntu

sudo apt-get update
sudo apt-get install tripwire

You’ll be prompted to create your site and local keys, make sure you record them in your password manager.

In your preferred editor open /etc/tripwire/twpol.txt

The changes you make here are based on what you’re looking to monitor, the default config has good coverage of system files but is unlikely to be monitoring your website files if that’s something you wanted to do.  For example, I’ve needed to remove /proc and some of the files in /root that haven’t existed on systems I’ve been monitoring.

Then create the signed policy file and then the database:

sudo twadmin --create-polfile /etc/tripwire/twpol.txt
sudo tripwire --init

At this point it’s worth running a check. You’ll want to make sure it has no errors.

sudo tripwire --check

Finally I’d manually run the daily cron to check the email comes through to you.

sudo /etc/cron.daily/tripwire

Day to day usage

Changing files

After you make changes to your system you’ll need to run a report to check what tripwire sees have changed.

sudo tripwire --check

You can then update the signed database.  This will open up the report allowing you to check you’re happy with the changes before exiting.

sudo tripwire --update -r /var/lib/tripwire/report/$HOSTNAME-yyyyMMdd-HHmmss.twr

You’ll be need your local key in order to update the database.

Changing policy

If you decide you’d like to monitor or exclude some more files you can update /etc/tripwire/twpol.txt.  If you’re monitoring this file you’ll need to update the database as per the above section.  After that you can update the signed policy file (you’ll need your site and local keys for this).

sudo tripwire --update-policy /etc/tripwire/twpol.txt

 

As you can see tripwire can be an amazingly powerful tool in any security arsenal.  We use it as part of our maintenance plans and encourage others to do the same.

 

Feature image by Nathalie licensed CC BY 2.0.

The Cloud Native Computing Foundation

The Cloud Native Computing Foundation (CNCF) is:

an open source software foundation dedicated to making cloud native computing universal and sustainable

They do this by hosting and “incubating” projects they see as valuable, helping them to develop and reach maturity, where they can be used widely in cloud environments.

CNCF has over 350 members including the world’s largest public cloud and enterprise software companies as well as dozens of innovative startups

The CNCF is also backed by the Linux Foundation, who are fast becoming one of the most recognised organisations in the industry. They support the open source community as a whole, aiming to protect and accelerate development of the Linux kernel, along with many other things.

Why should I care?

The CNCF is exciting as, for me at least, it provides a bit of a portal into the way that the industry is moving at the moment.  It showcases both the current behemoths of cloud computing software stacks, along with projects that are likely to replace or supplement them in the future. The CNCF split their projects into 3 main categories:

  • Graduated
  • Incubating
  • Sandbox

Graduated projects are ones that have reached maturity and see wide adoption. The current list of these projects at the time of writing are Kubernetes, Prometheus, Envoy, CoreDNS and containerd. If you’ve been even dabbling in the cloud/linux community, then you’ve probably heard of at least a few of these projects.

Incubating projects are ones that haven’t quite hit the prime time yet, but are well on their way. These currently include projects such as rkt, a container engine that’s a potential competitor for Docker, CNI (Container Network Interface), which focuses on configuring networking within containers, and etcd, a key-value store designed for storing critical system data.

I find the CNCF useful for guiding me on what pieces of software I should be learning to enhance my skill set as they’re likely to be desirable in the short to medium term. It’s also one of the first places I’m likely to check for a piece of software that fits a particular need, as I know that CNCF projects are going to be active, well supported, and have lots of related stack overflow questions / Github issues for when I’m getting started.

Training and Certification

The CNCF also offer some training and certification options. This is useful to prove that you’re familiar and capable with some of the technologies they support. At the time of writing, the training courses and certifications they offer are all kubernetes based (which is by no means a bad thing), but I’m sure they will offer more in the future.

In summary, the CNCF acts a sort of central hub for a lot of the hottest and biggest projects right now, and even if you’re don’t have a particular need for them at this time, it’s good to know what’s out there right now, as well as coming over the hill, and it’s therefore useful for this reason alone.

 

Featured image by chuttersnap on Unsplash

What are Status Pages?

A status pages allows a supplier of a service to let their customers know about outages and issues with their service.  They can be used to show planned maintenance and can hook into e-mail or other update methods but typically they are a website firstly.  Status pages are great; they make things easier for everyone and save time.  If you think you’re having an issue related to a provider you can quickly look at their status page to see if they’re already aware of it before deciding whether or not to contact them.  If they have already acknowledged the problem it also means you don’t need to spend time working out what has changed at your end.

We have one

Our status page status.dogsbody.com, has been running for quite a while. We suggest our customer use it as your first point of call when you spot something odd.  If you’ve got your own server(s) that we maintain, we’ll contact you directly if we start seeing issues.  The status page covers the below…

Support – methods we usually communicate with you

  • Email
  • Telephone
  • Slack

Hosting – our shared hosting servers

  • Indigo (our WordPress only hosting)
  • Purple (our general purpose hosting)
  • Violet (our cPanel hosting)

When we schedule maintenance or have issues we’ll update you via the status page.  If you are an Indigo or Purple customer and want to be notified of issues or maintenance go ahead and subscribe.

You want one

Having an (up to date) status page improves your users experience.  It gives them a quick way to find out what’s going on.  This means they’ll be have a better understanding and (usually) more tolerant of issues you’re already dealing with.

Having a status page is likely to cut down on the number of similar questions you get if you have an outage.  We’ve been really happy with the self-hosted open source software we’re using – Cachet.  We wanted to make sure our status page doesn’t go down at the same time as our other services.  So we’ve used a different server provider to our main infrastructure.  If you want to avoid worrying about that sort of thing, we’ve seen a lot of people are using statuspage.io and status.io.

Feature image background by Wolfgang.W. licensed CC BY 2.0.

How will the Ubuntu 14.04 EOL affect me?

On April 2019, Ubuntu 14.04 reaches end of life (EOL).
We recommend that you update to Ubuntu 18.04.

Over time technology and security evolves, new bugs are fixed and new threats prevented, so in order to maintain a secure infrastructure it is important to keep all software and systems up to date.

Operating systems are key to security, providing the libraries and technologies behind NGINX, Apache and anything else running your application. Old operating systems don’t support the latest technologies which new releases of software depend on, leading to compatibility issues.

Leaving old Ubuntu 14.04 systems past April 2019 leaves you at risk to:

  • Security vulnerabilities of the system in question
  • Making your network more vulnerable as a whole
  • Software incompatibility
  • Compliance issues (PCI)
  • Poor performance and reliability

Ubuntu End of life dates:

Ubuntu LTS (long term support) operating systems come with a 5 year End Of Life policy. This means that after 5 years it receives no maintenance updates including security updates.

  • Ubuntu 14.04 : April 2019
  • Ubuntu 16.04 : April 2021
  • Ubuntu 18.04 : April 2023

Faster:

Just picking up your files and moving them from Ubuntu 14.04 to Ubuntu 18.04 will speed up your site due to the new software.

  • Apache 2.4.7 -> Apache 2.4.29
  • NGINX 1.4.6 -> NGINX 1.14.0
  • MySQL 5.5 -> MySQL 5.7
  • PHP 5.5 -> PHP 7.2

Are you still using an old operating system?

Want to upgrade?

Not sure if this effects you?

Drop us a line and see what we can do for you!

 

Feature image by See1,Do1,Teach1 licensed CC BY 2.0.

PHP 5.6 will go end of life on 31 Dec 2018

Quick Public Safety Announcement, PHP 5.6 goes end of life (EOL) on the 31 December 2018.  This means that known security flaws will no longer be being fixed so any sites you have running on it will become vulnerable, hence it is important you update them to a newer version.

We recommend updating to the latest stable version (at the time of writing this is PHP 7.2).  As this is a major upgrade you will want to test your site on the new version and may need to get your developers to update some code before moving over.

If you’re unsure if you are affected or want a hand upgrading? Get in touch!

Everyone loves a good graph and Jordi Boggiano has provided this great overview of the PHP versions out there in the wild!

Feature image by See1,Do1,Teach1 licensed CC BY 2.0.

Duplicacy: Backup to the cloud

Duplicacy is an open source backup tool which supports a large number of storage back-ends, including many cloud offerings, whilst also providing many other useful features. We recently implemented a duplicacy-based backup solution for a customer, and wanted to share our experience to help out anybody looking to implement duplicacy themselves.

Installation

Duplicacy is written in Go, meaning it can be easily downloaded and compiled on the CLI. However, this involves installing Go on the system you wish to backup, which may not always be an option. Fortunately, duplicacy also provides binary releases, which can be downloaded and executed with ease.

To install duplicacy on a Linux system, the steps are as follows:

wget https://github.com/gilbertchen/duplicacy/releases/download/v2.1.0/duplicacy_linux_x64_2.1.0
sudo mv duplicacy_linux_x64_2.1.0 /usr/local/bin/duplicacy
sudo chmod +x /usr/local/bin/duplicacy

You can then run duplicacy by simple running the “duplicacy” command in your terminal.

Setting up your storage

As mentioned above, duplicacy supports an impressive number of storage back-ends. As of the time of writing, they are:

  • Local disk
  • SFTP
  • Dropbox
  • Amazon S3
  • Wasabi
  • DigitalOcean Spaces
  • Google Cloud Storage
  • Microsoft Azure
  • Backblaze B2
  • Google Drive
  • Microsoft OneDrive
  • Hubic
  • OpenStack Swift
  • WebDAV (under beta testing)
  • pcloud (via WebDAV)
  • Box.com (via WebDAV)

The two options that we’ve used are SFTP and AWS (Amazon Web Services) S3. To backup a system over SFTP, all you need is a working SFTP user on the remote system. No additional set up is required.

The set up for Amazon S3 is a little more involved, in summary, the steps are:

  • Create an Amazon S3 bucket in your preferred region
  • Create an IAM policy granting permissions on this bucket
  • Create an IAM user and assign them this policy
  • Configure duplicacy to use this user and bucket

Creating an S3 bucket

Creating a bucket is pretty straightforward. Login to your Amazon S3 account, go to the S3 service, click “Create bucket”, give your bucket a name, select a region, done. There are some other options when creating a bucket but these are not relevant to this post so I’ll not cover them here.

Creating an IAM policy

IAM stands for Identity and Access Management, and is central to many operations in AWS. To create your policy, navigate to the IAM service in AWS, select “policies” on the left, and click the big blue “Create policy” button at the top.

On this screen, choose the “JSON” tab. This is where we’ll specify the guts of our policy. It should look something like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::dbt-gary-duplicacy-backup-example",
                "arn:aws:s3:::dbt-gary-duplicacy-backup-example/*"
            ]
        }
    ]
}

You’ll need to replace “dbt-gary-duplicacy-backup-example” with the name of the S3 bucket you created in the last step

When you’re happy with your policy, click “Review policy”, followed by “Save changes”

Creating an IAM user and assigning the policy

From the home of the IAM service, now click “Users” on the left, followed by the big blue “Add user” button at the top. Provide a name for your user, and check the “Programmatic access” box below. Click next.

On the next screen, click “Attach existing policies directly”. At the top of the list of policies now listed below, click the “Filter: Policy type” drop-down, and select “Customer managed”. Check the box for your IAM policy, and click “review” to continue, followed by “Create user” on the next page.

Your IAM user and policy have now been created.

Ensure that you save the details now presented to you. You will need these to configure duplicacy

Configuring duplicacy

On the system you wish to backup, navigate to the directory you wish to backup. For example, on the system we configured, this was the “/home” directory. You can now configure duplicacy. The steps are as follows:

sudo duplicacy init your_repo_name s3://your-region@amazon.com/your_bucket_name
sudo duplicacy set -key s3_id -value your_access_key
sudo duplicacy set -key s3_secret -value your_secret_key

There are a number of strings you’ll need to replace in the above snippet:

your_repo_name – The name you’d like to give to this set of backups. For example, “johns-desktop”

your_bucket_name – The name you gave your S3 bucket in the steps above.

your_region – This is the AWS region you select for your buck above. Please see this table, using the “region” column that corresponds to your region name. For example, “eu-west-2” for the London region

your_access_key – This is the access key for the IAM user you created above. It will be a long string of random looking characters.

your_secret_key – This is the secret key for the IAM user you created above. It will again be a long string of random looking characters. Make sure you keep this safe, as anybody who has it can access your backups!

Running a backup

If all went well with the above, then you’re ready to run your first backup. This is as easy as running:

sudo duplicacy backup

This will backup all files under the current directory. Depending on the number of and size of files, this may take some time.

Including/excluding certain files/directories from your backups

Duplicacy offers powerful filtering functionality allowing for fine grained control over what files and directories you want to backup. These can be somewhat confusing to configure, but are very useful once you’ve got the hang of them. We may do a follow up post covering these, so be sure to check back in the future.

Restoring backups from duplicacy

In order to restore from duplicacy, you need to configure your system to interact with your backups. If you’re restoring on the same system the backups were taken on, you need not take any additional steps. If you’re restoring to a different system, you need to follow the installation and duplicacy configuration steps show above.

Once things are configured, you can view the available backups like so:

sudo duplicacy list

Note that you must be in the correct directory on your system (the one where you initialised your repo), in order to view the backups

This will give you a list of your backups:

Snapshot johns-desktop revision 1 created at 2018-04-12 07:29 -hash
Snapshot johns-desktop revision 2 created at 2018-04-12 12:03 
Snapshot johns-desktop revision 3 created at 2018-04-17 17:37 
Snapshot johns-desktop revision 4 created at 2018-04-18 11:10 
Snapshot johns-desktop revision 5 created at 2018-04-18 14:38 
Snapshot johns-desktop revision 6 created at 2018-04-20 03:02 
Snapshot johns-desktop revision 7 created at 2018-04-21 03:02 
Snapshot johns-desktop revision 8 created at 2018-04-22 03:02 
Snapshot johns-desktop revision 9 created at 2018-04-23 03:02

As you can see, there are revision numbers and the corresponding times and dates for these revisions. Revisions are just another name for a backup.

You can then restore a particular backup. For example, to restore revision 7:

sudo duplicacy restore -r 7

Again, depending on the number and size of files in this backup, this may take some time.

Duplicacy offer some really cool features when using the restore command. For example, you can see the contents of a file in a backup with the “cat” option, and compare differences between two backups with the “diff” option. You can see all of the options here.

Selective restores

One of the more useful restore options is to only restore a certain file or directory from your backup. This can be accomplished with the following command:

sudo duplicacy restore -r 7 path/to/your/file.txt

This can also be extended to restore everything under a directory, like so:

sudo duplicacy restore -r 7 path/to/your/directory\*

Summary

Duplicacy is an extremely powerful and portable backup tool, allowing for reliable and fine grained backups of your data. If you have any questions on duplicacy or would like any help setting it up, please leave a comment below or contact us and we’ll be happy to help. Thanks for reading.

Feature image background by 111692634@N04/ licensed CC BY-SA 2.0.

Server harddrive slots

Cloud Storage Pricing Comparison

For a long time AWS has been the go to for cloud storage but the competition has heated up over the last few years. We keep a close eye on the various storage offerings so that we can recommend the best solutions to our customers.  So where are we now?  Who’s the current “winner”?

Obviously, the best provider will depend entirely on what you want to use it for. We frequently use cloud storage for backups. It is a perfect use case, you can sync your backups into multiple geographical locations at the touch of a button. Storage space grows with you and it doesn’t require anything extra on our servers. Backups of course are not the only option.

Here is a handful of use cases for cloud storage:

  • Backups (especially off-site Backups)
  • Online File Sharing
  • Content delivery networks (CDNs)
  • Team Collaboration
  • Infrequently accessed files storage
  • Storage with unparalleled availability (uptime) and durability (data corruption)
  • Static sites (such as my personal site) which is hosted directly out of an AWS S3 bucket

The Data

Below is an abridged version of the data we keep on various providers. This spreadsheet should be kept up-to date as prices changes in the future.

An Example

As we said above, we regularly use cloud storage for for server backups. In this example I am backing up 20 GB’s of data every day. It is stored for 3 months. Each month a backup is downloaded to verify its integrity. This equates to:

  • 1860GB’s of stored data
  • 620GB’s uploaded
  • 20GB’s downloaded
  • 3100 Put requests
  • 100 Get requests

And the winners are (yearly price)…

  1. £113.73 – Backblaze B2
  2. £321.57 – Azure
  3. £335.29 – Digital Ocean Spaces
  4. £386.29 – Google Cloud Storage
  5. £410.96 – IBM Cloud
  6. £419.33 – AWS S3
  7. £1,581.60 – Rackspace

At the time of writing, Backblaze provide the cheapest storage per GB by miles but with two large caveats. They only have two data centres and because of that cannot match the redundancy of bigger companies like AWS.  They also do not have a UK data centre which can cause a potential compliance issue as data has to be stored in the US.

Azure is our current recommendation for new cloud storage setups. They are the second cheapest per GB stored, have a UK based data centre and also provide great control over data redundancy. Digital Ocean are the next cheapest but because of the minimum $5 spend they may not be for everyone.

Gotcha’s

Of course what is right for you will also depend on your current server set up. If you are using AWS for data processing and analytics it makes sense to use them, data transfer within AWS being free.

Most cloud providers price in US dollars which we have converted to UK sterling.  This means that exchange rates can affect storage prices greatly.  Azure were the only provider to provide UK sterling prices directly.

Be sure to check the durability (chance that files will become corrupted) of your data as well as it’s availability (chance that you cannot access a file).

The options are limitless. Interested in what cloud storage can do for you? Drop us a line today!

 

Feature image background by gothopotam licensed CC BY 2.0.

How will Debian 7 end of life affect me?

On 31st May 2018, Debian 7 “Wheezy” reaches end of life (EOL).
We recommend that you update to Debian 9 “Stretch”.

Over time technology and security evolves, new bugs are fixed and new threats prevented, so in order to maintain a secure infrastructure it is important to keep all software and systems up to date.  Once an operating system reaches end of life it no longer receives updates so will end up left with known security holes.

Operating systems are key to security, providing the libraries and technologies behind NGINX, Apache and anything else running your application. Old operating systems don’t support the latest technologies which new releases of software depend on, leading to compatibility issues.

Leaving old Debian 7 systems past May 2018 leaves you at risk to:

  • Security vulnerabilities of the system in question
  • Making your network more vulnerable as a whole
  • Software incompatibility
  • Compliance issues (PCI)
  • Poor performance and reliability

Debian End of life dates:

  • Debian 7 : 31st May 2018
  • Debian 8 : April 2020
  • Debian 9: June 2022

Faster:

Just picking up your files and moving them from Debian 7 to Debian 9 will speed up your site due to the newer software.

  • Apache 2.2.22 -> Apache 2.4.25
  • PHP 5.4 -> PHP 7.0
  • MySQL 5.5 -> MariaDB 10.1

Are you still using an old operating system?

Want to upgrade?

Not sure if this effects you?

Drop us a line and see what we can do for you!

Feature image by See1,Do1,Teach1 licensed CC BY 2.0.

Turning Prometheus data into metrics for alerting

As you may have seen in previous blog posts, we have a Warboard in our office which shows the current status of the servers we manage. Most of our customers are using our new alerting stack, but some have their own monitoring solutions which we want to integrate with. One of these was Prometheus. This blog post covers how we transformed raw Prometheus values into percentages which we could display on our warboard and create alerts against.

Integrating with Prometheus

In order to get a summary of the data from Prometheus to display on the Warboard we first needed to look at what information the Node Exporter provided and how it was tagged. Node Exporter is the software which makes the raw server stats available for Prometheus to collect. Given that our primary concerns are CPU, memory, disk IO and disk space usage we needed to construct queries to calculate them as percentages to be displayed on the Warboard.

Prometheus makes the data accessible through its API on "/api/v1/query?query=<query>". Most of the syntax is fairly logical with the general rule being that if we take an average or maximum value we need to specify "by (instance)" in order to keep each server separate. Node Exporter mostly returns raw values from the kernel rather than trying to manipulate them.  This is nice as it gives you freedom to decide how to use the data but does mean we have to give our queries a little extra consideration:

CPU

(1 - avg(irate(node_cpu{mode="idle"}[10m])) by (instance)) * 100

CPU usage is being reported as an integer that increases over time so we need to calculate the current percentage of usage ourselves. Fortunately Prometheus has the rate and irate functions for us. Since rate is mostly for use in calculating whether or not alert thresholds have been crossed and we are just trying to display the most recent data, irate seems a better fit. We are currently taking data over the last 10 minutes to ensure we get data for all servers, even if they’ve not reported very recently. As total CPU usage isn’t being reported it is easiest use the idle CPU usage to calculate the total as 100% – idle% rather than trying to add up all of the other CPU usage metrics. Since we want separate data for each server we need to group by instance.

Memory

((node_memory_MemTotal - node_memory_MemFree) / node_memory_MemTotal) * 100

The memory query is very simple, the only interesting thing to mention would be that MemAvailable wasn’t added until Linux 3.14 so we are using MemFree to get consistent values from every server.

Disk IO

(max(avg(irate(node_disk_io_time_ms[10m])) by (instance, device)) by (instance))/10

Throughout setting up alerting I feel disk IO has been the “most interesting” metric to calculate. For both Telegraf, which we discuss setting up here, and Node Exporter I found looking at the kernel docs most useful for confirming that disk “io_time” was the correct metric to calculate disk IO as a percentage from. Since we need a percentage we have to rule out anything dealing with bytes or blocks as we don’t want to benchmark or assume the speed of every disk. This leaves us with “io_time” and “weighted_io_time”. “weighted_io_time” might give the more accurate representation of how heavily disks are being used; it multiplies the time waited by a process, by the total number of processes waiting. However we need to use “io_time” in order to calculate a percentage or we would have to factor in the number of processes running at a given time. If there are multiple disks on a system, we are displaying the disk with the greatest IO as we are trying to spot issues so we only need to consider the busiest device. Finally we need to divide by 1000 to convert to seconds and multiply by 100 to get a percentage.

Disk Space

max(((node_filesystem_size{fstype=~"ext4|vfat"} - node_filesystem_free{fstype=~"ext4|vfat"}) / node_filesystem_size{fstype=~"ext4|vfat"}) * 100) by (instance)

As Node Exporter is returning 0 filesystem size for nsfs volumes and there are quite a few temporary and container filesystems that we aren’t trying to monitor, we either need to exclude ones we aren’t interested in or just include those that we are. As with disk IO, many servers have multiple devices / mountpoints so we are just displaying the fullest disk, since again we are trying to spot potential issues.

If you are looking to set-up your own queries I would recommend having a look through the Prometheus functions here and running some ad hoc queries from the "/graph" section of Prometheus in order to look at what data you have available.

If you need any help with Prometheus monitoring then please get in touch and we’ll be happy to help.