A short guide to MySQL database optimization

MySQL is a very popular open source database, but many install it and forget about it. Spending a little time on MySQL database optimization can reap huge returns …

In this article, I want to show you a couple of the first places you should head, when you need to pinpoint bottlenecks or tweak the MySQL configuration.

MySQL slow log

The slow log, will log any queries that take longer than a given number of seconds.

This can help you to identify poorly written or demanding queries.  You can then refactor them or use concepts like “indexes” to speed them up.

It’s often helpful to start with a high “long query time”, to just flag up the longest queries and then gradually reduce it, as you deal with each one in turn.

To enable the slow log, create the following /etc/mysql/mysql.conf.d/mycustomconfigs.cnf or add the following lines to your my.conf file…


… then restart MySQL, to load in the new values.

Improve the query

Once you’ve found a slow query, it’s worth considering if there is a simpler way to get the same information.

If you can improve the performance of the query you might be able to skip looking into why the old one was slow.


If you’re still looking to improve your query, we need to dig into how MySQL is actually running the query and why it’s slow.  This will give us a better idea how to fix it.

For these examples I ran a few basic queries against this auto generated employee data.

Let’s suppose your slow query is:

SELECT AVG(hire_date) FROM employees WHERE emp_no IN (SELECT emp_no FROM dept_manager)

To see how it will be executed, prefix it with EXPLAIN:

mysql> EXPLAIN SELECT AVG(hire_date) FROM employees WHERE emp_no IN (SELECT emp_no FROM dept_manager);
| id | select_type | table        | partitions | type   | possible_keys | key     | key_len | ref                           | rows | filtered | Extra                  |
| 1  | SIMPLE      | dept_manager | NULL       | index  | PRIMARY       | PRIMARY | 8       | NULL                          | 24   | 100.00   | Using index; LooseScan |
| 1  | SIMPLE      | employees    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.dept_manager.emp_no | 1    | 100.00   | NULL                   |
2 rows in set, 1 warning (0.00 sec)

It’s worth having a quick look at the documentation on the output format.  Three of the columns to check are:

  • key – is the index that will be used, the one you’d expect?
  • rows – is the number of rows to be examined as low as possible?
  • filtered – is the percentage of results being filtered as high as possible?

Suppose we’re regularly making the following query:

SELECT * FROM employees WHERE gender = 'M'

This has type: ALL meaning that all rows in the table will be scanned.

It therefore makes sense here to add an index.

After doing so, the type changes to ref – MySQL can simply return the rows matching the index rather than checking every row.

As you’d expect this halves the number of rows:

mysql> EXPLAIN SELECT * FROM employees WHERE gender = 'M';
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered  | Extra       |
| 1  | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299025 | 50.00    | Using where |
1 row in set, 1 warning (0.00 sec)

mysql> CREATE INDEX gender ON employees(gender);
Query OK, 0 rows affected (0.97 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM employees WHERE gender = 'M';
| id | select_type | table     | partitions | type | possible_keys | key    | key_len | ref   | rows   | filtered | Extra |
| 1  | SIMPLE      | employees | NULL       | ref  | gender        | gender | 1       | const | 149512 | 100.00   | NULL  |
1 row in set, 1 warning (0.01 sec)

Third Party Tools

I should mention, that there are a bunch of tools that can help you find bottlenecks and really hone your MySQL database optimization techniques.

For websites written in PHP, we’re big fans of New Relic APM. This tool will allow you to sort pages based on their load time.  You can then dig deeper into whether the application code or database queries have the most room for improvement.

Once you’ve narrowed things down, you can start implementing improvements.

It’s worth having a search for other application monitoring providers to see if tools such as Datadog or DynaTrace better suit you.

MySQL Tuner

MySQL Tuner is a tool which looks at a database’s usage patterns to suggest configuration improvements.  Make sure you run it on a live system, that has been running for at least 24 hours.  Otherwise it won’t have access to enough data to make relevant recommendations.

Before you download and use the tool I’ll echo it’s warning:

It is extremely important for you to fully understand each change you make to a MySQL database server. If you don’t understand portions of the script’s output, or if you don’t understand the recommendations, you should consult a knowledgeable DBA or system administrator that you trust. Always test your changes on staging environments, and always keep in mind that improvements in one area can negatively affect MySQL in other areas.

Once you run MySQL Tuner, it will login to the database, read a bunch of metrics and print out some recommendations.

It’s worth grouping the related ones and reading up on any you haven’t come across.  After that you can test out changes in your staging environment.

One common improvement is to set skip-name-resolve.  This saves a little bit of time on each connection by not performing DNS lookups.  Before you do this make sure you aren’t using DNS in any of your grant statements (you’re just using IP addresses or localhost).

Your friendly SysAdmins

Of course, we are also here to help and regularly advise customers on changes that can be made to their infrastructure.

Give us a shout if you think we can help you too.


Feature image by Joris Leermakers licensed CC BY-SA 2.0.

Exploring character encoding types

Morse code was first used to transfer information in the 1840’s.  As you’re probably aware, it uses a series of dots and dashes to represent each character.

Computers need a way to represent characters in binary form – as a series of ones and zeros – equivalent to the dots and dashes used by Morse code.


A widely used way for computers to encode information, is ASCII (American Standard Code for Information Interchange), created in the 1960’s.

ASCII defines a string of seven ones and zeros that represent the letters A-Z, upper and lowercase as well as numbers 0-9 and common symbols. 128 characters in total.

8 bit encoding

As you’d expect, ASCII is well suited for use in America, however it’s missing many characters that are frequently used in other countries.

For example, it doesn’t include characters like é or £ & €.

Due to ASCII’s popularity, it’s been used as a base to create many different encodings.  All these different encodings add an extra eighth bit, doubling the possible number of characters and using the additional space for characters used by differing groups …

  • Latin 1 – Adds Western Europe and Americas (Afrikaans, Danish, Dutch, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Spanish and Swedish) characters.
  • Latin 2 – Adds Latin-written Slavic and Central European (Czech, Hungarian, Polish, Romanian, Croatian, Slovak, Slovene) characters.
  • Latin 3 – Adds Esperanto, Galician, Maltese, and Turkish characters.
  • Latin 4 – Adds Scandinavia/Baltic, Estonian, Latvian, and Lithuanian characters (is an incomplete predecessor of Latin 6).
  • Cyrillic – Adds Bulgarian, Byelorussian, Macedonian, Russian, Serbian and Ukrainian characters.
  • Arabic – Adds Non-accented Arabic characters.
  • Modern Greek – Adds Greek characters.
  • Hebrew – Adds Non-accented Hebrew characters.
  • Latin 5 – Same as Latin 1 except for Turkish instead of Icelandic characters
  • Latin 6 – Adds Lappish/Nordic/Eskimo languages. Adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were missing in Latin 4 to cover the entire Nordic area characters.
  • etc.

All of this still doesn’t give global coverage though! There’s also an issue due to the inability of using different encodings on a single document, should you ever need to use characters from different character sets.

We need an alternative …


Unicode seeks to unify all the characters into one set.

This simplifies communication, as everyone can use a shared character set and doesn’t need to convert between them.

Unicode allows for over a million characters!

One of the most popular ways to encode Unicode, is as UTF-8.  UTF-8 has a variable width. Depending on the character used to encode, either 8, 16, 24 or 32 bits are used.

For characters in the ASCII character set, only 8 bits need to be used.

Another way to encode Unicode is UTF-32, which always uses 32 bit. This fixed width is simpler, but causes it to often use significantly more space than UTF-8.


You probably don’t need telling, but Emoji are picture characters.

For a long time, knowledge workers have created smiley faces and more complex emoticons using symbols.

To take this a step further, emoji provide a wealth of characters.

The data transferred is always the same, but the pictures used differ between different platforms. Depending on the device, you’re viewing this, our smiley face emoji, 😊, will look different.

The popularity of emoji has actually helped push Unicode support, which includes emoji as part of its character set.

I’ve pulled out a few recently added ones and you can see more on the Unicode website.

U+1F996 added in 2017 – T-Rex 🦖
U+1F99C added in 2018 – Parrot 🦜
U+1F9A5 added in 2019 – Sloth 🦥


Feature image by Thomas licensed CC BY-SA 2.0.

WP-CLI – The Swiss Army Knife For WordPress

WP-CLI (WordPress Command-Line Interface) is an open source project providing a command-line interface for managing WordPress sites. It is an extremely powerful and versatile tool, being able to carry out pretty much any operation that would normally be carried out via the web control panel, along with some additional functions that are only available via the CLI.

We use WP-CLI extensively here at Dogsbody Technology. It allows us to streamline and automate our WordPress set up and maintenance routine, so we thought we’d spread the word and get everybody else in on the action.


There are a few installation methods for WP-CLI, all which are documented here. We typically use the Phar installation method, which is as simple as:

curl -O https://raw.githubusercontent.com/wp-cli/builds/gh-pages/phar/wp-cli.phar
chmod +x wp-cli.phar
sudo mv wp-cli.phar /usr/local/bin/wp

Basic Usage

Unless otherwise instructed, WP-CLI will operate on the site contained in your current working directory. So if you want to work on a particular site you’ll need to “cd” to the installation directory before running your command, or alternatively you can pass the --path argument to WP-CLI. e.g.

wp --path=/var/www/dogsbody.com plugin install yoast

Creating a new site

As well as managing existing sites, WP-CLI can also set up new ones. You’ll need to create a MySQL database and user, but beyond that WP-CLI can handle the rest. A basic site download/install procedure may look something like this:

wp core download --locale=en_GB
wp core config --dbname=database_name --dbuser=database_user --dbpass=database_password
wp core install --url=www.dogsbody.com --title="Dogsbody's Website" --admin_user=dogsbody --admin_password=admin_password --admin_email= --skip-email

Re-Writing a site

We often have customers wanting to take an existing site and re-configuring it to work on a new domain, or wanting to add HTTPS to an existing site and update everything to be served securely. WP-CLI makes this otherwise quite complex process much easier with with it’s search/replace feature:

wp search-replace 'https://www.dogsbodytechnology.com' 'https://www.dogsbody.com' --skip-columns=guid

(It’s advisable to skip the guid column as the guid of posts/pages within WordPress should never change).

In summary, WP-CLI is a very powerful tool and one that anybody working with WordPress sites often should at least be aware of. It can save you heaps of time and help you avoid mistakes.

If you want any help with WP-CLI, then please contact us. Or if you want some seriously fast and secure WordPress hosting, be sure to check out our WordPress hosting.

Happy World IPv6 Day 2019

Today IPv6 is 7 years old. While IPv6 was drafted in 1998 a global permanent deployment of IPv6 happened on 6 June 2012.

Unlike Google Play or the Raspberry Pi, which were launched in the same year, IPv6 adoption seems to be lagging behind with an increase of misinformation and organisations just ignoring the fact it even exists.

Currently IPv4 and IPv6 coexist in the Internet. Companies such as Sky completed there roll out of IPv6 way back in 2016 so if you still think the ‘internet doesn’t run on ipv6’ then you are very much mistaken.

Google IPv6 adoption graph shows how increasing important having IPv6 is and will be to your business.IPv6 Adoption

What is IPv6?

IPv6 uses a 128-bit address, theoretically allowing 2128, or approximately 3.4×1038 addresses. The actual number is slightly smaller, as multiple ranges are reserved for special use or completely excluded from use. The total number of possible IPv6 addresses is more than 7.9×1028 times as many as IPv4, which uses 32-bit addresses and provides approximately 4.3 billion addresses.

Why do I need to worry about it?

IPv4 has fewer than 4.3 billion addresses available – which may seem a crazy amount but since the internet become more popular back in the 1980’s they knew the addresses would run out!  The addition of millions of mobile devices over the last few years have not helped this at all. Sure enough IPv4 is now in the final stages of exhausting its unallocated address space, and yet still carries most Internet traffic.

Are you and your Business ready for IPV6?

Do you have IPv6 on your server? Does your monitoring solution monitor both IPv4 and IPv6?

Dogsbody Technology Server monitoring and management has included monitoring of IPv6 from its launch 6 years ago but we are still amazed at how many companies don’t support IPv6.  We still have trouble finding suppliers that fully support it and there is now an ongoing race for people to make an operating system that is IPv6 only from the ground up.

Certainly.  We try to set all servers up with IPv6 as standard.

Further reading:

Feature image by Phil Benchoff licensed  CC by 2.0

Infographic: Losing the automatic right to .uk

If you own any third-level domains ending .uk then the matching second-level .uk domain may have been reserved for you until 25 June 2019.  For example, if you own example.co.uk your ability to register the shorter example.uk may have been reserved.

After 1 July 2019 any reserved .uk domains that have not been registered will be released into the public domain meaning they can be registered by anyone.

What should I do?

Assuming you want the shorter .uk version of a domain then there are a number of checks to go through.  You can check the rights to a domain on the Nominet website or we have made the following handy infographic to get you started…

Rights to a .uk domain

Why is this happening?

In June 2014 (5 years ago) Nominet, the controllers of the .uk Top Level Domain (TLD) decided to allow people to register second level domains.  That is, to allow people to register example.uk (second level domain names) instead of being forced to register example.co.uk, example.org.uk, example.ltd.uk etc. (third level domain names).

They wanted to make it fair for existing rights holders and domain owners to obtain one of the shorter .uk domains and so locked access to stop anyone registering any domains that already existed as third level domains for 5 years.

Five years later and that time is now up.  In July 2019 anyone will be able to register any second level .uk domain no matter whether the equivalent third level domain is registered or not.

I’m eligible – How do I register the .uk version of my domain?

Contact your current registrar who will be able to help you with this. Remember you need to register the .uk domain name yourself before 6am BST (UTC+1) on the 25th of June 2019.

There is a .uk domain I want but am not eligible  – what can I do?

Wait…. If the eligible party don’t purchase it then it becomes publicly available to be purchased by anyone from the 1st July 2019.
We plan on doing a follow up blog post nearer the time on “name dropping” services that can be used to grab the domains you want when they become available.

If any of this is too much for you then give us a shout, we are here to help.  Do remember though… a domain name is for life, not just for Christmas 😉

More detailed information on this subject can be found on nominet.uk.

Please feel free to share this infographic with anyone you feel may find this useful using the buttons below.

Dogsbody is proud to announce StatusPile

What is StatusPile?

Simply put – it’s your status page of status pages.

Most service providers have some kind of status page. When something goes wrong, you have to visit all providers, to find out where the issue lies.

With StatusPile you need to visit just one place to see all statuses at-a-glance. 

Login via Auth0 to create your very own customised dashboard and then visit just one dashboard, to see which provider has issues.

Each tile also links directly to that provider’s status page, for when you need the detail.

Oh and did we mention, it’s completely free to use?

Why did we build StatusPile?

One of Dogsbody’s stated aims is to give back to the OpenSource community.

This project has certainly done that. StatusPile is already helping developers and DevOps people around the world. We are also actively encouraging contributions and additions to the service provider list.

The code is on Github.  Feel free to contribute, fork the code, submit a pull request or submit any suggestions of providers you would like to see added in the future.

We asked Dogsbody’s founder what’s behind the project.

“We needed a status dashboard for the services we monitor for our clients. We couldn’t find one, so we built one.
It’s simple to use, works with AuthO authentication and is available for forking. We hope you find it as useful as we do.”
– Dan Benton, founder of Dogsbody Technology

How do I get started?

1) Visit StatusPile.com.

2) Customise your dashboard:

3) Login to save your configuration, using your favourite platform (or email):

(Hat tip to Auth0 for the free license).

We hope you find it as useful as we do. We plan to add more providers and features over the coming months – so why not check it out today?

Tripwire – How and Why

Open Source Tripwire is a powerful tool to have access to.  Tripwire is used by the MOD to monitor systems.  The tool is based on code contributed by Tripwire – a company that provide security products and solutions.  If you need to ensure the integrity of your filesystem Tripwire could be the perfect tool for you.

What is Tripwire

Open Source Tripwire is a popular host based intrusion detection system (IDS).  It is used to monitor filesystems and alert when changes occur.  This allows you to detect intrusions or unexpected changes and respond accordingly.  Tripwire has great flexibility over which files and directories you choose to monitor which you specify in a policy file.

How does it work

Tripwire keeps a database of file and directory meta data.  Tripwire can then be ran regularly to report on any changes.

If you install Tripwire from Ubuntu’s repo as per the instructions below a daily cron will be set-up to send you an email report.  The general view with alerting is that no news is good news.  Due to the nature of Tripwire it’s useful to receive the daily email, that way you’ll notice if Tripwire gets disabled.

Before we start

Before setting up Tripwire please check the following:

  • You’ve configured email on your server.  If not you’ll need to do that first, we’ve got a guide.
  • You’re manually patching your server.  Make sure you don’t have unattended upgrades running (see the manual updates section) as unless you’re co-ordinating Tripwire with your patching process it will be hard for you to distinguish between expected and unexpected changes.
  • You’re prepared to put some extra time into maintaining this system for the benefit of knowing when your files change.

Installation on Ubuntu

sudo apt-get update
sudo apt-get install tripwire

You’ll be prompted to create your site and local keys, make sure you record them in your password manager.

In your preferred editor open /etc/tripwire/twpol.txt

The changes you make here are based on what you’re looking to monitor, the default config has good coverage of system files but is unlikely to be monitoring your website files if that’s something you wanted to do.  For example, I’ve needed to remove /proc and some of the files in /root that haven’t existed on systems I’ve been monitoring.

Then create the signed policy file and then the database:

sudo twadmin --create-polfile /etc/tripwire/twpol.txt
sudo tripwire --init

At this point it’s worth running a check. You’ll want to make sure it has no errors.

sudo tripwire --check

Finally I’d manually run the daily cron to check the email comes through to you.

sudo /etc/cron.daily/tripwire

Day to day usage

Changing files

After you make changes to your system you’ll need to run a report to check what tripwire sees have changed.

sudo tripwire --check

You can then update the signed database.  This will open up the report allowing you to check you’re happy with the changes before exiting.

sudo tripwire --update -r /var/lib/tripwire/report/$HOSTNAME-yyyyMMdd-HHmmss.twr

You’ll need your local key in order to update the database.

Changing policy

If you decide you’d like to monitor or exclude some more files you can update /etc/tripwire/twpol.txt.  If you’re monitoring this file you’ll need to update the database as per the above section.  After that you can update the signed policy file (you’ll need your site and local keys for this).

sudo tripwire --update-policy /etc/tripwire/twpol.txt


As you can see tripwire can be an amazingly powerful tool in any security arsenal.  We use it as part of our maintenance plans and encourage others to do the same.


Feature image by Nathalie licensed CC BY 2.0.

The Cloud Native Computing Foundation

The Cloud Native Computing Foundation (CNCF) is:

an open source software foundation dedicated to making cloud native computing universal and sustainable

They do this by hosting and “incubating” projects they see as valuable, helping them to develop and reach maturity, where they can be used widely in cloud environments.

CNCF has over 350 members including the world’s largest public cloud and enterprise software companies as well as dozens of innovative startups

The CNCF is also backed by the Linux Foundation, who are fast becoming one of the most recognised organisations in the industry. They support the open source community as a whole, aiming to protect and accelerate development of the Linux kernel, along with many other things.

Why should I care?

The CNCF is exciting as, for me at least, it provides a bit of a portal into the way that the industry is moving at the moment.  It showcases both the current behemoths of cloud computing software stacks, along with projects that are likely to replace or supplement them in the future. The CNCF split their projects into 3 main categories:

  • Graduated
  • Incubating
  • Sandbox

Graduated projects are ones that have reached maturity and see wide adoption. The current list of these projects at the time of writing are Kubernetes, Prometheus, Envoy, CoreDNS and containerd. If you’ve been even dabbling in the cloud/linux community, then you’ve probably heard of at least a few of these projects.

Incubating projects are ones that haven’t quite hit the prime time yet, but are well on their way. These currently include projects such as rkt, a container engine that’s a potential competitor for Docker, CNI (Container Network Interface), which focuses on configuring networking within containers, and etcd, a key-value store designed for storing critical system data.

I find the CNCF useful for guiding me on what pieces of software I should be learning to enhance my skill set as they’re likely to be desirable in the short to medium term. It’s also one of the first places I’m likely to check for a piece of software that fits a particular need, as I know that CNCF projects are going to be active, well supported, and have lots of related stack overflow questions / Github issues for when I’m getting started.

Training and Certification

The CNCF also offer some training and certification options. This is useful to prove that you’re familiar and capable with some of the technologies they support. At the time of writing, the training courses and certifications they offer are all kubernetes based (which is by no means a bad thing), but I’m sure they will offer more in the future.

In summary, the CNCF acts a sort of central hub for a lot of the hottest and biggest projects right now, and even if you’re don’t have a particular need for them at this time, it’s good to know what’s out there right now, as well as coming over the hill, and it’s therefore useful for this reason alone.


Featured image by chuttersnap on Unsplash

What are Status Pages?

A status pages allows a supplier of a service to let their customers know about outages and issues with their service.  They can be used to show planned maintenance and can hook into e-mail or other update methods but typically they are a website firstly.  Status pages are great; they make things easier for everyone and save time.  If you think you’re having an issue related to a provider you can quickly look at their status page to see if they’re already aware of it before deciding whether or not to contact them.  If they have already acknowledged the problem it also means you don’t need to spend time working out what has changed at your end.

We have one

Our status page status.dogsbody.com, has been running for quite a while. We suggest our customer use it as your first point of call when you spot something odd.  If you’ve got your own server(s) that we maintain, we’ll contact you directly if we start seeing issues.  The status page covers the below…

Support – methods we usually communicate with you

  • Email
  • Telephone
  • Slack

Hosting – our shared hosting servers

  • Indigo (our WordPress only hosting)
  • Purple (our general purpose hosting)
  • Violet (our cPanel hosting)

When we schedule maintenance or have issues we’ll update you via the status page.  If you are an Indigo or Purple customer and want to be notified of issues or maintenance go ahead and subscribe.

You want one

Having an (up to date) status page improves your users experience.  It gives them a quick way to find out what’s going on.  This means they’ll be have a better understanding and (usually) more tolerant of issues you’re already dealing with.

Having a status page is likely to cut down on the number of similar questions you get if you have an outage.  We’ve been really happy with the self-hosted open source software we’re using – Cachet.  We wanted to make sure our status page doesn’t go down at the same time as our other services.  So we’ve used a different server provider to our main infrastructure.  If you want to avoid worrying about that sort of thing, we’ve seen a lot of people are using statuspage.io and status.io.

Feature image background by Wolfgang.W. licensed CC BY 2.0.

Duplicacy: Backup to the cloud

Duplicacy is an open source backup tool which supports a large number of storage back-ends, including many cloud offerings, whilst also providing many other useful features. We recently implemented a duplicacy-based backup solution for a customer, and wanted to share our experience to help out anybody looking to implement duplicacy themselves.


Duplicacy is written in Go, meaning it can be easily downloaded and compiled on the CLI. However, this involves installing Go on the system you wish to backup, which may not always be an option. Fortunately, duplicacy also provides binary releases, which can be downloaded and executed with ease.

To install duplicacy on a Linux system, the steps are as follows:

wget https://github.com/gilbertchen/duplicacy/releases/download/v2.1.0/duplicacy_linux_x64_2.1.0
sudo mv duplicacy_linux_x64_2.1.0 /usr/local/bin/duplicacy
sudo chmod +x /usr/local/bin/duplicacy

You can then run duplicacy by simple running the “duplicacy” command in your terminal.

Setting up your storage

As mentioned above, duplicacy supports an impressive number of storage back-ends. As of the time of writing, they are:

  • Local disk
  • SFTP
  • Dropbox
  • Amazon S3
  • Wasabi
  • DigitalOcean Spaces
  • Google Cloud Storage
  • Microsoft Azure
  • Backblaze B2
  • Google Drive
  • Microsoft OneDrive
  • Hubic
  • OpenStack Swift
  • WebDAV (under beta testing)
  • pcloud (via WebDAV)
  • Box.com (via WebDAV)

The two options that we’ve used are SFTP and AWS (Amazon Web Services) S3. To backup a system over SFTP, all you need is a working SFTP user on the remote system. No additional set up is required.

The set up for Amazon S3 is a little more involved, in summary, the steps are:

  • Create an Amazon S3 bucket in your preferred region
  • Create an IAM policy granting permissions on this bucket
  • Create an IAM user and assign them this policy
  • Configure duplicacy to use this user and bucket

Creating an S3 bucket

Creating a bucket is pretty straightforward. Login to your Amazon S3 account, go to the S3 service, click “Create bucket”, give your bucket a name, select a region, done. There are some other options when creating a bucket but these are not relevant to this post so I’ll not cover them here.

Creating an IAM policy

IAM stands for Identity and Access Management, and is central to many operations in AWS. To create your policy, navigate to the IAM service in AWS, select “policies” on the left, and click the big blue “Create policy” button at the top.

On this screen, choose the “JSON” tab. This is where we’ll specify the guts of our policy. It should look something like this:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "arn:aws:s3:::*"
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [

You’ll need to replace “dbt-gary-duplicacy-backup-example” with the name of the S3 bucket you created in the last step

When you’re happy with your policy, click “Review policy”, followed by “Save changes”

Creating an IAM user and assigning the policy

From the home of the IAM service, now click “Users” on the left, followed by the big blue “Add user” button at the top. Provide a name for your user, and check the “Programmatic access” box below. Click next.

On the next screen, click “Attach existing policies directly”. At the top of the list of policies now listed below, click the “Filter: Policy type” drop-down, and select “Customer managed”. Check the box for your IAM policy, and click “review” to continue, followed by “Create user” on the next page.

Your IAM user and policy have now been created.

Ensure that you save the details now presented to you. You will need these to configure duplicacy

Configuring duplicacy

On the system you wish to backup, navigate to the directory you wish to backup. For example, on the system we configured, this was the “/home” directory. You can now configure duplicacy. The steps are as follows:

sudo duplicacy init your_repo_name s3://your-region@amazon.com/your_bucket_name
sudo duplicacy set -key s3_id -value your_access_key
sudo duplicacy set -key s3_secret -value your_secret_key

There are a number of strings you’ll need to replace in the above snippet:

your_repo_name – The name you’d like to give to this set of backups. For example, “johns-desktop”

your_bucket_name – The name you gave your S3 bucket in the steps above.

your_region – This is the AWS region you select for your buck above. Please see this table, using the “region” column that corresponds to your region name. For example, “eu-west-2” for the London region

your_access_key – This is the access key for the IAM user you created above. It will be a long string of random looking characters.

your_secret_key – This is the secret key for the IAM user you created above. It will again be a long string of random looking characters. Make sure you keep this safe, as anybody who has it can access your backups!

Running a backup

If all went well with the above, then you’re ready to run your first backup. This is as easy as running:

sudo duplicacy backup

This will backup all files under the current directory. Depending on the number of and size of files, this may take some time.

Including/excluding certain files/directories from your backups

Duplicacy offers powerful filtering functionality allowing for fine grained control over what files and directories you want to backup. These can be somewhat confusing to configure, but are very useful once you’ve got the hang of them. We may do a follow up post covering these, so be sure to check back in the future.

Restoring backups from duplicacy

In order to restore from duplicacy, you need to configure your system to interact with your backups. If you’re restoring on the same system the backups were taken on, you need not take any additional steps. If you’re restoring to a different system, you need to follow the installation and duplicacy configuration steps show above.

Once things are configured, you can view the available backups like so:

sudo duplicacy list

Note that you must be in the correct directory on your system (the one where you initialised your repo), in order to view the backups

This will give you a list of your backups:

Snapshot johns-desktop revision 1 created at 2018-04-12 07:29 -hash
Snapshot johns-desktop revision 2 created at 2018-04-12 12:03 
Snapshot johns-desktop revision 3 created at 2018-04-17 17:37 
Snapshot johns-desktop revision 4 created at 2018-04-18 11:10 
Snapshot johns-desktop revision 5 created at 2018-04-18 14:38 
Snapshot johns-desktop revision 6 created at 2018-04-20 03:02 
Snapshot johns-desktop revision 7 created at 2018-04-21 03:02 
Snapshot johns-desktop revision 8 created at 2018-04-22 03:02 
Snapshot johns-desktop revision 9 created at 2018-04-23 03:02

As you can see, there are revision numbers and the corresponding times and dates for these revisions. Revisions are just another name for a backup.

You can then restore a particular backup. For example, to restore revision 7:

sudo duplicacy restore -r 7

Again, depending on the number and size of files in this backup, this may take some time.

Duplicacy offer some really cool features when using the restore command. For example, you can see the contents of a file in a backup with the “cat” option, and compare differences between two backups with the “diff” option. You can see all of the options here.

Selective restores

One of the more useful restore options is to only restore a certain file or directory from your backup. This can be accomplished with the following command:

sudo duplicacy restore -r 7 path/to/your/file.txt

This can also be extended to restore everything under a directory, like so:

sudo duplicacy restore -r 7 path/to/your/directory\*


Duplicacy is an extremely powerful and portable backup tool, allowing for reliable and fine grained backups of your data. If you have any questions on duplicacy or would like any help setting it up, please leave a comment below or contact us and we’ll be happy to help. Thanks for reading.

Feature image background by 111692634@N04/ licensed CC BY-SA 2.0.