Warning sign - Outages

Common warning signs before server outages

Everyone knows that server outages and server down time cost. It directly affects your business in a number of ways including:

  • Loss of opportunities
  • Damage to your brand
  • Data loss
  • Lost sales
  • Lost trust

It’s essential to stay on top and ahead of any potential downtime.

Here are three areas where you need to be ahead of the curve:

Know your limits / server resources

Physical resource shortages

A common cause of downtime is from running out of server resources.

Whether it is RAM, CPU, disk space or other, when you run out, you risk data corruption, programs crashing and severe slowdowns to say the least. It is essential to perform regular server monitoring of your resources.

One of the most important; yet overlooked metrics, is disk space. Running out of disk space is one of the most preventable issues facing IT systems in our opinion.

When you run out of disk space, your system can no longer save files, losing data and leading to data corruption.

Often your website might still look like it is up and running and it’s only when a customer interacts with it, perhaps uploading new data or adding an item to a shopping basket, that you find it then fails to work.

We see this happen most frequently, when there is a “run-away log file” that keeps expanding until everything stops on the server!

CMS systems like Magento fall particularly prey to this as they often have unchecked application logs.

Internally, we record all server resource metrics every 10 seconds onto our MINDER stack and alerts will be raised well in advance of disk space running out. You don’t need to be this ‘advanced’ – you could simply have a script check current disk space hourly and email you if it is running out.

Configured resource shortages

Another common resource limit is a misconfigured server.

You could have a huge server with more CPU cores, RAM and storage than you could dream of using, but if your software isn’t configured to use it it won’t matter.

For example, if you were using PHP-FPM, and hadn’t configured it correctly, it would only have five processes running to process PHP. This means that in the case of a traffic spike, the first five requests would be served as normal but anything beyond that 5th request will be queued up until the first five had been served. This will of course needlessly slowing the site down for visitors.

Issues like this are often flagged up in server logs, letting you know when you hit these configured limits, so it is good to keep your eyes on them. These logs can also indicate that your site is getting busier and help you to grow your infrastructure in good time, along with your visitors.

You might be thinking, “why are there these arbitrary limits getting in my way? I don’t need these at all”.

Well, it is good to have these limits so that in the case of an unusual traffic spike, everything will run slowly but importantly it will work! If they are set too high, or not set at all, you might reach the aforementioned “physical limits” issue risking data corruption and crashing.

Did you know, by default NGINX only runs with one single threaded worker!

Providers

As a small business, it is normally impossible to do everything in house – and why would you want to,  when you need to focus on your business?

So it is good to step back every once in a while and document your suppliers.

Even if you only own a simple website, suppliers could include:

  • Domain registrar (OpenSRS, Domainbox, …)
  • DNS providers (Route 53, DNS Made Easy, …)
  • Server hosting (Rapidswitch, Linode, AWS EC2, …)
  • Server maintenance (Dogsbody Technology, …)
  • Website software updates (WordPress, Magento, …)
  • Website plugin updates (Akismit, W3 Total Cache, …)
  • Content Delivery Network (Cloudflare, Akamai, …)
  • Third parties (Sagepay, Worldpay, …)

All of these providers need to keep their software and/or services up to date. Some will cause more impact on you than others.

Planned maintenance

Looking at server hosting, all servers need maintenance every now and again, perhaps to load in a recent security update or to migrate you away from ageing hardware.

The most important point here is to be aware of it.

All reputable providers will send notifications about upcoming maintenance windows and depending on the update they will let you reschedule the maintenance for a more convenient time – reducing the effect on your business.

It is also good to have someone (like us) on call in case it doesn’t go to plan. Maintenance work might start in the dead of night, but if no one realises it’s still broken at 09:00, heads might roll!

Unplanned maintenance

Not all downtime can is planned. Even the giants, Facebook and Amazon have unplanned outages now and again.

This makes it critical to know where to go if your provider is having issues. Most providers have support systems where you can reach their technical team. Our customers can call us up at a moments notice.

Another good first point of call is a provider’s status page, here you can see any current (as well as past or future) maintenance or issues that are occurring. For example if you use Linode you can see issues on their status page here.

Earlier this year, we developed Status Pile a webapp, which combines provider status information into one place, making it easier for you to see issues at a glance.

Uptime monitoring

This isn’t really a warning sign, but it’s impossible to foresee everything. The above areas are great places to start, but they can’t cover you for the unexpected.

That’s where uptime monitoring comes in. Regardless of the cause, you need to know when your site goes down and you need to know fast.

We recommend monitoring your website at least minutely with a provider like Pingdom or AppBeat.

Proper configuration

Just setting up uptime monitoring is one thing, but it is imperative to configure it properly. You can tell someone to “watch the turkey in the oven” and so they watch the turkey burn!

I’ve seen checks which make sure a site returns a webpage, but if that page says, “error connecting to database” it doesn’t matter!

Good website monitoring checks the page returned includes the correct status code and site content. Perhaps your website connects to your docker application but only for specific actions then you should test specifically as well.

Are you checking your entire website stack?

Cartoon Dog sat at Table woth fire around him. Next frames he is saying 'this is fine'

Who is responsible?

A key part of uptime monitoring – and a number of other items I have mentioned – is that it alerts the right people and that they action those alerts.

If your uptime alerts flag an outage and they are sent to an accounts team it’s unlikely they’ll be able to take action. Equally if an alert comes in late in the evening when no one is around your site might be down until 0900 the next morning.

This is where our maintenance service comes in. We have a support team on call 24/7, ready to jump on any issues.

 

Phew that was a lot, we handle all of this and more. Contact us and see how we can give you peace of mind.

Feature image by Andrew Klimin licensed CC BY 2.0.

Troubleshooting ‘server down’ issues in Linux

This post will show you the keys to troubleshooting ‘server down’ issues in Linux, cover some common causes for outages and how you can investigate and resolve them. A well configured Linux server offers an incredible level of stability and performance. Inevitably though, outages are pretty much a fact of life.

First though, it’s important to know what exactly an outage is.

An outage is pretty universally defined as…

A period of time where a service is unavailable or when the output of a service is incomplete, corrupted, or otherwise unable to provide normal levels of interaction.

For example, we’d consider both of the following to be outages:

• A website not being returned at all when a user requests it
• A site is returned, but all of the images were missing

Outages come in many shapes and sizes however. Here are some we commonly deal with, as we manage and monitor 100s of customer servers every day.

Service failure

A piece of software on the server stops functioning, either in an unexpected way or altogether. This is usually pretty black and white to investigate.

Is the service running? If not, then you can try starting it back up.

If the service still fails to start, try checking the logs for information.

Typically services only fail when a configuration changes somewhere so that would usually be the first place to start any investigation.

Resource exhaustion

Resources on the server(s) are so strained that requests are extremely slow, or fail due to a timeout.

It’s fairly easy to spot when CPU or RAM is being used up with a simple `top` command.

Disk usage should also be monitored, but many don’t think about disk IO being the bottleneck.  This can be especially true in virtualised environments when IO is limited, sometime intentionally by the likes of AWS.

Resource exhaustion can be due to an attack, which might be looking for vulnerabilities or a denial of service (DoS). Typically these requests need to be blocked further upstream.

Physical issues

In this age of the cloud, it’s easy to forget that servers are still physical (even though they are owned by someone else). Physical hardware does still break and these things need to be checked and monitored.

Hard drives are still one of the most common elements to fail.  This is why RAID is still important in this day and age.

Memory issues are some of the hardest to diagnose as they cause instability to the system, making it fall over in new and interesting ways (we’ve seen them all!).

Fans can also fail, causing the system to overheat and other parts to fail sooner.

We had a new customer come to us recently, because their system was unstable… two of the three fans had failed, causing memory corruption and a single disk error in their RAID array which they knew nothing about, as they had no monitoring.

Network issues

Your site is down, your server is unreachable and apparently dead in the water… or is it?  Could it be absolutely fine but no one can talk to it?

You might be thinking, “what’s the difference?”. The answer is, what needs to be done, to resolve the issue. Chances are, the problem is upstream from your server and there maybe nothing you can actually control yourself – playing a waiting game is your only answer.

“Netsplits” are still surprisingly common on the internet, this is where two parts of the internet are still “working” but unable to talk to each other.

Uptime monitoring utilities can come in handy here. These utilities constantly attempt to connect to your site and/or applications, from various locations around the globe and send notifications if there are issues. The logs can also help you determine if only certain geographies are unable to access your service, or if the problem is truly global.

ISP issues

Typically a mixture of one or more of the above outage types is happening, but is affecting the ISP that is hosting your server/site. When a service provider’s systems experience problems, this can cause cascading issues with anything built on top of, or relying on their systems.

This is typically resolved with a support ticket to the provider(s) having problems.

In our experience, the only way to mitigate this type of issue is to host across multiple providers. This is not for the faint of heart but is possible with enough planning.

Some final thoughts on troubleshooting ‘server down’ issues in Linux

The lines between the outage types we’ve listed can often be blurred, as one can often have a “domino effect” leading to others. For example, a failing hard disk can cause read/writes to slow down massively, which can lead to requests taking longer than usual, which can lead to resource exhaustion as things get tied up.

Debugging outages can be time consuming if there are lots of moving parts to your application/service. We find our members of staff develop a gut instinct over years and years of debugging problems and honing their skills.

If you struggle troubleshooting ‘server down’ issues in Linux boxes on your infrastructure whether shared or self-hosted … if any of the above seems too much for you, or you’d just like somebody else to take your worries away, please contact us and I’m sure we can help.

5 benefits of IT managed services

For start ups, small and medium sized businesses, the benefits of IT managed services can be huge.

IT managed services providers, like our team here at Dogsbody, have the ability and skills to grow with your business. This means you don’t have to go through the pain of recruiting, setting up and maintaining an in-house team which would need to cover every eventuality, niche, new technology and idea.

Choosing a managed service provider (MSP) is time consuming and stressful. Establishing trust is a huge part of this process, which can’t happen overnight. In most cases you will be handing over the keys to your kingdom to a 3rd party, which is a scary thought.

“Your choice of MSP needs to become an extension to your business – a long term partner not simply a provider.”

Dan Benton, founder Dogsbody

So, if you’re prepared to make that leap of faith, what are some of the benefits of IT managed services?

Benefit #1: Cost savings and stability

Consider first of all, the costs of recruiting a number of different full time specialists into your business.  Then consider the cost of running that team… not just their salary but training, holidays, pensions and sick leave. Then consider the cost of hardware and software for that team member (and keeping it up to date).

Many people think of outsourcing as expensive, because they see the monthly costs and occasional large invoice, however this is a fraction of the price of hiring directly.

With IT managed services, you’re ‘borrowing’ an on-demand, experienced team (in our case Linux sysadmins), who do their job for multiple customers every day.

Instantly, you can remove the worries of having cover for holidays and sickness with no organisation required on your behalf.

24/7 services means you also don’t have to worry about the cost and time of on-call rotas or importantly, being disturbed in the middle of the night.

An IT Managed service provider usually offer plans or packages which have a fixed monthly cost (we do) and this means you know what you are paying and can budget accordingly.

Benefit #2: Expertise across platforms

Working for multiple customers, managed service providers like Dogsbody, have the advantage of knowing what to look out for.

We look for the pitfalls …

We see trending issues across multiple customers/platforms with similar set ups …

And we keep up to date with the latest news on vulnerabilities, new technologies, features, security and latest regulations.

Technology moves fast.

When a large project looms, resources can be scaled to accommodate. Support or maintenance contracts can be increased or decreased (if allowed) depending on requirements.

Any good managed service provider will also monitor more than you could probably imagined could be monitored.

Dogsbody monitors a LOT for our customers. StatusPile is our SaaS service – a status page of status pages …

Proactive monitoring means potential issues can be caught and mitigated before they become real issues. Problems can also be highlighted and identified impartially, which means your managed service provider can advise on the best ways to increase your overall operational efficiency. They should be doing this proactively. If they’re not, you may want to talk to us!

Benefit #3: Freeing your valuable staff

Companies often try to save money by carrying out expert tasks internally.

Do you get Sue the website developer to maintain the servers your business critical website resides on?

This is fine until something big comes up, which can take Sue and her team away from their ‘day job’. Worse still, Sue’s team may end up attempting to fix issues they may have little knowledge of. Dogsbody often has contacts from companies with server emergencies and they need expert help to resolve a situation. It’s one of the most common ways our long term relationships start.

As an example, server maintenance isn’t a revenue earner for most businesses, but is essential to reduce the risk of data breaches or having old vulnerabilities being attacked. Having your server(s) offline and losing revenue through lost sales, compensation, legal action or financial penalties is one of the most stressful things any business owner can endure.

Having servers and infrastructure managed correctly, have huge benefits to a business, including improved uptime, protection against threats and ensuring compliance with the likes of PCI or GDPR.

A clear benefit of IT managed services is that Sue and her team have time to focus on their role, generating revenue, and driving your business forward.

Benefit #4: Managed service providers do just one thing well

One of the biggest benefits of IT managed services is specialism.

Managed services providers do what they do well. After all, it’s their business.

“Dogsbody often turns down lucrative projects if it’s not our core expertise. We hope your MSP would do the same.”

Dan Benton, founder Dogsbody

Our niche is Linux, we don’t do Microsoft, we don’t do Apple, we don’t manage printers. This means we don’t dilute our skills trying to support everyone and everything.

Being a specialist, not a generalist means you know you’re dealing with the best in the business, which we know builds trust.

Benefit #5: A technology partner can grow with your business

Growing your business takes time, planning, patience and of course money. An MSP can provide a firm long term foundation on which the business can then grow.

No company knows definitively what they will need in 2, 3 or 4 years time, especially when it comes to technology. We have helped our customers build various iterations of their technology stacks as they have grown, giving them access to expertise they didn’t even realise was a requirement when they started working with us.

Benefits-of-IT-managed-services--- growth no matter which direction

When a large project looms, resources can be easily scaled to accommodate. Support or maintenance contracts can be increased or decreased (if allowed) depending on requirements, so the cost of scaling up may be nearly negligible.

Outsourcing your IT to a proficient managed service provider will always help you to build a stronger foundation for the future.

Agree with those benefits of IT managed services?

If you like what you see and want to put us to the test?  get in touch

Our specialism is Linux Managed Services but we’re happy to point you in the right direction if you’ve other requests too.