Troubleshooting ‘server down’ issues in Linux

This post will show you the keys to troubleshooting ‘server down’ issues in Linux, cover some common causes for outages and how you can investigate and resolve them. A well configured Linux server offers an incredible level of stability and performance. Inevitably though, outages are pretty much a fact of life.

First though, it’s important to know what exactly an outage is.

An outage is pretty universally defined as…

A period of time where a service is unavailable or when the output of a service is incomplete, corrupted, or otherwise unable to provide normal levels of interaction.

For example, we’d consider both of the following to be outages:

• A website not being returned at all when a user requests it
• A site is returned, but all of the images were missing

Outages come in many shapes and sizes however. Here are some we commonly deal with, as we manage and monitor 100s of customer servers every day.

Service failure

A piece of software on the server stops functioning, either in an unexpected way or altogether. This is usually pretty black and white to investigate.

Is the service running? If not, then you can try starting it back up.

If the service still fails to start, try checking the logs for information.

Typically services only fail when a configuration changes somewhere so that would usually be the first place to start any investigation.

Resource exhaustion

Resources on the server(s) are so strained that requests are extremely slow, or fail due to a timeout.

It’s fairly easy to spot when CPU or RAM is being used up with a simple `top` command.

Disk usage should also be monitored, but many don’t think about disk IO being the bottleneck.  This can be especially true in virtualised environments when IO is limited, sometime intentionally by the likes of AWS.

Resource exhaustion can be due to an attack, which might be looking for vulnerabilities or a denial of service (DoS). Typically these requests need to be blocked further upstream.

Physical issues

In this age of the cloud, it’s easy to forget that servers are still physical (even though they are owned by someone else). Physical hardware does still break and these things need to be checked and monitored.

Hard drives are still one of the most common elements to fail.  This is why RAID is still important in this day and age.

Memory issues are some of the hardest to diagnose as they cause instability to the system, making it fall over in new and interesting ways (we’ve seen them all!).

Fans can also fail, causing the system to overheat and other parts to fail sooner.

We had a new customer come to us recently, because their system was unstable… two of the three fans had failed, causing memory corruption and a single disk error in their RAID array which they knew nothing about, as they had no monitoring.

Network issues

Your site is down, your server is unreachable and apparently dead in the water… or is it?  Could it be absolutely fine but no one can talk to it?

You might be thinking, “what’s the difference?”. The answer is, what needs to be done, to resolve the issue. Chances are, the problem is upstream from your server and there maybe nothing you can actually control yourself – playing a waiting game is your only answer.

“Netsplits” are still surprisingly common on the internet, this is where two parts of the internet are still “working” but unable to talk to each other.

Uptime monitoring utilities can come in handy here. These utilities constantly attempt to connect to your site and/or applications, from various locations around the globe and send notifications if there are issues. The logs can also help you determine if only certain geographies are unable to access your service, or if the problem is truly global.

ISP issues

Typically a mixture of one or more of the above outage types is happening, but is affecting the ISP that is hosting your server/site. When a service provider’s systems experience problems, this can cause cascading issues with anything built on top of, or relying on their systems.

This is typically resolved with a support ticket to the provider(s) having problems.

In our experience, the only way to mitigate this type of issue is to host across multiple providers. This is not for the faint of heart but is possible with enough planning.

Some final thoughts on troubleshooting ‘server down’ issues in Linux

The lines between the outage types we’ve listed can often be blurred, as one can often have a “domino effect” leading to others. For example, a failing hard disk can cause read/writes to slow down massively, which can lead to requests taking longer than usual, which can lead to resource exhaustion as things get tied up.

Debugging outages can be time consuming if there are lots of moving parts to your application/service. We find our members of staff develop a gut instinct over years and years of debugging problems and honing their skills.

If you struggle troubleshooting ‘server down’ issues in Linux boxes on your infrastructure whether shared or self-hosted … if any of the above seems too much for you, or you’d just like somebody else to take your worries away, please contact us and I’m sure we can help.

5 benefits of IT managed services

For start ups, small and medium sized businesses, the benefits of IT managed services can be huge.

IT managed services providers, like our team, here at Dogsbody, have the ability and skills to grow with your business. This means you don’t have to go through the pain of recruiting, setting up and maintaining an in-house team which would need to cover every eventuality, niche, new technology and idea.

Choosing a managed service provider (MSP) is time consuming and stressful. Establishing trust is a huge part of this process, which can’t happen overnight. In most cases you will be handing over the keys to your kingdom to a 3rd party, which is a scary thought.

“Your choice of MSP needs to become an extension to your business – a long term partner not simply a provider.”

Dan Benton, founder Dogsbody

So, if you’re prepared to make that leap of faith, what are some of the benefits of IT managed services?

Benefit #1: Cost savings and stability

Consider first of all, the costs of recruiting a number of different full time specialists into your business.  Then consider the cost of running that team… not just their salary but training, holidays, pensions and sick leave. Then consider the cost of hardware and software for that team member (and keeping it up to date).

Many people think of outsourcing as expensive, because they see the monthly costs and occasional large invoice, however this is a fraction of the price of hiring directly.

With IT managed services, you’re ‘borrowing’ an on-demand, experienced team (in our case Linux sysadmins), who do their job for multiple customers every day.

Instantly, you can remove the worries of having cover for holidays and sickness with no organisation required on your behalf.

24/7 services means you also don’t have to worry about the cost and time of on-call rotas or importantly, being disturbed in the middle of the night.

An IT Managed service provider usually offer plans or packages which have a fixed monthly cost (we do) and this means you know what you are paying and can budget accordingly.

Benefit #2: Expertise across platforms

Working for multiple customers, managed service providers like Dogsbody, have the advantage of knowing what to look out for.

We look for the pitfalls …

We see trending issues across multiple customers/platforms with similar set ups …

And we keep up to date with the latest news on vulnerabilities, new technologies, features, security and latest regulations.

Technology moves fast.

When a large project looms, resources can be scaled to accommodate. Support or maintenance contracts can be increased or decreased (if allowed) depending on requirements.

Any good managed service provider will also monitor more than you could probably imagined could be monitored.

Dogsbody monitors a LOT for our customers. StatusPile is our SaaS service – a status page of status pages …

Proactive monitoring means potential issues can be caught and mitigated before they become real issues. Problems can also be highlighted and identified impartially, which means your managed service provider can advise on the best ways to increase your overall operational efficiency. They should be doing this proactively. If they’re not, you may want to talk to us!

Benefit #3: Freeing your valuable staff

Companies often try to save money by carrying out expert tasks internally.

Do you get Sue the website developer to maintain the servers your business critical website resides on?

This is fine until something big comes up, which can take Sue and her team away from their ‘day job’. Worse still, Sue’s team may end up attempting to fix issues they may have little knowledge of. Dogsbody often has contacts from companies with server emergencies and they need expert help to resolve a situation. It’s one of the most common ways our long term relationships start.

As an example, server maintenance isn’t a revenue earner for most businesses, but is essential to reduce the risk of data breaches or having old vulnerabilities being attacked. Having your server(s) offline and losing revenue through lost sales, compensation, legal action or financial penalties is one of the most stressful things any business owner can endure.

Having servers and infrastructure managed correctly, have huge benefits to a business, including improved uptime, protection against threats and ensuring compliance with the likes of PCI or GDPR.

A clear benefit of IT managed services is that Sue and her team have time to focus on their role, generating revenue, and driving your business forward.

Benefit #4: Managed service providers do just one thing well

One of the biggest benefits of IT managed services is specialism.

Managed services providers do what they do well. After all, it’s their business.

“Dogsbody often turns down lucrative projects if it’s not our core expertise. We hope your MSP would do the same.”

Dan Benton, founder Dogsbody

Our niche is Linux, we don’t do Microsoft, we don’t do Apple, we don’t manage printers. This means we don’t dilute our skills trying to support everyone and everything.

Being a specialist, not a generalist means you know you’re dealing with the best in the business, which we know builds trust.

Benefit #5: A technology partner can grow with your business

Growing your business takes time, planning, patience and of course money. An MSP can provide a firm long term foundation on which the business can then grow.

No company knows definitively what they will need in 2, 3 or 4 years time, especially when it comes to technology. We have helped our customers build various iterations of their technology stacks as they have grown, giving them access to expertise they didn’t even realise was a requirement when they started working with us.

Benefits-of-IT-managed-services--- growth no matter which direction

When a large project looms, resources can be easily scaled to accommodate. Support or maintenance contracts can be increased or decreased (if allowed) depending on requirements, so the cost of scaling up may be nearly negligible.

Outsourcing your IT to a proficient managed service provider will always help you to build a stronger foundation for the future.

Agree with those benefits of IT managed services?

If you like what you see and want to put us to the test?  get in touch

Our specialism is Linux Managed Services but we’re happy to point you in the right direction if you’ve other requests too.