• Link to LinkedIn Link to LinkedIn Link to LinkedIn
  • Link to Facebook Link to Facebook Link to Facebook
  • Link to Bluesky Link to BlueskyLink to Bluesky
  • Link to Mastodon Link to MastodonLink to Mastodon
  • Link to Mail Link to Mail Link to Mail
  • Link to Rss Link to Rss Link to Rss
  • Dogsbody Technology Charity Support 2025
Contact us: 01276 818576
Dogsbody Technology
  • Emergency support
  • Infrastructure Services
    • Infrastructure Design
    • Infrastructure Build
    • Server management and monitoring
    • In-life Support
    • Pen Testing & Audit
    • Hosting Services
      • Plesk Hosting
      • VPS & Dedicated Servers
      • Tor Hosting
  • Happy Customers
  • About Us
  • Careers
    • Write your own job
  • News & Views
  • Contact Us
  • Menu Menu

Troubleshooting ‘server down’ issues in Linux

21 Aug 2019/0 Comments/in Managed Services/by Dan Benton

This post will show you the keys to troubleshooting ‘server down’ issues in Linux, cover some common causes for outages and how you can investigate and resolve them. A well configured Linux server offers an incredible level of stability and performance. Inevitably though, outages are pretty much a fact of life.

First though, it’s important to know what exactly an outage is.

An outage is pretty universally defined as…

A period of time where a service is unavailable or when the output of a service is incomplete, corrupted, or otherwise unable to provide normal levels of interaction.

For example, we’d consider both of the following to be outages:

• A website not being returned at all when a user requests it
• A site is returned, but all of the images were missing

Outages come in many shapes and sizes however. Here are some we commonly deal with, as we manage and monitor 100s of customer servers every day.

Service failure

A piece of software on the server stops functioning, either in an unexpected way or altogether. This is usually pretty black and white to investigate.

Is the service running? If not, then you can try starting it back up.

If the service still fails to start, try checking the logs for information.

Typically services only fail when a configuration changes somewhere so that would usually be the first place to start any investigation.

Resource exhaustion

Resources on the server(s) are so strained that requests are extremely slow, or fail due to a timeout.

It’s fairly easy to spot when CPU or RAM is being used up with a simple `top` command.

Disk usage should also be monitored, but many don’t think about disk IO being the bottleneck.  This can be especially true in virtualised environments when IO is limited, sometime intentionally by the likes of AWS.

Resource exhaustion can be due to an attack, which might be looking for vulnerabilities or a denial of service (DoS). Typically these requests need to be blocked further upstream.

Physical issues

In this age of the cloud, it’s easy to forget that servers are still physical (even though they are owned by someone else). Physical hardware does still break and these things need to be checked and monitored.

Hard drives are still one of the most common elements to fail.  This is why RAID is still important in this day and age.

Memory issues are some of the hardest to diagnose as they cause instability to the system, making it fall over in new and interesting ways (we’ve seen them all!).

Fans can also fail, causing the system to overheat and other parts to fail sooner.

We had a new customer come to us recently, because their system was unstable… two of the three fans had failed, causing memory corruption and a single disk error in their RAID array which they knew nothing about, as they had no monitoring.

Network issues

Your site is down, your server is unreachable and apparently dead in the water… or is it?  Could it be absolutely fine but no one can talk to it?

You might be thinking, “what’s the difference?”. The answer is, what needs to be done, to resolve the issue. Chances are, the problem is upstream from your server and there maybe nothing you can actually control yourself – playing a waiting game is your only answer.

“Netsplits” are still surprisingly common on the internet, this is where two parts of the internet are still “working” but unable to talk to each other.

Uptime monitoring utilities can come in handy here. These utilities constantly attempt to connect to your site and/or applications, from various locations around the globe and send notifications if there are issues. The logs can also help you determine if only certain geographies are unable to access your service, or if the problem is truly global.

ISP issues

Typically a mixture of one or more of the above outage types is happening, but is affecting the ISP that is hosting your server/site. When a service provider’s systems experience problems, this can cause cascading issues with anything built on top of, or relying on their systems.

This is typically resolved with a support ticket to the provider(s) having problems.

In our experience, the only way to mitigate this type of issue is to host across multiple providers. This is not for the faint of heart but is possible with enough planning.

Some final thoughts on troubleshooting ‘server down’ issues in Linux

The lines between the outage types we’ve listed can often be blurred, as one can often have a “domino effect” leading to others. For example, a failing hard disk can cause read/writes to slow down massively, which can lead to requests taking longer than usual, which can lead to resource exhaustion as things get tied up.

Debugging outages can be time consuming if there are lots of moving parts to your application/service. We find our members of staff develop a gut instinct over years and years of debugging problems and honing their skills.

If you struggle troubleshooting ‘server down’ issues in Linux boxes on your infrastructure whether shared or self-hosted … if any of the above seems too much for you, or you’d just like somebody else to take your worries away, please contact us and I’m sure we can help.

Share this entry
  • Facebook Facebook Share on Facebook
  • Whatsapp Whatsapp Share on WhatsApp
  • Linkedin Linkedin Share on LinkedIn
  • Reddit Reddit Share on Reddit
  • Mail Mail Share by Mail
https://www.dogsbody.com/wp-content/uploads/24311604930_4413240982_o.png 954 1481 Dan Benton https://www.dogsbody.com/wp-content/uploads/Dogsbody-site-logo-1.png Dan Benton2019-08-21 16:51:182019-08-27 12:56:10Troubleshooting ‘server down’ issues in Linux
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

We are Dogsbody. We take the pain away from building, securing and maintaining IT infrastructure.

Find out how we can help your business

Everything we do is about security. Our team is our strength.

Get in touch

Latest thoughts and news

  • Our Trusted Suppliers after 15+ Years
  • Avoid Surprise AWS RDS Charges in 2026
  • A Season of Giving: Dogsbody Technology Charity Support 2025
  • Wrapping Up 2025: Our Christmas Hours
  • PHP 8.1 will go end of life – 31 Dec 2025
Search Search

Useful links

  • About Us
  • Dogsbody News & Views
  • Contact Us

Linux & cloud services

  • Infrastructure Design
  • Infrastructure Build
  • In life Support
  • Infrastructure Audit
  • Penetration Testing
  • Hosting Services

In life support

  • Overview
  • Helpdesk support
  • Server management and monitoring

Careers

  • Working at Dogsbody
  • Write your own job description
© Copyright 2010-2026 Dogsbody Technology Ltd - Registered in England and Wales 07236558
  • Link to LinkedIn Link to LinkedIn Link to LinkedIn
  • Link to Facebook Link to Facebook Link to Facebook
  • Link to Bluesky Link to BlueskyLink to Bluesky
  • Link to Mastodon Link to MastodonLink to Mastodon
  • Link to Mail Link to Mail Link to Mail
  • Link to Rss Link to Rss Link to Rss
  • Contact us
  • Terms of use
  • Privacy policy
Link to: 5 Vim tips for advanced users Link to: 5 Vim tips for advanced users 5 Vim tips for advanced usersVim tips Link to: Python 2 will go end of life on 01 Jan 2020 Link to: Python 2 will go end of life on 01 Jan 2020 Python 2 will go end of life on 01 Jan 2020
Scroll to top Scroll to top Scroll to top