• Link to LinkedIn Link to LinkedIn Link to LinkedIn
  • Link to Facebook Link to Facebook Link to Facebook
  • Link to Bluesky Link to BlueskyLink to Bluesky
  • Link to Mastodon Link to MastodonLink to Mastodon
  • Link to Mail Link to Mail Link to Mail
  • Link to Rss Link to Rss Link to Rss
  • Dogsbody Technology Charity Support 2025
Contact us: 01276 818576
Dogsbody Technology
  • Emergency support
  • Infrastructure Services
    • Infrastructure Design
    • Infrastructure Build
    • Server management and monitoring
    • In-life Support
    • Pen Testing & Audit
    • Hosting Services
      • Plesk Hosting
      • VPS & Dedicated Servers
      • Tor Hosting
  • Happy Customers
  • About Us
  • Careers
    • Write your own job
  • News & Views
  • Contact Us
  • Menu Menu

5 things you need to know when working with big logs

12 Mar 2019/0 Comments/in Knowledge Base/by Rob Hooper

With everything being logged the logs on a busy server can get very big and very noisy. The bigger your logs the harder it is to extract the information you want, therefore it is essential to have a number of analytics techniques up your sleeve.

In the case of an outage logs are indispensable to see what happened. If you’re under attack it will be logged. Everything is logged so it is essential to pay attention.
– From my last blog post why there’s nothing quite like Logcheck.

These are our top five tips when working with large log files.

1. tail

The biggest issue with log files is the size, logs can easily grow into gigabytes. Most text editing tools normally used with other text files (vim, nano, gedit etc) load these files into memory, this is not an option when the file is larger than your systems memory.

The tail command fixes this by only getting the bottom lines of the log file. It fixes the problem of reading the whole file into memory by only loading the final bytes of the file.

Log files nearly always have new log lines appended to the bottom of them meaning they are already in chronological order. tail therefore gets you the most recent logs.

A good technique with this is to use tail to export a section of the log (in this example the last 5000 lines of the log). This means you can comb a smaller extract (perhaps with the further tools below) without needing to look through every single log line, reducing resource usage on the server.

tail -5000 > ~/logfile.log

You may also find the head command useful, it is just like tail but for the top lines in a file.

2. grep is your best friend.

Perhaps you are only interested in a certain lines in your log file, then you need grep.

For example if you are only interested in a specific timestamp, this grep returns all of the logs that match the 05th March 2019 at 11:30 until 11:39.

grep "05/Mar/2019:11:3" logfile.log

When using grep you need to know what is in the log file and how it is formatted, head and tail can help there.

Be careful to not assume things, different logs are often written in different formats logs even when they are created by the one application (for example trying to compare webserver access and error logs).

So far I have only used grep inclusively but you can also use it to exclude lines. For example the bellow command returns all logs from the 05th of March at 11:30 and then removes lines from two IP’s. You can use this to remove your office IP’s from your log analytics.

grep "05/Mar/2019:11:3" logfile.log | grep -v '203.0.113.43\|203.0.113.44'

3. Unique identifiers

grep is at its best when working with unique identifiers as you saw above we focussed in on a specific time stamps. This can be extended to any unique identifier but what do you look for?

A great unique identifier for web server logs is the visitors IP address this can be used to follow their session and see all of the URL’s they visited on the site. Unless they are trying to obfuscate it, their IP address persists everywhere the visitor goes so can be used when collating logs between multiple servers.

grep "203.0.113.43" server1-logfile.log server2-logfile.log

Some software includes its own unique identifiers for example email software like postfix logs a unique ID against each email it processes. You can use this identifier to collate all logs related to a specific email. It could be that the email has been stuck in the system for days which this approach will pick up on.

This command will retrieve all logs with the unique identifier 123ABC123A from all files that start “mail.log” (mail.log.1, mail.log.3.gz)

zgrep '123ABC123A' mail.log*

Taking points 2 and 3 one step further, with a little bit of command line magic. This command returns the IP addresses of the most frequent site visitors at on the 5th of March at 11 AM.

grep "05/Mar/2019:11:" nginx-access.log | awk '{ print $1 }' | sort | uniq -c | sort -n | tail

4. Logrotate

As I have said before logs build up quickly over time and to keep your logs manageable it is good to rotate them. This means that rather than one huge log file you have multiple of smaller files. Logrotate is a system tool which does this, in fact you may likely find that it is already installed.

It stores its config’s in /etc/logrotate.d and most software provides their own config’s to rotate their logs.

If you are still dealing with large log files then it may well be time to edit these config’s.

A quick win might be rotating the file daily rather than weekly.

You can also configure logrotate to rotate files based on size rather than date.

5. AWS Athena

AWS Athena brings your log analytics to the next level. With it you can turn your text log file into a database and search it with SQL queries. This is great for when you are working with huge volumes of log data. To make this easier Athena natively supports the Apache log format and only charge you for the queries you make.

AWS have lots of good documentation on setting up Athena and tying it into Apache logs.

Fighting huge log files? not getting the insights you want? Contact us and see how we can help.

 

Feature image by Ruth Hartnup licensed CC BY 2.0.

Tags: Amazon AWS, log management, logs
Share this entry
  • Facebook Facebook Share on Facebook
  • Whatsapp Whatsapp Share on WhatsApp
  • Linkedin Linkedin Share on LinkedIn
  • Reddit Reddit Share on Reddit
  • Mail Mail Share by Mail
https://www.dogsbody.com/wp-content/uploads/9239050337_22c79acab6_k.jpg 1535 2048 Rob Hooper https://www.dogsbody.com/wp-content/uploads/Dogsbody-site-logo-1.png Rob Hooper2019-03-12 11:24:232019-03-12 11:24:235 things you need to know when working with big logs
You might also like
See you at AWS Summit London 2015
Avoid Surprise AWS RDS Charges in 2026
AWS services that need to be on your radar
Setting up IPv6 in your AWS VPC
Server harddrive slots Cloud Storage Pricing Comparison
Buzzword Bingo
Setting up IPv6 on your EC2
Pushover Alerts Alerts & Webhooks with AWS Lambda
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

We are Dogsbody. We take the pain away from building, securing and maintaining IT infrastructure.

Find out how we can help your business

Everything we do is about security. Our team is our strength.

Get in touch

Latest thoughts and news

  • Our Trusted Suppliers after 15+ Years
  • Avoid Surprise AWS RDS Charges in 2026
  • A Season of Giving: Dogsbody Technology Charity Support 2025
  • Wrapping Up 2025: Our Christmas Hours
  • PHP 8.1 will go end of life – 31 Dec 2025
Search Search

Useful links

  • About Us
  • Dogsbody News & Views
  • Contact Us

Linux & cloud services

  • Infrastructure Design
  • Infrastructure Build
  • In life Support
  • Infrastructure Audit
  • Penetration Testing
  • Hosting Services

In life support

  • Overview
  • Helpdesk support
  • Server management and monitoring

Careers

  • Working at Dogsbody
  • Write your own job description
© Copyright 2010-2026 Dogsbody Technology Ltd - Registered in England and Wales 07236558
  • Link to LinkedIn Link to LinkedIn Link to LinkedIn
  • Link to Facebook Link to Facebook Link to Facebook
  • Link to Bluesky Link to BlueskyLink to Bluesky
  • Link to Mastodon Link to MastodonLink to Mastodon
  • Link to Mail Link to Mail Link to Mail
  • Link to Rss Link to Rss Link to Rss
  • Contact us
  • Terms of use
  • Privacy policy
Link to: Password Managers: What, How & Why? Link to: Password Managers: What, How & Why? Password Managers: What, How & Why? Link to: The Cloud Native Computing Foundation Link to: The Cloud Native Computing Foundation The Cloud Native Computing Foundation
Scroll to top Scroll to top Scroll to top