DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • How to Implement Istio in Multicloud and Multicluster
  • Decoding ChatGPT: The Concerns We All Should Be Aware Of
  • Extending Java APIs: Add Missing Features Without the Hassle
  • A Complete Guide to Agile Software Development

Trending

  • How to Implement Istio in Multicloud and Multicluster
  • Decoding ChatGPT: The Concerns We All Should Be Aware Of
  • Extending Java APIs: Add Missing Features Without the Hassle
  • A Complete Guide to Agile Software Development

Getting Unique Counts from a Log File

Geoffrey Papilion user avatar by
Geoffrey Papilion
·
Jun. 24, 13 · Interview
Like (0)
Save
Tweet
Share
5.76K Views

Join the DZone community and get the full member experience.

Join For Free

Two colleagues of mine ask a very similar question for interviews. The question is not particularly hard, nor does it require a lot of thought to solve, but it's something that as a developer or as ops guys you might find yourself needing to do. The question is, given a log file of a particular format, tell me how many times something occurs in that log file. For example tell me the number of unique IP addresses in an access log, and the number of times each IP had visited this system.

It's amazing how many people don’t know what to do with this. One of my peers ask people to do this using the command line, the other tells the candidate they can do this anyway then want. I like this question because it's VERY practical; I do tasks like this everyday, and I expect the people I work with to be able to do.

A More Concrete Exmaple

I like the shell solution, because its basically a one liner. So lets walk through it using access logs as an example.

Here is a very basic sample of a common access_log I threw together for this:

127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
192.168.0.1 - - [10/Oct/2000:13:55:41 -0700] "GET /missing.html HTTP/1.0" 404 506
192.168.0.2 - - [10/Oct/2000:13:55:48 -0700] "GET /missing.html HTTP/1.0" 404 506
192.168.0.5 - - [10/Oct/2000:13:56:42 -0700] "GET /missing.html HTTP/1.0" 404 506
192.168.0.6 - - [10/Oct/2000:13:57:05 -0700] "GET /missing.html HTTP/1.0" 404 506
192.168.0.1 - - [10/Oct/2000:13:58:36 -0700] "GET /missing2.html HTTP/1.0" 404 506
192.168.0.1 - - [10/Oct/2000:13:59:28 -0700] "GET /exitst.html HTTP/1.0" 200 1506
192.168.0.3 - - [10/Oct/2000:14:15:20 -0700] "GET /exitst.html HTTP/1.0" 200 1506
192.168.0.7 - - [10/Oct/2000:14:16:32 -0700] "GET /missing3.html HTTP/1.0" 404 506
192.168.0.7 - - [10/Oct/2000:14:20:54 -0700] "GET /exitst.html HTTP/1.0" 200 1506
192.168.0.8 - - [10/Oct/2000:13:22:42 -0700] "GET /exitst.html HTTP/1.0" 200 1506

Let's say you want to count the number of times a unique IP addresses who’ve visited this system. Using nothing more than awk, sort, and uniq you can find the answer. What you’ll want to do is pull the first field with awk, then pipe that through sort, and then uniq. This isn’t fancy, but it returns the result very quickly without a whole lot of fuss.

Like so:

~/Projects/access_logs$ awk '{print $1}' < access_logs  |sort | uniq -c
      1 127.0.0.1
      3 192.168.0.1
      1 192.168.0.2
      1 192.168.0.3
      1 192.168.0.5
      1 192.168.0.6
      2 192.168.0.7
      1 192.168.0.8
~/Projects/access_logs$ 

This gives you each hostname or IP, and the number of times they’ve contacted this server.

Upping the Complexity


Now for something more complex -- let's say you want to get the most commonly requested document that returns a 404. So, again we can do this all in a shell one-liner. We still need awk, sort, uniq, but this time we’ll also use tail. This time we can use awk to examine the status field(9), then print the URL field(7) if the status returned was 404. We can then use sort, uniq, and sort to order the results. Finally we’ll use tail to only print the last line, and awk, to print the requested document.

So here is what this looks like:

~/Projects/access_logs$ awk '{if($9=="404"){print $7}}'  access_logs  |sort |uniq -c |sort -n |tail -1 |awk '{print $2}'
/missing.html

Of course there are many other ways to do this. This is a totally simple way to do it, and the best part of this is that you can count on these tools being on almost every *nix system.

Published at DZone with permission of Geoffrey Papilion, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • How to Implement Istio in Multicloud and Multicluster
  • Decoding ChatGPT: The Concerns We All Should Be Aware Of
  • Extending Java APIs: Add Missing Features Without the Hassle
  • A Complete Guide to Agile Software Development

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: