DZone
Web Dev Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Web Dev Zone > Apache Log Parsing with Python

Apache Log Parsing with Python

Steven Lott user avatar by
Steven Lott
·
Feb. 07, 12 · Web Dev Zone · Interview
Like (0)
Save
Tweet
20.67K Views

Join the DZone community and get the full member experience.

Join For Free
How much do I love Python?  Consider this little snippet that parses Apache logs.

import re
from collections import defaultdict, named tuple

format_pat= re.compile( 
    r"(?P<host>[\d\.]+)\s" 
    r"(?P<identity>\S*)\s" 
    r"(?P<user>\S*)\s"
    r"\[(?P<time>.*?)\]\s"
    r'"(?P<request>.*?)"\s'
    r"(?P<status>\d+)\s"
    r"(?P<bytes>\S*)\s"
    r'"(?P<referer>.*?)"\s' # [SIC]
    r'"(?P<user_agent>.*?)"\s*' 
)

Access = namedtuple('Access',
    ['host', 'identity', 'user', 'time', 'request',
    'status', 'bytes', 'referer', 'user_agent'] )

def access_iter( source_iter ):
    for log in source_iter:
        for line in (l.rstrip() for l in log):
            match= format_pat.match(line)
            if match:
                yield Access( **match.groupdict() )

That's about it.  The access log rows are now first-class Access-class objects that can be processed pleasantly by high-level Python applications.

Cool things.

  1. The adjacent string concatenation means that the regular expression can be broken up into bits to make it readable.
  2. When the named tuple attributes match the regular expression names, we can trivially turn the match.groupdict() into a named tuple. 
  3. By using a generator, the other parts of the application can simply loop through the results without tying up memory to create vast intermediate structures.

A couple of years back, a sysadmin was trying to justify spending money on a log analyzer product.  I suggested they (at the very least) get an open source log analyzer.

I also suggested that they learn Python and save themselves the pain of working with a (potentially) complex tool.  Given this as a common library module, log analysis applications are remarkably easy to write.


Source: http://slott-softwarearchitect.blogspot.com/2012/01/apache-log-parsing.html
Python (language)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • API Security Tools: What To Look For
  • SQL Database Schema: Beginner’s Guide (With Examples)
  • Migrating Legacy Applications and Services to Low Code
  • My Sentiments, Erm… Not Exactly

Comments

Web Dev Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo