Over a million developers have joined DZone.

Apache Log Parsing with Python

· Web Dev Zone

Make the transition to Node.js if you are a Java, PHP, Rails or .NET developer with these resources to help jumpstart your Node.js knowledge plus pick up some development tips.  Brought to you in partnership with IBM.

How much do I love Python?  Consider this little snippet that parses Apache logs.

import re
from collections import defaultdict, named tuple

format_pat= re.compile( 
    r"(?P<host>[\d\.]+)\s" 
    r"(?P<identity>\S*)\s" 
    r"(?P<user>\S*)\s"
    r"\[(?P<time>.*?)\]\s"
    r'"(?P<request>.*?)"\s'
    r"(?P<status>\d+)\s"
    r"(?P<bytes>\S*)\s"
    r'"(?P<referer>.*?)"\s' # [SIC]
    r'"(?P<user_agent>.*?)"\s*' 
)

Access = namedtuple('Access',
    ['host', 'identity', 'user', 'time', 'request',
    'status', 'bytes', 'referer', 'user_agent'] )

def access_iter( source_iter ):
    for log in source_iter:
        for line in (l.rstrip() for l in log):
            match= format_pat.match(line)
            if match:
                yield Access( **match.groupdict() )

That's about it.  The access log rows are now first-class Access-class objects that can be processed pleasantly by high-level Python applications.

Cool things.

  1. The adjacent string concatenation means that the regular expression can be broken up into bits to make it readable.
  2. When the named tuple attributes match the regular expression names, we can trivially turn the match.groupdict() into a named tuple. 
  3. By using a generator, the other parts of the application can simply loop through the results without tying up memory to create vast intermediate structures.

A couple of years back, a sysadmin was trying to justify spending money on a log analyzer product.  I suggested they (at the very least) get an open source log analyzer.

I also suggested that they learn Python and save themselves the pain of working with a (potentially) complex tool.  Given this as a common library module, log analysis applications are remarkably easy to write.


Source: http://slott-softwarearchitect.blogspot.com/2012/01/apache-log-parsing.html

Learn why developers are gravitating towards Node and its ability to retain and leverage the skills of JavaScript developers and the ability to deliver projects faster than other languages can.  Brought to you in partnership with IBM.

Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}