Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Drupal Log Analysis Using the ELK Stack

DZone's Guide to

Drupal Log Analysis Using the ELK Stack

Sure, logging is hard for lumberjacks, but why does it have to be so hard for developers? How does ELK stack up when it comes to easing the pain of your log analysis?

· Database Zone
Free Resource

Finding a database that fits with a container-based deployment model can be frustrating. Learn what to look for in a Docker database

While most developers and DevOps teams will admit that logging is important, many will still insist on avoiding the task if possible. Although log files contain a wealth of valuable information and should therefore be the first place to look at when troubleshooting errors and events, they are often opened only as a last resort.

Image title


The reason for this is simple: Log files are not easy. They're not easy to access, they're not easy to collect, and they're not easy to read. Often, they can't even be found to start with. These problems have only intensified over the past few years, with applications being built on top of distributed infrastructures and containerized architectures.

Drupal applications add another layer of complexity to this, offering basic logging features for developers and being complex creatures to start with. Drupal developers can define the message type and the severity level (for example, "emergency" or "debug") for logs and have the messages saved to the database. Drupal 8 also provides a logging class (that replaces Watchdog) to write custom logs to the database.

But for modern apps, querying the database for error messages and analyzing Drupal and PHP logs is not enough. There are web server and database logs to sift through as well, and in a normal sized production environment, this means a ton of data.

No.

A more solid solution is required that will allow you to centralize all of the streams of log data being generated by the app, query this data to identify correlations and anomalies, and monitor the environment for events.

Enter the ELK Stack. The most popular and fastest-growing open source log analytics platform, ELK allows you to build a centralized logging system that can pull logs from as many sources as you define and then analyze and visualize the data.

To show an example of using ELK, this article will go through the steps of establishing a pipeline of logs from your Drupal application into the Logz.io ELK Stack. You can, if you like, use any instance of the stack to perform the exact same procedure.

My Environment

A few words on the environment I'm using for this tutorial. I'm using an AWS Ubuntu 14.04 instance and have installed Drupal 8 on top of the standard LAMP stack. For instructions on how to get this set up, I recommend reading this Cloud Academy post.

Note: You will need to install the GD extension because this is a minimum requirement for Drupal 8.

Preparing the Log Files

My first step is to prepare the log files that we want to track and analyze. In the case of a standard LAMP stack, this usually means web server logs, PHP error logs (which include Drupal errors as well), and MySQL logs.

PHP errors, such as undefined variables and unknown functions, are logged by default into the Apache error log file (/var/logs/apache2/error.log), which is convenient in some cases. But to make our analysis work easier, it's better to separate the two log streams.

To do this, I'm going to access my 'php.ini' file and define a new path for PHP errors:

error_log=/var/log/php_errors.log


Next, I'm going to restart Apache and verify the change using phpinfo().

Installing Filebeat

While there are numerous ways to forward data into ELK, I'm going to ship my log files using Filebeat — which is a log shipper created by Elastic that tails defined log files and sends the traced data to Logstash or Elasticsearch.

To install Filebeat from the repository, I'm going to first download and install the Public Signing Key:

$ curl https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -


Next, I'm going to save the repository definition to /etc/apt/sources.list.d/beats.list:

$ echo "deb https://packages.elastic.co/beats/apt stable main" |  sudo tee -a /etc/apt/sources.list.d/beats.list


Finally, I'm going to run apt-get update and install Filebeat:

$ sudo apt-get update && sudo apt-get install filebeat


Now, because Logz.io uses TLS as an added security layer, my next step before configuring the data pipeline is to download a certificate and move it to the correct location:

$ wget http://raw.githubusercontent.com/cloudflare/cfssl_trust/master/intermediate_ca/COMODORSADomainValidationSecureServerCA.crt

$ sudo mkdir -p /etc/pki/tls/certs

$ sudo cp COMODORSADomainValidationSecureServerCA.crt /etc/pki/tls/certs/


Configuring Filebeat

My next step is to configure Filebeat to track my log files and forward them to the Logz.io ELK Stack. To demonstrate this configuration, I'm going to show how to define tracking for my PHP and Apache log files. (The process is similar for MySQL logs as well.)

In the Filebeat configuration file at /etc/filebeat/filebeat.yml, I'm going to define a prospector for each type of logs. I'm also going to add some Logz.io-specific fields (codec and user token) to each prospector.

The configuration is as follows:

################### Filebeat Configuration Example ############################

############################# Filebeat #####################################

filebeat:
    # List of prospectors to fetch data.
    prospectors:
    # This is a text lines files harvesting definition
    -
        paths:
            - /var/log/php_errors.log
        fields:
            logzio_codec: plain
            token: tWMKrePSAcfaBSTPKLZeEXGCeiVMpuHb
        fields_under_root: true
        ignore_older: 24h
        document_type: php
    -
    paths:
        - /var/log/apache2/*.log
    fields:
        logzio_codec: plain
        token: tWMKrePSAcfaBSTPKLZeEXGCeiVMpuHb
    fields_under_root: true
    ignore_older: 24h
    document_type: apache
    registry_file: /var/lib/filebeat/registry


In the Output section, I'm going to define the Logz.io Logstash host (listener.logz.io:5015) as the output destination for our logs and the location of the certificate used for authentication.

############################# Output ########################################

# Configure what outputs to use when sending the data collected by the beat.
output:
    logstash:
        # The Logstash hosts
        hosts: ["listener.logz.io:5015"]
        tls:
            # List of root certificates for HTTPS server verifications
            Certificate_authorities: ['/etc/pki/tls/certs/COMODORSADomainValidationSecureServerCA.crt']


Now, if I were using the open-source ELK stack, I could ship directly to Elasticsearch or use my own Logstash instance. The configuration for either of these outputs, in this case, is straightforward:

Output:
    logstash:
        hosts: ["localhost:5044"]
    Elasticsearch:
        hosts: ["localhost:9200"]


Save your Filebeat configuration.

Beautifying the PHP Logs

Logstash, the component of the ELK Stack that is in charge of parsing the logs before forwarding them to Elasticsearch, can be configured to manipulate the data to make the logs more readable and easier to analyze (a.k.a., log "beautification" or "enhancement").

In this case, I'm going to use the grok plugin to parse the PHP logs. If you're using Logz.io, grokking is done by us. But if you're using the open-source ELK, you can simply apply the following configuration directly to your Logstash configuration file (/etc/logstash/conf.d/xxxx.conf):

if [type] == "php" {
    grok {
        match => [
            "message", "\[%{MONTHDAY:day}-%{MONTH:month}-%{YEAR:year} %{TIME:time} %{WORD:zone}\] PHP %{DATA:level}\:  %{GREEDYDATA:error}"
            ]
        }
    mutate {
        add_field => [ "timestamp", "%{year}-%{month}-%{day} %{time}" ]
        remove_field => [ "zone", "month", "day", "time" ,"year"]
    }
    date {
        match => [ "timestamp" , "yyyy-MMM-dd HH:mm:ss" ]
        remove_field => [ "timestamp" ]
    }
}


Verifying the Pipeline

It's time to make sure the log pipeline into ELK is working as expected.

First, make sure Filebeat is running:

$ cd /etc/init.d

$ ./filebeat status


And if not, enter:

$ sudo ./filebeat start


Next, open up Kibana (integrated into the Logz.io user interface). Apache logs and PHP errors will begin to show up in the main display area.

In this case, we're getting an undefined variable error that I have simulated by editing the 'index.php' file. Note that since I have other logs coming into my system from other data sources, I'm using the following Kibana query to search for the two log types we have defined in Filebeat:

type:php OR type:apache

searching for php or apache drupal logs


Analyzing the Logs

To start making sense of the data being ingested and indexed by Elasticsearch, I'm going to select one of the messages in the main display area — this will give me an idea of what information is available.

Now, remember the different types that we defined for the Filebeat prospectors? To make the list of log messages more understandable, select the 'type,' 'response,' and 'level' fields from the list of mapped fields on the left. These fields were defined in the grok pattern that we applied to the Logstash configuration.

drupal log messages kibana


Open one of the messages and view the information that has been shipped into the system:

{
    "_index": "logz-dkdhmyttiiymjdammbltqliwlylpzwqb-160705_v1",
    "_type": "php",
    "_id": "AVW6v83dflTeqWTS7YdZ",
    "_score": null,
    "_source": {
        "level": "Notice",
        "@metadata": {
            "beat": "filebeat",
            "type": "php"
         },
         "source": "/var/log/php_errors.log",
         "message": "Undefined variable: kernel in /var/www/html/index.php on line 19",
         "type": "php",
         "tags": [
             "beats-5015"
         ],
         "@timestamp": "2016-07-05T11:09:39.000Z",
         "zone": "UTC",
         "beat": {
             "hostname": "ip-172-31-37-159",
             "name": "ip-172-31-37-159"
         },
         "logzio_code": "plain"
     },
     "fields": {
         "@timestamp": [
             1467716979000
         ]
     },
    "highlight": {
        "type": [
            "@kibana-highlighted-field@php@/kibana-highlighted-field@"
        ]
    },
    "sort": [
        1467716979000
    ]
}


Visualizing the Logs

One of the advantages of using the ELK Stack is its ability to create visualizations on top the data stored on Elasticsearch. This allows you to create monitoring dashboards that can be used to efficiently keep tabs on your environment.

As an example, I'm going to create a line chart that shows the different PHP and Drupal errors being logged over time.

Selecting the Visualize tab in Kibana, I'm going to pick the line chart visualization type from the selection of available visualizations. Then, I'm going to select to create the visualization based on a new search and use this query to search for only PHP and Drupal events: 'type:php.'

All that's left now is to configure the visualization. Easier said than done, right? The truth is that creating visualizations in Kibana can be complicated at times and takes some trial and error testing before fine-tuning it to get the best results.

We're going to keep it simple. We're using a simple count aggregation for the Y-axis and a date histogram cross-referenced with the 'level' field.

The configuration for our line chart visualization looks as follows:

line chart visualization drupal log analysis


Hit the green Play button to see a preview of the visualization:

drupal log analysis visualization


A common visualization for web application environments is a map of web server requests. This gives you a general picture of where requests are coming from (and in this case, from where yours truly is writing this post).

Selecting the Tile Map visualization this time, I'm going to change my Kibana query to:

type:apache


Then, the configuration is simple:

map visualization configuration drupal


Of course, these are merely basic demonstrations of how to visualize your log data in Kibana and how ELK can be used to analyze and monitor Drupal applications. The sky's the limit. You can build much more complex visualizations and even create your own custom Kibana visualization type if you like.

Once you have a series of visualizations for monitoring your Drupal app, you can collect them in a dashboard giving you a general overview of your environment.

When you're looking for a SQL database that can scale elastically, while still preserving ACID guarantees, you only have a few choices. Find out how these elastic SQL databases perform in thishead-to-head YCSB benchmark.

Topics:
drupal ,elk stack ,log analysis

Published at DZone with permission of Daniel Berman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}