Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Logging at Scale With Node.js

DZone's Guide to

Logging at Scale With Node.js

Logging and using your insights effectively is important. Learn how to simply set this up for Node.js using the ELK stack.

· Performance Zone ·
Free Resource

xMatters delivers integration-driven collaboration that relays data between systems, while engaging the right people to proactively resolve issues. Read the Monitoring in a Connected Enterprise whitepaper and learn about 3 tools for resolving incidents quickly.

Paying attention to your logging strategy from the beginning of your project is a good idea — otherwise, you might run into problems. This might sound like an empty warning, but take a step back from your code and think good and hard about how you’re logging right now and what you’re doing with that information.

If you don’t really have a use for it, then you might as well stop logging, but if you’re actually getting insights from it, either when troubleshooting, or through some sort of analytics tools, then make sure you can keep doing so after you’ve scaled up (or down) your architecture. Can you trust that you will be able to process your logging data if you’re under an elastic scaling architecture?

Our end-goal when it comes to dealing with logs is depicted in the diagram below, where you can see multiple instances of several different services sending their logging messages to a centralized system. This system can either be an in-house cluster (yes, it should be a cluster or something capable of scaling like one, because it will need to keep up with your architecture), or a third party service (such as Splunk, Loggly and Logz.io to name just a few)

Image title

An example of a centralized logging architecture, where multiple instances of different services are sending their logging information into a single system.

The way these services send their data to their destination will vary depending on the nature of that system, but usually standard ways will be provided (the most common ones are either RESTful APIs or agents you can install on your servers and configure to send the data to a remote location by themselves).

There are two very common mistakes developers make when logging in new systems that aren’t necessarily hard to fix, but require attention when scaling. These issues will need to be addressed if we want to get anywhere near the ideal scenario of above.

You’re Just Logging Into stdout and stderr

And what makes it even worse, you’re not wrapping the output function/method of your language of choice into a construct under your control. In other words, and in the Node.js universe, you’re logging using console.log and console.error.

This is great for small projects and quick PoC (Proof of Concept), but if you’re interested in getting anything out of your logs, then you need to do something about it, especially because as you might’ve guessed by now, both the stdout and stderr are local to each server instance, so if you start scaling your application into multiple servers, you’ll have to deal with distributed logs that aren’t being saved anywhere (or maybe they are, depending on your setup).

Lucky for you, there are several ways to solve this, again depending on where you’re currently standing. For instance, if you’re using PM2 or something like it, you’ll get access to the logs for all instances of your process within the same to sever (see chapter 3 for more details on PM2), simply by running the following command:

 $ pm2 logs 

This will work, even if you’re not saving the data anywhere, since PM2 will catch all your output and save it automatically, just in case. But that’ll only get you halfway since we also need to send those log files into a centralized location.

Because there are so many options out there, and so many variations of similar solutions, I’m just going to cover a simple one, assuming you have an ELK (Elastic, Logstash, and Kibana) cluster configured and ready to receive logs somewhere on your architecture. This will act as the centralized logging and analytics system from before.

So what you want to do in this situation is to configure something that will ship the log files stored by PM2, into Logstash, which in turn, will apply any transformation you might need to the data and then it’ll send it and index it into Elastic for your consumption using Kibana. Simple enough, isn’t it?

This might sound like a lot at first glance, especially if this is your first time dealing with something like this, but it is a scalable way of going about it. If you do it right, you gain the ability to support failures and downtimes on your Elastic cluster, you get back-pressure, on you logging pipeline, making sure you’re not overwhelming your analytics platform, etc.

To achieve this, you’ll install and configure Filebeat in all your servers (all the ones that need to send data out anyways). Filebeat is essentially a log shipper that follows a standard protocol called Beat. This shipper (and it’s associated protocol) is the result of several years of iteration by the team of Elastic to get the best and most lightweight log shipper possible.

To install it, you can download it from the official website and then to configure it, you can edit the filebeat.yml file (which will be located on it’s installation folder, in my case, it was in /etc/filebeat), making it look like:

filebeat.prospectors:
- input_type: log
 paths:
- [YOURHOMEFOLDER]/.pm2/logs/yourapp*.log

document_type: yourapp-name
fields_under_root: true

output.logstash:
 hosts: ["LOGSTASH-HOST:5044"]

Configuration content to make filebeat send the logged data into logstash.

That configuration will pull the content of the log files for your app (stored in a default location by PM2) and into a Logstash server. You need to replace the constants YOURHOMEFOLDER and LOGSTASH-HOST by the actual values to make it work.

With that, you can start the shipper in daemon form using this command:

 $ sudo filebeat -c -e /etc/filebeat/filebeat.yml 

TIP: I would recommend making sure that line runs everytime your server starts, otherwise you’ll stop sending data after the first server reboot.

With that, you’re ready to retrieve your log files (even if you’re as crazy as to simply log with console.log). But you still need to configure Logstash to make sure you can parse these logs, transform them (if needed) and then index them into Elastic. So stop smiling, and keep reading.

On your Logstash server, assuming you’ve already installed it, you need to configure it to use the Filebeat plugin and output that data into Elastic. In other words, you need to create a configuration file that looks like this:

input {
 beats {
   port => 5044
 }
}

output {
 elasticsearch { hosts => ["ELASTIC-HOST:9200"] }
 }

Note that the configuration from above will only receive and index data, it will not transform (which is one of the key benefits of Logstash) anything. So if you wanted to do some extra tweaking of your logs before indexing them, I recommend looking at the full documentation for Logstash. Also, make sure to match the port under the configuration for the Beat plugin with the port specified on the Filebeat config file.

You’re now set. Congratulations, you’ve managed to avoid a major problem by using the right set of tools.

Note: Even if you have some form of workaround in place to centralize your logs, using console.log and console.error for logging purposes is far from ideal. Creating a simple wrapper around these methods (at the very least) will grant you more control over the log formats, extra information you might want to add, and so on. Just say no to using console.log and console.error, I beg you.

If, on the other hand, you weren’t using anything like PM2 that would catch the output of your process and save it into a file, you’re out of luck. You lost your logs to the black hole that lives inside every server and there is no way for you to retrieve that. So don’t do it like this.

You’re Logging Into a Single File

Now, this is a better scenario than the previous one, even though it’s still far from ideal. You’re not correctly wrapping your output function/method with something you can control ( i.e. you have your own logger). You’re even saving that information into a log file, which is great, but because you’re not in control over the what, where and when you log information, you need to consider other things like:

  • File size, how much space can you allocate to your logs? Are you sure you’re not depleting your hard disk, causing your application to possibly fail due to lack of space?

  • History, how much history do you want to keep in your file. This will depend on your application logging needs. If you need to keep a lot of debugging information on your files, then a lot of history would not be recommended, since you’d end up with huge files. If on the other hand, you’re not logging a lot of events, you might as well keep as much as you can (always taking into account the previous point).

You could potentially take care of both from inside your own code, by adding extra logic to your logger and make sure you properly keep the size and history of your logs in check. You can also use an external tool, such as the classic logrotate command line utility, which is already part of most (if not all) Linux distributions.

In order to use this utility to solve your problems, you’ll have to create a configuration file, something that looks like below and then execute logrotate, as explained further down.

/your/app/path/logfile.log {
compress
rotate 5
size 300M
}

With that configuration, your log file will be rotated whenever it reaches 300 Mb in size, and after the 5th rotation, that file will be removed (in other words, history is kept up to 5 rotations). You can now execute logrotate specifying the path to the new configuration file, like so:

 $ logrotate /path/to/your-new-configuration-file.conf 

This is definitely the preferred way of handling this logic instead of having to write it directly into your own logger’s code. But you’re not there yet, you now have your own log file, and you’re properly making sure it doesn’t grow out of hand, but you still need to send its content into a centralized location. You can look at the previous point in order to understand how to configure Filebeat and Logstash.

With that last bit, you’re ready to move on with your development, because now you have, once again, a stable logging system inside your platform.

I hope you enjoyed reading the article, this is but a simple extract of my next book about scaling your Node.js applications. So if you found it useful or interesting, remember to visit https://www.fernandodoglio.com/ for more information on the subject.

For more details:

Discovering, responding to, and resolving incidents is a complex endeavor. Read this narrative to learn how you can do it quickly and effectively by connecting AppDynamics, Moogsoft and xMatters to create a monitoring toolchain.

Topics:
node.js ,javascript ,logging ,scale ,performance ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}