Journald is a log data storage and collection system. Here's an overview of Journald, and why it's great, with a glance at Syslog issues and Journald improvements.
Join the DZone community and get the full member experience.Join For Free
Journald is a system service for collecting and storing log data, introduced with systemd. It tries to make it easier for system administrators to find interesting and relevant information among an ever-increasing amount of log messages.
In keeping with this goal, one of the main changes in journald was to replace simple plain text log files with a special file format optimized for log messages. This file format allows system administrators to access relevant messages more efficiently. It also brings some of the power of database-driven centralized logging implementations to individual systems.
At the same time, journald does not include a well-defined remote logging implementation. Instead, it relies on existing syslog implementations to relay messages to a central log host, thereby losing most of the benefits of the new system.
Syslog Has Several Key Problems
Syslog is the standard solution for logging on UNIX. The term describes both a protocol (RFC 5424 ff) and a C API (syslog(3)), but is also commonly used for the implementations of both (such as rsyslogor syslog-ng).
In the normal configuration, these syslog implementations write log messages to plain text files.While UNIX has a lot of great tools to work with plain text files, a lack of structure is the source of pretty much all of syslog’s problems.
Finding information in large plain text files with lots of unrelated information can be difficult. Syslog implementations generally allow administrators to split up their files according to pre-defined topics, but they then end up with many smaller files and no easy way to correlate information between files.
Additionally, the syslog protocol does not provide a means of separating messages by application-defined targets. For example, web servers can emit log messages per virtual host. Because syslog cannot deal with such meta information, the web servers generally write their own access logs so that the main system log is not flooded with web server status messages. This creates additional sources of possible log messages an admin has to keep in mind, with additional places where these messages are configured.
Simple plain text files also require log rotation to prevent them from becoming too large. In log rotation, existing log files are renamed and compressed. Any programs that watch syslog messages for problems have to deal with this somehow. One common tool for this, the logcheck package, runs in a cron job and uses some heuristics to figure out when a log file was rotated and when to restart parsing a file. It is not unlikely that some log messages are lost in this process. Because of log rotation problems, some programs include the ability to directly notify admins of problems by email instead of using logging.
There are a few more problems with plain text files. As log files write messages terminated by a newline, a log message can not contain newlines. This makes it hard for programs to emit multi-line information such as backtraces when an error occurs, and log parsing software must often do a lot of work to combine log messages spread over multiple lines.
Further syslog issues are described here.
Journald Makes Improvements by Adding Structure to Log Files
Journald tries to address these problems mainly by replacing plain text files with a more structured format. At the same time, it retains full syslog compatibility by providing the same API in C, supporting the same protocol, and also forwarding plain-text versions of messages to an existing syslog implementation.
The format, as well as the journald API allow for structured data. That is, log messages not only consist of a fixed list of fields with the main log messages in a single free-form body but also allow the application to define their own fields for the message. The format allows for quick access to and retrieval of messages by all fields. It also does away with log rotation by using a space-optimized format directly that does not require renaming files to archive entries and automatically limiting the maximum size of the journal on the disk. This removes a lot of the difficulties programs face when dealing with log files.
As this structured file format does not work well with standard UNIX tools that are optimized for plain text, there is a command line tool that can query these files, journalctl(1). This program utilizes the journal format to give very fast access to entries filtered by date, emitting program, program PID, UID, service, or other elements. This makes it possible, for example, to use systemctl for a quick status check showing the last few log entries of a service, as the journal files give fast access to entries by specific services. It is even possible to follow new journal entries of the specified type using the -f command line option, much like tail -f on a log file would. Additionally, journalctl can not only access the log files of the current system but also backups in single files or directories of other systems.
In short, the log file format of journald makes it much easier for programs to retrieve only the information they want to know about, in a structured way, with the ability to easily follow the stream of new log messages in real time.
Remote Logging Issues Limit Journald Usefulness in Modern Computing Infrastructures
There is a big but: Modern computing infrastructure utilizes many systems where it becomes impractical to read logs on individual machines. Centralized logging—where log messages from different systems are sent to a central log host and usually stored in a database—are increasingly becoming the standard way of logging. These log hosts address many of the same syslog issues that journald does, providing quick access to log messages by certain criteria, following new messages to generate reports, etc.
As these centralized logging systems utilize text-based syslog daemons, they have to extract application-specific fields from the single-line plain text log message using heuristics and tailored regular expressions. Journald allows applications to send key-value fields that the centralized systems could use directly instead of relying on these heuristics. Sadly, journald does not come with a usable remote logging solution. The program systemd-journal-remote is more of a proof-of-concept than an actually useful tool, lacking good authentication among other things. Third-party packages such as journald-forwarder send journal logs to Loggly directly but are very new and experimental. As such, remote logging still has to go through existing syslog implementations and the syslog protocol and therefore cannot make use of many of the additional features of journald, in particular the structured data.
While you can send structured data through syslog using the native format in RFC 5424, little general support exists for this capability. Current releases of rsyslog provide the module imjournalto import some of the journal structured fields to syslog, but this only uses a partial mapping of fields where available and does not support application-defined fields. Performance also seems to be problematic enough that rsyslog actively discourages from using this module.
An alternative would be to use a text format like JSON to encode fields, but there are no converters available to do this on the journal or syslog level, so it requires full application support, which is not easy to manage.
Even though some key benefits of journald are lost, its tight integration with systemd allows the journal to include some messages that syslog alone would miss, for example early-boot messages. Thus journald provides some benefits even in a full remote logging setup.
Journald replaces the plain text files of syslog with a binary format that:
- Allows for log messages with multiple fields and multi-line text
- Stores these messages in a space-efficient way that does not require renaming files for maintenance
- Gives fast access to messages given specific criteria, much like a database would
At the same time, it retains full syslog compatibility by providing the same API and also forwarding messages to existing syslog implementations.
For modern systems where centralized logging is used, though, journald relies on its integration with existing syslog implementations to route messages to the central log host. By doing so, journald loses some of its key benefits of journald.
This article was written by Jorgen Schäfer.
Published at DZone with permission of Karen Sowa. See the original article here.
Opinions expressed by DZone contributors are their own.