Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Elastic Stack 6: What You Need to Know

DZone's Guide to

Elastic Stack 6: What You Need to Know

As with any major release of the stack, there are quite a large number of changes in Elastic Stack 6.0. Read on to get the scoop!

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Elastic Stack 6 was released last November, and now's a good time as any to evaluate whether to upgrade. To help you guys make that call, we are going to take a look at some of the major changes included in the different components in the stack and review the main breaking changes.

The main changes to Elasticsearch are designed to boost performance, improve reliability, and make recovery easier. The most significant changes in Logstash are focused on allowing users to run multiple self-contained pipelines on the same JVM. There are no major changes to Kibana, but a relatively large amount of minor usability improvements were added. With the exception of the addition of Auditbeat, the main changes made to Beats are focused on improving the performance of and enhancing existing log shippers.

Let's take a closer look.

Elasticsearch 6

Changes to Elasticsearch are mostly internal and shouldn't require most organizations to alter how they configure or administer the Elasticsearch, with the big exception being the change to mapping types.

Sparse Doc Values

A sparse values situation (when documents do not have values for each of the fields in our indices) results in the use of a large amount of disk space and file-system cache. The change to Lucene 7 allows Elasticsearch to now support sparse doc values, a new encoding format that reduces disk space and improves query throughput.

Index Sorting

Lucene 7 also brings the ability to specify a sort order to indices. This boosts performance by allowing for the sorting of indices during re-indexing (while documents are written) instead of when documents are read. The indices are written to disk in the order of the specified sort.

Specifying a sort of indices means that a search can terminate when it has found the documents requested by the query. For example, if the sort is alphabetical, then a search for entries beginning with "E" can quit when it reaches "F" because all items beginning with E have been read.

No More Mapping Types

This is one of the most talked-about changes in Elasticsearch 6, and a somewhat controversial one at that.

In this version, indices can only have one mapping type. Indices created in 5.x with multiple mapping types will continue to function as before. The plan is to remove mapping types completely in version 7.

The main reason for this move is to simplify the understanding and usage of the underlying data structure in Elasticsearch. Comparisons to RDBMS databases have led to a faulty understanding that types can be compared to tables. This has led in turn, to an expectation for fields to be independent across types whereas they must be of the same field type.

While this change fundamentally changes the way we index data, and as such has received a decent amount of criticism in the community, it also promises to speed searches by forcing users to use indices in a fashion tailored to the underlying database structure. The common existing practice of treating indices like tables and types like tables is suboptimal. Enforcing a single type per index should provide for significant performance increases.

Better Shard Recovery

A new feature called Sequence IDs promises to guarantee more successful and efficient shard recovery.

Every index, update, and delete operation receives an ID that is logged in the primary shard's transaction log. A replica can now refer to the operations recorded in this log and use them to update itself without needing to copy all the files, thus making recovery much faster. You'll be able to configure how long to keep these transaction logs.

Replicas can run unacknowledged and different operations — meaning that in case of a primary shard failing, the replicas will be able to sync with the new primary shard without waiting for the next recovery.

Upgrades

Updating to the new Elasticsearch version is made easier with a series of upgrade improvements that aim at tackling some of the traditional hurdles facing upgrade procedures.

A new rolling restart feature negates the need for a full cluster restart and thus minimizes downtime. Elasticsearch 5.x indices will be able to be searched using cross-cluster searching: a new approach to cross-cluster operations that replaces the traditional tribe-node based approach. Deprecation logs have been reinforced with important info on breaking changes. And if you're an X-Pack user, you will be able to manage the upgrade via the UI.

Logstash 6

Logstash was originally intended to handle one type of event per instance, and prior to this version, each Logstash instance supported only a single event pipeline. Users can circumvent this restriction using conditional statements in the configuration, which often leads to a new set of problems.

Logstash now supports native support for multiple pipelines. These pipelines are defined in a pipelines.yml file, which is loaded by default. If Logstash is started with the -r flag, it will periodically reread the pipelines.yml file looking for changes, making dynamic multiple pipelines possible for a single Logstash instance.

Each pipeline's independent configuration means that each pipeline can be assigned different levels of resources. A single Logstash instance could, for example, have multiple pipelines with a single worker thread and a high-volume pipeline with ten worker threads. Combining this with Logstash's ability to periodically reload configurations means dynamic reconfiguration without having to tear the entire application down to meet changing needs.

X-Pack users will be able to manage multiple pipelines within Kibana. This solution uses Elasticsearch to store pipeline configurations and allows for on-the-fly reconfiguration of Logstash pipelines.

Another noteworthy addition to Logstash is a new conversion tool to help those using Elasticsearch Ingest Nodes to begin using Logstash. The tool allows users to enter their Ingest Node configuration and outputs a Logstash pipeline configuration.

Last but not least, a new pipeline viewer now allows users to monitor and analyze Logstash pipelines in Kibana. Pipelines are displayed on a graph where each component is accompanied by relevant metrics. I explored the pipeline viewer in this article.

Kibana 6

As mentioned in the introduction to this article, there are no major changes to the stack's UI, but there is a nice list of usability improvements that users will most likely enjoy.

The main changes to the UI include a new CSV export option, new colors for better contrast, and improved screen reading and keyboard navigation.

Dashboarding in Kibana has two new features. A new full-screen mode was added when viewing dashboards to allow users to enjoy all the Kibana goodness. In addition, the new dashboard only mode enables administrators to share dashboards safely. Using a new kibana_dashboard_only_user role, specific users can be given limited access to Kibana with read-only data visibility and with no ability to modify or delete any of the dashboards.

Also of note is a new querying language called Kuery that seeks to address the issues of previous search mechanisms and that is meant to ultimately simplify querying and filtering in Kibana. Disabled by default, the new language supports all the query types that are supported by the Elasticsearch DSL, including those not supported by Lucene query syntax (such as aggregations and visualizations) — directly in the query language.

Beats 6

Auditbeat is the only new shipper released in this version and is basically a native and easy-to-use auditing tool for the Elastic Stack. I tried the beta back in September and am pretty sure this will become one of the more popular beats.

Most of the other changes center around improving support and performance of existing beats, mostly Filebeat and Metricbeat.

Filebeat and Metricbeat have a tighter integration with Docker and Kubernetes now with new processors that add metadata to the logs collected from these environments, such as details on the container name, image, labels, pod name, and so forth. A new Kubernetes module was also added to Metricbeat that gives insights into your Kubernetes deployment.

Additional changes to Filebeat and Metricbeat include deployment manifests for easier setup in Kubernetes and a new modules.d directory that contains configuration files for the different supported modules.

A new set of commands for Beats allows users to list, enable, and disable modules, export configurations and the Elasticsearch mapping template, and test Logstash or Elasticsearch connectivity.

Last but not least, internal changes were introduced in the Beats pipeline architecture to improve performance, including a transition to an asynchronous pipeline that allows other processes to continue even as one hangs in a wait state. This also made the Filebeat internal spooler obsolete. For Metricbeat specifically, polling frequency was reduced and sharding behavior changed; that, coupled with the changes made to Elasticsearch, results in less storage used.

What Will Break?

We're going to end this piece with a summary of the main gotchas that need to be considered before upgrading. Keep in mind this is a partial list of breaking changes only. As always, we recommend you read the official documentation carefully and test the upgrade process.

Elasticsearch

  • As mentioned above, the main breaking change in this version is the gradual removal of mapping types from indices, so Elasticsearch 6.0 will be able to read indices created in version 5.0 or above only.
  • Elasticsearch 6.0 requires a reindexing before full functionality can be achieved; this is because of a number of changes to the Elasticsearch indices. These are accompanied by changes to the CAT, Document, and Java APIs and the Search and Query DSL, as well as REST changes. The result is a number of key changes that affect Elasticsearch deeply, altering everything from the structure of queries and scripts to how internal components communicate.

A full list of the breaking changes is available here.

Logstash

Changes in Logstash 6 involve several breaking changes, including a number of changes to the input and output options. There are also numerous plugin changes and a change to the config.reload.interval configuration option, which now uses time value strings (5m, 10s, etc.) instead of millisecond time values.

A full list of the breaking changes is available here.

Kibana

To migrate existing Kibana installations to Kibana 6.0, the Kibana index needs to be re-indexed. A full list of the breaking changes is available here.

Beats

  • The main breaking change in Beats 6 is the removal of the Filebeat internal spooler, as described above. This results in the removal of some of the configuration options and the addition of others.
  • Following the changes made to the pipeline architecture, there is no longer an option to enable two outputs at the same time (you can use multiple outputs in Logstash instead or run multiple instances of your beat shipper).
  • If you're outputting to Logstash from your beat, you now need to include a version in the Logstash index setting.

A full list of the breaking changes is available here.

Summing It Up

As with any major release of the stack, there are quite a large number of changes in Elastic Stack 6.0. Many of these changes stem from the upgrade to the Lucene 7 database engine, but just as many are part of a general push towards increased efficiency and performance.

Elastic Stack 6.0 also brings a number of important security changes and enhancements and sets the stage for many more. This follows the needs of users deploying Elastic Stack in production environments, where complex security requirements are increasingly the standard.

In many ways, version 6.0 is a transitionary release. It is a midpoint between traditional deployments of the stack that clung to concepts of data organization and production deployments belonging to relational database designs and more modern approaches.

That a major version of the stack comes with a need to re-index, changes to the index structure, and a number of configuration changes to various plugins should come as no surprise. Migration from previous versions will need to be planned carefully and, above all, tested.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,elasticsearch ,elastic stack 6 ,logstash ,kibana ,jvm ,data visualization ,data analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}