Over a million developers have joined DZone.

Performing Complex Searches on Package Inspector Data

DZone's Guide to

Performing Complex Searches on Package Inspector Data

Learn how to run PQL queries to quickly see which nodes in your infrastructure may be vulnerable to a given CVE and be proactive about keeping your infrastructure secure.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Identifying which nodes in your estate are potentially vulnerable to a newly released CVE (common vulnerability or exposure) is often the first step in helping to keep your infrastructure secure. A useful new tool for sysadmins or SREs (site reliability engineers) seeking this type of situational awareness is Puppet Enterprise’s Package Inspector. It allows you to see which versions of any specific package are installed on your systems, whether Puppet manages these packages or not.

Viewing this information in the console and exporting it in batches lets you get access to this package data. A potentially more flexible and granular way to access and further process this data is by use of Puppet query language (PQL) queries against the data in the PuppetDB. These queries can be submitted using either the PuppetDB CLI or directly to the PuppetDB query API endpoint.

This post will show how you can formulate and run PQL queries to quickly get a picture of which nodes in your infrastructure may be vulnerable to a given CVE and get a head start on keeping your infrastructure secure.


In this worked example, we will show you how to extract a list of all the nodes that have potentially vulnerable versions of OpenSSL, as flagged in a recent CVE (CVE-2016-6304, released on 22 September 2016, and modified on 26 Sept.). We will use an API call to the PuppetDB query endpoint and form the query itself using PQL.

First, some information on the CVE, taken from the CVE site.

As a result of the issue described in this CVE, a malicious client can send an excessively large online certificate status protocol (OCSP) status request extension. If that client continually requests renegotiation, sending a large OCSP status request extension each time, then there will be unbounded memory growth on the server. This will eventually lead to a denial-of-service (DOS) attack through memory exhaustion. This issue affects the following OpenSSL versions:

  • 1.0.1t, 1.0.1s, 1.0.1r, 1.0.1q, 1.0.1p, 1.0.1o, 1.0.1n, 1.0.1m, 1.0.1l, 1.0.1k, 1.0.1j, 1.0.1i, 1.0.1h, 1.0.1g, 1.0.1f, 1.0.1e, 1.0.1d, 1.0.1c, 1.0.1b, 1.0.1a, 1.0.1
  • 1.0.2h, 1.0.2g, 1.0.2f, 1.0.2e, 1.0.2d, 1.0.2c, 1.0.2b, 1.0.2a, 1.0.2
  • 1.1.0

It has been fixed in the following versions:

  • 1.0.1u
  • 1.0.2i
  • 1.1.0a

So How Do I Know How Big My Problem Is?

You can get a list of all affected nodes by searching for OpenSSL in the console. First, go to the Packages view and individually export all the lines that have an affected version. But it's easier and quicker to extract the information directly from PuppetDB using the PuppetDB CLI tool or the query API, and output the results in a CSV file for further processing.

The basis for this query is the package_inventory API endpoint as documented here.

A query like the following will return all packages installed (whether managed by Puppet or not) on a specific node as detailed in the documentation on the Package Inspector feature here:

    [root@glisan-master centos]# curl -s -X GET http://localhost:8080/pdb/query/v4 --data-urlencode "query=package_inventory[package_name, version]{     certname = 'glisan-agent1'}" | jq .
        "package_name": "deep_merge",
        "version": "1.0.1"
        "package_name": "facter",
        "version": "3.6.4"
        "package_name": "fast_gettext",
        "version": "1.1.0"
      .... Output Truncated....
        "package_name": "passwd",
        "version": "1:4.2-3.1ubuntu5.3"
        "package_name": "uidmap",
        "version": "1:4.2-3.1ubuntu5.3"
    [root@glisan-master centos]#

The output above was fetched using a direct call to the PuppetDB API and formatted for viewing using the jq tool, which will also be used to format the output of our query as a CSV file. Explaining the use of the jq tool is outside the scope of this post, but the documentation is widely available, such as at the author’s documentation site.

This same PQL query can also be used to return information on any of the queryable fields in the records, which include certname, package_name, version, and provider. The key is to construct the query so that the required information is returned, and the query parameters match all required nodes.

For example, the following query will return the certname, package_name, and version of all records that match the query, which in this case is all records that have a package_name of openssl :

    query=package_inventory[certname, package_name, version]{ package_name = 'openssl'}

Within the query, it is possible to use regex matches, so you can create a query to match multiple versions of a specific package. This means that you can construct a query that matches all nodes that have a version of OpenSSL flagged as potentially vulnerable in the CVE above, and which will return only the certname (hostname) and version information. Here's an example:

   query=package_inventory[certname, version]{ package_name = 'openssl' and (version ~ '1.0.1[^u-z]' or version ~ '1.0.2[^i-z]' or version ~ '1.1.0[^a-z]')}"
        "certname": "glisan-master",
        "version": "1:1.0.1e-42.el7.9"
        "certname": "glisan-agent2",
        "version": "1:1.0.1e-42.el7.9"
        "certname": "glisan-agent1",
        "version": "1.0.2g-1ubuntu4.6"

You can use the jq tool to output the information as a CSV by piping the output from curl to the following jq command, optionally adding a further sed statement to strip additional quotation characters:

    jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' | sed s/\"//g

This means that the crafted PQL query (run using the PuppetDB CLI tool) looks like the example below, after the output has been piped to jq for formatting into a CSV format that can be redirected into a file for further processing or analysis:

   [root@glisan-master centos]# puppet query “package_inventory[certname, version]{ package_name = 'openssl' and (version ~ '1.0.1[^u-z]' or version ~ '1.0.2[^i-z]' or version ~ '1.1.0[^a-z]')}” | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' | sed s/\"//g
    [root@glisan-master centos]#

In this specific infrastructure, there were three nodes potentially at risk due to CVE-2016-6304. They may need to receive patches.

Knowledge Is Power

This method allows you to capture all the required information in a single PQL query, and allows you to build more complex queries for greater control over the information returned.

There is no such thing as a silver bullet in InfoSec and OpSec. But you can get much better situational awareness of your infrastructure, and much more easily identify the parts of your infrastructure that could be vulnerable to the next CVE when you take advantage of Puppet Enterprise’s Package Inspector and the PuppetDB CLI tool, or the PuppetDB query APIs.

If you have questions, please visit the Ask Puppet site or send an email to the pe-users mailing list.

Learn More

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

pquery ,puppet ,big data ,searching ,api ,cve ,vulnerabilities ,data security

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}