Solving for Endpoint Compliance in a Cloud-First Landscape
Solving for Endpoint Compliance in a Cloud-First Landscape
In a cloud-based BYOD environment, finding the right security solution is a top priority.
Join the DZone community and get the full member experience.Join For Free
LifeOmic is a decidedly “cloud-first” startup company. We’re all-in on AWS services, delivering most of our applications on serverless technology stacks. We leverage SaaS services whenever we can and federate identity and access to these services via Okta. We have an almost-zero on-premises technology footprint. This frees us to focus our security time and effort where it matters most: the data in the cloud.
LifeOmic’s evolving Zero-Trust approach to securing cloud data is informed by our Top Ten security principles. In particular, we’ve chosen to air-gap our production environments: our engineers cannot directly access production data under normal operating conditions. To do so requires a heavyweight and highly-audited “break glass” emergency procedure to be followed. Also, we believe that security that isn’t usable is worthless. We work hard to make security invisible, automatic, and where possible, pleasant. This means aligning natural incentives with our security policies, so that people want to do the secure thing.
As an emerging leader in the precision health space, LifeOmic is HIPAA-compliant and its Precision Health Cloud platform is HITRUST CSF certified. One of our mandatory regulatory controls is the ability to provide evidence that our endpoint device configurations comply with certain screensaver, firewall, disk encryption, and security patch settings.
We’re a developer-focused company in many ways, and our developers value the freedom to use the hardware, operating system, and tools of their choice to get their work done. For our company, those choices have meant almost 100% laptops. Mostly Macs, with a growing number of Linux devices, and very few Windows devices. LifeOmic is decidedly “Bring Your Own Device” (BYOD) when it comes to IT: aside from a few one-time settings changes and a few applications we ask employees to install at onboarding, we are hands-off when it comes to our employee’s devices. Since we’ve put so many other controls in place around securing access to our data, granting our employees this much freedom isn’t such a crazy choice. In fact, it’s liberating! It does, however, demand a lightweight strategy for endpoint compliance, which could easily be in a natural state of tension with our decentralized provisioning and minimal management approach.
This might be summarized as “How do we flexibly demonstrate compliance for user-controlled devices in a pleasant and usable way?”
Iteration One: Chef InSpec
Our first iteration of this strategy in late 2017 sought to leverage Chef InSpec, which provides a concise language for describing security and compliance rules, and a mechanism for checking to ensure they are followed. We really liked the concise and expressive DSL it provided, and so began to look for ways to execute InSpec profiles on our endpoints. A quick proof-of-concept seemed to indicate that we could indeed leverage the
inspec CLI tool to generate evidence. Where, then, to store the output? The natural choice at the time seemed to be the Chef Compliance Server which is now deprecated in favor of Chef Automate. It provided an API for storing and retrieving InSpec profiles, as well as storing and retrieving results from individual InSpec audit runs, and some nice eye-candy in the form of compliance dashboards and graphs. At the time, there was no hosted SaaS offering for this service.
Standing up Chef Compliance required a dedicated EC2 instance that was not autoscalable. This introduced a single point of failure (SPOF), and cut against the grain of LifeOmic’s otherwise nearly instance-free, serverless architecture. Once deployed, however, we discovered that certain technical details at the time prevented
inspec from reporting directly to Compliance Server. They could be made to talk to each other, but it was necessary to add
chef-client as a dependency in order to run the Audit cookbook, which itself required further dependencies in the form of Ruby gems.
Since this was the company’s only use of Chef, there was no need or desire to use Chef Server. We still needed a lightweight update mechanism for our agents, however, so we created a “cookbook-updating-cookbook,” and inserted that into the
chef-client runlist alongside the Audit cookbook. It would ensure the latest cookbooks were retrieved from S3 prior to running Audit.
With these mechanisms in place, we set out to implement the compliance checks needed for our HITRUST assessment in early 2018. We quickly found that OS
resources available via the InSpec DSL did not have exactly what we needed. This forced us to frequently “shell out” to system binaries with InSpec’s
command syntax, do string comparisons on the command output, and include many conditional OS platform and version checks. This meant that we lost most of the benefits of the beautiful DSL that InSpec offered.
Once the rules were developed, Chef Server was configured properly, and once we installed
inspec and all the
audit cookbook dependencies on all our laptops, we were able to get HITRUST-specific compliance data flowing and generate some pretty charts.
This mechanism worked — indeed, we passed our Q1 2018 HITRUST assessment with flying colors — but we knew immediately that we wanted to overhaul our approach. Like fixing the proverbial “hole in the bucket” every step along the journey of this first iteration seemed to add additional complexity and overhead, and the sum was not worth its weight in parts. This is in no way a knock against Chef or any of their products. The “Chef Way” just wasn’t a good fit for our environment (BYOD), and appetite for operational overhead (averse). In particular, we weren’t happy with the SPOF introduced by Compliance Server, and didn’t want or need full administrative control over the configuration of our endpoint devices via
Iteration Two: Osquery + SumoLogic
Our next idea was to leverage Osquery, which allows you to query your host device like a SQL database. We liked the cross-platform nature of the tool, and while using SQL to query for operating system details might seem a bit arbitrary, it is easy to reason about and is a tested and sensible choice for a “lingua franca.”
Osquery supported most of our evidence requirements out-of-the-box, and it was pretty easy to construct query packages for the tool. While we were considering where the query results would be stored and analyzed, we evaluated two Osquery-related SaaS offerings: Kolide Fleet and Uptycs. These were both powerful security analytics tools, but since we were hard at work building out our own JupiterOne security platform, we weren’t really in the market for another security analysis tool. We decided to take a lightweight approach and simply have
osqueryd periodically execute our query packages and log the results to disk. We would then forward these logs to SumoLogic, a SaaS log-monitoring and SIEM tool we were already using.
The technical implementation was straightforward, and only required the installation and configuration of Osquery and one other dependency, Fluentd for log forwarding.
Once we had log data flowing to SumoLogic, it was pretty simple to create dashboards for realtime monitoring of the log results.
This worked fairly well as a lightweight approach to endpoint compliance monitoring, but there were some sharp edges we weren’t satisfied with:
Screensaver settings seemed broken for some versions of MacOS.
Administratoraccess to run as a system service, and to access almost anything of interest on Linux.
Query performance was hard to troubleshoot, and extending
Steps needed to remediate individual endpoints were opaque to the end-user.
We wanted this compliance data to flow through our JupiterOne platform.
What we really wanted was a lightweight, user-friendly way to notify the user when their machine was out-of-compliance, empowering and informing them with specific instructions for remediating just those bits that need attention. This tool shouldn’t need excessive privileges in order to do that, and it should report its findings to our own JupiterOne platform for asset inventory, compliance reporting and security analysis use.
Iteration Three: Stethoscope + JupiterOne
Enter Netflix’s Stethoscope app. This project’s goals and features aligned perfectly with our own needs.
Stethoscope empowers the user to maintain their own compliance readiness in a transparent and non-invasive way. It does not need administrative privileges to run, and won’t accidentally launch CPU-consuming queries. In addition to the clean UI, it provides a GraphQL API running on
localhost to interface with.
Stethoscope targets just the security compliance checks needed for common frameworks, and accepts a simple policy configuration as
JSON that specifies what configuration values, application, and OS versions, etc., should pass compliance.
When we started using Stethoscope in earnest, Linux was not supported. It was very easy to contribute to this open-source project and add the support we needed for our use. Big shout-out to Rob McVey at Netflix for being so responsive and providing such great feedback and help!
Rather than fork Stethoscope to add custom integration logic for JupiterOne, we chose to ship a sidecar agent, written in Golang for easy cross-platform distribution. This agent is responsible for:
performing initial one-time activation/registration with JupiterOne
retrieving the Stethoscope policy configuration associated with that endpoint’s JupiterOne account
daemonizing as a background system service
periodically hitting the localhost GraphQL endpoint exposed by Stethoscope to scan the device
reporting those GraphQL results back to JupiterOne
On the administrative side, JupiterOne provides a simple configuration pane for specifying the policy each device should be bound to, how often it should execute, and which email addresses you’d like to send invite/activation emails to:
Stethoscope (via Electron) provides an easy auto-upgrade mechanism, and since the minimum required
stethoscopeVersion may be configured via policy, users will be notified via Stethoscope’s native OS notification mechanism when their agent requires update. This minimizes hassle should a security update become necessary in Stethoscope itself.
Endpoint Security Analysis With JupiterOne
As a SecOps engineer, it is easy to ask JupiterOne for our current compliance status (names changed to protect the "innocent").
Find Person that OWNS Device that MONITORS HostAgent with compliant=false return tree
Looks like we currently have four non-compliant users. Drilling down into any of those graph nodes gives more detail, showing individual compliance check results. Let’s pick host
Inspecting these results, we find the host above seems to allow remote login, which violates our policy. That might be interesting, but let’s ask JupiterOne for all non-compliant users who also have access to AWS, which is definitely worth digging into:
Find HostAgent with compliant=false that MONITORS Device that OWNS Person that IS User that ASSIGNED Application with shortName = 'aws' return tree
Only one result now, and it’s the same
Stormbreaker host. Time to have a quick chat with this user, and perhaps set up an alert based on this query.
Of course, JupiterOne’s powerful and expressive query language enables us to ask lots of other useful questions related to real-time endpoint compliance, and see those results in graph, table, or JSON format.
As a cloud-first, developer-heavy startup, finding the right tradeoffs for endpoint compliance has been an iterative process. We’re quite pleased with the results so far, and excited about using and contributing back to the active Stethoscope open-source project going forward.
Our stethoscope-powered compliance evidence helped us pass our 2019 HITRUST re-certification at LifeOmic, and we’re now able to offer the same “easy button” to our JupiterOne users, with installation options for recent Mac, Windows 10, and Ubuntu Linux platforms.
If you’d like to maximize your security efforts and reduce security operations complexity, check out JupiterOne. And if you’d like work on security that matters, solving for real-world security, compliance, and operations issues–we’re hiring!
Published at DZone with permission of Ryan Hilliard . See the original article here.
Opinions expressed by DZone contributors are their own.