DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • AWS Cloud Security: Key Components, Common Vulnerabilities, and Best Practices
  • Setting Up CORS and Integration on AWS API Gateway Using CloudFormation
  • Pilot VPC and Advanced NAT: Securely Connect Overlapping Networks to AWS VPC
  • Deploying Dockerized Applications on AWS Lambda: A Step-by-Step Guide

Trending

  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI
  • Why Pass/Fail CI Pipelines Are Insufficient for Enterprise Release Decisions
  • The Hidden Bottlenecks That Break Microservices in Production
  • Run Gemma 4 on Your Laptop: A Hands-On Guide to Google's Latest Open Multimodal LLM
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Auto Remediation of GuardDuty Findings for a Compromised ECS Cluster in AWSVPC Network Mode

Auto Remediation of GuardDuty Findings for a Compromised ECS Cluster in AWSVPC Network Mode

The solution described in this blog will help to quarantine the EC2 instance and also the ECS cluster running on it in the case of Malware attack.

By 
Joyanta Banerjee user avatar
Joyanta Banerjee
·
Feb. 23, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
5.8K Views

Join the DZone community and get the full member experience.

Join For Free

Summary

It is of utmost importance for enterprises to protect their IT workloads, running either on AWS or other clouds, against a broad range of malware (including computer viruses, worms, spyware, botnet software, ransomware, etc. 

AWS GuardDuty Malware Protection service helps customers detect those malicious files in an agent-less mechanism. Once the findings are received, the customers need to automate the process of taking necessary remediation actions. When ECS/MaliciousFile finding types are received for Amazon ECS clusters running on Amazon EC2 instances; there is more than one way of remediating based on the network mode of ECS tasks in the cluster. 

When tasks are running with a bridge or host, the remediating process is relatively simple and requires attaching a security group with no inbound and outbound rules to the underlying EC2 instance. Remediation becomes more complex when tasks are running in awsvpc network mode. This blog will show how to leverage AWS Lambda and AWS EventBridge to automatically isolate an infected ECS Cluster running on EC2 instances in awsvpc network mode.

Prerequisites 

  • 2 AWS accounts using AWS Organization, 1 as a root account and another as a member account 

  • GuardDuty should be enabled on both accounts, and the root account should be assigned as an admin account for GuardDuty.

  • GuardDuty Malware Protection is enabled on the accounts 

  • 2 AWS Profiles for using AWS CLI (this needs to be created on the m/c where the concepts described in this blog can be implemented), 1 for the root account and another for a member account, both configured with the user having Administrator Access policy 

  • NPM (version <=18)  and  Python are installed

Limitations 

The GuardDuty Malware Protection runs once in 24 hours. There is a wait time of 24 hours for the automatic remediation to trigger. This is not a near real-time solution.

Target Architecture 

The GuardDuty-Tester project will be used to simulate a malicious actor in the ECS Cluster. The cloud formation stack provided with that project will set up the following infrastructure in the member account. 

  1. Amazon VPC with 1 private and 1 public subnet.

  2. ECS cluster running on EC2 instances with default networking mode in the private subnet and a bastion host in the public subnet. 

The following steps are required to be performed to run the ECS cluster in awsvpc networking mode.

Edit the guardduty-tester.template as per the instructions given below.

  1. In the section on the definition of taskdefinition: of Type: 'AWS::ECS::TaskDefinition' add the following NetworkMode configuration.

    YAML
     
    NetworkMode: 'awsvpc'
          ExecutionRoleArn:
            Fn::GetAtt: ECSExecutionRole.Arn
          TaskRoleArn:
            Fn::GetAtt: TaskInstanceIAMRole.Arn
          RequiresCompatibilities:
            - EC2


  2. In the section on the definition of service: of Type: 'AWS::ECS::Service'add the following Network Configuration

    YAML
     
    NetworkConfiguration:
            AwsvpcConfiguration:
              SecurityGroups:
                - !Ref RedTeamSecurityGroup
              Subnets: 
                - !Ref PrivateSubnet
              AssignPublicIp: DISABLED


  3. Add the code snippet to create the role ECSExecutionRole

    YAML
     
    ECSExecutionRole:
        Type: AWS::IAM::Role
        Properties:
          Path: /
          AssumeRolePolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: "Allow"
                Action: "sts:AssumeRole"
                Principal: { "Service": "ecs-tasks.amazonaws.com"}
          ManagedPolicyArns:
            - 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
            - 'arn:aws:iam::aws:policy/CloudWatchLogsFullAccess'


  4. Add the code snippet given below to create the role TaskInstanceIAMRole

    YAML
     
    TaskInstanceIAMRole:
        Type: AWS::IAM::Role
        Properties:
          Path: /
          AssumeRolePolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: "Allow"
                Action: "sts:AssumeRole"
                Principal: { "Service": "ecs-tasks.amazonaws.com"}
          ManagedPolicyArns:
            - 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
            - 'arn:aws:iam::aws:policy/CloudWatchLogsFullAccess'
          Policies:
            - PolicyName: ECSTaskRole
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  - Effect: "Allow"
                    Action:
                     - "cloudformation:List*"
                     - "cloudformation:Describe*"
                     - "cloudformation:Get*"
                    Resource: "*"
                  - Effect: "Allow"
                    Action:
                     - "cloudwatch:PutMetricData"
                    Resource: "*"
                  - Effect: "Allow"
                    Action:
                     - "ecs:DescribeTaskDefinition"
                     - "ecs:DescribeTasks"
                    Resource: "*"
                  - Effect: "Allow"
                    Action:
                     - "ec2:DescribeSubnets"
                    Resource: "*" 


  5. Add the code snippet below to create the ECSCrossAccountRole, which will be assumed by the Remediation Lambda function to modify the security groups during remediation. 

    YAML
     
    ECSCrossAccountRole:
        Type: AWS::IAM::Role
        Properties:
          Path: /
          AssumeRolePolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: "Allow"
                Action: "sts:AssumeRole"
                Principal: { "AWS": "arn:aws:sts::<admin_account_no>:assumed-role/<remediation lambda role>/<remdiation lambda name>"}      
          Policies:
            - PolicyName: ECSCrossAccountPolicy
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  - Effect: "Allow"
                    Action:
                     - "ecs:ListServices"
                     - "ecs:UpdateService"                
                    Resource: "*"
                  - Effect: "Allow"
                    Action:
                     - "ec2:CreateSecurityGroup"
                     - "ec2:ModifyNetworkInterfaceAttribute"
                     - "ec2:RevokeSecurityGroupEgress"
                     - "ec2:RevokeSecurityGroupIngress"
                     - "ec2:DescribeNetworkInterfaces"
                     - "ec2:DescribeSecurityGroupRules"
                     - "ec2:DeleteSecurityGroup"
                     - "ec2:DescribeSecurityGroups"
                    Resource: "*"


  6. Cross-account role to be assumed from a Lambda function running in an Admin account.

As part of the remediation actions, the following components need to be created in the Admin account:

  1.  Event-bridge rule to capture “ECS/MaliciousFile” findings and trigger Lambda function.

  2.  Lambda function to assume the cross-account role and isolate the infected instances.

Unlike other network modes of running ECS tasks (e.g., host, where the host network is used, or bridge, where there dockers in the built network are leveraged), tasks are allocated their own elastic network interface (ENI) and a primary private IPv4 address when running in awsvpc network mode. Since these ENIs are created by AWS, it is not allowed to change the security group associated with them. Hence the EC2s approach of quarantining the ECS cluster and its tasks doesn’t work for this configuration. To quarantine these tasks, one has to iterate through the list of a security group associated with each ENI and explicitly remove the inbound and outbound rules. The section below described the steps for achieving the same.

Steps

  1. A simulated malicious actor logs into the Bastion Host and simulates placing malicious files within the ECS Cluster. Please follow the Step 1,2 and 3 provided in the README.md file of the GuardDuty-Tester project to simulate this.

  2. If the pre-requisite steps are successfully implemented, then the following steps will happen automatically.

    1. The GuardDuty Malicious Protection scans the member account, discovers the presence of a malicious file, and reports that in the form of an Execution:ECS/MaliciousFile findings. The below screenshots will validate the same:Malicious File Discovery

    2. The finding is pushed to GuardDuty in the Admin account.

    3. GuardDuty findings in the Admin account trigger a CloudWatch Event.

    4. The CloudWatch Event triggers a rule to invoke the Remediation Lambda. 
  3. Remediation Lambda does the following steps.

    1. Assumes a role in the member account which has all the required permissions

      • Python
       
      • sts_connection = boto3.client('sts')
        account_no = os.getenv('CHILD_ACCOUNT')
        acct_b = sts_connection.assume_role(
          RoleArn=f"arn:aws:iam::{account_no}:role/<role-name>",
          RoleSessionName="cross_acct_lambda"
        )
        print('acct_b',acct_b)
        
        ACCESS_KEY = acct_b['Credentials']['AccessKeyId']
        SECRET_KEY = acct_b['Credentials']['SecretAccessKey']
        SESSION_TOKEN = acct_b['Credentials']['SessionToken']
        


    2. Gets the list of services running on the ECS Cluster.

      • Python
       
      • cluster = event_dict.get('detail').get('resource').get('ecsClusterDetails').get('arn')
                #response.get('clusterArns')[0] 
        print('cluster:',cluster)
        
        ecs = boto3.client('ecs',
                             aws_access_key_id=ACCESS_KEY,
                             aws_secret_access_key=SECRET_KEY,
                             aws_session_token=SESSION_TOKEN)
        
        response = ecs.list_services(
            cluster=cluster,
            launchType='EC2',
            schedulingStrategy='REPLICA'
        )
        
        service = response.get('serviceArns')[0]
        
        for service in response.get('serviceArns'): 
        
          print('service:',service)
        
          for networkInterface in event_dict.get('detail').get('resource').get('instanceDetails').get('networkInterfaces'):
        
            vpc_id = networkInterface.get('vpcId')
            subnet = networkInterface.get('subnetId')
        
            eni_id = networkInterface.get('networkInterfaceId')
        
        
        
        
            
        
        


    3. Creates a security group with no inbound and outbound rule.

      1.  
      1. # create service client using the assumed role credentials, e.g. S3
            ec2 = boto3.client(
              'ec2',
              aws_access_key_id=ACCESS_KEY,
              aws_secret_access_key=SECRET_KEY,
              aws_session_token=SESSION_TOKEN,
            )
        
            response = ec2.describe_security_groups(
              Filters=[
                {
                  'Name': 'vpc-id',
                  'Values': [vpc_id]
                }
              ]
            )
        
        
            print('response sg', response)
            security_group_id = ""
            for sg in response.get('SecurityGroups'):
              if sg.get('GroupName') == 'sg_quarantine':
                security_group_id = sg.get('GroupId')
        
                print('qurantined sg',security_group_id)
        
        
                if security_group_id == None or security_group_id == "":
                  print('new sg created')
                  response = ec2.create_security_group(GroupName='sg_quarantine',
                                                       Description='quarantine security group',
                                                       VpcId=vpc_id)    
        
                  security_group_id = response['GroupId']   
                  print('new qurantined sg',security_group_id)
                  
                  

       

      1. Python
       
      1. def remove_all_permission(security_group_id,ec2):
            response = ec2.describe_security_group_rules(
              Filters=[
                {
                  'Name': 'group-id',
                  'Values': [
                    security_group_id,
                  ]
                },
              ],
        
              DryRun=False
            )
        
            print('sg rules',response)
        
            for rule in response.get('SecurityGroupRules'):
              sg_ruleid = rule.get('SecurityGroupRuleId')
        
              if rule.get('IsEgress'):
                response = ec2.revoke_security_group_egress(
                  DryRun=False,
                  GroupId=security_group_id,
                  SecurityGroupRuleIds=[
                    sg_ruleid,
                  ]
                )
              else:
                response = ec2.revoke_security_group_ingress(
                  DryRun=False,
                  GroupId=security_group_id,
                  SecurityGroupRuleIds=[
                    sg_ruleid,
                  ]
                )
        
        
              print('rule response',response) 


    4. Associates the security group with no inbound/outbound rule with the ECS service using the below code block.

      1. Python
       
      1. def update_sg_service(security_group_id,cluster,service,subnet,ecs ):    
            try:    
                print('service mapping of sg to be changed',service,subnet)
                response = ecs.update_service(
                cluster=cluster,
                service=service,
                networkConfiguration={
                    'awsvpcConfiguration': {
                        'subnets': [
                            subnet
                        ],
                        'securityGroups': [
                            security_group_id
                        ]
                        
                    }
                })
                print('response after s mapping change:',response)
            except ClientError as e:
                print('exception while service remediation',e)
        


    5. Gets the list of network interfaces associated with the tasks.

    6. Iterates through the list of network interfaces

      1. Iterates through the list of security groups associated with each network interface.

        1. Removes all inbound and outbound permission associated with the security group.

          Python
           
          def update_sg_eni(security_group_id,ec2,eni_id):  
          
              print('eni mapping to be changed:',eni_id)      
              try:    
                  response = ec2.describe_network_interfaces(
                      
                      DryRun=False,
                      NetworkInterfaceIds=[
                          eni_id,
                      ]
                  )
                  print("Owner", response.get('NetworkInterfaces')[0].get('RequesterManaged'))
                  if response.get('NetworkInterfaces')[0].get('RequesterManaged') == False:
                      eni_response = ec2.modify_network_interface_attribute(
                          
                          DryRun=False,
                          Groups=[
                              security_group_id
                          ],
                          NetworkInterfaceId=eni_id
                      )
                  
                      print('response after eni mapping change:',eni_response)
                  else:
                      for group in response.get('NetworkInterfaces')[0].get('Groups'):
                          print('before removing all permisions')
                          remove_all_permission(group.get('GroupId'),ec2)                
          
              
              except ClientError as e:
                  print('exception while eni remediation',e)
          

    7. The following screenshots taken after the Remediation lambda has successfully runs validates that the ECS cluster has been completely quarantined. 
      1. Shows the ECS cluster created by the Tester project and highlights the VPC, Subnet, and associated Security Group.

Shows the ECS cluster created by the Tester project and highlights the VPC, Subnet, and associated Security Group.

                      2. Please note the Security Group Id in the above screenshot is actually the id of the          sg_quarantine created with no inbound and outbound rule.


no inbound and outbound rule for that security group

                  3.  Search for the guardduty-tester-RedTeamSecurityGroup in the EC2->Security Groups, and you should see the no inbound and outbound rule for that security group.

Search for the guardduty-tester-RedTeamSecurityGroup

Conclusion

The solution described in this blog will help to quarantine the EC2 instance and also the ECS cluster running on it in the case of Malware attack. The auto-remediation helps check the spread of the Malware within the network, and the quarantined instance can then be later inspected for more details or to run some forensics on it.

AWS AWS Lambda Entity component system Malware Docker (software) Network security Task (computing)

Opinions expressed by DZone contributors are their own.

Related

  • AWS Cloud Security: Key Components, Common Vulnerabilities, and Best Practices
  • Setting Up CORS and Integration on AWS API Gateway Using CloudFormation
  • Pilot VPC and Advanced NAT: Securely Connect Overlapping Networks to AWS VPC
  • Deploying Dockerized Applications on AWS Lambda: A Step-by-Step Guide

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook