Auto Remediation of GuardDuty Findings for a Compromised ECS Cluster in AWSVPC Network Mode
The solution described in this blog will help to quarantine the EC2 instance and also the ECS cluster running on it in the case of Malware attack.
Join the DZone community and get the full member experience.
Join For FreeSummary
It is of utmost importance for enterprises to protect their IT workloads, running either on AWS or other clouds, against a broad range of malware (including computer viruses, worms, spyware, botnet software, ransomware, etc.
AWS GuardDuty Malware Protection service helps customers detect those malicious files in an agent-less mechanism. Once the findings are received, the customers need to automate the process of taking necessary remediation actions. When ECS/MaliciousFile
finding types are received for Amazon ECS clusters running on Amazon EC2 instances; there is more than one way of remediating based on the network mode of ECS tasks in the cluster.
When tasks are running with a bridge or host, the remediating process is relatively simple and requires attaching a security group with no inbound and outbound rules to the underlying EC2 instance. Remediation becomes more complex when tasks are running in awsvpc network mode. This blog will show how to leverage AWS Lambda and AWS EventBridge to automatically isolate an infected ECS Cluster running on EC2 instances in awsvpc network mode.
Prerequisites
2 AWS accounts using AWS Organization, 1 as a root account and another as a member account
GuardDuty should be enabled on both accounts, and the root account should be assigned as an admin account for GuardDuty.
GuardDuty Malware Protection is enabled on the accounts
2 AWS Profiles for using AWS CLI (this needs to be created on the m/c where the concepts described in this blog can be implemented), 1 for the root account and another for a member account, both configured with the user having Administrator Access policy
Limitations
The GuardDuty Malware Protection runs once in 24 hours. There is a wait time of 24 hours for the automatic remediation to trigger. This is not a near real-time solution.
Target Architecture
The GuardDuty-Tester project will be used to simulate a malicious actor in the ECS Cluster. The cloud formation stack provided with that project will set up the following infrastructure in the member account.
Amazon VPC with 1 private and 1 public subnet.
ECS cluster running on EC2 instances with default networking mode in the private subnet and a bastion host in the public subnet.
The following steps are required to be performed to run the ECS cluster in awsvpc
networking mode.
Edit the guardduty-tester.template
as per the instructions given below.
In the section on the definition of
taskdefinition:
ofType: 'AWS::ECS::TaskDefinition'
add the following NetworkMode configuration.YAML71NetworkMode'awsvpc'
2ExecutionRoleArn
3Fn::GetAtt ECSExecutionRole.Arn
4TaskRoleArn
5Fn::GetAtt TaskInstanceIAMRole.Arn
6RequiresCompatibilities
7EC2
In the section on the definition of
service:
ofType: 'AWS::ECS::Service'
add the following Network ConfigurationYAML71NetworkConfiguration
2AwsvpcConfiguration
3SecurityGroups
4!Ref RedTeamSecurityGroup
5Subnets
6!Ref PrivateSubnet
7AssignPublicIp DISABLED
Add the code snippet to create the role
ECSExecutionRole
YAML131ECSExecutionRole
2Type AWS IAM Role
3Properties
4Path /
5AssumeRolePolicyDocument
6Version"2012-10-17"
7Statement
8Effect"Allow"
9Action"sts:AssumeRole"
10Principal "Service""ecs-tasks.amazonaws.com"
11ManagedPolicyArns
12'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
13'arn:aws:iam::aws:policy/CloudWatchLogsFullAccess'
Add the code snippet given below to create the role
TaskInstanceIAMRole
YAML1TaskInstanceIAMRole
2Type AWS IAM Role
3Properties
4Path /
5AssumeRolePolicyDocument
6Version"2012-10-17"
7Statement
8Effect"Allow"
9Action"sts:AssumeRole"
10Principal "Service""ecs-tasks.amazonaws.com"
11ManagedPolicyArns
12'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
13'arn:aws:iam::aws:policy/CloudWatchLogsFullAccess'
14Policies
15PolicyName ECSTaskRole
16PolicyDocument
17Version"2012-10-17"
18Statement
19Effect"Allow"
20Action
21"cloudformation:List*"
22"cloudformation:Describe*"
23"cloudformation:Get*"
24Resource"*"
25Effect"Allow"
26Action
27"cloudwatch:PutMetricData"
28Resource"*"
29Effect"Allow"
30Action
31"ecs:DescribeTaskDefinition"
32"ecs:DescribeTasks"
33Resource"*"
34Effect"Allow"
35Action
36"ec2:DescribeSubnets"
37Resource"*"
Add the code snippet below to create the
ECSCrossAccountRole
, which will be assumed by the Remediation Lambda function to modify the security groups during remediation.YAML311ECSCrossAccountRole
2Type AWS IAM Role
3Properties
4Path /
5AssumeRolePolicyDocument
6Version"2012-10-17"
7Statement
8Effect"Allow"
9Action"sts:AssumeRole"
10Principal "AWS""arn:aws:sts::<admin_account_no>:assumed-role/<remediation lambda role>/<remdiation lambda name>"
11Policies
12PolicyName ECSCrossAccountPolicy
13PolicyDocument
14Version"2012-10-17"
15Statement
16Effect"Allow"
17Action
18"ecs:ListServices"
19"ecs:UpdateService"
20Resource"*"
21Effect"Allow"
22Action
23"ec2:CreateSecurityGroup"
24"ec2:ModifyNetworkInterfaceAttribute"
25"ec2:RevokeSecurityGroupEgress"
26"ec2:RevokeSecurityGroupIngress"
27"ec2:DescribeNetworkInterfaces"
28"ec2:DescribeSecurityGroupRules"
29"ec2:DeleteSecurityGroup"
30"ec2:DescribeSecurityGroups"
31Resource"*"
Cross-account role to be assumed from a Lambda function running in an Admin account.
As part of the remediation actions, the following components need to be created in the Admin account:
Event-bridge rule to capture “ECS/MaliciousFile” findings and trigger Lambda function.
Lambda function to assume the cross-account role and isolate the infected instances.
Unlike other network modes of running ECS tasks (e.g., host, where the host network is used, or bridge, where there dockers in the built network are leveraged), tasks are allocated their own elastic network interface (ENI) and a primary private IPv4 address when running in awsvpc network mode. Since these ENIs are created by AWS, it is not allowed to change the security group associated with them. Hence the EC2s approach of quarantining the ECS cluster and its tasks doesn’t work for this configuration. To quarantine these tasks, one has to iterate through the list of a security group associated with each ENI and explicitly remove the inbound and outbound rules. The section below described the steps for achieving the same.
A simulated malicious actor logs into the Bastion Host and simulates placing malicious files within the ECS Cluster. Please follow the Step 1,2 and 3 provided in the README.md file of the GuardDuty-Tester project to simulate this.
If the pre-requisite steps are successfully implemented, then the following steps will happen automatically.
The GuardDuty Malicious Protection scans the member account, discovers the presence of a malicious file, and reports that in the form of an
Execution:ECS/MaliciousFile
findings. The below screenshots will validate the same:The finding is pushed to GuardDuty in the Admin account.
GuardDuty findings in the Admin account trigger a CloudWatch Event.
- The CloudWatch Event triggers a rule to invoke the Remediation Lambda.
Remediation Lambda does the following steps.
Assumes a role in the member account which has all the required permissions
-
Python
- 121
sts_connection = boto3.client('sts')
2account_no = os.getenv('CHILD_ACCOUNT')
3acct_b = sts_connection.assume_role(
4RoleArn=f"arn:aws:iam::{account_no}:role/<role-name>",
5RoleSessionName="cross_acct_lambda"
6)
7print('acct_b',acct_b)
8
9ACCESS_KEY = acct_b['Credentials']['AccessKeyId']
10SECRET_KEY = acct_b['Credentials']['SecretAccessKey']
11SESSION_TOKEN = acct_b['Credentials']['SessionToken']
12
-
Gets the list of services running on the ECS Cluster.
-
Python
- 341
cluster = event_dict.get('detail').get('resource').get('ecsClusterDetails').get('arn')
2#response.get('clusterArns')[0]
3print('cluster:',cluster)
4
5ecs = boto3.client('ecs',
6aws_access_key_id=ACCESS_KEY,
7aws_secret_access_key=SECRET_KEY,
8aws_session_token=SESSION_TOKEN)
9
10response = ecs.list_services(
11cluster=cluster,
12launchType='EC2',
13schedulingStrategy='REPLICA'
14)
15
16service = response.get('serviceArns')[0]
17
18for service in response.get('serviceArns'):
19
20print('service:',service)
21
22for networkInterface in event_dict.get('detail').get('resource').get('instanceDetails').get('networkInterfaces'):
23
24vpc_id = networkInterface.get('vpcId')
25subnet = networkInterface.get('subnetId')
26
27eni_id = networkInterface.get('networkInterfaceId')
28
29
30
31
3233
34
-
Creates a security group with no inbound and outbound rule.
# create service client using the assumed role credentials, e.g. S3 ec2 = boto3.client( 'ec2', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, aws_session_token=SESSION_TOKEN, ) response = ec2.describe_security_groups( Filters=[ { 'Name': 'vpc-id', 'Values': [vpc_id] } ] ) print('response sg', response) security_group_id = "" for sg in response.get('SecurityGroups'): if sg.get('GroupName') == 'sg_quarantine': security_group_id = sg.get('GroupId') print('qurantined sg',security_group_id) if security_group_id == None or security_group_id == "": print('new sg created') response = ec2.create_security_group(GroupName='sg_quarantine', Description='quarantine security group', VpcId=vpc_id) security_group_id = response['GroupId'] print('new qurantined sg',security_group_id)
-
Python
- x1
def remove_all_permission(security_group_id,ec2):
2response = ec2.describe_security_group_rules(
3Filters=[
4{
5'Name': 'group-id',
6'Values': [
7security_group_id,
8]
9},
10],
11
12DryRun=False
13)
14
15print('sg rules',response)
16
17for rule in response.get('SecurityGroupRules'):
18sg_ruleid = rule.get('SecurityGroupRuleId')
19
20if rule.get('IsEgress'):
21response = ec2.revoke_security_group_egress(
22DryRun=False,
23GroupId=security_group_id,
24SecurityGroupRuleIds=[
25sg_ruleid,
26]
27)
28else:
29response = ec2.revoke_security_group_ingress(
30DryRun=False,
31GroupId=security_group_id,
32SecurityGroupRuleIds=[
33sg_ruleid,
34]
35)
36
37
38print('rule response',response)
-
Associates the security group with no inbound/outbound rule with the ECS service using the below code block.
-
Python
- 211
def update_sg_service(security_group_id,cluster,service,subnet,ecs ):
2try:
3print('service mapping of sg to be changed',service,subnet)
4response = ecs.update_service(
5cluster=cluster,
6service=service,
7networkConfiguration={
8'awsvpcConfiguration': {
9'subnets': [
10subnet
11],
12'securityGroups': [
13security_group_id
14]
1516}
17})
18print('response after s mapping change:',response)
19except ClientError as e:
20print('exception while service remediation',e)
21
-
Gets the list of network interfaces associated with the tasks.
Iterates through the list of network interfaces
Iterates through the list of security groups associated with each network interface.
Removes all inbound and outbound permission associated with the security group.
Python321def update_sg_eni(security_group_id,ec2,eni_id):
2
3print('eni mapping to be changed:',eni_id)
4try:
5response = ec2.describe_network_interfaces(
67DryRun=False,
8NetworkInterfaceIds=[
9eni_id,
10]
11)
12print("Owner", response.get('NetworkInterfaces')[0].get('RequesterManaged'))
13if response.get('NetworkInterfaces')[0].get('RequesterManaged') == False:
14eni_response = ec2.modify_network_interface_attribute(
1516DryRun=False,
17Groups=[
18security_group_id
19],
20NetworkInterfaceId=eni_id
21)
2223print('response after eni mapping change:',eni_response)
24else:
25for group in response.get('NetworkInterfaces')[0].get('Groups'):
26print('before removing all permisions')
27remove_all_permission(group.get('GroupId'),ec2)
28
2930except ClientError as e:
31print('exception while eni remediation',e)
32
- The following screenshots taken after the Remediation lambda has successfully runs validates that the ECS cluster has been completely quarantined.
- Shows the ECS cluster created by the Tester project and highlights the VPC, Subnet, and associated Security Group.
2. Please note the Security Group Id in the above screenshot is actually the id of the sg_quarantine
created with no inbound and outbound rule.
3. Search for the guardduty-tester-RedTeamSecurityGroup
in the EC2->Security Groups, and you should see the no inbound and outbound rule for that security group.
Conclusion
The solution described in this blog will help to quarantine the EC2 instance and also the ECS cluster running on it in the case of Malware attack. The auto-remediation helps check the spread of the Malware within the network, and the quarantined instance can then be later inspected for more details or to run some forensics on it.
Opinions expressed by DZone contributors are their own.
Comments