DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

AWS Cloud

brought to you by AWS Developer Relations

AWS Cloud is built for developers to create, innovate, collaborate, and turn ideas into reality. It provides you with an environment that you can tailor based on your application requirements. The content and resources in this Partner Zone are custom-made to support your development and IT initiatives by allowing you to get a hands-on experience with cloud technologies. Leverage these content resources to inspire your own cloud-based designs, and ensure that your SaaS projects are set up for success.

icon

DZone's Featured AWS Cloud Resources

Modernize and Gradually Migrate Your Data Model From SQL to NoSQL

Modernize and Gradually Migrate Your Data Model From SQL to NoSQL

By Rashmi Nambiar
This article was authored by AWS Sr. Specialist SA Alexander Schueren and published with permission. We all like to build new systems, but there are too many business-critical systems that need to be improved. Thus, constantly evolving system architecture remains a major challenge for engineering teams. Decomposing the monolith is not a new topic. Strategies and techniques like domain-driven design and strangler patterns shaped the industry practice of modernization. NoSQL databases became popular for modernization projects. Better performance, flexible schemas, and cost-effectiveness are key reasons for adoption. They scale better and are more resilient than traditional SQL databases. Using a managed solution and reducing operational overhead is a big plus. But moving data is different: it’s messy and there are many unknowns. How do you design the schema, keep the data consistent, handle failures, or roll back? In this article, we will discuss two strategies that can help transition from SQL to NoSQL more smoothly: change data capture and dual-writes. Continuous Data Migration With agile software development, we now ship small batches of features every week instead of having deployment events twice a year, followed by fragile hotfixes and rollbacks. However, with data migrations, there is a tendency to migrate all the data at once. Well, most of the data migrations are homogenous (SQL to SQL), so the data structure remains compatible. Thus, many commercial tools can convert the schema and replicate data. But migrating from SQL to NoSQL is different. It requires an in-depth analysis of the use case and the access pattern to design a new data model. Once we have it, the challenge is to migrate data continuously and catch and recover from failures. What if we can migrate a single customer record, or ten customers from a specific region, or a specific product category? To avoid downtime, we can migrate the data continuously by applying the migration mechanism to a small subset of data. Over time we gain confidence, refine the mechanism, and expand to a larger dataset. This will ensure stability and we can also capitalize on the better performance or lower cost much earlier. Change Data Capture Change data capture (CDC) is a well-established and widely used method. Most relational database management systems (RDBMS) have an internal storage mechanism to collect data changes, often called transaction logs. Whenever you write, update, or delete data, the system captures this information. This is useful if you want to roll back to a previous state, move back in time or replicate data. We can hook into the transaction log and forward the data changes to another system. When moving data from SQL to AWS database services, such as Amazon RDS, AWS Database Migration Service (AWS DMS) is a popular choice. In combination with the schema conversion tool, you can move from Microsoft SQL or Oracle server to an open-source database, such as PostgreSQL or MySQL. But with DMS, we can also move from SQL to NoSQL databases, such as Amazon DynamoDB, S3, Neptune, Kinesis, Kafka, OpenSearch, Redis, and many others. Here is how it works: Define the source and the target endpoints with the right set of permission for read and write operations. Create a task definition specifying the CDC migration process. Add a table mapping with the rule type object-mapping to specify the partition key and attributes for your DynamoDB table. Here is an example of a mapping rule in AWS DMS: { "rules": [ { "rule-type": "object-mapping", "rule-id": "1", "rule-name": "TransformToDDB", "object-locator": { "schema-name": "source-schema", "table-name": "customer" }, "rule-action": "map-record-to-record", "target-table-name": "customer", "mapping-parameters": [ { "partition-key-name": "CustomerName", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "${FIRST_NAME},${LAST_NAME}" }, { "target-attribute-name": "ContactDetails", "attribute-type": "document", "attribute-sub-type": "dynamodb-map", "value": "..." } ] } ] } This mapping rule will copy the data from the customer table and combine FIRST_NAME and LAST_NAME to a composite hash key, and add ContactDetails column with a DynamoDB map structure. For more information, you can see other object-mapping examples in the documentation. One of the major advantages of using CDC is that it allows for atomic data changes. All the changes made to a database, such as inserts, updates, and deletes, are captured as a single transaction. This ensures that the data replication is consistent, and with a transaction rollback, CDC will propagate these changes to the new system as well. Another advantage of CDC is that it does not require any application code changes. There might be situations when the engineering team can’t change the legacy code easily; for example, with a slow release process or lack of tests to ensure stability. Many database engines support CDC, including MySQL, Oracle, SQL Server, and more. This means you don’t have to write a custom solution to read the transaction logs. Finally, with AWS DMS, you can scale your replication instances to handle more data volume, again without additional code changes. AWS DMS and CDC are useful for database replication and migration but have some drawbacks. The major concern is the higher complexity and costs to set up and manage a replication system. You will spend some time fine-tuning the DMS configuration parameters to get the best performance. It also requires a good understanding of the underlying databases, and it’s challenging to troubleshoot errors or performance issues, especially for those who are not familiar with the subtle details of the database engine, replication mechanism, and transaction logs. Dual Writes Dual writes is another popular approach to migrate data continuously. The idea is to write the data to both systems in parallel in your application code. Once the data is fully replicated, we switch over to the new system entirely. This ensures that data is available in the new system before the cutover, and it also keeps the door open to fall back to the old system. With dual writes, we operate on the application level, as opposed to the database level with CDC; thus, we use more compute resources and need a robust delivery process to change and release code. Here is how it works: Applications continue to write data to the existing SQL-based system as they would. A separate process often called a “dual-writer” gets a copy of the data that has been written to the SQL-based system and writes it to the DynamoDB after the transaction. The dual-writer ensures we write the data to both systems in the same format and with the same constraints, such as unique key constraints. Once the dual-write process is complete, we switch over to read from and write to the DynamoDB system. We can control the data migration and apply dual writes to some data by using feature flags. For example, we can toggle the data replication or apply only to a specific subset. This can be a geographical region, customer size, product type, or a single customer. Because dual writes are instrumented on the application level we don’t run queries against the database directly. We work on the object level in our code. This allows us to have additional transformation, validation, or enrichment of the data. But there are also downsides, code complexity, consistency, and failure handling. Using feature flags helps to control the flow, but we still need to write code, add tests, deploy changes, and have a feature flag store. If you are already using feature flags, this might be negligible; otherwise, it's a good chance to introduce feature flags to your system. Data consistency and failure handling are the primary beasts to tame. Because we copy data after the database transaction, there might be cases of rollbacks, and with dual write, you can miss this case. To counter that, you’d need to collect operational and business metrics to keep track of read and write operations to each system, which will increase confidence over time. Conclusion Modernization is unavoidable and improving existing systems will become more common in the future. Over the years, we have learned how to decouple monolithic systems and with many NoSQL database solutions, we can improve products with better performance and lower costs. CDC and dual-writes are solid mechanisms for migrating data models from SQL to NoSQL. While CDC is more database and infrastructure-heavy, with dual-writes we operate on a code level with more control over data segmentation, but with higher code complexity. Thus, it is crucial to understand the use case and the requirements when deciding which mechanism to implement. Moving data continuously between systems should not be difficult, so we need to invest more and learn how to adjust our data model more easily and securely. Chances are high that this is not the last re-architecting initiative you are doing, and building these capabilities will be useful in the future. Do You Like Video Tutorials? If you are more of a visual person who likes to follow the steps to build something with a nice recorded tutorial, then I would encourage you to subscribe to the Build On AWS YouTube channel. This is the place you should go to dive deep into code and architecture, with people from the AWS community sharing their expertise in a visual and entertaining manner. More
Deploying MWAA Using AWS CDK

Deploying MWAA Using AWS CDK

By Ricardo Sueiras
Introduction In this quick how-to guide, I will show you how you can use a Python AWS CDK application to automate the deployment and configuration of your Apache Airflow environments using Managed Workflows for Apache Airflow (MWAA) on AWS. What will you need: an AWS account with the right level of privileges a development environment with the AWS CDK configured and running (at the time of writing, you should be using AWS CDK v2) access to an AWS region where Managed Workflows for Apache Airflow is supported all code used in this how-to guide is provided in this GitHub repository Some things to watch out for: If you are deploying this in an environment that already has VPCs, you may generate an error if you exceed the number of VPCs within your AWS Account (by default, this is set to 5, but this is a soft limit which you can request an increase for). Make sure that the Amazon S3 bucket you define for your MWAA environment does not exist before running the CDK app Getting Started Make sure we are running the correct version of the AWS CDKv2 tool (at least v2.2) and then check out the git repo. Shell cdk --version > 2.28.1 (build d035432) git clone https://github.com/094459/blogpost-cdk-mwaa.git After checking out the repository you will have the following files on your local developer environment. Plain Text ├── app.py ├── cdk.json ├── dags │ ├── sample-cdk-dag-od.py │ └── sample-cdk-dag.py ├── mwaa_cdk │ ├── mwaa_cdk_backend.py │ └── mwaa_cdk_env.py └── requirements.txt The first thing we need to do is update our Python dependencies which are documented in the requirements.txt file. Note! If you are currently using in the process of moving between AWS CDKv1 and v2, then you should check out this blog post to help you prepare for this as the steps that follow may fail. Shell pip install -r requirements.txt Exploring the CDK Stack Our AWS CDK application consists of a number of files. The entry point to our application is the app.py file, where we define the structure and resources we are going to build. We then have two CDK stacks that deploy and configure AWS resources. Finally, we have resources that we deploy to our target Apache Airflow environment. If we take a look at the app.py file, we can see explore our CDK application in more detail. We are creating two stacks, one called mwaa_cdk_backend and the other called mwaa_cdk_env. The mwaa_cdk_backend will be used to set up the VPC network that the MWAA environment is going to use. The mwaa_cdk_env is the stack that will configure your MWAA environment. In order to do both though, first, we set up some configuration parameters so that we can maximise the re-use of this CDK application Python import aws_cdk as cdk from mwaa_cdk.mwaa_cdk_backend import MwaaCdkStackBackend from mwaa_cdk.mwaa_cdk_env import MwaaCdkStackEnv env_EU=cdk.Environment(region="{your-aws-region}", account="{your-aws-ac}") mwaa_props = {'dagss3location': '{your-unqiue-s3-bucket}','mwaa_env' : '{name-of-your-mwaa-env}'} app = cdk.App() mwaa_hybrid_backend = MwaaCdkStackBackend( scope=app, id="mwaa-hybrid-backend", env=env_EU, mwaa_props=mwaa_props ) mwaa_hybrid_env = MwaaCdkStackEnv( scope=app, id="mwaa-hybrid-environment", vpc=mwaa_hybrid_backend.vpc, env=env_EU, mwaa_props=mwaa_props ) app.synth() We define configuration parameters in the env_EU and mwaa_props lines. This will allow you to re-use this stack to create multiple different environments. You can also add/change the variables in mwaa_props if you wanted to make other configuration options changeable via a configuration property (for example, logging verbosity or perhaps the version of Apache Airflow) After changing the values in the app.py file and saving, we are ready to deploy. mwaa_cdk_backend There is nothing particularly interesting about this other than it creates the underlying network infrastructure that MWAA needs. There is nothing you need to do, but if you do want to experiment, then what I would say is that a) ensure you read and follow the networking guidance on the MWAA documentation site, as they provide you with details on what needs to be set up, b) if you are trying to lock down the networking, try just deploying the backend stack, and then manually creating an MWAA environment to see if it works/fails. Python from aws_cdk import ( aws_iam as iam, aws_ec2 as ec2, Stack, CfnOutput ) from constructs import Construct class MwaaCdkStackBackend(Stack): def __init__(self, scope: Construct, id: str, mwaa_props, **kwargs) -> None: super().__init__(scope, id, **kwargs) # Create VPC network self.vpc = ec2.Vpc( self, id="MWAA-Hybrid-ApacheAirflow-VPC", cidr="10.192.0.0/16", max_azs=2, nat_gateways=1, subnet_configuration=[ ec2.SubnetConfiguration( name="public", cidr_mask=24, reserved=False, subnet_type=ec2.SubnetType.PUBLIC), ec2.SubnetConfiguration( name="private", cidr_mask=24, reserved=False, subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT) ], enable_dns_hostnames=True, enable_dns_support=True ) CfnOutput( self, id="VPCId", value=self.vpc.vpc_id, description="VPC ID", export_name=f"{self.region}:{self.account}:{self.stack_name}:vpc-id" ) We can see that once this stack has deployed, it will output the VPC details via the console as well as via the AWS CloudFormation Output tab. mwaa_cdk_env The MWAA environment stack is a little more interesting and I will break it down. The first part of the stack configures the Amazon S3 buckets that MWAA will use. Python from aws_cdk import ( aws_iam as iam, aws_ec2 as ec2, aws_s3 as s3, aws_s3_deployment as s3deploy, aws_mwaa as mwaa, aws_kms as kms, Stack, CfnOutput, Tags ) from constructs import Construct class MwaaCdkStackEnv(Stack): def __init__(self, scope: Construct, id: str, vpc, mwaa_props, **kwargs) -> None: super().__init__(scope, id, **kwargs) key_suffix = 'Key' # Create MWAA S3 Bucket and upload local dags s3_tags = { 'env': f"{mwaa_props['mwaa_env']}", 'service': 'MWAA Apache AirFlow' } dags_bucket = s3.Bucket( self, "mwaa-dags", bucket_name=f"{mwaa_props['dagss3location'].lower()}", versioned=True, block_public_access=s3.BlockPublicAccess.BLOCK_ALL ) for tag in s3_tags: Tags.of(dags_bucket).add(tag, s3_tags[tag]) s3deploy.BucketDeployment(self, "DeployDAG", sources=[s3deploy.Source.asset("./dags")], destination_bucket=dags_bucket, destination_key_prefix="dags", prune=False, retain_on_delete=False ) dags_bucket_arn = dags_bucket.bucket_arn What this also does, however, is it takes all the files it finds in the local dags folder (in this particular example, and what is in the GitHub repo, this will be two DAGs, sample-cdk-dag-od.py and sample-cdk-dag.py) and uploads those as part of the deployment process. You can tweak this to your own requirements if you want, and even comment it out/remove it as needed if you do not need to do this. Next up we have the code that creates the MWAA execution policy and the associated role that will be used by the MWAA worker nodes. This is taken from the MWAA documentation, but you can adjust it as needed for your own environment. You might need to do this if you are integrating with other AWS services — this has been set up with default none access, so anything you need to do will need to be added. Python mwaa_policy_document = iam.PolicyDocument( statements=[ iam.PolicyStatement( actions=["airflow:PublishMetrics"], effect=iam.Effect.ALLOW, resources=[f"arn:aws:airflow:{self.region}:{self.account}:environment/{mwaa_props['mwaa_env']}"], ), iam.PolicyStatement( actions=[ "s3:ListAllMyBuckets" ], effect=iam.Effect.DENY, resources=[ f"{dags_bucket_arn}/*", f"{dags_bucket_arn}" ], ), iam.PolicyStatement( actions=[ "s3:*" ], effect=iam.Effect.ALLOW, resources=[ f"{dags_bucket_arn}/*", f"{dags_bucket_arn}" ], ), iam.PolicyStatement( actions=[ "logs:CreateLogStream", "logs:CreateLogGroup", "logs:PutLogEvents", "logs:GetLogEvents", "logs:GetLogRecord", "logs:GetLogGroupFields", "logs:GetQueryResults", "logs:DescribeLogGroups" ], effect=iam.Effect.ALLOW, resources=[f"arn:aws:logs:{self.region}:{self.account}:log-group:airflow-{mwaa_props['mwaa_env']}-*"], ), iam.PolicyStatement( actions=[ "logs:DescribeLogGroups" ], effect=iam.Effect.ALLOW, resources=["*"], ), iam.PolicyStatement( actions=[ "sqs:ChangeMessageVisibility", "sqs:DeleteMessage", "sqs:GetQueueAttributes", "sqs:GetQueueUrl", "sqs:ReceiveMessage", "sqs:SendMessage" ], effect=iam.Effect.ALLOW, resources=[f"arn:aws:sqs:{self.region}:*:airflow-celery-*"], ), iam.PolicyStatement( actions=[ "ecs:RunTask", "ecs:DescribeTasks", "ecs:RegisterTaskDefinition", "ecs:DescribeTaskDefinition", "ecs:ListTasks" ], effect=iam.Effect.ALLOW, resources=[ "*" ], ), iam.PolicyStatement( actions=[ "iam:PassRole" ], effect=iam.Effect.ALLOW, resources=[ "*" ], conditions= { "StringLike": { "iam:PassedToService": "ecs-tasks.amazonaws.com" } }, ), iam.PolicyStatement( actions=[ "kms:Decrypt", "kms:DescribeKey", "kms:GenerateDataKey*", "kms:Encrypt", "kms:PutKeyPolicy" ], effect=iam.Effect.ALLOW, resources=["*"], conditions={ "StringEquals": { "kms:ViaService": [ f"sqs.{self.region}.amazonaws.com", f"s3.{self.region}.amazonaws.com", ] } }, ), ] ) mwaa_service_role = iam.Role( self, "mwaa-service-role", assumed_by=iam.CompositePrincipal( iam.ServicePrincipal("airflow.amazonaws.com"), iam.ServicePrincipal("airflow-env.amazonaws.com"), iam.ServicePrincipal("ecs-tasks.amazonaws.com"), ), inline_policies={"CDKmwaaPolicyDocument": mwaa_policy_document}, path="/service-role/" ) The next part configures the security group and subnets needed by MWAA. Python security_group = ec2.SecurityGroup( self, id = "mwaa-sg", vpc = vpc, security_group_name = "mwaa-sg" ) security_group_id = security_group.security_group_id security_group.connections.allow_internally(ec2.Port.all_traffic(),"MWAA") subnets = [subnet.subnet_id for subnet in vpc.private_subnets] network_configuration = mwaa.CfnEnvironment.NetworkConfigurationProperty( security_group_ids=[security_group_id], subnet_ids=subnets, ) The final part is the most interesting from the MWAA perspective, which is setting up and then configuring the environment. I have commented some of the environment settings out, so feel free to adjust for your own needs. The first thing we do is create a configuration for the MWAA logging. In this particular configuration, I have enabled everything with INFO level logging so feel free to enable/disable or change the logging level as you need. Python logging_configuration = mwaa.CfnEnvironment.LoggingConfigurationProperty( dag_processing_logs=mwaa.CfnEnvironment.ModuleLoggingConfigurationProperty( enabled=True, log_level="INFO" ), task_logs=mwaa.CfnEnvironment.ModuleLoggingConfigurationProperty( enabled=True, log_level="INFO" ), worker_logs=mwaa.CfnEnvironment.ModuleLoggingConfigurationProperty( enabled=True, log_level="INFO" ), scheduler_logs=mwaa.CfnEnvironment.ModuleLoggingConfigurationProperty( enabled=True, log_level="INFO" ), webserver_logs=mwaa.CfnEnvironment.ModuleLoggingConfigurationProperty( enabled=True, log_level="INFO" ) ) Next up we define some MWAA Apache Airflow configuration parameters. If you use custom properties, then this is where you will add them. Also, if you want to use TAGs for your MWAA environment, you can adjust accordingly. Python options = { 'core.load_default_connections': False, 'core.load_examples': False, 'webserver.dag_default_view': 'tree', 'webserver.dag_orientation': 'TB' } tags = { 'env': f"{mwaa_props['mwaa_env']}", 'service': 'MWAA Apache AirFlow' } Next, we need to create some additional IAM policies and permissions as well as an AWS KMS encryption key to keep everything encrypted. This part is optional if you decide to not configure KMS encryption when configuring your MWAA environment, but I have included the info here. Python kms_mwaa_policy_document = iam.PolicyDocument( statements=[ iam.PolicyStatement( actions=[ "kms:Create*", "kms:Describe*", "kms:Enable*", "kms:List*", "kms:Put*", "kms:Decrypt*", "kms:Update*", "kms:Revoke*", "kms:Disable*", "kms:Get*", "kms:Delete*", "kms:ScheduleKeyDeletion", "kms:GenerateDataKey*", "kms:CancelKeyDeletion" ], principals=[ iam.AccountRootPrincipal(), # Optional: # iam.ArnPrincipal(f"arn:aws:sts::{self.account}:assumed-role/AWSReservedSSO_rest_of_SSO_account"), ], resources=["*"]), iam.PolicyStatement( actions=[ "kms:Decrypt*", "kms:Describe*", "kms:GenerateDataKey*", "kms:Encrypt*", "kms:ReEncrypt*", "kms:PutKeyPolicy" ], effect=iam.Effect.ALLOW, resources=["*"], principals=[iam.ServicePrincipal("logs.amazonaws.com", region=f"{self.region}")], conditions={"ArnLike": {"kms:EncryptionContext:aws:logs:arn": f"arn:aws:logs:{self.region}:{self.account}:*"}, ), ] ) key = kms.Key( self, f"{mwaa_props['mwaa_env']}{key_suffix}", enable_key_rotation=True, policy=kms_mwaa_policy_document ) key.add_alias(f"alias/{mwaa_props['mwaa_env']}{key_suffix}") Now we come to actually creating the environment, using the stuff we have created or set up above. The following represents all the typical configuration options for the core Apache Airflow options within MWAA. You can change them to suit your own environment or parameterise them as mentioned above. Python managed_airflow = mwaa.CfnEnvironment( scope=self, id='airflow-test-environment', name=f"{mwaa_props['mwaa_env']}", airflow_configuration_options={'core.default_timezone': 'utc'}, airflow_version='2.0.2', dag_s3_path="dags", environment_class='mw1.small', execution_role_arn=mwaa_service_role.role_arn, kms_key=key.key_arn, logging_configuration=logging_configuration, max_workers=5, network_configuration=network_configuration, #plugins_s3_object_version=None, #plugins_s3_path=None, #requirements_s3_object_version=None, #requirements_s3_path=None, source_bucket_arn=dags_bucket_arn, webserver_access_mode='PUBLIC_ONLY', #weekly_maintenance_window_start=None ) managed_airflow.add_override('Properties.AirflowConfigurationOptions', options) managed_airflow.add_override('Properties.Tags', tags) CfnOutput( self, id="MWAASecurityGroup", value=security_group_id, description="Security Group name used by MWAA" ) This stack also outputs the MWAA security group, but you could export other information as well. Deploying Your CDK Application Now that we have reviewed the app, and modified it so that it contains your details (your AWS account/unique S3 bucket/etc), you can now run the app and deploy the CDK stacks. To do this we use the "cdk deploy" command. First of all, from the directory, make sure everything is working ok. To do this we can use the "cdk ls" command. It should return the following (which are the ids assigned in the stacks that this CDK application uses) if it is working ok. Shell cdk ls >MWAA-Backend >MWAA-Environment We can now deploy them, either altogether or one at a time. This CDK application needs the MWAA-Backend app deployed first as it contains the VPC networking that will be used in the MWAA-Environment stack, so we can deploy that by: Shell cdk deploy MWAA-Backend And if it is working ok, it should look similar to the following: Plain Text ✨ Synthesis time: 7.09s mwaa-hybrid-backend: deploying... [0%] start: Publishing 2695cb7a9f601cf94a4151c65c9069787d9ec312084346f2f4359e3f55ff2310:704533066374-eu-central-1 [100%] success: Published 2695cb7a9f601cf94a4151c65c9069787d9ec312084346f2f4359e3f55ff2310:704533066374-eu-central-1 mwaa-hybrid-backend: creating CloudFormation changeset... ✅ mwaa-hybrid-backend ✨ Deployment time: 172.13s Outputs: mwaa-hybrid-backend.ExportsOutputRefMWAAHybridApacheAirflowVPC677B092EF6F2F587 = vpc-0bbdeee3652ef21ff mwaa-hybrid-backend.ExportsOutputRefMWAAHybridApacheAirflowVPCprivateSubnet1Subnet2A6995DF7F8D3134 = subnet-01e48db64381efc7f mwaa-hybrid-backend.ExportsOutputRefMWAAHybridApacheAirflowVPCprivateSubnet2SubnetA28659530C36370A = subnet-0321530b8154f9bd2 mwaa-hybrid-backend.VPCId = vpc-0bbdeee3652ef21ff Stack ARN: arn:aws:cloudformation:eu-central-1:704533066374:stack/mwaa-hybrid-backend/b05897d0-f087-11ec-b5f3-02db3f47a5ca ✨ Total time: 179.22s You can then track/view what has been deployed by checking the CloudFormation stack via the AWS Console. We can now deploy the MWAA environment, which we can do simply by typing: Shell cdk deploy MWAA-Environment This time, it will pop up details about some of the security-related information, in this case the IAM policies and security groups that I mentioned earlier. Answer "Y" to deploy these changes. This will kick off the deployment which you can track by going to the CloudFormation console. This will take approx 20-25 minutes, so a good time to grab a cup of tea and read some of my other blog posts perhaps :-) If it has been successful, you will see the following output (again, your details will change but it should look similar to this): Plain Text Including dependency stacks: mwaa-hybrid-backend [Warning at /mwaa-hybrid-environment/mwaa-sg] Ignoring Egress rule since 'allowAllOutbound' is set to true; To add customize rules, set allowAllOutbound=false on the SecurityGroup ✨ Synthesis time: 12.37s mwaa-hybrid-backend mwaa-hybrid-backend: deploying... [0%] start: Publishing 2695cb7a9f601cf94a4151c65c9069787d9ec312084346f2f4359e3f55ff2310:704533066374-eu-central-1 [100%] success: Published 2695cb7a9f601cf94a4151c65c9069787d9ec312084346f2f4359e3f55ff2310:704533066374-eu-central-1 ✅ mwaa-hybrid-backend (no changes) ✨ Deployment time: 1.97s Outputs: mwaa-hybrid-backend.ExportsOutputRefMWAAHybridApacheAirflowVPC677B092EF6F2F587 = vpc-0bbdeee3652ef21ff mwaa-hybrid-backend.ExportsOutputRefMWAAHybridApacheAirflowVPCprivateSubnet1Subnet2A6995DF7F8D3134 = subnet-01e48db64381efc7f mwaa-hybrid-backend.ExportsOutputRefMWAAHybridApacheAirflowVPCprivateSubnet2SubnetA28659530C36370A = subnet-0321530b8154f9bd2 mwaa-hybrid-backend.VPCId = vpc-0bbdeee3652ef21ff Stack ARN: arn:aws:cloudformation:eu-central-1:704533066374:stack/mwaa-hybrid-backend/b05897d0-f087-11ec-b5f3-02db3f47a5ca ✨ Total time: 14.35s mwaa-hybrid-environment This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening). Please confirm you intend to make the following modifications: IAM Statement Changes ┌───┬──────────────────────────────┬────────┬──────────────────────────────┬──────────────────────────────┬─────────────────────────────────┐ │ │ Resource │ Effect │ Action │ Principal │ Condition │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ ${Custom::CDKBucketDeploymen │ Allow │ sts:AssumeRole │ Service:lambda.amazonaws.com │ │ │ │ t8693BB64968944B69AAFB0CC9EB │ │ │ │ │ │ │ 8756C/ServiceRole.Arn} │ │ │ │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ ${mwaa-dags.Arn} │ Deny │ s3:ListAllMyBuckets │ AWS:${mwaa-service-role} │ │ │ │ ${mwaa-dags.Arn}/* │ │ │ │ │ │ + │ ${mwaa-dags.Arn} │ Allow │ s3:* │ AWS:${mwaa-service-role} │ │ │ │ ${mwaa-dags.Arn}/* │ │ │ │ │ │ + │ ${mwaa-dags.Arn} │ Allow │ s3:Abort* │ AWS:${Custom::CDKBucketDeplo │ │ │ │ ${mwaa-dags.Arn}/* │ │ s3:DeleteObject* │ yment8693BB64968944B69AAFB0C │ │ │ │ │ │ s3:GetBucket* │ C9EB8756C/ServiceRole} │ │ │ │ │ │ s3:GetObject* │ │ │ │ │ │ │ s3:List* │ │ │ │ │ │ │ s3:PutObject │ │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ ${mwaa-hybrid-demoKey.Arn} │ Allow │ kms:CancelKeyDeletion │ AWS:arn:${AWS::Partition}:ia │ │ │ │ │ │ kms:Create* │ m::704533066374:root │ │ │ │ │ │ kms:Decrypt* │ │ │ │ │ │ │ kms:Delete* │ │ │ │ │ │ │ kms:Describe* │ │ │ │ │ │ │ kms:Disable* │ │ │ │ │ │ │ kms:Enable* │ │ │ │ │ │ │ kms:GenerateDataKey* │ │ │ │ │ │ │ kms:Get* │ │ │ │ │ │ │ kms:List* │ │ │ │ │ │ │ kms:Put* │ │ │ │ │ │ │ kms:Revoke* │ │ │ │ │ │ │ kms:ScheduleKeyDeletion │ │ │ │ │ │ │ kms:Update* │ │ │ │ + │ ${mwaa-hybrid-demoKey.Arn} │ Allow │ kms:Decrypt* │ Service:logs.eu-central-1.am │ "ArnLike": { │ │ │ │ │ kms:Describe* │ azonaws.com │ "kms:EncryptionContext:aws:lo │ │ │ │ │ kms:Encrypt* │ │ gs:arn": "arn:aws:logs:eu-centr │ │ │ │ │ kms:GenerateDataKey* │ │ al-1:704533066374:*" │ │ │ │ │ kms:PutKeyPolicy │ │ } │ │ │ │ │ kms:ReEncrypt* │ │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ ${mwaa-service-role.Arn} │ Allow │ sts:AssumeRole │ Service:airflow-env.amazonaw │ │ │ │ │ │ │ s.com │ │ │ │ │ │ │ Service:airflow.amazonaws.co │ │ │ │ │ │ │ m │ │ │ │ │ │ │ Service:ecs-tasks.amazonaws. │ │ │ │ │ │ │ com │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ * │ Allow │ logs:DescribeLogGroups │ AWS:${mwaa-service-role} │ │ │ + │ * │ Allow │ ecs:DescribeTaskDefinition │ AWS:${mwaa-service-role} │ │ │ │ │ │ ecs:DescribeTasks │ │ │ │ │ │ │ ecs:ListTasks │ │ │ │ │ │ │ ecs:RegisterTaskDefinition │ │ │ │ │ │ │ ecs:RunTask │ │ │ │ + │ * │ Allow │ iam:PassRole │ AWS:${mwaa-service-role} │ "StringLike": { │ │ │ │ │ │ │ "iam:PassedToService": "ecs-t │ │ │ │ │ │ │ asks.amazonaws.com" │ │ │ │ │ │ │ } │ │ + │ * │ Allow │ kms:Decrypt │ AWS:${mwaa-service-role} │ "StringEquals": { │ │ │ │ │ kms:DescribeKey │ │ "kms:ViaService": [ │ │ │ │ │ kms:Encrypt │ │ "sqs.eu-central-1.amazonaws │ │ │ │ │ kms:GenerateDataKey* │ │ .com", │ │ │ │ │ kms:PutKeyPolicy │ │ "s3.eu-central-1.amazonaws. │ │ │ │ │ │ │ com" │ │ │ │ │ │ │ ] │ │ │ │ │ │ │ } │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ arn:${AWS::Partition}:s3:::c │ Allow │ s3:GetBucket* │ AWS:${Custom::CDKBucketDeplo │ │ │ │ dk-hnb659fds-assets-70453306 │ │ s3:GetObject* │ yment8693BB64968944B69AAFB0C │ │ │ │ 6374-eu-central-1 │ │ s3:List* │ C9EB8756C/ServiceRole} │ │ │ │ arn:${AWS::Partition}:s3:::c │ │ │ │ │ │ │ dk-hnb659fds-assets-70453306 │ │ │ │ │ │ │ 6374-eu-central-1/* │ │ │ │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ arn:aws:airflow:eu-central-1 │ Allow │ airflow:PublishMetrics │ AWS:${mwaa-service-role} │ │ │ │ :704533066374:environment/mw │ │ │ │ │ │ │ aa-hybrid-demo │ │ │ │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ arn:aws:logs:eu-central-1:70 │ Allow │ logs:CreateLogGroup │ AWS:${mwaa-service-role} │ │ │ │ 4533066374:log-group:airflow │ │ logs:CreateLogStream │ │ │ │ │ -mwaa-hybrid-demo-* │ │ logs:DescribeLogGroups │ │ │ │ │ │ │ logs:GetLogEvents │ │ │ │ │ │ │ logs:GetLogGroupFields │ │ │ │ │ │ │ logs:GetLogRecord │ │ │ │ │ │ │ logs:GetQueryResults │ │ │ │ │ │ │ logs:PutLogEvents │ │ │ ├───┼──────────────────────────────┼────────┼──────────────────────────────┼──────────────────────────────┼─────────────────────────────────┤ │ + │ arn:aws:sqs:eu-central-1:*:a │ Allow │ sqs:ChangeMessageVisibility │ AWS:${mwaa-service-role} │ │ │ │ irflow-celery-* │ │ sqs:DeleteMessage │ │ │ │ │ │ │ sqs:GetQueueAttributes │ │ │ │ │ │ │ sqs:GetQueueUrl │ │ │ │ │ │ │ sqs:ReceiveMessage │ │ │ │ │ │ │ sqs:SendMessage │ │ │ └───┴──────────────────────────────┴────────┴──────────────────────────────┴──────────────────────────────┴─────────────────────────────────┘ IAM Policy Changes ┌───┬───────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────┐ │ │ Resource │ Managed Policy ARN │ ├───┼───────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ + │ ${Custom::CDKBucketDeployment8693BB64968944B69AAFB0CC9EB8756C/Ser │ arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasic │ │ │ viceRole} │ ExecutionRole │ └───┴───────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘ Security Group Changes ┌───┬────────────────────┬─────┬────────────┬────────────────────┐ │ │ Group │ Dir │ Protocol │ Peer │ ├───┼────────────────────┼─────┼────────────┼────────────────────┤ │ + │ ${mwaa-sg.GroupId} │ In │ Everything │ ${mwaa-sg.GroupId} │ │ + │ ${mwaa-sg.GroupId} │ Out │ Everything │ Everyone (IPv4) │ └───┴────────────────────┴─────┴────────────┴────────────────────┘ (NOTE: There may be security-related changes not in this list. See https://github.com/aws/aws-cdk/issues/1299) Do you wish to deploy these changes (y/n)? y mwaa-hybrid-environment: deploying... [0%] start: Publishing e9882ab123687399f934da0d45effe675ecc8ce13b40cb946f3e1d6141fe8d68:704533066374-eu-central-1 [0%] start: Publishing 983c442a2fe823a8b4ebb18d241a5150ae15103dacbf3f038c7c6343e565aa4c:704533066374-eu-central-1 [0%] start: Publishing 91ab667f7c88c3b87cf958b7ef4158ef85fb9ba8bd198e5e0e901bb7f904d560:704533066374-eu-central-1 [0%] start: Publishing f2a926ee3d8ca4bd02b0cf073eb2bbb682e94c021925bf971a9730045ef4fb02:704533066374-eu-central-1 [25%] success: Published 983c442a2fe823a8b4ebb18d241a5150ae15103dacbf3f038c7c6343e565aa4c:704533066374-eu-central-1 [50%] success: Published 91ab667f7c88c3b87cf958b7ef4158ef85fb9ba8bd198e5e0e901bb7f904d560:704533066374-eu-central-1 [75%] success: Published f2a926ee3d8ca4bd02b0cf073eb2bbb682e94c021925bf971a9730045ef4fb02:704533066374-eu-central-1 [100%] success: Published e9882ab123687399f934da0d45effe675ecc8ce13b40cb946f3e1d6141fe8d68:704533066374-eu-central-1 mwaa-hybrid-environment: creating CloudFormation changeset... ✅ mwaa-hybrid-environment ✨ Deployment time: 1412.35s Outputs: mwaa-hybrid-environment.MWAASecurityGroup = sg-0ea83e01caded2bb3 Stack ARN: arn:aws:cloudformation:eu-central-1:704533066374:stack/mwaa-hybrid-environment/450337a0-f088-11ec-a169-06ba63bfdfb2 ✨ Total time: 1424.72s Testing the Environment If we take a look at the Amazon S3 bucket we can see we have our MWAA bucket and dags folder created, as well as our local DAGs uploaded. If we go to the MWAA console, we can see our environment We can now grab the URL for this environment, either by getting it from the console or by using the AWS CLI. Just substitute the name of the MWAA environment and AWS region, and it should then give you the URL you can use in your browser (although you will have to append /home to it) Note I am using jq, if you do not have this in your environment, you can run the command without this but just need to find the entry in the output where it says "WebserverUrl" Shell aws mwaa get-environment --name {name of the environment created} --region={region} | jq -r '.Environment | .WebserverUrl' And as we can see, we have the two sample DAGS that were in the local folder, and are now available for us in the MWAA environment. Removing/Cleaning up our MWAA environment In order to remove everything we have deployed, all we need to do is: Shell cdk destroy MWAA-Environment It will take 20-30 minutes to clean up the MWAA environment. One thing that it will not do, however, is remove the Amazon S3 bucket we set up, so you will need to manually delete that via the console (or use the AWS CLI — that would be my approach). Once you have removed that S3 bucket, now clean up the backend stack Shell cdk destroy MWAA-Backend This should be much quicker to clean up. Once finished, you should be done. What's Next? That's all folks, I hope this has been helpful. Please let me know if you find this how-to guide useful, and if you run into any issues, please log an issue in GitHub and I will take a look. More
Improving Performance of Serverless Java Applications on AWS
Improving Performance of Serverless Java Applications on AWS
By Rashmi Nambiar
Architect Data Analytics Engine for Your Business Using AWS and Serverless
Architect Data Analytics Engine for Your Business Using AWS and Serverless
By Rashmi Nambiar
Unleash Developer Productivity With Infrastructure From Code [Video]
Unleash Developer Productivity With Infrastructure From Code [Video]
By Rashmi Nambiar
Automate and Manage AWS Outposts Capacity Across Multi-Account AWS Setup [Video]
Automate and Manage AWS Outposts Capacity Across Multi-Account AWS Setup [Video]

This is a recording of the breakout session led by AWS Hero Margaret Valtierra at AWS re:Invent 2022, Las Vegas. Posted with permission. Curious how, for mere dollars a month and minimal upkeep, you can centrally track and manage Outposts capacity across multiple AWS accounts? In this session, we’ll show a unique solution implemented at Morningstar by the Cloud Services team to do just that. We'll walk through how we arrived at the architecture of the solution that uses lambdas, DynamoDB, CloudWatch, S3, and a custom API to track capacity and block users from overspending their quota.

By Rashmi Nambiar
Competition of the Modern Workloads: Serverless vs Kubernetes on AWS [Video]
Competition of the Modern Workloads: Serverless vs Kubernetes on AWS [Video]

This is a recording of breakout sessions from AWS Heroes at re:Invent 2022. Posted with permission. Both serverless and Kubernetes have benefits for your operational production environments, but how do you choose? In this video session, we will have a “battle” between the serverless and the Kubernetes approach examining use cases and insights from each speaker's experience. After an overview of each architecture and the AWS services that are a part of it like databases, queues, and more, we will compare: Maintenance and compliance Scaling Developer experience Cost Monitoring and logging Ecosystem For each category, we will show the advantages and disadvantages of each architecture side by side with the audience voting on who wins each round.

By Rashmi Nambiar
Bringing Software Engineering Rigor to Data [Video]
Bringing Software Engineering Rigor to Data [Video]

This is a recording of a breakout session from AWS Heroes at re:Invent 2022, presented by AWS Hero Zainab Maleki. Posted with permission. In software engineering, we've learned that building robust and stable applications has a direct correlation with overall organization performance. The data community is striving to incorporate the core concepts of engineering rigor found in software communities but still has further to go. This talk covers ways to leverage software engineering practices for data engineering and demonstrates how measuring key performance metrics could help build more robust and reliable data pipelines. This is achieved through practices like Infrastructure as Code for deployments, automated testing, application observability, and end-to-end application lifecycle ownership.

By Rashmi Nambiar
Detect and Resolve Biases in Artificial Intelligence [Video]
Detect and Resolve Biases in Artificial Intelligence [Video]

This presentation is of the on-demand session by AWS Hero Virginie Mathivet at AWS re:Invent 2022 Las Vegas. Posted with permission. Many applications get a bad buzz on the internet because of biases and discriminations in the models, but how can we avoid them? This talk presents the problem of biases, but also how to detect and fight them with cloud-agnostic solutions. We will examine solutions for biases at both the dataset level with statistical indicators, and at the model level thanks to eXplainable AI (XAI) algorithms.

By Rashmi Nambiar
Decode User Requirements to Design Well-Architected Applications
Decode User Requirements to Design Well-Architected Applications

This article was authored by Veliswa Boya & Jason Nicholls and published with permission. In his book “War and Peace and IT,” Enterprise Strategist at AWS Mark Schwartz says that it’s time for the business-IT wall to come down. Old business models and stereotypes have long pitted “suits” against “nerds." He further goes on to say that it’s time to foster a space of collaboration and a shared mission - a space that puts technologists and business people on the same team. The question is: how do we ensure the success of this collaboration when the two don’t even speak the same language? Business and technical professionals oftentimes do not understand each other’s terminology. Each discipline can have a different meaning for the same words. In this reading, we will review how you as a technologist - here specifically referring to you, the architect - can listen and understand your key stakeholders and collaborate with them to understand what is most important. We will discuss how to decode some of the business “speak“ into a language that you can understand, therefore helping you understand what’s required by the business. We will also review how the AWS Well-Architected Framework can then help with determining the most important architectural considerations for ultimately a well-architected application that meets the business requirements. Misunderstood business requirements (domain concerns) are costly. What can start as a remark by a business can end up implemented in the software that makes its way into production. How many of us have built a solution that turned out to not be what the customer wanted, all because we didn’t understand the requirement in the first place? It may be because of tight timelines, but we didn’t spend time on understanding what the requirements really are! Misunderstood business requirements result in the following: Building the wrong product Wasting time and money building to a misunderstood specification Safety issues Delivery delays Failure to meet expectations, which results in loss of trust First, the Core Expectations of an Architect Earlier we talked about architects listening to key stakeholders and understanding their requirements. What else is expected from an architect? We won’t define the role of an architect, as this can be hard to do. We will however focus on 8 core expectations of an architect. In their book “Fundamentals of Software Architecture: An Engineering Approach," co-authors Mark Richard and Neal Ford recommend that we focus on the following expectations as far as the role of an architect is concerned: Understand and navigate politics Possess interpersonal skills Have business domain experience Diverse exposure and experience Ensure compliance with decisions Keep current with the latest trends Continually analyze the architecture Make architecture decisions Architecture Characteristics Architecture characteristics are concerns that are critical to the success of the architecture, and therefore critical to the success of the system as a whole. During the business requirements definition, the business will specify domain functionality – which is commonly called “functional requirements” - together with the architecture characteristics. The functional requirements, for the most part, tend to be clear as they define what the application must do >> the business logic. They influence some structural aspects of the architecture and are critical to the success of the application. Architecture characteristics are where the misunderstanding tends to come in. There are many more Architecture characteristics: there is no fixed list, but there is a standard ISO 25000. Architectural characteristics are not functional. Architecture characteristics can be explicit (expressed in the requirements doc) or implicit (not expressed) and we will focus on the characteristics below for the rest of the post: Auditability Scalability Availability Performance Security Legality In fact, there is alignment between architecture characteristics and the pillars of the AWS Well-Architected Framework – but more on this later. How do we translate domain concerns to architectural characteristics? “Architects need to practice being architects, just as developers need a chance to practice being developers.” Ted Neward, who started Architectural Katas, had the idea that architects need practice just as much as developers need practice — and so the Architectural Katas were started, inspired by the Code Katas! And what is a kata? The idea comes from martial arts/karate. A kata is an exercise in karate where you repeat a form many, many times, making little improvements each time. Who knows of the infamous FizzBuzz code kata? There is a GitHub project that has collected a list of some kata exercises that are found on the Internet and the GitHub community. The idea with both the Code Kata and the Architecture Kata is to create a safe environment for one to practice and fail over and over again, learn, and gain experience. Architecture katas are not designed to have you come up with perfect architecture, but to train you towards how to come up with solutions — and provide you with a space to fail. Architectural Katas Katas are essentially group exercises where you work with peers (in this case, your project team) to arrive at the best architecture possible — you get feedback from each other, iterate, and improve on the initial architecture with each iteration. The project team meets for a while and discovers requirements that aren’t in the original proposal by asking questions of the “customer,” who is usually the moderator during an architecture kata. It is at this phase that the implicit requirements are uncovered, and for any other questions that are not already covered by the requirements, you may ask the Moderator. The phases that form part of an architectural kata are: Preparation Phase Discussion Phase Peer Review Phase Voting Phase Prepare During the Preparation Phase, the project team is assembled, usually by the Moderator who will also be a facilitator of the kata. The Moderator is the customer or anyone who’s best placed to answer questions that the project team will have. Discuss During this phase, the project team figures out what they will be building. The team examines the requirements for the kata as given, and works out a rough vision of what the project's architecture will look like. The team will ask the Moderator any questions you have about the project. It is also worth noting that at this phase, any technology is fair game as customers tend to not really care, most of the time, what kind of technology is used. It is therefore not necessary during this phase, to place too much focus on the technology that will be used to build the application. Peer Review During this phase, the project team will present the architecture led by the architect. It is also at this phase that all questions from the rest of the project team (especially the customer) will be answered. Vote Lastly, the entire team votes for the architecture that has just been presented, and this vote will determine whether there is another iteration or not. Identifying Architecture Characteristics What architectural characteristics can you derive from this — based on these requirements? What can you derive as far as system availability, security, and performance are concerned? Each part of the requirement might contribute to one or more aspects of the architecture (and any may not). Here the architect will look for factors that influence or impact the design — particularly structural factors. First, separate the candidate architecture characteristics into explicit and implicit characteristics. One of the first details that should catch the architect’s eye is the number of users — currently thousands, and perhaps one-day, millions! This means that we need to design for scalability in order to be able to handle a large number of concurrent users without serious performance degradation. This will be one of the top architectural characteristics. Notice that the problem statement didn’t ask explicitly for scalability but rather expressed the requirement as an expected number of users. This is an example of architects decoding domain language (business requirements) into engineering equivalents! There are many other characteristics here: can you identify elasticity or the ability to handle bursts of customers during promotions and specials? AWS Well-Architected Framework Earlier on we talked about how most of the architecture characteristics align with the pillars of the AWS Well-Architected Framework. Once you’ve iterated to arrive at the architecture characteristics and have designed the architecture, you would then put that through the Well-Architected Framework review to ensure that it is according to the architecture characteristics that you uncovered when you iterated during the Architecture Kata. We will now dive deep into this and see how and then look into why we are paying close attention to this alignment. The AWS Well-Architected Framework helps cloud architects build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Built around six pillars — operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability — AWS Well-Architected provides a consistent approach for customers and partners to evaluate architectures and implement scalable designs. The AWS Well-Architected Framework describes key concepts, design principles, and architectural best practices for designing and running workloads in the cloud. By answering a few foundational questions, learn how well your architecture aligns with cloud best practices and gain guidance for making improvements. The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make while building systems on AWS. By using the Framework, you learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. The Framework provides a way for you to consistently measure your architectures against best practices and identify areas for improvement. We believe that having well-architected systems greatly increases your security, reliability, and the likelihood of business success. Operational Excellence The Operational Excellence pillar focuses on running and monitoring systems and continually improving processes and procedures. Key topics include automating changes, responding to events, and defining standards to manage daily operations. Security The Security pillar focuses on protecting information and systems. Key topics include confidentiality and integrity of data, managing user permissions, and establishing controls to detect security events. Reliability The Reliability pillar focuses on workloads performing their intended functions and how to recover quickly from failure to meet demands. Key topics include distributed system design, recovery planning, and adapting to changing requirements. Cost Optimization The Cost Optimization pillar focuses on avoiding unnecessary costs. Key topics include understanding spending over time and controlling fund allocation, selecting resources of the right type and quantity, and scaling to meet business needs without overspending. Performance Efficiency The Performance Efficiency pillar focuses on the structured and streamlined allocation of IT and computing resources. Key topics include selecting resource types and sizes optimized for workload requirements, monitoring performance, and maintaining efficiency as business needs evolve. Sustainability The Sustainability pillar focuses on minimizing the environmental impacts of running cloud workloads. Key topics include a shared responsibility model for sustainability, understanding impact, and maximizing utilization to minimize required resources and reduce downstream impacts. In line with the architecture characteristics listed as prioritized earlier in this post, availability, scalability, and security — the architecture design will be evaluated against the following pillars: Availability ---> Reliability Pillar Security ---> Security Pillar Scalability ---> Performance efficiency Pillar Over and above the architecture characteristics and the AWS Well-Architected Framework corresponding pillars, this architecture includes monitoring, which is provided by Amazon CloudWatch monitoring. This way speaks to another pillar that we are not focusing on here today — the Operational Excellence pillar. The architecture that the team arrived at is a serverless architecture. Follow the link to learn more about serverless architectures and find out considerations that would have led to the architect opting for this versus non-serverless. AWS Well-Architected Lenses AWS Well-Architected Lenses extend the guidance offered by AWS Well-Architected to specific industry and technology domains, such as machine learning (ML), data analytics, serverless, high-performance computing (HPC), IoT, SAP, streaming media, the games industry, hybrid networking, and financial services. To fully evaluate workloads, use applicable lenses together with the AWS Well-Architected Framework and its six pillars. Today we have 15 Lenses and the ability for you to build custom lenses. Serverless Applications Lens In this Lens, we focus on how to design, deploy, and architect your serverless application workloads in the AWS Cloud. For brevity, we have only covered details from the Well-Architected Framework that are specific to serverless workloads. You should still consider best practices and questions that have not been included in this document when designing your architecture. It is recommended that you read the AWS Well-Architected Framework whitepaper. For serverless workloads, AWS provides multiple core components (serverless and non-serverless) that allow you to design robust architectures for your serverless applications. Here we will present an overview of the services that will evaluate focusing on the services that we have in our architecture. There are eight areas that you should consider when building a serverless workload: Compute layer Data layer Messaging and streaming layer User management and identity layer Edge layer Systems monitoring and deployment Deployment approaches Lambda version control Now let’s deep dive into the Compute, User Management and Identity, and Edge Layer and discuss how they meet the architecture characteristics we identified and the AWS Well-Architected Framework Pillars that we also identified. Compute Layer The compute layer of your workload manages requests from external systems, controlling access and verifying that requests are appropriately authorized. Your business logic will be deployed and started by the runtime environment that it contains. With the compute layer, we focus on three AWS Services, of which two of them make part of the architecture that we are working with here. AWS Lambda — AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can trigger Lambda from over 200 AWS services and software-as-a-service (SaaS) applications and only pay for what you use. Amazon API Gateway — Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications. (It is worth noting that although we are only focusing on Reliability, Performance Efficiency, and Security as the Well-Architected Framework pillars due to the architectural characteristics that we are prioritizing – it is still worth considering the other pillars when designing the architecture. As an example, for Sustainability, one must always ensure to optimize areas of code that consume the most time or resources by using a service called Amazon CodeGuru to review code.) User Management and Identity Layer The user management and identity layer of your workload provides identity, authentication, and authorization for both external and internal customers of your workload’s interfaces. Edge Layer The edge layer of your workload manages the presentation layer and connectivity to external customers. It provides an efficient delivery method to external customers residing in distinct geographical locations. Amazon CloudFront provides a CDN that securely delivers web application content and data with low latency and high transfer speeds. In conclusion, to get started with Architecting on AWS, there are reference architectures to help you do just this. Take a look at our AWS Architecture Center for guidance on the AWS Well-Architected Framework, how to establish your cloud foundation on AWS, and some helpful resources on getting started with architecting on AWS. Looking at our opening quote at the beginning of this post, do we see a future where the business-IT wall comes down?

By Rashmi Nambiar
Lifting and Shifting a Web Application to AWS Serverless
Lifting and Shifting a Web Application to AWS Serverless

This article was authored by AWS Principal Developer Advocate, Marcia Villalba, and published with permission. This article provides a guide on how to migrate a MERN (Mongo, Express, React, and Node.js) web application to a serverless environment. It not only looks at the process of migrating a non-serverless web application to a serverless environment, but it also explores two issues that arise during the migration process. Lift and shift is often the fastest way to get a migrated application into production, and it allows the development team to refactor parts of the application that may benefit from serverless technologies. Before starting any migration, it is important to define the non-functional requirements that the new application needs to have. For this application these requirements are: An environment that scales to zero Paying as little as possible for idle time Configuring as little infrastructure as possible Automatic high availability of the application Minimal changes to the original code Application Overview This blog post guides you on how to migrate a MERN application. The original application is hosted on two different servers: one contains the Mongo database and another contains the Node/js/Express and ReactJS applications. This demo application simulates a swag e-commerce site. The database layer stores the products, users, and purchase history. The server layer takes care of the e-commerce business logic, hosting the product images, and user authentication and authorization. The web layer takes care of all the user interaction and communicates with the server layer using REST APIs. Application Migration The database layer of the application is migrated to MongoDB Atlas, a cloud-based database cluster that scales automatically and is paid for what is used. This case is as simple as dumping the content of the database in a local folder and then restoring it in the cloud. The Node.js/Express backend is migrated to AWS Lambda using the AWS Lambda Web Adapter, an open-source project that allows one to build a web application and run it on Lambda. You can learn more about migration in this video. The next step is to create an HTTP endpoint for the server application. This demo uses Lambda function URLs, as they are simple to configure, and one function URL forwards all routes to the Express server. You can learn more about Lambda Functions URLs in this video. The React web app is migrated to AWS Amplify, a fully managed service that provides features like hosting web applications and managing the CI/CD pipeline for the web app. You can see how to do the migration by following this video. Migration Challenges Up to here, the application is migrated from being hosted in a traditional environment to running using the serverless infrastructure. However, during this migration, there are two issues that arise: authentication and authorization, and storage. Authentication and Authorization Migration The original application handles the authentication and the authorization by itself. However, with the current migrated application every time you log in, you are logged out unexpectedly from the application. This is because the server code is responsible for handling the authentication and the authorization of the users, and now our server is running in an AWS Lambda function and functions are stateless. This means that there will be one function running per request—a request can load all the products on the landing page, get the details for a product, or log in to the site—and if you do something in one of these functions, the state is not shared across. To solve this, you must remove the authentication and authorization mechanisms from the function and use a service that can preserve the state across multiple invocations of the functions. That is why this migration uses Amazon Cognito to handle authentication and authorization. You can learn more about Amazon Cognito in this video. With this new architecture, the application calls Amazon Cognito APIs directly from the AWS Amplify application, minimizing the amount of code needed. To learn how this part of the migration was done check this other video. Storage Migration In the original application, when a new product is created, a new image is uploaded to the Node.js/Express server. However, now the application resides in a Lambda function. The code (and files) that are part of that function cannot change unless the function is redeployed. Consequently, you must separate the user storage from the server code. For solving this issue this migration will use Amazon S3, an object storage service that provides scalability, data availability, security, and performance. Additionally, Amazon CloudFront can be used to accelerate the retrieval of images from the cloud. If you want to learn how this migration was done, you can check this video. Conclusion By following this guide, you can quickly and easily migrate your web application to a serverless environment and take advantage of the benefits of serverless, such as automatic scaling and paying for what is used. This is a summary of the migration steps that this article suggests: Database migration: Migrate the database from on-premises to MongoDB Atlas. Backend migration: Migrate the NodeJS/Express application from on-premises to an AWS Lambda using Lambda Web Adapter and Lambda Function URLs. Web App migration: Migrate the React web app from on-premises to AWS Amplify. Authentication migration: Migrate the custom-built authentication to use Amazon Cognito. Storage migration: Migrate the local storage of images to use Amazon S3 and Amazon CloudFront. The following image shows the proposed solution for the migrated application: If you want to read an extended article on how to do this, why every service was picked over others, and access the code for the original and migrated application, you can read "Lifting and shifting a web application to AWS Serverless: Part 1," and if you are more a visual person, the video below covers it as well.

By Rashmi Nambiar
Build a Managed Analytics Platform for an E-Commerce Business on AWS (Part 1)
Build a Managed Analytics Platform for an E-Commerce Business on AWS (Part 1)

With the increase in popularity of online shopping, building an analytics platform for e-commerce is important for any organization, as it provides insights into the business, trends, and customer behavior. But more importantly, it can uncover hidden insights that can trigger revenue-generating business decisions and actions. In this blog, we will learn how to build a complete analytics platform in batch and real-time mode. The real-time analytics pipeline also shows how to detect distributed denial of service (DDoS) and bot attacks, which is a common requirement for such use cases. Introduction E-commerce analytics is the process of collecting data from all of the sources that affect a certain online business. Data analysts or business analysts can then utilize this information to deduce changes in customer behavior and online shopping patterns. E-commerce analytics spans the whole customer journey, starting from discovery through acquisition, conversion, and eventually retention and support. In this two-part blog series, we will build an e-commerce analytical platform that can help to analyze the data in real-time as well as in batch. We will use an e-commerce dataset from Kaggle to simulate the logs of user purchases, product views, cart history, and the user’s journey on the online platform to create two analytical pipelines: Batch processing Online/real-time processing You may like to refer to this session presented at AWS re:Invent 2022 for a video walk-through. Batch Processing The batch processing will involve data ingestion, lakehouse architecture, processing, and visualization using Amazon Kinesis, AWS Glue, Amazon S3, and Amazon QuickSight to draw insights regarding the following: Unique visitors per day During a certain time, the users add products to their carts but don’t buy them Top categories per hour or weekday (i.e., to promote discounts based on trends) To know which brands need more marketing Online/Real-Time Processing The real-time processing would involve detecting DDoS and bot attacks using AWS Lambda, Amazon DynamoDB, Amazon CloudWatch, and AWS SNS.This is the first part of the blog series, where we will focus only on the online/real-time processing data pipeline. In the second part of the blog series, we will dive into batch processing. Dataset For this blog, we are going to use the e-commerce behavior data from a multi-category store. This file contains the behavior data for 7 months (from October 2019 to April 2020) from a large multi-category online store, where each row in the file represents an event. All events are related to products and users. Each event is like a many-to-many relationship between products and users. Architecture Real-Time Processing We are going to build an end-to-end data engineering pipeline where we will start with this e-commerce behavior data from a multi-category store dataset as an input, which we will use to simulate a real-time e-commerce workload. This input raw stream of data will go into an Amazon Kinesis Data Stream (stream1), which will stream the data to Amazon Kinesis Data Analytics for analysis, where we will use an Apache Flink application to detect any DDoS attack, and the filtered data will be sent to another Amazon Kinesis Data Stream (stream2). We are going to use SQL to build the Apache Flink application using Amazon Kinesis Data Analytics and, hence, we would need a metadata store, for which we are going to use AWS Glue Data Catalog. And then this stream2 will trigger an AWS Lambda function which will send an Amazon SNS notification to the stakeholders and shall store the fraudulent transaction details in a DynamoDB table. The architecture would look like this: Batch Processing If we look into the architecture diagram above, we will see that we are not storing the raw incoming data anywhere. As the data enters through Kinesis Data Stream (stream1) we are passing it to Kinesis Data Analytics to analyze. And later on, we might discover some bug in our Apache Flink application, and at that point, we will fix the bug and resume processing the data, but we cannot process the old data (which was processed by our buggy Apache Flink application). This is because we have not stored the raw data anywhere, which can allow us to re-process it later. That's why it's recommended to always have a copy of the raw data stored in some storage (e.g., on Amazon S3) so that we can revisit the data if needed for reprocessing and/or batch processing. This is exactly what we are going to do. We will use the same incoming data stream from Amazon Kinesis Data Stream (stream1) and pass it on to Kinesis Firehose which can write the data on S3. Then we will use Glue to catalog that data and perform an ETL job using Glue ETL to process/clean that data so that we can further use the data for running some analytical queries using Athena.At last, we would leverage QuickSight to build a dashboard for visualization. Step-By-Step Walkthrough Let's build this application step-by-step. I'm going to use an AWS Cloud9 instance for this project, but it is not mandatory. If you wish to spin up an AWS Cloud9 instance, you may like to follow the steps mentioned here and proceed further. Download the Dataset and Clone the GitHub Repo Clone the project and change it to the right directory: Shell # Clone the project repository git clone https://github.com/debnsuma/build-a-managed-analytics-platform-for-e-commerce-business.git cd build-a-managed-analytics-platform-for-e-commerce-business/ # Create a folder to store the dataset mkdir dataset Download the dataset from here and move the downloaded file (2019-Nov.csv.zip) under the dataset folder: Now, let's unzip the file and create a sample version of the dataset by just taking the first 1000 records from the file. Shell cd dataset unzip 2019-Nov.csv.zip cat 2019-Nov.csv | head -n 1000 > 202019-Nov-sample.csv Create an Amazon S3 Bucket Now we can create an S3 bucket and upload this dataset: Name of the Bucket: e-commerce-raw-us-east-1-dev (replace <BUCKET_NAME> with your own bucket name) Shell # Copy all the files in the S3 bucket aws s3 cp 2019-Nov.csv.zip s3://<BUCKET_NAME>/ecomm_user_activity/p_year=2019/p_month=11/ aws s3 cp 202019-Nov-sample.csv s3://<BUCKET_NAME>/ecomm_user_activity_sample/202019-Nov-sample.csv aws s3 cp 2019-Nov.csv s3://<BUCKET_NAME>/ecomm_user_activity_unconcompressed/p_year=2019/p_month=11/ Create the Kinesis Data Stream Now, let's create the first Kinesis data stream (stream1 in our architecture diagram) which we will be using as the incoming stream. Open the AWS Console and then: Go to Amazon Kinesis. Click on Create data stream. Let's create another Kinesis data stream which we are going to use later on (stream2 in the architecture diagram). This time use the data stream name as e-commerce-raw-user-activity-stream-2. Start the E-Commerce Traffic We can now start the e-commerce traffic, as our Kinesis data stream is ready. This simulator which we are going to use is a simple python script which will read the data from a CSV file (202019-Nov-sample.csv, the dataset which we downloaded earlier) line by line and send it to the Kinesis data stream (stream1). But before you run the simulator, just edit the stream-data-app-simulation.py script with your <BUCKET_NAME> where you have the dataset. Shell # S3 buckect details (UPDATE THIS) BUCKET_NAME = "e-commerce-raw-us-east-1-dev" Once it's updated, we can run the simulator. Shell # Go back to the project root directory cd .. # Run simulator pip install boto3 python code/ecomm-simulation-app/stream-data-app-simulation.py HttpStatusCode: 200 , electronics.smartphone HttpStatusCode: 200 , appliances.sewing_machine HttpStatusCode: 200 , HttpStatusCode: 200 , appliances.kitchen.washer HttpStatusCode: 200 , electronics.smartphone HttpStatusCode: 200 , computers.notebook HttpStatusCode: 200 , computers.notebook HttpStatusCode: 200 , HttpStatusCode: 200 , HttpStatusCode: 200 , electronics.smartphone Integration with Kinesis Data Analytics and Apache Flink Now, we will create an Amazon Kinesis Data Analytics Streaming Application which will analyze this incoming stream for any DDoS or bot attack. Open the AWS Console and then: Go to Amazon Kinesis. Select Analytics applications. Click on Studio notebooks. Click on Create Studio notebook. Use ecomm-streaming-app-v1 as the Studio notebook name. Under the Permissions section, click on Create to create an AWS Glue database, name the database as my-db-ecomm. Use the same database, my-db-ecomm from the dropdown. Click on Create Studio notebook. Now, select the ecomm-streaming-app-v1 Studio notebook and click on Open in Apache Zeppelin: Once the Zeppelin Dashboard comes up, click on Import note and import this notebook: Open the sql-flink-ecomm-notebook-1 notebook. Flink interpreters supported by Apache Zeppelin notebook are Python, IPython, stream SQL, or batch SQL, and we are going to use SQL to write our code. There are many different ways to create a Flink Application but one of the easiest ways is to use Zeppelin notebook. Let's look at this notebook and briefly discuss what are we doing here: First, we are creating a table for the incoming source of data (which is the e-commerce-raw-user-activity-stream-1 incoming stream). Next, we are creating another table for the filtered data (which is for the e-commerce-raw-user-activity-stream-2 outgoing stream). And finally, we are putting the logic to simulate the DDoS attack. We are essentially looking into the last 10 seconds of the data and grouping them by user_id. If we notice more than 5 records within that 10 seconds, Flink will take that user_id and the no. of records within those 10 seconds and will push that data to the e-commerce-raw-user-activity-stream-2 outgoing stream. 5 products in the last 10 seconds, by user_id 1 SQL %flink.ssql /*Option 'IF NOT EXISTS' can be used, to protect the existing Schema */ DROP TABLE IF EXISTS ecomm_user_activity_stream_1; CREATE TABLE ecomm_user_activity_stream_1 ( `event_time` VARCHAR(30), `event_type` VARCHAR(30), `product_id` BIGINT, `category_id` BIGINT, `category_code` VARCHAR(30), `brand` VARCHAR(30), `price` DOUBLE, `user_id` BIGINT, `user_session` VARCHAR(30), `txn_timestamp` TIMESTAMP(3), WATERMARK FOR txn_timestamp as txn_timestamp - INTERVAL '10' SECOND ) PARTITIONED BY (category_id) WITH ( 'connector' = 'kinesis', 'stream' = 'e-commerce-raw-user-activity-stream-1', 'aws.region' = 'us-east-1', 'scan.stream.initpos' = 'LATEST', 'format' = 'json', 'json.timestamp-format.standard' = 'ISO-8601' ); /*Option 'IF NOT EXISTS' can be used, to protect the existing Schema */ DROP TABLE IF EXISTS ecomm_user_activity_stream_2; CREATE TABLE ecomm_user_activity_stream_2 ( `user_id` BIGINT, `num_actions_per_watermark` BIGINT ) WITH ( 'connector' = 'kinesis', 'stream' = 'e-commerce-raw-user-activity-stream-2', 'aws.region' = 'us-east-1', 'format' = 'json', 'json.timestamp-format.standard' = 'ISO-8601' ); /* Inserting aggregation into Stream 2*/ insert into ecomm_user_activity_stream_2select user_id, count(1) as num_actions_per_watermarkfrom ecomm_user_activity_stream_1group by tumble(txn_timestamp, INTERVAL '10' SECOND), user_idhaving count(1) > 5; Create the Apache Flink Application Now that we have our notebook imported, we can create the Flink Application from the notebook directly. To do that: Click on Actions for ecomm-streaming-app-v1 in the top right corner. Click on Build sql-flink-ecomm-notebook-1 > Build and export. It will compile all the codes, will create a ZIP file, and would store the file on S3. Now we can deploy that application by simply clicking on Actions for ecomm-streaming-app-v1 on the top right corner. Click on Deploy sql-flink-ecomm-notebook-1 as Kinesis Analytics application > Deploy using AWS Console. Scroll down and click on Save changes. This is the power of Kinesis Data Analytics: just from a simple Zeppelin Notebook, we can create a real-world application without any hindrance. Finally, we can start the application by clicking on Run. It might take a couple of minutes to start the application, so let's wait until we see the Status as Running. Alarming DDoS Attack If we revisit our architecture, we will see that we are almost done with the real-time/online processing. The only thing which is pending is to create a Lambda function which will be triggered whenever there is an entry of a record inside the e-commerce-raw-user-activity-stream-2 stream. The Lambda function would perform the following: Write that record into a DynamoDB table. Send an SNS notification. Update the CloudWatch metrics. Let's first build the code for the Lambda function. The code is available under code/serverless-app folder. Shell # Install the aws_kinesis_agg package cd code/serverless-app/ pip install aws_kinesis_agg -t . # Build the lambda package and download the zip file. zip -r ../lambda-package.zip . # Upload the zip to S3 cd .. aws s3 cp lambda-package.zip s3://e-commerce-raw-us-east-1-dev/src/lambda/ Now, let's create the Lambda function. Open the AWS Lambda console. Click on Create function button. Enter the Function name as ecomm-detect-high-event-volume. Enter the Runtime as Python 3.7. Click on Create function. Once the Lambda function is created we need to upload the code which we stored in S3. Provide the location of the Lambda code and click on Save: We need to provide adequate privileges to our Lambda function so that it can talk to Kinesis Data Streams, DynamoDB, CloudWatch, and SNS. To modify the IAM Role: Go to the Configuration tab > Permission tab on the left. Click on the Role Name. Since this is just for the demo, we are adding Full Access, but it's NOT recommended for the production environment. We should always follow the least privilege principle to grant access to any user/resource. Let's create the SNS Topic: Open the Amazon SNS console. Click on Create Topic. Select the Type as Standard. Provide the Name as ecomm-user-high-severity-incidents. Click on Create Topic. Let's create a DynamoDB table: Open the Amazon DynamoDB console. Click on Create table. Create the table with the following details. Now, we can add the environment variables that are needed for the Lambda Function. These environment variables are used in the lambda function code. The following are the environment variables: Show Time So, now we are all done with the implementation and it's time to start generating the traffic using the python script which we created earlier, and see everything in action!! Shell cd build-a-managed-analytics-platform-for-e-commerce-business python code/ecomm-simulation-app/stream-data-app-simulation.py HttpStatusCode: 200 , electronics.smartphone HttpStatusCode: 200 , appliances.sewing_machine HttpStatusCode: 200 , HttpStatusCode: 200 , appliances.kitchen.washer HttpStatusCode: 200 , electronics.smartphone HttpStatusCode: 200 , computers.notebook HttpStatusCode: 200 , computers.notebook HttpStatusCode: 200 , HttpStatusCode: 200 , HttpStatusCode: 200 , electronics.smartphone HttpStatusCode: 200 , furniture.living_room.sofa We can also monitor this traffic using the Apache Flink Dashboard: Open the Amazon Kinesis Application dashboard. Select the Application, ecomm-streaming-app-v1-sql-flink-ecomm-notebook-1-2HFDAA9HY. Click on Open Apache Flink dashboard. Once you are on the Open Apache Flink dashboard. Click on Running Jobs > Job Name which is running. Finally, we can also see all the details of the users, which are classified as a DDoS attack by the Flink Application in the DynamoDB table. You can let the simulator run for the next 5-10 mins and can explore and monitor all the components we have built in this whole data pipeline. Summary In this blog post, we built an e-commerce analytical platform that can help analyze the data in real-time. We used a python script to simulate the real traffic using the dataset and used Amazon Kinesis as the incoming stream of data. That data is being analyzed by Amazon Kinesis Data Analytics using Apache Flink using SQL, which involves detecting distributed denial-of-service (DDoS) and bot attacks using AWS Lambda, DynamoDB, CloudWatch, and AWS SNS. In the second part of this blog series, we will dive deep and build the batch processing pipeline and build a dashboard using Amazon QuickSight, which will help us to get more insights about users. It will help us to know details like, who visits the e-commerce website more frequently, which are the top and bottom selling products, which are the top brands, and so on.

By Suman Debnath
17 Open Source Projects at AWS Written in Rust
17 Open Source Projects at AWS Written in Rust

This article was authored by AWS Senior Software Developer Engineer, Tim McNamara, and published with permission. Lots of people have been investigating Rust recently. That raises an important question: “Is Rust actually useful for me?” While we can’t tell you whether it’s appropriate for your use case, we can share some examples of where it’s been useful for us. These projects serve as a representative sample of what we've created so far and provide a glimpse into our use of Rust. We hope that by inspecting the code, you can learn from our work and get inspired to experiment with Rust at your workplace. Keep in mind that this list is not exhaustive: there are more open-source projects to explore, particularly within the AWS and AWS Labs organizations in GitHub. We hope you find these projects informative and valuable! Systems Programming Rust rose to prominence as a systems programming language, offering memory safety benefits that are unavailable in its peer languages, such as C and C++. This field, specifically virtualization, is where AWS first utilized Rust for large-scale projects. Bottlerocket Bottlerocket is an operating system designed for hosting containers. It includes only the essential software required to run containers and ensures that the underlying software is always secure. For example, it’s impossible to SSH into a container running in Bottlerocket: running containers don’t even have a shell, let alone sshd. Firecracker Firecracker powers AWS Lambda and AWS Fargate. It runs workloads in lightweight virtual machines called micro VMs, which combine speed and flexibility (which we’re used to from containers) with security and isolation (which we’re used to from virtual machines). Web Services As Rust became increasingly commonplace within AWS, projects began to appear that were broader than the initial systems programming domain. It is now a language that has a strong user base developing web-facing services. Rust Runtime for AWS Lambda Serverless is becoming increasingly mainstream within the technology industry, and serverless Rust is an excellent way to make use of this new paradigm. The Rust Runtime for AWS Lambda provides a custom runtime for AWS Lambda that’s ergonomic to use and offers a performance boost versus other runtimes. smithy-rs Keeping servers and clients up-to-date as APIs change is a difficult task. The Smithy Interface Definition Language (IDL) simplifies this by delegating the bookkeeping to software. The Rust implementation is called smithy-rs. It can generate clients and servers in Rust while enabling business logic to be implemented within developers’ preferred languages such as Python. smithy-rs is an interesting project internally, as it is an example of using Kotlin and Rust within the same code base. smithy-rs is used to generate the open-source crates that belong to AWS SDK for Rust. AWS SDK for Rust The AWS SDK for Rust enables AWS services to be accessed programmatically from Rust programs. The whole SDK encompasses dozens of crates, each corresponding to an AWS service, all of which are available for inspection within the SDK’s GitHub repository. Testing While Rust’s type system provides many guarantees, it doesn’t prevent all bugs. We’ve created a few tools to expand the robustness of software written across the company and beyond. Kani Rust Verifier Kani Rust Verifier is part of a family of tools called “model checkers” to enable mathematical reasoning about software. Kani provides lightweight formal verification within Rust projects. In fact, Firecracker’s security is formally verified with Kani. You can use Kani in your own programs to increase their robustness to errors that unit and integration tests are likely to miss. Shuttle Shuttle is a tool for testing concurrent code that works by controlling the scheduling of each thread and scheduling those threads randomly. By controlling the scheduling, Shuttle allows us to reproduce failing tests deterministically. CLI Utilities Developers at Amazon have also found that writing CLIs in Rust is very worthwhile. The type system prevents many tricky runtime errors during development. CLIs written in Rust are easy to distribute, run very fast, and use very little memory. Amazon Ion Amazon Ion is a data format that comes with a CLI written in Rust. What’s a data format? You’ve probably seen JSON around — that’s an example of a data format. JSON is text-based, which is readable but can take up unnecessary space. It can also be unclear from receiving a file whether it contains the correct fields and data types. Unlike JSON, Ion provides both text and binary forms of its data model, to make it easy to inspect data on the fly. The CLI also enables you to validate a file against a schema. AWS CloudFormation Guard AWS CloudFormation Guard validates CloudFormation specifications. This can enable you to prevent mistakes entering production for people who are following an infrastructure as code methodology by including it as a pre-commit hook. Nitro Enclaves Command Line Interface (Nitro CLI) Nitro CLI is a tool for managing the lifecycle of Nitro Enclaves. Enclaves enable AWS customers to protect their most sensitive data by housing that data within an isolated, hardened, and highly constrained environment. Other Utilities Rust has also proven to be worthwhile in less prominent locations. coldsnap makes it easy to upload and EBS download snapshots from the command line, while dynein provides a CLI for Amazon DynamoDB. Flowgger can ingest, transform, and export logs from multiple sources. To provide error-bounded timestamps, ClockBound works with the chrony NTP server to enable disparate events to be ordered, independent from geographic locations of the source. Libraries and Developer Tools As experience is gained, it’s common for people to share what they’ve learned. By inspecting the AWS and AWS Labs Github organizations, it’s clear that the number of libraries written in Rust is growing. s2n-quic Cryptographic applications were another area where early experiments were taken with Rust. One of the downstream outcomes of that work is our open-source implementation of post-quantum key exchange for TLS, which is found in the s2n-quic QUIC implementation. cargo-check-external-types cargo-check-external-types is a Cargo plugin for Rust library authors. It helps to make sure that the library’s API stays consistent, even if a dependency changes. Essentially, it checks which types from other libraries can be part of their public API, so a change to a dependency doesn't break their library. DCV Color Primitives To convert between color models in different applications, a common library makes a lot of sense. The DCV Color Primitives library can convert between multiple pixel formats, while also supporting being easy to compile to multiple target architectures, including ARM (which includes Graviton family of CPUs) and WebAssembly.

By Rashmi Nambiar
Deploying Go Applications to AWS App Runner: A Step-By-Step Guide
Deploying Go Applications to AWS App Runner: A Step-By-Step Guide

In this blog post, you will learn how to run a Go application to AWS App Runner using the Go platform runtime. You will start with an existing Go application on GitHub and deploy it to AWS App Runner. The application is based on the URL shortener application (with some changes) that persists data in DynamoDB. Introduction AWS App Runner is a robust and user-friendly service that simplifies the deployment process of web applications in the AWS Cloud. It offers developers an effortless and efficient way to deploy their source code or container image directly to a scalable and secure web application without requiring them to learn new technologies or choose the appropriate compute service. One of the significant benefits of using AWS App Runner is that it connects directly to the code or image repository, enabling an automatic integration and delivery pipeline. This eliminates the need for developers to go through the tedious process of manually integrating their code with AWS resources. For developers, AWS App Runner simplifies the process of deploying new versions of their code or image repository. They can easily push their code to the repository, and App Runner will automatically take care of the deployment process. On the other hand, for operations teams, App Runner allows for automatic deployments every time a new commit is pushed to the code repository or a new container image version is added to the image repository. App Runner: Service Sources With AWS App Runner, you can create and manage services based on two types of service sources: Source code (covered in this blog post) Source image Source code is nothing but your application code that App Runner will build and deploy. All you need to do is point App Runner to a source code repository and choose a suitable runtime that corresponds to a programming platform version. App Runner provides platform-specific managed runtimes (for Python, Node.js, Java, Go, etc.). The AWS App Runner Go platform runtime makes it easy to build and run containers with web applications based on a Go version. You don’t need to provide container configuration and build instructions such as a Dockerfile. When you use a Go runtime, App Runner starts with a managed Go runtime image which is based on the Amazon Linux Docker image and contains the runtime package for a version of Go and some tools. App Runner uses this managed runtime image as a base image and adds your application code to build a Docker image. It then deploys this image to run your web service in a container. Let’s Get Started Make sure you have an AWS account and install AWS CLI. 1. Create a GitHub Repo for the URL Shortener Application Clone this GitHub repo and then upload it to a GitHub repository in your account (keep the same repo name i.e. apprunner-go-runtime-app): git clone https://github.com/abhirockzz/apprunner-go-runtime-app 2. Create a DynamoDB Table To Store URL Information Create a table named urls. Choose the following: Partition key named shortcode (data type String) On-Demand capacity mode 3. Create an IAM Role With DynamoSB-Specific Permissions export IAM_ROLE_NAME=apprunner-dynamodb-role aws iam create-role --role-name $IAM_ROLE_NAME --assume-role-policy-document file://apprunner-trust-policy.json Before creating the policy, update the dynamodb-access-policy.json file to reflect the DynamoDB table ARN name. aws iam put-role-policy --role-name $IAM_ROLE_NAME --policy-name dynamodb-crud-policy --policy-document file://dynamodb-access-policy.json Deploy the Application to AWS App Runner If you have an existing AWS App Runner GitHub connection and want to use that, skip to the Repository selection step. 1. Create an AWS App Runner GitHub Connection Open the App Runner console and choose Create service. Create AWS App Runner Service On the Source and deployment page, in the Source section, for Repository type, choose Source code repository. Under Connect to GitHub, choose Add new, and then, if prompted, provide your GitHub credentials. Add GitHub connection In the Install AWS Connector for GitHub dialog box, if prompted, choose your GitHub account name. If prompted to authorize the AWS Connector for GitHub, choose Authorize AWS Connections. Choose Install. Your account name appears as the selected GitHub account/organization. You can now choose a repository in your account. 2. Repository Selection For Repository, choose the repository you created: apprunner-go-runtime-app. For Branch, choose the default branch name of your repository (for example, main). Configure your deployment: In the Deployment settings section, choose Automatic, and then choose Next. Choose GitHub repo 3. Configure Application Build On the Configure build page, for the Configuration file, choose Configure all settings here. Provide the following build settings: Runtime: Choose Go 1 Build command: Enter go build main.go Start command: Enter ./main Port: Enter 8080 Choose Next. Configure runtime info 4. Configure Your Service Under Environment variables, add an environment variable. For Key, enter TABLE_NAME, and for Value, enter the name of the DynamoDB table (urls) that you created before. Add environment variables Under Security > Permissions, choose the IAM role that you had created earlier (apprunner-dynamodb-role). Add IAM role for App Runner Choose Next. On the Review and create page, verify all the details you’ve entered, and then choose Create and deploy. If the service is successfully created, the console shows the service dashboard, with a Service overview of the application. Verify URL Shortener Functionality The application exposes two endpoints: To create a short link for a URL Access the original URL via the short link First, export the App Runner service endpoint as an environment variable: export APP_URL=<enter App Runner service URL> # example export APP_URL=https://jt6jjprtyi.us-east-1.awsapprunner.com 1. Invoke It With a URL That You Want to Access via a Short Link curl -i -X POST -d 'https://abhirockzz.github.io/' $APP_URL # output HTTP/1.1 200 OK Date: Thu, 21 Jul 2022 11:03:40 GMT Content-Length: 25 Content-Type: text/plain; charset=utf-8 {"ShortCode":"ae1e31a6"} You should get a JSON response with a short code and see an item in the DynamoDB table as well. You can continue to test the application with other URLs that you want to shorten! 2. Access the URL Associated With the Short Code Enter the following in your browser http://<enter APP_URL>/<shortcode>. For example, when you enter https://jt6jjprtyi.us-east-1.awsapprunner.com/ae1e31a6, you will be redirected to the original URL. You can also use curl. Here is an example: export APP_URL=https://jt6jjprtyi.us-east-1.awsapprunner.com curl -i $APP_URL/ae1e31a6 # output HTTP/1.1 302 Found Location: https://abhirockzz.github.io/ Date: Thu, 21 Jul 2022 11:07:58 GMT Content-Length: 0 Clean up Once you complete this tutorial, don’t forget to delete the following resources: DynamoDB table App Runner service Conclusion In this blog post, you learned how to go from a Go application in your GitHub repository to a complete URL shortener service deployed to AWS App Runner!

By Abhishek Gupta CORE
Getting Started With MSK Serverless and AWS Lambda Using Go
Getting Started With MSK Serverless and AWS Lambda Using Go

In this post, you will learn how to deploy a Go Lambda function and trigger it in response to events sent to a topic in an MSK Serverless cluster. The following topics have been covered: How to use the franz-go Go Kafka client to connect to MSK Serverless using IAM authentication Write a Go Lambda function to process data in MSK topic. Create the infrastructure: VPC, subnets, MSK cluster, Cloud9 etc. Configure Lambda and Cloud9 to access MSK using IAM roles and fine-grained permissions. MSK Serverless is a cluster type for Amazon MSK that makes it possible for you to run Apache Kafka without having to manage and scale cluster capacity. It automatically provisions and scales capacity while managing the partitions in your topic, so you can stream data without thinking about right-sizing or scaling clusters. Consider using a serverless cluster if your applications need on-demand streaming capacity that scales up and down automatically.- MSK Serverless Developer Guide Prerequisites You will need an AWS account to install AWS CLI, as well as a recent version of Go (1.18 or above). Clone this GitHub repository and change it to the right directory: git clone https://github.com/abhirockzz/lambda-msk-serverless-trigger-golang cd lambda-msk-serverless-trigger-golang Infrastructure Setup AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS. You create a template that describes all the AWS resources that you want (like Amazon EC2 instances or Amazon RDS DB instances), and CloudFormation takes care of provisioning and configuring those resources for you. You don't need to individually create and configure AWS resources and figure out what's dependent on what; CloudFormation handles that.- AWS CloudFormation User Guide Create VPC and Other Resources Use a CloudFormation template for this. aws cloudformation create-stack --stack-name msk-vpc-stack --template-body file://template.yaml Wait for the stack creation to complete before proceeding to other steps. Create MSK Serverless Cluster Use AWS Console to create the cluster. Configure the VPC and private subnets created in the previous step. Create an AWS Cloud9 Instance Make sure it is in the same VPC as the MSK Serverless cluster and choose the public subnet that you created earlier. Configure MSK Cluster Security Group After the Cloud9 instance is created, edit the MSK cluster security group to allow access from the Cloud9 instance. Configure Cloud9 To Send Data to MSK Serverless Cluster The code that we run from Cloud9 is going to produce data to the MSK Serverless cluster. So we need to ensure that it has the right privileges. For this, we need to create an IAM role and attach the required permissions policy. aws iam create-role --role-name Cloud9MSKRole --assume-role-policy-document file://ec2-trust-policy.json Before creating the policy, update the msk-producer-policy.json file to reflect the required details including MSK cluster ARN etc. aws iam put-role-policy --role-name Cloud9MSKRole --policy-name MSKProducerPolicy --policy-document file://msk-producer-policy.json Attach the IAM role to the Cloud9 EC2 instance: Send Data to MSK Serverless Using Producer Application Log into the Cloud9 instance and run the producer application (it is a Docker image) from a terminal. export MSK_BROKER=<enter the MSK Serverless endpoint> export MSK_TOPIC=test-topic docker run -p 8080:8080 -e MSK_BROKER=$MSK_BROKER -e MSK_TOPIC=$MSK_TOPIC public.ecr.aws/l0r2y6t0/msk-producer-app The application exposes a REST API endpoint using which you can send data to MSK. curl -i -X POST -d 'test event 1' http://localhost:8080 This will create the specified topic (since it was missing, to begin with) and also send the data to MSK. Now that the cluster and producer applications are ready, we can move on to the consumer. Instead of creating a traditional consumer, we will deploy a Lambda function that will be automatically invoked in response to data being sent to the topic in MSK. Configure and Deploy the Lambda Function Create Lambda Execution IAM Role and Attach the Policy A Lambda function's execution role is an AWS Identity and Access Management (IAM) role that grants the function permission to access AWS services and resources. When you invoke your function, Lambda automatically provides your function with temporary credentials by assuming this role. You don't have to call sts:AssumeRole in your function code. aws iam create-role --role-name LambdaMSKRole --assume-role-policy-document file://lambda-trust-policy.json aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaMSKExecutionRole --role-name LambdaMSKRole Before creating the policy, update the msk-consumer-policy.json file to reflect the required details including MSK cluster ARN etc. aws iam put-role-policy --role-name LambdaMSKRole --policy-name MSKConsumerPolicy --policy-document file://msk-consumer-policy.json Build and Deploy the Go Function and Create a Zip File Build and zip the function code: GOOS=linux go build -o app zip func.zip app Deploy to Lambda: export LAMBDA_ROLE_ARN=<enter the ARN of the LambdaMSKRole created above e.g. arn:aws:iam::<your AWS account ID>:role/LambdaMSKRole> aws lambda create-function \ --function-name msk-consumer-function \ --runtime go1.x \ --zip-file fileb://func.zip \ --handler app \ --role $LAMBDA_ROLE_ARN Lambda VPC Configuration Make sure you choose the same VPC and private subnets as the MSK cluster. Also, select the same security group ID as MSK (for convenience). If you select a different one, make sure to update the MSK security group to add an inbound rule (for port 9098), just like you did for the Cloud9 instance in an earlier step. Configure the MSK Trigger for the Function When Amazon MSK is used as an event source, Lambda internally polls for new messages from the event source and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides these to your function as an event payload. The maximum batch size is configurable (the default is 100 messages). Lambda reads the messages sequentially for each partition. After Lambda processes each batch, it commits the offsets of the messages in that batch. If your function returns an error for any of the messages in a batch, Lambda retries the whole batch of messages until processing succeeds or the messages expire. Lambda sends the batch of messages in the event parameter when it invokes your function. The event payload contains an array of messages. Each array item contains details of the Amazon MSK topic and partition identifier, together with a timestamp and a base64-encoded message. Make sure to choose the right MSK Serverless cluster and enter the correct topic name. Verify the Integration Go back to the Cloud9 terminal and send more data using the producer application. I used a handy JSON utility called jo (sudo yum install jo). APP_URL=http://localhost:8080 for i in {1..5}; do jo email=user${i}@foo.com name=user${i} | curl -i -X POST -d @- $APP_URL; done In the Lambda function logs, you should see the messages that you sent. Conclusion You were able to set up, configure and deploy a Go Lambda function and trigger it in response to events sent to a topic in an MSK Serverless cluster!

By Abhishek Gupta CORE

The Latest AWS Cloud Topics

article thumbnail
Build a Managed Analytics Platform for an E-Commerce Business on AWS (Part 1)
Learn how to build a complete analytics platform in batch and real-time mode. The real-time analytics pipeline also shows how to detect DDoS and bot attacks.
March 3, 2023
by Suman Debnath
· 1,421 Views · 3 Likes
article thumbnail
Modernize and Gradually Migrate Your Data Model From SQL to NoSQL
This article explores two strategies that can help transition from SQL to NoSQL more smoothly: change data capture and dual-writes.
February 28, 2023
by Rashmi Nambiar
· 1,903 Views · 1 Like
article thumbnail
Detect and Resolve Biases in Artificial Intelligence [Video]
In this presentation from re:Invent 2022 Las Vegas, learn more about the problem of biases, but also how to detect and fight them with cloud-agnostic solutions.
February 27, 2023
by Rashmi Nambiar
· 1,642 Views · 1 Like
article thumbnail
Monitor and Predict Health Data Using AWS AI Services
Explore how to build a serverless personal health solution leveraging AWS AI services to provide insight extraction, monitoring, and forecasting.
February 27, 2023
by Rashmi Nambiar
· 1,542 Views · 1 Like
article thumbnail
Architect Data Analytics Engine for Your Business Using AWS and Serverless
This article shows the approach to architectural design that you can use at your organization to start the data journey and quickly get results.
February 21, 2023
by Rashmi Nambiar
· 3,138 Views · 2 Likes
article thumbnail
Unleash Developer Productivity With Infrastructure From Code [Video]
Learn about a new category of developer tools that is emerging that challenges the primitive-centric approach to building modern cloud applications.
February 21, 2023
by Rashmi Nambiar
· 2,393 Views · 1 Like
article thumbnail
Bringing Software Engineering Rigor to Data [Video]
Leverage software engineering practices for data engineering and learn to measure key performance metrics to help build more robust and reliable data pipelines.
February 20, 2023
by Rashmi Nambiar
· 2,298 Views · 1 Like
article thumbnail
Competition of the Modern Workloads: Serverless vs Kubernetes on AWS [Video]
In this video session, explore a “battle” between the serverless and the Kubernetes approach examining use cases and insights from each speaker's experience.
February 17, 2023
by Rashmi Nambiar
· 4,227 Views · 1 Like
article thumbnail
17 Open Source Projects at AWS Written in Rust
Have you found yourself asking, "Is Rust actually useful for me?" Here, find some examples of where it’s been useful.
February 16, 2023
by Rashmi Nambiar
· 7,313 Views · 2 Likes
article thumbnail
Decode User Requirements to Design Well-Architected Applications
It's time to foster a space of collaboration and a shared mission: a space that puts technologists and business people on the same team. Learn how in this post.
February 16, 2023
by Rashmi Nambiar
· 3,402 Views · 1 Like
article thumbnail
Lifting and Shifting a Web Application to AWS Serverless
This post provides a guide on how to migrate a MERN (Mongo, Express React, and Node.js) web application to a serverless environment.
February 15, 2023
by Rashmi Nambiar
· 2,862 Views · 1 Like
article thumbnail
Improving Performance of Serverless Java Applications on AWS
Learn the basics behind how Lambda execution environments operate and different ways to improve the startup time and performance of Java applications on Lambda.
February 15, 2023
by Rashmi Nambiar
· 3,599 Views · 1 Like
article thumbnail
Building Your Own Apache Kafka Connectors
Building connectors for Apache Kafka following a complete code example.
February 11, 2023
by Ricardo Ferreira
· 4,320 Views · 1 Like
article thumbnail
How To Train, Evaluate, and Deploy a Hugging Face Model
Learn the end-to-end deployment of Hugging Face models.
January 5, 2023
by Banjo Obayomi
· 3,854 Views · 1 Like
article thumbnail
Deploying MWAA Using Terraform
How to use HashiCorp's open source Infrastructure as Code tool Terraform to configure and deploy your Managed Workflows for Apache Airflow environments.
September 30, 2022
by Ricardo Sueiras
· 8,127 Views · 2 Likes
article thumbnail
Deploying MWAA Using AWS CDK
Learn how to use a Python AWS CDK application to configure and deploy your Apache Airflow environments using MWAA in a repeatable and consistent way.
July 24, 2022
by Ricardo Sueiras
· 5,347 Views · 2 Likes
  • 1

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: