Are You Well-Architected? A Look at A Robust AWS Architecture
Are You Well-Architected? A Look at A Robust AWS Architecture
The AWS Well-Architected Framework provides guidelines on what makes an optimal cloud infrastructure.
Join the DZone community and get the full member experience.Join For Free
In an age that is exploding with Big Data and IoT innovations, the shift from an "on-premise" environment to a "cloud" environment offers tremendous opportunity for organizations in terms of increased agility, lower total cost of ownership and faster innovation. The organizations that are the most successful in making this major shift are those that start early in their journey to establish a well-defined strategy for approaching this new IT operating model.
Moving from a large-investment model in data centers to the consumption-based model of AWS requires not only certain changes to tools and processes but also the organizations’ mindsets as well. Today, almost 52% of organizations around the world are using some form of Platform-as-a-Service (PaaS) and 65% of the organizations as Information-as-a-Service (IaaS).
Collectively, if we see the PaaS/IaaS usage worldwide, Amazon Web Services (AWS) leads the pack with 94% of all-access events. More than 50% of the organizations are also using AWS with Azure, typically as an official multi-cloud strategy.
In this context, AWS’s well-architected (WA) program is a framework to measure applications running in the cloud against a set of strategies and best practices. The framework has been compiled by working closely with over thousands of customers for over ten years.
The purpose of this framework is to empower customers to make informed decisions about their cloud architecture and help them to understand the potential impact that those decisions can have. The questions that arise while one goes through the well-architected process is based on the business context in terms of the business realities and the choices that are made for a given system at a given time rather than a simple “Yes or No.”
The AWS WA Framework consists of a set of questions that span across five pillars based on cloud-specific design principles. Creating technology solutions is a lot like constructing a building. If the foundation (read as "pillar") of the building isn’t solid then it may compromise the integrity and function of that building or cause structural problems to it, both of which are detrimental to the business. A set of design principles are implemented to facilitate a good design in the cloud which are divided into two categories – General and Pillar Specific.
Amazon Web Services (AWS) WA Framework serves 3 benefits for its customers. First, it increases the awareness of the architectural best practices and helps them understand the pros and cons of the decisions they make while building systems on AWS. Secondly, it addresses foundational areas that are often neglected while designing out the cloud system. Thirdly, it teaches a consistent approach to evaluate architectures and making corrections or improvements wherever needed. If implemented well, well-architected systems can increase success and provide a competitive advantage to the business to a great extent.
The 5 Pillars Of The AWS Framework
The AWS Well-Architected Framework is an amalgamation of architectural best practices that empowers the customers to develop workloads that are secure, cost-optimized, reliable and high-performing in nature. Each pillar of the framework is supported by certain design principles, which, if used wisely, can help strengthen the pillars even further and ensure the sound working of the entire cloud system.
With the context of cloud architecture, a workload can be defined as a set of components that constantly delivers business value and is indicative of the level of detail that the technology and business leaders often communicate about such as analytic platforms, marketing websites and so on. What makes the study of cloud architecture so significant is that the workloads which these systems manage usually vary in their level of architectural complexity in the sense that some can be simple (static website) or complex (microservices architectures with many linked components and multiple data stores).
It deals with protecting information, assets, and systems against advanced threats through continued risk assessments and mitigation strategies.
To strengthen this pillar, it is essential to implement a strong identity foundation by centralizing privilege management and reducing reliance on long-term credentials; apply security at all layers by applying a defense-in-depth approach with other security controls; protect data in transit and at rest by classifying your data into sensitivity and using mechanism of encryption and access control wherever appropriate; automate security best practices to scale up more rapidly by implementing controls in a secure architecture that are automatically designed and managed as a code in version-controlled templates; enable traceability by constantly monitoring and auditing actions and changes in the architecture in real-time and integrating metrics and logs within the system to automatically respond and take action; preparing for major security events by having an appropriate incident management process in place aligning to your specific organizational requirements; create tools and mechanisms that reduce or eliminate the need for manual processing or direct access to data by humans to avoid unnecessary loss or errors while handling sensitive data.
Reliability deals with meeting the customer and business demands by proactively preventing and quickly recovering from failures — misconfigurations or transient network issues — arising out of the foundational elements through proven failure recovery processes, consistent change management and the acquisition of computing resources. In a traditional on-premise environment, achieving reliability can be quite difficult due to single-point of failure, and the lack of elasticity and automation to mitigate disruptions.
To strengthen this pillar, it is necessary to test and validate recovery procedures by using automation to simulate various failures or recreate pre-failure scenarios to expose failure pathways and fix it before a real failure scenario takes place; scale horizontally to increase the aggregate system availability by replacing one large resource with multiple small resources to reduce the single failure’s impact on the overall system, and distributing requests across the smaller resources to reduce the chance of overall system failure; automatically recover from failure by monitoring the system for key performance indicators (measure of business value rather than the technical aspect of the service operation) and triggering automation when a threshold is breached; manage change in infrastructure via automation to reduce errors or breakdowns; stop guessing capacity by monitoring the demand and system utilization, and automating the addition or removal of resources to maintain an optimal level to satisfy demand without over-provisioning or under-provisioning as the most common cause of failure in on-premises systems is resource saturation.
The pillar involves using computing resources effectively and efficiently to meet the demand requirements and maintaining that efficiency level as the demand changes in response to technological evolution. Achieving high-level and lasting performance can be challenging in a traditional, on-premises environment.
To strengthen this pillar, it is necessary to mechanical sympathy by using the technology approach that aligns best with what you are trying to achieve; democratize advanced technologies by pushing the knowledge and complexity of these advanced technologies into the cloud vendor’s domain, and simply letting your IT team consume it in the form of a service such as machine learning, media transcoding and NoSQL databases; experiment more often using virtual and automated resources that can help to quickly carry out comparative testing using different types of configurations and instances; use serverless architectures in the cloud by removing the operational burden of running and managing these servers by using storage servers which drastically reduces the transactional costs to carry out computing activities; go global in minutes by easily deploying the cloud system in multiple regions around the world in a few clicks to give a better experience to the customers at minimal cost and to provide lower latency.
Performance efficiency in the cloud is composed of four areas:
This deals with running and monitoring systems to deliver business value and continually improve procedures and processes to enhance operational capabilities. In traditional on-premises environments, operations is perceived as an isolated function and distinct from the lines of the development and business teams that it supports.
To strengthen this pillar, it is necessary to learn from all the operational failures by driving continuous improvement through lessons learned from all operational events and sharing what is learned across all teams and through the entire organization; make frequent and reversible changes by designing workloads in a way to allow components to be updated regularly in order to increase the flow of beneficial changes into your workload.
Changes in small increments can be reversed if they fail to help in the resolution of issues introduced to your environment; perform operations as code by defining and updating the workload as a code, and scripting operations procedures and automating their execution by triggering them in response to events; refine operations and procedures frequently by setting up regular game days to review and validate that all the operations and procedures are effective; maintain annotated documentation by automating the creation of annotated documentation after every build which can further be used by humans and systems as a further input to the operations code; anticipate failure by performing exercises in a simulated environment to identify potential sources of failure so that they can be mitigated.
In order to achieve leading results in the industry, an organization must have a sound strategy to which its operational capabilities align, and that it must be able to execute its strategy more reliably and consistently than its competitors – this is what defines Operational Excellence. Being operationally excellent allows companies to have lower operating costs, lower operational risk and increased revenues in comparison to its competitors, thereby enabling the organization to create value for its customers and shareholders. Operational Excellence cannot be directly measured but can be easily viewed in the seven performance metrics or the seven value drivers measuring compliance, quality, yield, cost, safety, environmental and productivity.
It deals with avoiding unneeded costs and ensuring that the value received from the implementation of the best practices is always more than the costs incurred to implement it. Cost optimization can be challenging in traditional on-premises solutions because one has to predict the business needs and future capacity while navigating through complex procurement processes to ensure that the usage and costs move in line with the demand and that the costs get reduced over time by using appropriate resources to deliver the required business outcomes and maximize its return on investment.
To strengthen this pillar, it is necessary to measure overall efficiency which helps to understand the gains one can make from increasing the output and reducing the cost. AWS can do the heavy lifting of racking, stacking, and powering servers so that one can focus on the business projects and customers rather than on the IT infrastructure; use managed services on the cloud to reduce cost of ownership by removing the operational burden of maintaining servers for tasks such as managing databases and sending emails; analyze and attribute expenditure by making it easier to accurately identify the cost and usage of systems on the cloud and measure return on investment (ROI), presenting the system owners with an opportunity to optimize their resources and reduce costs, adopt a consumption model by paying only for the computing resources one consumes and changing the usage accordingly depending on the business requirements.
Cost optimization in the cloud is composed of four areas:
The Benefits of Shifting to a Cloud Computing Environment
As organizations continue to cope with the increasing complexity in business operations and disruption in the way they work owing to advanced technologies, they have to find newer ways to manage their operations to keep pace with the times and derive maximum value out of it for all of its stakeholders.
In a cloud computing environment, advanced IT resources are only a click away and it provides several benefits over a traditional computing environment. Cloud computing represents a significant shift in how technology is obtained, used and managed. It also represents a shift in how organizations plan the IT budget and pay for technology services. Cloud computing enables organizations to gain advantage from massive economies of scale, increase business agility and flexibility, trade capital expense for variable expense, go global in minutes, and avoid spending extra money running and maintaining data centers.
However, cloud adoption requires that the fundamental changes in the cloud architecture are discussed and considered across the entire organization and that the stakeholders across all the organizational unit (both within and outside IT) are informed about these changes and support these changes. Once the cloud architecture is sound, it can enable teams to implement continuous integration and delivery practices across all development stages. In other words, teams can quickly compare and evaluate practices and architectures against the set benchmarks (microservices and serverless) to determine what solution best fits the enterprise needs.
As more companies shift their strategic focus to building a top-notch customer experience, competitive benchmarking becomes all the more necessary to stay one step ahead in the industry landscape.
Enterprise architecture guides your organization’s information, process, business and technology decisions which allows it to execute its business strategy and meet the constantly-evolving customer needs. There are typically four domains of enterprise architecture:
The Internet of Things (IoT) is providing the organization with an ability to derive insights into the context that was previously invisible to the business. However, the organization needs a platform that meets the foundational principles of an IoT solution, before it can develop a strategy for IoT. AWS WA platform is providing organizations with the basic freedom to derive economic and organizational benefits of the cloud into their businesses and support virtually any cloud workload.
Published at DZone with permission of Sagar Tambe . See the original article here.
Opinions expressed by DZone contributors are their own.