Lessons From The Trenches: Cloud Modern Engineering
In one of the large programs that I was handling recently, I had the overall responsibility of owning 82 critical applications catering to important business functions.
Join the DZone community and get the full member experience.Join For Free
In one of the large programs that I was handling very recently, I had the overall responsibility of owning 82 critical applications catering to the most important business functions of a very large global organization and an industry leader. I consider it an honor for getting an opportunity like this in my career, to be at the driver’s seat for a truly digital transformation program like this one.
I respect the confidence of my esteemed partners for their faith in me and without going into the finer details I would like to share some of the learning I/We had in this amazing program.
These are the "Lessons from the Trenches" and not from the ivory towers of governance, program management and oversight.
We Had It All
The 82 enterprise apps were categorized into RUN and GROW apps in almost a 60:40 ratio.RUN apps needed high-end maintenance, hotfixes, and 100% uptime, and GROW apps needed next-generation development and modernization.
We had new PaaS Development on one side, Server de-commissioning and DC Exit strategies, 3rd Party SaaS Integration (HCM and Payroll), VAs, Power Bi, MI Management... you name it!
Cloud Modern Engineering
After multiple discussions with the senior technical managers of the organization, we shortlisted a set of 9 most important parameters against which the applications will be tracked and measured regularly. These were the modern engineering parameters for cloud applications.
Several aspects were considered before finalizing these parameters like — Application Architecture, User Metrics including Users with Visual, Physical, and Cognitive Disabilities, Business Process Criticality, Size of the application, Infrastructure needs, Longevity, and many other aspects.
I will try to touch upon some basic high-level aspects of these parameters below.
1. IaaS Migration
While Cloud migration had already started in this group, there were still many applications that were running on-premises. So, the first important parameter for us to drive and track was the IaaS Migration of the on-premises applications. We aimed at Lift-n-Shift to immediately gain the advantages of the Cloud. This also gave us an opportunity for “app rationalization”, where we decided to sunset a few applications by adding additional features and functionality to other larger applications. We did not see any benefit of moving a few applications into the cloud for multiple reasons. IaaS Migration was the first important modern engineering parameter.
2. Get Current and Stay Current
Getting Current and Staying Current is one of the important aspects of ensuring the safety, security, and overall health of a hybrid ecosystem. This is more important when we are dealing with financial applications, PII data, or Health Records.
- Get Current — If the application was running on any of the older versions of the servers then getting them updating them with the latest version.
- Staying Current — Ensuring that the versions running are always the latest and greatest.
We had to ensure that we Get Current and Stay Current across 1. App Severs 2. Database Servers 3. OS Servers.
Adhering to this parameter could be quite challenging as it would need code re-write, build, and deploy. But we don’t have a choice for an application that is critical.
Accessibility was one of the most important parameters that we had shortlisted. One of the things we had to do was to ensure that all the applications which had a UI/UX interface should adhere to some important accessibility standards. We had to ensure that the web pages and websites are accessible to all, especially to users with disabilities. We had to ensure compliance with prescribed web accessibility standards (WCAG2.0, Sec 508, DDA).
This required a good amount of code changes for the UI layer. This also required a BI weekly cadence with the accessibility experts to get the application certified to meet the accessibility standards. This was very unusual and unforgettable learning in my life where every 2 weeks I would present the applications to a large team of accessibility and UX experts and users with special abilities and get their opinion, feedback, and sign-off.
I cannot forget the day when after one such meeting, one of my friend with 98% visual impairment used a mobile app to call a cab for both of us
I cannot forget the day when after one such meeting, one of my friends with 98% visual impairment used a mobile app to call a cab for both of us and she led me to the exact pickup point.
4. GIT Adoption and Build Automation
Adopting a standard configuration control process is one of the important things that we tracked in this program. We had to ensure a standard way of configuration control and code management across the applications. Some of the older and shadow applications did not have a strong process and the code was lying on random servers, file shares, etc. A standardization was very much needed, and we tracked this extensively across the applications.
We also had to drive Build Automation as we had a good no of applications where the build process was not automated. Build automation also was essential for driving DevOps adoption across the program.
With a broader aim to move to a DevOps model across the portfolio GIT Adoption and Build Automation was rather very important.
5. Agile Adoption
Yes — there were a few programs where the Agile methodology of development was not followed. This is typical of any large enterprise. There were a good number of application teams that were still using a sort of a customized waterfall model and an Agile model was not implemented.
We had to identify the programs, depending upon a few factors, where immediately an agile methodology was to be implemented. Tracking of Agile methodology implementation also became one of the key modern engineering parameters for this program.
As this initiative was completely sponsored and driven by the senior-most management, we did not face any problems or opposition in implementing Agile across the program.
Implementing telemetry and monitoring for cloud applications in an enterprise is a very important step. Several times this is neglected, during the initial stages of the project only to be realized later. Having a strategy for telemetry implementation for all new programs was mandated and was clearly one of the important gating criteria.
For older programs on the cloud that were already in production, we had to change the strategy. We made Azure Application Insights and Azure Monitor implementation a basic starting step.
Telemetry needs vary by application and the level of telemetry needed, and we cannot follow a one size fits all approach.
Telemetry needs vary by application and the level of telemetry needed, and we cannot follow a one size fits all approach. However, there needs to be a starting point. Azure Application Insights and Azure Monitor were the starting points for apps in production.
7. Application Rationalization
In a large enterprise transformation program, we face a situation where we need to ask the question of whether an app is needed or not after a couple of years i.e. the longevity of the app. Often the business teams don't see a value for the app after 2/5 years.
In many cases, some of the smaller apps are already merged with larger applications. What I mean is the core business functionality of the smaller app is already available in some larger applications so maintaining this smaller app individually is not cost-effective for the organization.
In such a situation we go for an application rationalization strategy where either the smaller application is put on a sunset path or it is merged functionally with the larger application, we definitely need a dedicated engineering effort for doing that.
Retain / Merge / Sunset
For this program, we did exactly that. Every application in the portfolio went through scrutiny of whether it is important after two 2/5/10 years. Also, we analyzed if the core business functionality of this application can be merged or already available within a larger application inside the organization. If the core functionality is already available or if it can be quickly implemented, then definitely this app is a candidate for a merger.
So, we devised a three-stage strategy for each application — Retain / Merge / Sunset.
8. Server Optimization
We had a large number of servers in this program — On-Premises, Cloud VMs, etc. and there were many instances where VMs were created recklessly both in VM size and numbers. The management wanted a dedicated focused effort for server optimization and wanted to see the results quickly.
A dedicated team was formed for this consisting of infrastructure and application experts to focus on server and infra optimization strategies and drive it completely.
I was not a part of this group however I had to closely interact with this team to provide several inputs about the applications and the server from an architectural standpoint. The while the team had an awesome program manager and loved working with him and learn. The main elements of this strategy the team followed were:
- Containerization if applicable.
- Virtual Disks Optimization.
- VM right-sizing.
- Move SQL Server to wither IaaS (SQL on VM) or to SQL Azure MI or Elastic Pool wherever possible.
- Storage Tiering.
- Performance Monitoring on critical apps.
- Auto Snoozing, VM’s Start/Stop Schedules.
9. ARM Upgrades
Those of you who have worked in the initial days of Azure understand that earlier we had to use the Classic Azure Portal and not the modern Azure portal that we are so much used to nowadays. Around 2008-9 many of us were using the Azure classic portal for all the Azure needs and we do not have the JSON template then. We had the ASM (Azure Service Manager) templates and not the ARM(Azure Resource Manager) templates for IAC and other needs.
There are huge advantages of using ARM over ASM and I will not go into those details. Bottom-line is having cloud resources created in Azure Classic Portal using ASM is a pain.
Some applications in the portfolio were implemented on the first versions of Azure. They were created in the Azure Classic Portal and so they were dependent on ASM.
It was a very important step to migrate or in some cases recreate the whole template in the modern ARM template in JSON. We had to track this exercise both at an App and the Server level on a weekly basis.
Finally, some of the parameters that I have described above are very specific to this particular program and they definitely are not applicable to all programs — especially if we are dealing with a green-field cloud-native or a PaaS development program. However, in programs where were are dealing with a combination of Legacy and Modern Applications I think most of these parameters are relevant.
I had the opportunity of working with some amazing Engineering and Technology Leaders in this program, which also led to a life-long friendship. Some of the PMs and tech leads reporting to me in this program are now entrepreneurs and great tech leaders.
Opinions expressed by DZone contributors are their own.