Everybody involved in Java EE production support know this job can be difficult; 7/24 pager support, multiple incidents and bug fixes to deal with on a regular basis, pressure from the client and the management team to resolve production problems as fast as possible and prevent reoccurrences. On top of your day to day work, you also have to take care of multiple application deployments driven by multiple IT delivery teams. Sounds familiar?
As hard as it can be, the reward for your hard work can be significant. You may have noticed from my past articles that I’m quite passionate about Java EE production support, root cause analysis and performance related problems. This post is all about sharing a few tips and work principles I have applied over the last 10+ years working with multiple Java EE production support teams onshore & offshore.
This article will provide you with 8 ways to improve your production support skills which may help you better enjoy your IT support job and ultimately become a Java EE production support guru.
#1 – Partner with your clients and delivery teams
My first recommendation should not be a surprise to anybody. Regardless how good you are from a technical perspective, you will be unable to succeed as a great production support leader if you fail to partner with your clients and IT delivery teams.
You have to realize that you are providing a service to your client who is the owner and master of the IT production environment. You are expected to ensure the availability of the critical Java EE production systems and address known and future problems to come. Stay away from damaging attitudes such as a false impression that you are the actual owner or getting frustrated at your client for lack of understanding of a problem etc. Your job is to get all the facts right and provide good recommendations to your clients so they can make the right decisions. Over time, a solid trust will be established between you and your client with great benefits & opportunities.
Building a strong relationship with the IT delivery team is also very important. The delivery team, which includes IT architects, project managers and technical resources, is seen as the team of experts responsible to build and enhance the Java EE production environments via their established project delivery model. Over the years, I have seen several examples of friction between these 2 actors. The support team tends to be over critical of the delivery team work due to bad experience with failed deployments, surge of production incidents etc. I have also noticed examples where the delivery team tends to lack confidence in support team capabilities again due to bad experience in the context of failed deployments or lack of proper root cause analysis or technical knowledge etc.
As a production support individual, you have to build your credibility and stay away from negative and non-professional attitude. Building credibility means hard work, proper gathering of facts, technical & root cause analysis, showing interest in learning a new solution etc. This will increase the trust with the delivery team and allow you to gain significant exposure and experience in long term. Ultimately, you will be able to work and provide consultation for both teams.
Proper balance and professionalism between these 3 actors is key for any successful IT production environment.
#2 – Every production incident is a learning opportunity
One of the great things about Java EE production support is the multiple learning opportunities you are exposed to. You may have realized that after each production outage you achieved at least one the following goals:
- You gained new technical knowledge from a new problem type
- You increased your knowledge and experience on a known situation
- You increased your visibility and trust with your operation client
- You were able to share your existing knowledge with other team members allowing them to succeed and resolve the problem
Please note that it is also normal to face negative experiences from time to time. Again, you will also grow stronger from those and learn from your mistakes.
Recurring problems, incidents or preventive work still offer you opportunities to gather more technical facts, pinpoint the root cause or come up with recommendations to develop a permanent resolution.
The bottom line is that the more incidents you are involved with, the better. It is OK if you are not comfortable yet to take an active role in the incident recovery but please ensure that you are present so you can at least gain experience and knowledge from your other more experienced team members.
#3 – Don't fear change, embrace it
One common problem I have noticed across the Java EE support teams is a fear factor around production platform changes such as project deployment, infrastructure or network level changes etc. Below are a few reasons of this common fear:
- For many support team members, application “change” is synonym of production “instability”
- Lack of understanding of the project itself or scope of changes will automatically translate as fear
- Low comfort level of executing the requested application or middleware changes
Such fear factor is often a symptom of gaps in the current release management process between the 3 main actors or production platform problems such as:
- Lack of proper knowledge transfer between the IT delivery and support teams
- Already unstable production environment prior to new project deployment
- Lack of deep technical knowledge of Java EE or middleware
Fear can be a serious obstacle for your future growth and must be deal with seriously. My recommendation to you is that regardless of the existing gaps within your organization, simply embrace the changes but combine with proper due diligence such as asking for more KT, participating in project deployment strategy and risk assessments, performing code walkthroughs etc. This will allow you to eliminate that “fear” attitude, gain experience and credibility with your IT delivery team and client. This will also give you opportunities to build recommendations for future project deployments and infrastructure related improvements.
Finally, if you feel that you are lacking technical knowledge to implement the changes, simply say it and ask for another more experienced team member to shadow your work. This approach will reduce your fear level and allow you to gain experience with minimal risk level.
#4 – Learn how to read JVM Thread Dump and monitoring tools data
I’m sure you have noticed from my past articles and case studies that I use JVM Thread Dump a lot. This is for a reason. Thread Dump analysis is one of the most important and valuable skill to acquire for any successful Java EE production support individual. I analyzed my first Thread Dump 10 years ago when troubleshooting a Weblogic 6 problem running on JDK 1.3. 10 years and hundreds of Thread Dump snapshots later, I’m still learning new problem patterns…The good part with JVM and Thread Dump is that you will always find new patterns to identity and understand.
I can guarantee you that once you acquire this knowledge (along with JVM fundamentals), not only a lot of production incidents will be easier to pinpoint but also much more fun and self-rewarding. Given how easy, fast and non-intrusive it is these days to generate a JVM Thread Dump; there is simply no excuse not to learn this key troubleshooting technique.
My other recommendation is to learn how to use existing monitoring tools and interpret the data. Java EE monitoring tools are highly valuable weapons for any production support individual involved in day to day support. Depending of the product purchased or free tools used by your IT client, they will provide you with a performance view of your Java EE applications, middleware (Weblogic, JBoss, WAS…) and the JVM itself. This historical data is also critical when performing root cause analysis following a major production outage.
Proper knowledge and understanding of the data will allow you to understand the IT platform performance, capacity and give you opportunities to work with the IT capacity planning analysis & architect team which are accountable to ensure long term stability and scalability of the IT production environment.
#5 – Learn how to write code and perform code walkthroughs
My next recommendation is to improve your coding skills. One of the most important responsibilities as part of a Java EE production support team, on top of regular bug fixes, is to act as a “gate keeper” e.g. last line of defense before the implementation of a project. This risk assessment exercise involves not only project review, test results, performance test report etc. but also code walkthroughs. Unfortunately, this review is often not performed properly, if done at all. The goal of the exercise is to identify areas for improvement and detect potential harmful code defects for the production environment such as thread safe problems, lack of IO/Socket related timeouts etc. Your capability to perform such code assessment depends of your coding skills and overall knowledge of the Java EE design patterns & anti-patterns.
Improving your coding skills can be done by following a few strategies as per below:
- Explore opportunities within your IT organization to perform delivery work
- Jump on any opportunity to review officially or unofficially existing or new project code
- Create personal Java EE development projects pertinent for your day to day work and long term career
- Join Java/Java EE Open Source projects & communities (Apache, JBoss, Spring...)
#6 – Don’t pretend that you know everything about Java, JVM & Middleware
Another common problem I noticed for many Java EE production support individuals is a skill "plateau". This is especially problematic when working on static IT production environments with few changes and hardening improvements. In this context, you get used very quickly to your day to day work, technology used and known problems. You then become very comfortable with your tasks with a false impression of seniority. Then one day, your IT organization is faced with a re-org or you have to work for a new client. At this point you are shocked and struggling to overcome the new challenges. What happened?
- You reached a skill plateau within your small Java EE application list and middleware bubble
- You failed to invest time into yourself and outside your work IT bubble
- You failed to acknowledge your lack of deeper Java, Java EE & middleware knowledge e.g. false impression of knowing everything
- You failed to keep your eyes opened and explore the rest of the IT world and Java community
My main recommendation to you is that when you feel over confident or over qualified in your current role, it is time to move on and take on new challenges. This could mean a different role within your existing support team, moving to a project delivery team for a certain time or completely switching job and / or IT client.
Constantly seeking new challenges will lead to:
- Significant increase of knowledge due to a higher diversity of technologies such as JVM vendors (HotSpot, IBM JVM, Oracle JRockit…), middleware (Weblogic, JBoss, WAS…), databases, OS, infrastructure etc.
- Significant increase of knowledge due to a higher diversity of implementations and solutions (SOA, Web development / portals, middle-tier, legacy integration, mobile development etc.)
- Increased learning opportunities due to new types of production incidents
- Increased visibility within your IT organization and Java community
- Improved client skills and contacts
- Increased resistance to work under stress e.g. learn how to use stress and adrenaline at your advantage (typical boost you can get during a severe production outage)
#7 – Share your knowledge with your team and the Java community
Sharing your Java EE skills and production support experience is a great way to improve and maintain a strong relationship with your support team members. I also encourage you to participate and share your Java EE production problems with the Java community (Blogs, forums, Open Source groups etc.) since a lot of problems are common and I’m sure people can benefit from your experience.
That being said, one approach that I follow myself and highly recommend is to schedule planned (ideally weekly) internal training sessions. The topic is typically chosen via a simple voting system and presented by different members, when possible.
A good sharing mentality will naturally lead you to more research and reading, further increasing your skills in long term.
#8 – Rise to the Challenge
At this point you have acquired a solid knowledge foundation and key troubleshooting skills. You have been involved in many production incidents with good understanding of the root cause and resolution. You understand well your IT production environment and your client is starting to request your presence directly on critical incidents. You are also spending time every week to improve your coding skills and sharing with the Java community…but are you really up to the challenge?
A true hero can be defined by an individual with great capabilities to rise to the challenge and lead the others to victory. Obviously you are not expected to save the world but you can still be the “hero of the day” by rising to the challenge and leading your support team to the resolution of critical production outages.
A true successful and recognized Java EE production support person is not necessarily the strongest technical resource but one who has learned how to properly balance his technical knowledge and client skills along with a strong capability to rise to the challenge and take the lead when faced with difficult situations.
I really hope that these tips can help you in your day to day Java EE production support. Please share your experience and tips on how to improve your Java EE production support skills.