These are best practices from C2B2's experience working on supporting and building large scale production middleware environments:
1. Don't get your Developers to Build your Production Configuration
Despite the term "devops" don't get the devs to build the ops environment. Now we are all in favour of the agility and mindset of devops, like continuous delivery (see our other best practices), that arises from embracing devops principles but some devs assume devops mean they deliver production infrastructure. This is usually a bad thing as most developers do not have the expertise in creating best practice middleware configurations tuned for HA, Scalability, Security and Performance. Most middleware products "out of the box" are configured for ease of development so devs can get started easily and quickly in building cool stuff. They aren't configured out of the box for production. Devs don't have the time (or often the inclination) to track feature updates in minor releases, patch sets or critical security and bug fixes. It's the job of the ops team to track all these things and define a standard operational build of your middleware infrastructure which devs can deploy to.
2. Put in place in-depth historical Middleware and JVM monitoring
Many of our new customers just don't have adequate historical monitoring of their middleware infrastructure. Sure they have Nagios or some equivalent monitoring networks, cpu, disk usage, swap etc. but they rarely have anything monitoring JVM stats, Connection pool sizes, JMS queue back logs, JVM thread usage etc... all critical things in a Java middleware environment. Key metrics should be monitored, stored and be available for triage, diagnosis and capacity planning. Even better put in place a full Application Performance Management tool to deep dive into the Java code and JDBC layers. Once in place configure appropriate alerts so action can be taken before an outage. Without these things it is likely your customers will be the first people to notice an outage in the middleware.
3. Ensure Scripted and versioned Installation, Configuration and Deployment
Always script initial creation of your middleware environments and get the developers to script their application deployments. In fact script all changes to your middleware and also ensure you have roll back scripts to reverse the changes you just made. Put those scripts into configuration management and label them and manage them as you would any software product. Once you've done this you can rapidly create new production and test environments you can force your developers to really get devops and build applications that can be deployed without hours of manual steps. It removes human error, creates repeatability and aids agility it is generally a good thing.
4. Smoke test new deployments with automated scripts
Now you've scripted the deployment. Make sure the devs give you a scripted post-deployment smoke test suite. This smoke test suite will form part of your continuous deployment methodology. The smoke tests should check the application is working in the most basic sense. Assuming a typical web style application smoke tests should check the application is accessible over HTTP; the app can talk to its database; the app can talk to any other services; can enqueue and dequeue messages and fails over correctly. Ideally get your devs to give you the smoke test scripts so you can run them straight after running your deployment scripts and before flipping the new deployment into production. We've seen many a potential operational disaster averted after a "successful deployment" through the use of smoke test scripts.
5. Get skilled third party critical service backup to your ops team
By necessity most operations teams are generalists covering many layers in a typical middleware stack, networks, databases, operating systems, web servers etc. When there's a deep problem in the Java middleware tier it helps to have third party backup. Your third party middleware experts will have seen problems from many customers and can bring this expertise to bear on your problem. They can bring deep product expertise to any triage, diagnosis and critical support situation to rapidly get a service back up and running. They can liaise with the vendor of product support to identify product bugs with test cases and get patches. Deep middleware experts spend their time developing best practices, tracking release features, bug fixes and critical patches knowledge which you can utilise to back up your operational team.