Software performance is one of the biggest challenges today while building any web or mobile application. In the digital world today, every user wants a flow of information at a fast pace and companies are focused on building a scalable and optimized software solution. How fast any software (web or mobile) responds to a user is one of the main criteria to analyze the success of that software in the market. Therefore, building a scalable application is one of the demanding skills today. The performance of software depends on the architecture of it and what technology is used in designing and implementation; however, improving the performance of already developed large-scale applications can be a complex task.
This article will consolidate the top three quick parameters that need to be checked while doing performance testing.
1. Server: Memory Utilization (Linux)
Start with determining the CPU utilization of the operating system on which application servers are deployed. oVmstat is one of the powerful commands to check the IO, CPU, and memory utilization run time.
01: Value will be refreshed at every second.
5: Values will be captured 5 times.
r: Number of process waiting at run time (task placed in the run queue).
b: Number of process uninterruptible sleep (task waiting).
us: User time (user actually working).
sy: System time (kernel time).
id: Idle time.
wa: Time spent waiting for the IO.
To identify the issue first check CPU, check if sy+us+wa= 100%. If so, then there is high CPU utilization and CPU is running at its maximum capacity with full potential. But this is not a bottleneck until this is combined with a high run queue.
To check the run queue, see whether its value (i.e. r column) is greater than the number of CPU on the server. If so, then there is CPU resource problem. The r column represents the number of processes running and are in that run queue. If your server has 32 CPU and your run queue value is 36, that means CPU is overloaded.
Now for a solution for high memory utilization. Check the processes running with the “top” command. Identify the processes consuming maximum memory and balance the load on the server. For example, execute heavy processes at off-peak hours or check if program can be broken down into small processes to run independently.
The database is one of the main problems for slowing down the performance of any application because the queries are not tuned properly or the database parameters are not set correctly. It also depends on how your application is structured. Poorly written code can degrade the performance of an application, so along with developers, DBAs should also have knowledge of the architecture of the application and know how the queries manipulate data. In order to find a problem with the database, you needed to capture the AWR report, which is the very first step toward performance monitoring. There are a few catch points in AWR that can help you identify the problem instantly.
DB Time and CPU Time
DB time is the time consumed by the process in doing database operations such as insert, delete, update, and create. CPU time is the total time consumed by that process including database operations and source code operations. If DB time is greater than CPU time, the program is doing database operations and you have to investigate further in database profiling. If DB time is less than CPU time, there are only a few database operations in the program, which are probably consumed in code processing. Investigate further with Java code profiling for that program.
There are various wait events that slow down the program. But here, I will elaborate on few top events with a high probability of slowing down the application.
Log File Sync
If the top event is log file sync and consuming maximum time, then there is a lot of activity for redo logs. The database is doing frequent commits but database writer (DBWR) is waiting for redo log files to synchronize with the database and is not able to complete write fast enough. There is high CPU utilization.
Solution: If there is high CPU utilization on the database server, then check the database logs (trace file) for any redo log size errors. If they are found, then increase the redo log size or identify the number of commits in the application. Remove some unwanted and frequent commits from source code.
DB File Scattered Read
If this is the top wait event, then there are large full table scans.
Solution: For OLTP, identify full table scans and tune those queries to optimize the performance of the database.
DB File Sequential Read
If this is one of the top events, there are a lot of join operations in queries using large data sets.
Solution: Tune the queries and optimize their plans with a large number of inefficient joins, or check the indexes created on these join queries and remove non-selective indexes.
GC Buffer Busy Wait
This event is one of the top problems in RAC. GC buffer busy acquire occurs when local instances send requests to the global cache for data. GC buffer busy release occurs when any remote instance opens the request for data to be searched in the cache. These wait events mean that the application is running at full capacity and multiple queries are using the same data set to either update or insert that data.
Solution: Drill down more to identify the queries that are waiting. Use v$session view for module “BufferBusy” to identify the exact wait event and then further drill down to identify the queries that are working on same data set.
ITL (Interested Transaction List) means that the transaction is waiting for a lock that is reserved for another session. Every table has a transaction block that is reserved for DML functions. When too many concurrent queries are trying to update the same data block, this wait occurs.
Solution: Check the query waiting and increase the INITRANS/MAXTRANS for the table and its indexes.
Time Model Statistics
This section shows the statistics of various events (SQL execute elapsed, hard parse, parse time etc.), their time, and % of DB time consumed. Check the top three statistics names consuming the maximum % of DB time. Each stat has its own solution that I will explain in another section. For now, the top stat that I faced in my career is SQL execution elapsed time that is consuming 95% and above, which means that there are many queries with a high-elapsed time.
Solution: Check SQL statistics section and identify queries with high elapsed time and CPU time. Then check the run time plan of those queries and tune them.
Check for SQL ordered by elapsed time. Identify queries whose execution count is less but taking more elapsed time, which means that query is either reading a large amount of data or waiting for another query, or there is a deadlock (check trace logs). Check for SQL ordered by CPU, which is similar to elapsed time identify queries consuming maximum CPU and has very less execution count.
Solution: Check the plan of such queries and then drill down further for any deadlocks in database trace files.
This will show hot segments. Segments by physical reads read data directly from disk, i.e. queries did not find data in the buffer cache so it searched data in the disk. This can degrade the performance of the database. If the top wait event shows high buffer busy wait, then check the segments in this section to see what is causing that wait.
Solution: Identify the segments and check if the queries are using bind variables. If all the queries are using bind variables then drill down further with code profiling the functionality of the program, which is executing so many queries that are reading directly from the hard disk.
3. Source Code (Java)
Once you check the processes consuming maximum CPU and identify the queries, it's time to profile some Java code. There can be n number of issues with each application for slow performance and it also depends on the architecture of that application. I have listed below the top two Java code functions that are the major issues for performance degradation that I've faced during my career.
Use stringbuilder instead of the + operator. Large-scale Java applications perform a huge amount of string appends and not used correctly, then it can degrade the performance of your application. I have noticed n number of times while profiling Java code that the string operation is among top five functions consuming maximum time.
Solution: With the latest versions of Java, it automatically converts + operations to stringbuilder or the stringbuffer class. However, if you are developing your programs on old applications and facing performance issues, this can be one of the problems to watch out for.
Too much log printing can be the main cause of performance degradation for large-scale Java applications. While profiling Java code, if you find log.debug function visible on top wait events, then log printing is the main issue of slow performance. Developers usually use log.debug to debug their code and they forget to remove those logging before moving the code to production. It takes high CPU to read each and every function and log that into files.
Solution: Use log.debug in the IF loop only if required. Use log.error, info, and warning to print logs. Check log4j setting (if using log4j) and change the logger to info, i.e. log4j.rootcategory=INFO.