Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Business Code Analysis Using Hadoop and ELK

DZone's Guide to

Business Code Analysis Using Hadoop and ELK

The usage of Hadoop or another Big Data platform in order to generate insights for IT roles is a worthy alternative to consider.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Big Data is the shiny new thing in the computing world with its promise to provide a way to deal with the ever-growing data production of the twenty-first century. More Big Data enthusiasts are emerging and more companies are adopting various Big Data platforms with the desire to come up with customer-centric solutions that will help them get ahead in the highly competitive market. Although it is most common to use Big Data solutions to derive analyses that target business revenues, we, being a bunch of developers at an IT firm, had a slightly different approach.

Here’s a use case for the Hadoop ecosystem to monitor, analyze, and improve the software components of an IT firm.

The Software Infrastructure and Organization

IBTech is the IT subsidiary of QNB Finansbank, one of the largest banks in Turkey. Banking transactions are logically divided into business modules such as customer, deposits, cash management, treasury, loans, etc. They are further divided physically into entry channels such as mobile, web, internet banking, call center, ATM, branches, etc.

As a result, our in-house developed CoreBanking ecosystem consists of a main back-end cluster, in which all the transactions take place and several front end applications working on different platforms and written with various frameworks. Our system has a service-oriented architecture, meaning that all the requests from clients and all the interaction among modules in the back-end are processed in form of services.

Every request enters the server as a service call and performs several inner service calls before returning a response. All the services are dynamically called and routed by an in-memory service bus.

Our Motivation

Prior to the adoption of Hadoop, our existing monitoring tools lacked the ability to capture the custom detailed information about our system’s runtime. We had been using HP-BSM, which can monitor our service entry points and various external layers. This is a good starting point but fails when a more detailed or custom information is required for advanced analyses.

A custom implementation of a monitoring and analysis tool using traditional approaches like storing the data in a relational database or a file system and sequentially reading and analyzing the data also proves highly infeasible — especially when the system in question processes over 200 million transactions per day, each of which trigger around 20 inner service calls and 15 SQL statements.

Apart from the monitoring concerns, due to the dynamic nature of the service calls, we were unable to correctly and fully identify the possible routes and branches of service calls made from various endpoints or the module dependencies statically. Impact analysis would be done by examining the static code analysis generated by our existing tool, which required a great deal of human resources and yielded incomplete results, rendering the task practically impossible.

As the IT teams employ various roles that work with different technologies and different business domains, and indifferent to each other, it was not easy for a treasury backend developer to realize the impact of their change in a certain service to a certain screen in the mobile banking application. Nor was it clear to a call center front-end developer what caused the performance degradation when they called a certain service on the push of a button on a certain page.

All of these concerns drove us to develop a custom monitoring and analysis tool for the IT audience using Hadoop and its supplementary components.

Implementation Details

We wrote a module we named Milena, which collects all the runtime data passing through our core banking servers and sends the data to Kafka. We configured Flume agents that read these streams from Kafka and persist them to HDFS. The data of the previous day is analyzed at the end of the day by Spark batches (source codes of some are here), and the results are indexed at ElasticSearch and visualized in our Kibana dashboards. We call this entire system Insights.

Overview

Figure 1: An overview of the entire system.

Our runtime data includes identifying information about the transaction, information about the client and user, executed services, executed SQL statements and affected tables, return codes, and start and finish timestamps corresponding to every executable unit.

Our OLTP servers process about 200 million transactions, producing 3.3 TB of data on a daily basis. An example of the raw data stored is:

{
"header": {
      "aygKodu": "      ",
      "transactionState": 9,
      "entranceChannelCode": "999",
      "jvmID": "999",
      "pageInstanceId": "",
      "channelTrxName": "IssueCreditTransaction",
      "processTime": "075957",
      "processState": 9,
      "channelHostProcessCode": "0",
      "processDate": "20170328",
      "correlationId": "3e03bb4f-d175-44bc-904f-072b08116d4e",
      "channelCode": "999",
      "userCode": "XXXXXXXXX",
      "transactionID": 99999999999999999999999999,
      "processType": 0,
      "environment": "XXXX",
      "sessionId": "99999999999999999999999999",
      "clientIp": "999.999.999.999"
   },
   "services": [
  {
         "returnCode": 0,
         "channelId": "999",
         "parent": 0,
         "poms": [],
         "endTime": 1490677197467,
         "platformId": "9",
         "serviceName": "CREDIT_ISSUANCE_SERVICE",
         "startTime": 1490677197466,
         "level": 1,
         "environment": "XXXX",
         "order": 1,
         "additionalInfos": {},
         "queries": [],
         "referenceData": "CREDIT_ISSUANCE_OPERATION_PARAMETERS"
      },
  (...),
  {
         "returnCode": 0,
         "channelId": "999",
         "parent": 5,
         "poms": [],
         "endTime": 1490677197491,
         "platformId": "9",
         "serviceName": "GET_CUSTOMER_INFORMATION",
         "startTime": 1490677197491,
         "level": 6,
         "environment": "XXXX",
         "order": 18,
         "additionalInfos": {},
         "queries": [
            {
               "tables": "CUSTOMER_MAIN,CUSTOMER_EXT",
               "startTime": 1490677197491,
               "order": 1,
               "queryName": "SELECT_CUSTOMER_DATA",
               "isExecuted": true,
               "parent": 18,
               "type": 1,
               "endTime": 1490677197491
            }
         ],
         "referenceData": ""
      },
      {
         "returnCode": 0,
         "channelId": "999",
         "parent": 6,
         "poms": [],
         "endTime": 1490677197467,
         "platformId": "9",
         "serviceName": "GET_PRICING_POLICY",
         "startTime": 1490677197466,
         "level": 7,
         "environment": "XXXX",
         "order": 7,
         "additionalInfos": {},
         "queries": [],
         "referenceData": ""
      },
      {
         "returnCode": 0,
         "channelId": "999",
         "parent": 5,
         "poms": [],
         "endTime": 1490677197468,
         "platformId": "9",
         "serviceName": "CALCULATE_ISSUANCE_COMMISSIONS",
         "startTime": 1490677197466,
         "level": 6,
         "environment": "XXXX",
         "order": 6,
         "additionalInfos": {},
         "queries": [],
         "referenceData": ""
      },
      (...),
      {
         "returnCode": 0,
         "channelId": "999",
         "parent": 18,
         "poms": [],
         "endTime": 1490677197491,
         "platformId": "9",
         "serviceName": "CREDIT_ISSUANCE_DOCUMENT_SERVICE",
         "startTime": 1490677197491,
         "level": 9,
         "environment": "XXXX",
         "order": 19,
         "additionalInfos": {},
         "queries": [
            {
               "tables": "ISSUANCE_MAIN,CUSTOMER_MAIN",
               "startTime": 1490677197491,
               "order": 1,
               "queryName": "SELECT_CUSTOMER_ISSUANCE_INFORMATION",
               "isExecuted": true,
               "parent": 19,
               "type": 1,
               "endTime": 1490677197491
            }
         ],
         "referenceData": ""
      },
      (...)
   ]
}

We have various custom dashboards aimed to provide insights regarding the general performance of our core banking application, inter-module and intra-module dependencies, and usage statistics. We can effectively visualize the correlation between separate services. This enables us to foresee the impacts of a change or outage in a service on other services and applications. Here are some examples from some of the dashboards, with a hypothetical scenario involving multiple parties.

Suppose an IT monitoring specialist is set out to decrease the overall response time of the core banking system. They begin by examining the General-TOP dashboard of insights to see the most time-consuming transactions. They quickly observe that the credit issuance transaction, IssueCreditTransaction, of the mobile banking application lasts exceptionally long.

General-TOP

Figure 2: A general-TOP dashboard, in which the IssueCreditTransaction is observed to account for 16% of the total time elapsed in core banking servers.

They inform the owner of the transaction, the front-end development team of mobile banking, about the discovery. A front-end developer from the team goes on to analyze this particular transaction in terms of its dependencies.

They find out that 60% of the time spent in this transaction takes place in the CREDIT_ISSUANCE_SERVICE, which is dominated by the CREDIT_ISSUANCE_DOCUMENT_SERVICE. Hence, they easily pinpoint the low performance to the CREDIT_ISSUANCE_DOCUMENT_SERVICE:

Dependency

Figure 3: The dependency tree of the transaction in the above JSON. The left-most diagram shows the call counts of the inner services in this transaction. The diagram in the middle shows the total time elapsed in the inner services, the largest slices being the main burdens on the response time of this transaction. The right-most diagram shows the tree structure of this transaction, with all the possible paths from the root to leaves illustrated.

The mobile front-end development team asks the owners of the CREDIT_ISSUANCE_DOCUMENT_SERVICE, loans back-end development team to investigate the issue. A back-end developer starts examining the said service by searching it in the Service Performance dashboard. The analysis unveils two separate issues, a great deal of time is spent in the code, and the SELECT_CUSTOMER_ISSUANCE_INFORMATION query is costly.

Service

Figure 4: The detailed service analysis of CREDIT_ISSUANCE_DOCUMENT_SERVICE in the example JSON data. In the upper part, you can see the average durations of this particular service including or excluding the times elapsed in the inner services it calls. In the bottom right diagram, the layers in which this service passes time are shown.

For the performance problem of the code, the backend developer makes some changes for optimization, and requests for a test before deploying the code to production. Testers, given the changed service, try to list the test cases. They look through the call tree analysis to see which entry points lead to this service being called, through which path(s), and proceed to test these scenarios.

Service Call Tree

Figure 5: The call tree analysis of CREDIT_ISSUANCE_DOCUMENT_SERVICE. The diagram in the left illustrates all the possible paths leading to this service, including the one in the example JSON. The diagram on the right shows all the possible entry points and channels from which this service is reached.

For the performance problem of the query, a data architect reviews the SQL stamement, and reach the conclusion that a new index should be created on the table ISSUANCE_MAIN. They request the index creation for the database administrators to approve. Upon receiving the request to create a new index for the ISSUANCE_MAIN table, DBAs realize that the index creation would be costly, and likely to cause the table to be locked for a while.

They inform the process management specialists about the transactions that will be affected by the service blockage during the index creation, and plan the operation accordingly, by observing the call tree analysis of the table.

Table Call Tree

Figure 6: The call tree analysis of ISSUANCE_MAIN table.

Results

The first phase of the system was launched in production in October 2016 and continued to expand and improve upon customer feedback and demands. Roughly around January 2017, it had matured to its final state and has been in frequent use since then. Now that the project is live and stable, it has affected the project lifecycle considerably. The outcome appeals to a variety of roles from different backgrounds.

Business analysts and application architects are now reviving the possible effects of a proposed change in the discovery phase and planning the change accordingly. Developers and testers are reviewing the affected components of a change, and making sure of the smooth integration with necessary development and testing.

The number of production errors resulting from poor or lack of impact analysis was reduced drastically after the project was put into use. Process management specialists are reviewing the results to see which specific transactions would be affected by the planned or unplanned outage of a certain feature and take measures accordingly.

Conclusion

It is vital for a modern technology company to explore Big Data options to analyze data with concrete business revenues in mind. However, it should not be neglected that once manual and trivial workloads are eliminated, the highly competent IT staff would be able to focus on more innovative aspects of their jobs.

In addition to this, the software infrastructure, being the gist of a business’s whole system at the lowest level, is the key component that allows or disallows the agility, performance, flexibility and competence of the system. That’s why improvements that are done right and to the point in the software infrastructure have the potential to yield greater revenues, which is possible by effective monitoring and analysis. Usage of Hadoop or another Big Data platform to generate insights for IT roles is a worthy alternative to consider.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
hadoop ,spark ,elk ,big data ,big data analytics

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}