Over a million developers have joined DZone.

Big Data in Real-Time: Challenges and Solutions

· IoT Zone

Access the survey results 'State of Industrial Internet Application Development' to learn about latest challenges, trends and opportunities with Industrial IoT, brought to you in partnership with GE Digital.

The real-time application of big data is a scenario to return the computation and analysis results in real time even if there is a huge amount of data. This is an emerging demand on database applications in recent years.

In the past, because there was not so much data and the computation was simple, with few parallelisms, the pressure on the database was not great. A high-end or middle-range database server or cluster could allocate enough resource to meet the demand. Moreover, in order to rapidly and parallelly access the current business data and the historic data, users also tended to arrange a same database server for both the query analysis system and the production system. This way, the database cost could be lowered, the data management streamlined, and the parallelism ensured to some extent.

We are now in the prime of the real-time database development.

In recent years, due to the data explosion, and the increasingly diversified and complex applications, new changes have occurred to the database system. The most obvious change is that the data is growing at an accelerated pace with ever higher volume. Applications are progressively complex, and the number of concurrent accesses makes no exception. In this time of big data, the database is under increasing pressure, posing a serious challenge to the real-time application.

The first challenge is the real-time aspect. With the heavy workload on the database, the database performance drops dramatically, the response is sluggish, and user experience goes from bad to worse very quickly. The normal operation of the critical business system has been affected seriously. The real-time application has actually become the half-real-time.

The second challenge is the cost. In order to alleviate the performance pressure, users have to upgrade the database. The database server is expensive, so are the storage media and user license agreement. Most databases require additional charges on the number of CPUs, cluster nodes, and size of storage space. Due to the constant increase of data volume and pressure on databases, such upgrades will be done at intervals.

The third challenge is the database application. The increasing pressure on databases can seriously affect the core business application. Users would have to off-load the historic data from the database. Two groups of database servers thus come into being: one group for storing the historical data, and the other group for storing the core business data. As we know, the native cross-database query ability of databases are quite weak, and the performance is very low. To deliver the latest and promptest analysis result on time, applications must perform the cross-database query on the data from both groups of databases. The application programing would be getting ever more complex.

The forth challenge is the database management. In order to deliver the latest and promptest analysis result on time, and avoid the complex and inefficient cross-database programming, most users choose to accept the management cost and difficulty increase - timely update the historic library with the latest data from the business library. The advanced edition of database will usually provide the similar subscription & distribution or data duplicate functions.

The real-time application of big data is hard to progress when beset with these four challenges.

How do you guarantee the parallelism of the big data application? How do you reduce the database cost while ensuring the real-time? How do you implement the cross-database query easily? How do you reduce the management cost and difficulty? These are some of hottest topics being discussed among the CIOs or CTOs.

esProc is a good remedy to these stubborn headaches. It is a database middleware with complete computational capability, offering the support for the computing no matter fit it's in external storage, across databases, or parallelly. The combination of database and esProc can deliver enough capability to solve the four challenges to big data applications.


esProc supports the computation over files from external storage and the HDFS. This is to say, you can store a great volume of historical data in several cheap hard disks of average PCs, and leave them to esProc to handle. By comparison, a database alone can only store and manage the current core business data. The goal of cutting cost and diverting computational load is thus achieved.

esProc supports parallel cluster computing, so that the computational pressure can be averted to several cheap node machines when there is a heavy workload and a lot of parallel and sudden access requests. Its real-time is equal or even superior to that of the high-end database.

esProc offers the complete computational capability especially for complex data computing. Even it alone can handle those applications involving complex business logics. What's even better, esProc can do a better job when working with the database. It supports the computations over data from multiple data sources, including various structural data, non-structural data, database data, local files, the big data files in the HDFS, and the distributed databases. esProc can provide a unified JDBC interface to the application at upper level. Thus the coupling difficulty between big data and traditional databases is reduced, the limitation on the single-source report is removed, and the difficulty of the real-time application is reduced.

With the seamless support for the combined computation over files stored in external storage and the database data, users no longer need the complex and expensive data synchronization technology. The database only focus on the current data and core business applications, while esProc enable users to access both the historic data in external storage and the current business data in database. By doing so, the latest and promptest analysis result can be delivered on time.

The cross-database computation and external storage computation capability of esProc can ensure the real-time query while alleviating the pressure on database. Under the assistance of esProc, the real-time application of big data can be implemented efficiently at relatively low cost.

The IoT Zone is brought to you in partnership with GE Digital.  Discover how IoT developers are using Predix to disrupt traditional industrial development models.

Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}