Over a million developers have joined DZone.

Drive Java for Structured Data Computing with Grid style and Agile Syntax

DZone's Guide to

Drive Java for Structured Data Computing with Grid style and Agile Syntax

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Java doesn't have any competitive advantages in data computing, in particular the massive structured data computing. For example, according to the order detail computation, we need to find out the sales persons whose sales growths are over 10% in 3 consecutive months.

Java doesn't have the related advanced function to implement this. So, it’s hard for Java to handle such computation only with its own capability. Java needs a large amount of time and effort to manually realize the details in computation. For example, firstly, define classes and represent every piece of data with objects; secondly, use List to store multi-pieces of data; thirdly, use the nested multi-level loops to compute. Except the sorting algorithm, almost all massive data processing algorithms involved in the computation require manual implementation, such as aggregating, filtering, and grouping. Such computations usually involve the set computation and relation computation among massive data, or computation on relative positions between objects and object attributes. It takes great efforts to implement the underlying logics for these computations.

That’s why we must improve the Java computational capability. We need a tool tailored for implementing the structured data computation easily!

How about SQL? Not all Java application allows for using database. In addition, there are many data in Txt/Excel, and sometimes, problems of computation across databases and code reuse may be encountered. Moreover, SQL is still not convenient for handling many computations. Taking the above-mentioned computation for example, SQL is by no means convenient to compose:


02  (SELECT salesMan,month, amount/lag(amount) 

03  OVER(PARTITION BY salesMan ORDER BY month)-1 rising_range 

04  FROM sales), 

05  B AS

06  (SELECT salesMan, 

07  CASE WHEN rising_range>=1.1 AND

08  lag(rising_range) OVER(PARTITION BY salesMan

09  ORDER BY month)>=1.1 AND

10  lag(rising_range,2) OVER(PARTITION BY salesMan

11  ORDER BY month)>=1.1 

12  THEN 1 ELSE 0 END is_three_consecutive_month 

13  FROM A) 

14 SELECT DISTINCT salesMan FROM B WHERE is_three_consecutive_month=1

In this case, esProc is the better choice.

esProc is a development tool for database computing, specializing in simplifying the complex computation and is quite convenient to integrate with Java. For esProc, the corresponding scripts are shown below:

esProc allows for the direct retrieval and computation across multiple databases, text files, and Excel sheets. Its grid style and agile syntax are especially designed for the massive structured data computation. It supports external parameters, and the result can be exported directly via JDBC. So, with esProc, the computational capability of Java is dramatically improved. In addition, by nature, esProc supports cross-database computation and the code reuse, with very perfect debugging functions. No wonder that the development productivity of esProc is also superior to that of SQL.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}