DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • MongoDB to Couchbase: An Introduction to Developers and Experts
  • Anatomy of a PostgreSQL Query Plan
  • 5 Key Postgres Advantages Over MySQL
  • Basic CRUD Operations Using Hasura GraphQL With Distributed SQL on GKE

Trending

  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Optimizing Integration Workflows With Spark Structured Streaming and Cloud Services
  • GitHub Copilot's New AI Coding Agent Saves Developers Time – And Requires Their Oversight
  • Start Coding With Google Cloud Workstations
  1. DZone
  2. Data Engineering
  3. Databases
  4. Open-Source SPL That Can Execute SQL Without RDB

Open-Source SPL That Can Execute SQL Without RDB

There is no RDB in many scenarios, such as encountering CSV/RESTful JSON /MongoDB and other data sources, or performing mixed calculations between these data sources.

By 
Jerry Zhang user avatar
Jerry Zhang
DZone Core CORE ·
Jan. 19, 22 · Opinion
Likes (4)
Comment
Save
Tweet
Share
5.0K Views

Join the DZone community and get the full member experience.

Join For Free

SQL syntax is close to natural language, with a low learning threshold and the bonus of first-mover advantage, it soon became popular between database manufacturers and users. After years of development, SQL has become the most widely used and most mature structured data computing language.

However, SQL must work based on RDB, and there is no RDB in many scenarios, such as encountering CSV \ restful JSON \ MongoDB and other data sources or performing mixed calculations between these data sources, such as CSV and XLS. In these scenarios, many people will choose to hard code algorithms in high-level languages such as Java or C#, etc., which requires writing lengthy underlying functions from scratch, and the execution efficiency is difficult to guarantee. 

It is easy to accumulate the "code shit mountain" that everyone hates. Some people load data into the database and then use SQL for calculation, but the loading process is very cumbersome and the real-time performance is also very poor. Sometimes you have to turn to ETL tools. The framework is aggravated, the risk is increased, and it is doubly troublesome to do mixed calculations.

Now, here comes esProc SPL, and these problems can be easily solved.

SPL is an open-source computing technology, which fully covers the computing power of SQL and supports a wide variety of data sources. SQL can now be used for structured data computing without RDB.

Perfect SQL Computing Power

SPL provides a syntax equivalent to the SQL92 standard and can perform rich and diverse data calculations, including filtering, calculating fields, selecting some columns, renaming, etc. You can directly execute SQL by using files such as text and XLS as data tables. Let's take the CSV file as the data source as an example:

1.  Filtering

Basic comparison operation:

$select * from d:/Orders.csv where Amount>=100


Like:

$select * from d:/Orders.csv where Client like '%bro%'


Null value judgment: 

$select * from d:/Orders.csv where Client is null


Logical operators such as and, or and not can combine comparison operations to realize combined filtering:

$select * from d:/Orders.csv where not Amount>=100 and Client like 'bro' or OrderDate is null


In:

$select * from d:/Orders.csv where Client in ('TAS','KBRO','PNS')


Multi-layer parentheses:

$select * from d:/Orders.csv where (OrderDate<date('2020-01-01') and Amount<=100) 
or (OrderDate>=date('2020-12-31') and Amount>100)


2.  Calculating Columns

 SPL has rich mathematical functions, string functions, and date functions:

$select round(Amount,2), price*quantity from d:/Orders.csv
$select left(Client,4) from d:/Orders.csv
$select year(OrderDate) from d:/Orders.csv


Case when:

$select case year(OrderDate) when 2021 then 'this year' when 2020 then 'last year' 
 else 'previous years' end from d:/Orders.csv


Coalesce:

$select coalesce(Client,'unknown') from d:/Orders.csv


3.  SELECT

$select OrderId, Amount, OrderDate from d:/Orders.csv


4.  ORDER BY

$select * from d:/Orders.csv order by Client, Amount desc


5.  DISTINCT

$select distinct Client ,Sellerid from d:/Orders.csv


6.  GROUP BY … HAVING

$select year(OrderDate),Client ,sum(Amount),count(1) from d:/Orders.csv 
group by year(OrderDate),Client 
having sum(Amount)<=100


Aggregation functions include sum, count, avg, max, and min. Aggregation can be directly done without grouping:

$select avg(Amount) from d:/Orders.csv


7.  JOIN

 Left join:

$select o.OrderId,o.Client,e.Name e.Dept,e.EId from d:/Orders.txt o 
left join d:/Employees.txt e on o.SellerId=e.Eid


Right join:

$select o.OrderId,o.Client,e.Name e.Dept,e.EId from d:/Employees.txt e 
right join d:/Orders.txt o on o.SellerId=e.Eid


Full join:

$select o.OrderId,o.Client,e.Name e.Dept,e.EId from d:/Employees.txt e 
full join d:/Orders.txt o on o.SellerId=e.EId


Inner join: 

$select o.OrderId,o.Client,e.Name e.Dept from d:/Orders.csv o 
inner join d:/Employees.csv e on o.SellerId=e.Eid


Inner join can also be written in the form of where:

$select o.OrderId,o.Client,e.Name e.Dept from d:/Orders.csv o ,d:/Employees.csv e 
where  o.SellerId=e.Eid

 

8.  Subquery

$select t.Client, t.s, ct.Name, ct.address from 
(select Client ,sum(amount) s from d:/Orders.csv group by Client) t 
left join ClientTable ct on t.Client=ct.Client


With:

$with t as (select Client ,sum(amount) s from d:/Orders.csv group by Client)
select t.Client, t.s, ct.Name, ct.address from t 
left join ClientTable ct on t.Client=ct.Client


Subquery within in:

$select * from d:/Orders.txt o  where o.sellerid in (select eid from d:/Employees.txt)


9.  AS

Use the as a keyword to rename fields, calculated columns, physical tables, and subqueries:

$select price*quantity as subtotal from d:/detail.csv


10.  Set Operations

Including union, union all, intersect, minus. Here is an example: 

Select * from Orders1.csv
Union all
Select * from Orders2.csv


11. into 

The query results can be written to the file with the into keyword: 

$select dept,count(1) c,sum(salary) s into deptResult.xlsx from employee.txt 
 group by dept having s>100000

 

Rich Data Sources Support

SPL supports various non-database data sources, including text in various non-standard formats. CSV has been shown in the previous examples. Tab-separated TXT can also be supported, and SPL will process it automatically according to the extension:

$select * from d:/Orders.txt where Amount>=100 and Client like 'bro' or OrderDate is null 


If the separator is not a comma or tab, you need to use the SPL extension function. For example, the separator is a colon:

$select * from {file("d:/Orders.txt").import@t (;":")} 
where Amount>=100 and Client like 'bro' or OrderDate is null


For files without title lines, column names can be represented by serial numbers:

$select * from {file("d:/Orders.txt").import()} where _4>=100 and _2 like 'bro' or _5 is null


Some strings in special formats should also be parsed with extension functions. For example, the date format is not standard yyyy-MM-dd:

$select year(OrderDate),sum(Amount) from 
{file("d:/Orders.txt").import@t(orderid,client,sellerid,amount,orderdate "dd-MM-yyyy")}
group by year(OrderDate)


SQL can also be executed on Excel files. For Excel with a standard format, you only need to directly reference the file name:

$select * from d:/Orders.xlsx where Amount>=100 and Client like 'bro' or OrderDate is null


You can also read the specified sheet: 

$select * from {file("D:/Orders.xlsx").xlsimport@t (;"sheet3")} 
where Amount>=100 and Client like 'bro' or OrderDate is null


CSV / XLS file downloaded from the remote website:

$select * from {httpfile("http://127.0.0.1:6868/Orders.csv).import@tc() } 
where Amount>=100 and Client like 'bro' or OrderDate is null
 


The HTTP protocol has many features, such as character set, port number, post parameter, header parameter, login authentication, etc. SPL extension functions can support all of them. The extension function can also grab the table data on the web page and support downloading files from the FTP server, which will not be elaborated here.

The JSON file will be read as a string before parsing:

$select * from {json(file("d:\\data.json").read())} 
where Amount>=100 and Client like 'bro' or OrderDate is null


There are few two-dimensional JSON, and multi-layer is the norm. The SPL extension function can convert multi-layer data into two-dimensional records and then calculate them with SQL. The details will not be explained here.

Restful JSON

$select * from {json(httpfile("http://127.0.0.1:6868/api/getData").read())}
where Amount>=100 and Client like 'bro' or OrderDate is null


If there are many and long extension functions, they can be written in step-by-step form:                 


A
1 =httpfile("http://127.0.0.1:6868/api/getData")
2 =A1.read()
3 =json(A2)
4 $select * from {A3} where Amount>=100 and Client like 'bro' or OrderDate is null

Similar to CSV / XLS, SPL can also read JSON / XML files on HTTP websites.

XML

$select * from {xml(file("d:/data.xml").read(),"xml/row")} where Amount&gt;=100 
 and Client like 'bro' or OrderDate is null

 Web Service

$select * from {ws_call(ws_client("http://.../entityWS.asmx?wsdl"),
 "entityWS ":" entityWSSoap":"getData")} where Amount>=100 and Client like'bro' or OrderDate is null

 

SPL can also support NoSQL. 

MongoDB

$select * from
{mongo_shell@x (mongo_open("mongodb://127.0.0.1:27017/mongo"),"main.find()")}
where Amount>=100 and Client like 'bro' or OrderDate is null

There are often multi-layer data in MongoDB, including restful and web service, and they all can be converted into two-dimensional data with SPL extension functions. 

 Salesforce

$select * from {sf_query(sf_open(),"/services/data/v51.0/query","Select Id,CaseNumber,
 Subject From Case where Status='New'")} where Amount>=100 and Client like 'bro' or OrderDate is null

Hadoop HDFS  csv/xls/json/xml:                                    


A
1 =hdfs_open(;"hdfs://192.168.0.8:9000")
2 =hdfs_file(A1,"/user/Orders.csv":"GBK")
3 =A2.import@t()
4 =hdfs_close(A1)
5 $select Client,sum(Amount) from {A3} group by Client

HBase:                                  


A
1 =hbase_open("hdfs://192.168.0.8",   "192.168.0.8")
2 =hbase_scan(A1,"Orders")
3 =hbase_close(A1)
4 $select Client,sum(Amount) from {A2} group by Client

HBase also has access methods such as filter and CMP, which can be supported by SPL.

Hive has a public JDBC interface, but its performance is poor. SPL provides a high-performance interface:                             


A
1 =hive_client("hdfs://192.168.0.8:9000","thrift://192.168.0.8:9083","hive","asus")
2 =hive_query(A1, "select * from table")
3 =hive_close()
4 $select Client,sum(Amount) from {A2} group by Client

Spark                                


A
1 =spark_client("hdfs://192.168.0.8:9000","thrift://192.168.0.8:9083","aa")
2 =spark_query(A1,"select   * from tablename")
3 =spark_close(A1)
4 $select Client,sum(Amount) from {A2} group by Client

 Alibaba cloud


A
1 =ali_open("http://test.ots.aliyuncs.com","LTAIXZNG5zzSPHTQ","sa","test")
2 =ali_query@x(A1,"test",["id1","id2"],[1,"10001"]:[10,"70001"],   ["id1","id2","f1","f2"],f1>=2000.0)
3 $select Client,sum(Amount) from {A2} group by Client

Cassandra        


A
1 =stax_open("127.0.0.1":9042,"mycasdb","cassandra":"cassandra")
2 =stax_query(A1,"select * from user where id=?",1)
3 =stax_close(A1)
4 $select Client,sum(Amount) from {A2} group by Client

ElasticSearch                                 


A
1 =es_open("localhost:9200","user":"un1234")
2 =es_get(A1,"/person/_mget","{\"ids\":[\"1\",\"2\",\"5\"]}")
3 =es_close(A1)
4 $select Client,sum(Amount) from {A2} group by Client

Redis                            


A
1 =redis_open()
2 =redis_hscan(A1, "runoobkey", "v*", 3)
3 =redis_close (A1)
4 $select key,value from {A2} where value>=2000  and value<3000

SAP BW                                


A
1 =sap_open("userName","passWord","192.168.0.188","00","000",”E")
2 =sap_cursor(A1, "Z_TEST1","IT_ROOM").fetch()
3 =sap_close(A1)
4 $select * from {A2} where Vendor like '%software%'

 InfluxDB                                  


A
1 =influx_open("http://127.0.0.1:8086", "mydb", "autogen", "admin", "admin")
2 =influx_query(A1, "SELECT * FROM Orders")
3 =influx_close(A1)
4 $select Client,sum(Amount) from {A2} group by Client

 Kafka                                  


A
1 =kafka_open("D://kafka.properties";"topic-test")
2 =kafka_poll(A1)
3 =kafka_close (A1)
4 $select Client,sum(Amount) from {A2} group by Client

 MDX multidimensional database                                  


A
1 =olap_open("http://192.168.0.178:8088/msmdpump.dll","CubeTest","Administrator","admin")
2 =olap_query(A1,"with member [Measures].[AnnualInterestRate] as'[Measures].[SalesAmount]/[Measures].[StandardCost]-1'select {[Measures].[SalesAmount],[Measures].[StandardCost],[Measures].[AnnualInterestRate]} on columns, {[Order Date].[Calendar Year].[Calendar Year]} on rows from [DataSourceMulti]")
3 =olap_close(A1)
4 $select * from {A2} where SalesAmount>10000


In addition to supporting a wide variety of data sources, SPL can also perform mixed calculations between data sources. For example, between CSV and RDB:

$select o.OrderId,o.Client,e.Name e.Dept from d:/Orders.csv o inner join d:/Employees.xls e on o.SellerId=e.Eid

Between MongoDB and database:                     


A B
1 =mongo_open("mongodb://127.0.0.1:27017/mongo")
2 =mongo_shell@x(A1,"detail.find()").fetch() =connect("orcl").query@x("select * from main")
3 $select d.title, m.path,sum(d.amount) from {A2} as d left join {B2} as m on d.cat=m.cat group by d.title, m.path

Mixed calculations can be performed between any data sources, and the SQL syntax is not affected by the data sources.

Deeper Computing Power 

In fact, the original meaning of SPL is Structure Process Language, which is a language specially used for structured data processing. In the previous examples, some of the syntaxes of SPL itself (those extension functions) have been shown. SQL is just a function provided by SPL, and SPL itself has more powerful and convenient computing power than SQL. Some calculation logic is complex, and it is difficult to code in SQL or even stored procedure, while SPL can complete the calculation with simpler code. 

For example, here is a task: calculate the longest consecutive rising days of stock. SQL uses multi-layer nested subqueries and window functions, and the code is lengthy and difficult to understand: 

select max(continuousDays)-1
from (select count(*) continuousDays
    from (select sum(changeSign) over(order by tradeDate) unRiseDays
        from (select tradeDate,
            case when price>lag(price) over(order by tradeDate) then 0 else 1 end changeSign
            from AAPL) )
        group by unRiseDays)

While SPL only needs two lines:                              


A B
1 =T("d:/AAPL.xlsx") Read Excel file, the first line is title
2 =a=0,A1.max(a=if(price>price[-1],a+1,0)) Get the max continuous rising days

For simple calculations, using basic SQL is very convenient, but when the calculation requirements become complex, SQL is not applicable. Even if more functions (such as window functions) are provided, the calculation cannot be simplified. In this case, we recommend that users directly use SPL with concise code instead of writing multi-layer nested complex SQL. For this reason, SQL in SPL only supports the SQL92 standard and does not provide more syntax including window functions. 

SQL does not advocate multi-step calculation. It is used to write a calculation task in a large statement, which will increase the difficulty of the task. SPL naturally supports multi-step calculation, and can easily split complex large calculation tasks into simple small tasks, which will greatly reduce the difficulty of coding. For example, find out the top n major customers whose cumulative sales account for half of the total sales, and rank them according to the sales from large to small:                                        


A B
1 = T("D:/data/sales.csv").sort(amount:-1)  Fetch data, sort in descending order
2 =A1.cumulate(amount) Calculate cumulative sequence
3 =A2.m(-1)/2 The last cumulative value is the sum
4 =A2.pselect(~>=A3) The required position (more than half)
5 =A1(to(A4)) Get values by position

Flexible Application Structure

How to use SPL?

For interactive calculation and analysis, SPL has a professional IDE, which not only has complete debugging functions but also can visually observe the intermediate calculation results of each step:

 

Flexible application structure

 

SPL also supports command line execution and supports any mainstream operating system:

D:\raqsoft64\esProc\bin>esprocx.exe -R select Client,sum(Amount) from d:/Orders.csv group by Client
Log level:INFO
ARO 899.0
BDR 4278.8
BON 2564.4
BSF 14394.0
CHO 1174.0
CHOP 1420.0
DYD 1242.0
…

For the calculation in the application, SPL provides a standard JDBC driver and can be easily integrated into Java:

...
Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
PrepareStatement st = conn.prepareStatement("$select * from employee.txt where SALARY >=? and SALARY<?");
st.setObject(1, 3000);
st.setObject(2, 5000);
ResultSet result=st.execute();
...


There will be frequent modifications or complex calculations in the application. SPL allows the code to be placed outside the Java program, which can significantly reduce the code coupling. For example, the above SPL code can be saved as a script file and then called in the form of stored procedure in Java: 

...
Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
Statement st = connection.();
CallableStatement st = conn.prepareCall("{call getQuery(?, ?)}");
st.setObject(1, 3000);
st.setObject(2, 5000); 
ResultSet result=st.execute();
...


With open-source SPL, you can easily use SQL without RDB.

Database sql Open source file IO Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • MongoDB to Couchbase: An Introduction to Developers and Experts
  • Anatomy of a PostgreSQL Query Plan
  • 5 Key Postgres Advantages Over MySQL
  • Basic CRUD Operations Using Hasura GraphQL With Distributed SQL on GKE

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!