What Kind of OLAP Do We Really Need?
What Kind of OLAP Do We Really Need?
As the concept of OLAP (online analytical processing) has evolved, it has been increasingly used in a more narrow sense. This shouldn't be the case.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
OLAP is part and parcel of a BI application. As the name suggests, the word is an acronym for online analytical processing. Users — frontline employees, to be precise — are responsible for performing various types of data processing online.
But the concept of OLAP tends to be used in a very narrow sense. It has almost become the equivalent of multidimensional analysis. Based on a prebuilt data cubic, the analysis performs summarization according to specified dimensions/levels and presents the aggregate values as a table or a diagram. It adopts drill-down, aggregation, rotation, and slicing to change the dimensions/levels and summarization range. The idea behind multi-dimensional analysis is this:
Extensive ground-based aggregate results are too broad to get a good insight into an issue; instead, data needs to be sliced into smaller parts and drilled down to more detailed and deeper levels for achieving a more valuable analytical purpose.
The Broad-Sensed OLAP
Is online analytical processing all about the multidimensional analysis?
There are some data analysis scenarios in which a person who has a lot of experience in a field makes some predictions about their businesses. For example:
- An equity analyst predicts that stocks meeting certain conditions are most likely to rise.
- A sales manager knows which types of sales representatives are better at dealing with difficult customers.
- A tutor knows what the results of students who have very strong subjects and very weak subjects are like.
These guesses provide a basis for predictions. After operating for a certain time, a business will generate a huge amount of data, and a system could verify these guesses. Verified guesses can be used as principles to guide future decisions. If the guesses are proved wrong, re-guesses will be made.
It is guess verification that the OLAP should focus on. The guess-and-verify work aims to find principles or facts that support a conclusion based on historical data. An OLAP tool helps verify guesses via data manipulation.
Of course, guesses are made by experienced people in a certain field, instead of by the software. The online analysis is necessary because most of the time, guesses are made on the spot based on some intermediate results. It is impossible and unnecessary to pre-design a complete end-to-end path, which means pre-modeling is unfeasible. The provisionality of the action also makes IT resources unavailable when trying to verify it.
To counter this issue technologically, frontline workers must be equipped with the capability of querying and computing data in a flexible and interactive way. In the previously mentioned scenarios, the possible computations are as follows:
- For a stock that has been rising for three days in a month, find the probability of continuous rising on the fourth day.
- Find the customers whose last orders were half a year ago but who placed an order after their sales representatives were changed.
- Get the rankings of the English scores of the students whose scores of both Chinese and math are in top 10.
Limitations of Multidimensional Analysis
Obviously, these computations can be handled based on historical data. But is a multidimensional analysis method helpful?
I’m afraid not!
Multidimensional analysis has two drawbacks:
The data cubic should be pre-created, giving users no opportunity of remolding it provisionally and requiring a re-creation for each new analysis.
The analytic operations over a data cubic are limited, including only drill-down, aggregation, slicing and rotation. Thus, it is difficult to cope with complex multi-step computations.
Though the popular Agile BI products in recent years that are capable of performing multi-dimensional analysis have much better operation fluency and far more attractive interface than the early OLAP products have, their essential functionalities remain unchanged and no improvements have been made to make it capable of doing more things.
Yet, multidimensional analysis has values, like locating the exact source of the high cost. But it can’t get a principle that is crucial for predicting and guiding a future move based on data. In this sense, online analytical processing should be more than multidimensional analysis.
What Kind of OLAP Do We Need?
What functionalities should the OLAP software for verifying a speculation have?
As mentioned previously, verifying a speculation is a process of data query and computation. It is vital that the query and computation can be defined by frontline workers without the help of IT specialists. In the current application context, an OLAP platform needs to have the following two functionalities.
1. Associated Query
The first thing for performing an analysis is acquiring data. Many organizations have their own data warehouses for non-IT employees to access and perform queries on. An important issue is that most OLAP software doesn’t provide convenient associated query functionality for frontline employees. Instead, IT specialists need to first create a model to solve the associated query (which is similar to creating a data cubic for performing multidimensional analysis). Usually, not all real-life demands can be handled with this single model, and IT rescue is still needed. This makes online analytical processing not online anymore.
2. Interactive Computation
After data is collected, computation begins. The distinguishing characteristic of the speculation-verifying computation is that instead of a ready-made program, the next move is determined based on the result of the previous move. The process is highly interactive, which is similar to computation with a calculator. Furthermore, it is the structured data in batches, not numbers, that needs to be processed. The OLAP tool thus becomes a data calculator. Excel is interactive to some degree, making it the most popular desktop BI tool. But Excel doesn’t give sufficient support for dealing with multi-level data and regular operations, thus it is unable to handle the speculation-verifying computation mentioned in previous scenarios.
In later articles, we’ll analyze the current popular computing techniques to locate problems of handling the two types of computation and suggest solutions to them.
Published at DZone with permission of Buxing Jiang . See the original article here.
Opinions expressed by DZone contributors are their own.