Many BI practitioners have heard about OLAP which is an important constituent part of business intelligence. And today we will talk about what OLAP is indeed for actual need? What is real OLAP? What is instant OLAP for instant data analytics?
This (three part) article was originally posted on my blog . I gather them together to share with more people and hope you will enjoy.
Understood literally, OLAP is online analytical processing, that is, users conduct analytical operation on real-time business data.
But, currently the concept of OLAP is seriously narrowed, and it only refers to operations such as conducting drilling, aggregating, pivoting and slicing based on multi-dimensional data, namely, multi-dimensional interaction analysis.
To apply this kind of OLAP, it is necessary to create in advance a group of topic specific data CUBEs for data analytics in OLAP tool. Then users can display these data in the form of crosstab or graph and conduct in various real-time transformations (pivoting and drilling) on them, with the hope to find in the transformation process a certain law of the data or the argument to support a certain conclusion, thereby achieving the aim of data analytics.
Do we need this kind of OLAP? To answer this question, we need to carefully investigate the real application process of the OLAP, thereby finding out what the technical problem the OLAP tools needs to solve is on earth.
Employees with years' working experiences in any industry generally have some educated guesses about the businesses they engage in, such as:
A stock analyst may guess stocks meeting a certain condition are likely to go up.
An employee of an airline company may guess what kinds of people are accustomed to buying what kind of flights.
A supermarket operator may also guess the commodity at what price is more suitable for the people around the supermarket.
Evidently, this type of computation demand is ubiquitous in business analysis process and all can be computed out from historical database. Then how about instant data analytics, not from historical database?
Those guesses of interactive data analytics are just the basis for forecast. After operating for a period of time, a constructed business system can also accumulate large quantities of data (so called complex data calculation), and these guesses have most probably been evaluated by these accumulated data, when evaluated to be true, they can be used in forecast; when evaluated to be false they will be re-guessed.
It needs to be noted that these guesses are made by users themselves instead of the computer system! Instant data analytics is started by human being in OLAP. What a computer should do is to help a user to evaluate according to the existing data, the guess to be true or false, namely, on-line data query (including certain aggregation computation). This is just the application process of OLAP. The reason why on-line analysis is needed is that many query computations are temporarily required after a user has seen a certain intermediate result. In the whole process, model in advance is impossible and unnecessary (Raqsoft esProc is born to deal with these issues).
We call the above process evaluation process, whose purpose is to find from historical data some laws or evidences for conclusions, and the means adopted is to conduct interactive query computation on historical data. And this process can be a complex data calculation.
The following are a few examples actually requiring computations (or queries):
The first n customers whose purchases from the company account for half of the sales volume of the company of the current year;
The stocks which go up to the limit for three consecutive days within one month;
Commodities in the supermarket which are sold out at 5 P.M for three times within one month;
Commodities whose sales volumes in this month have decreased by more than 20% over those of the preceding month;
Evidently, this type of computation demand is ubiquitous in business analysis process and all can be computed out from historical database.
Then, can the narrowed OLAP be used to complete the above-mentioned data computation process (marketing and sales data analysis)?
Of course NOT!
Currently OLAP system has two key disadvantages:
1 The multi-dimensional cube is prepared in advance by the application system and user does not have the capability to temporarily design or reconstruct the cube, so once there is new analysis demand, it is necessary to re-create the analytics cube.
2 The analysis actions could be implemented by cube are rather monotonous. The defined actions are quite few, such as the drilling, aggregating, slicing, and pivoting. The complicated analysis behavior requiring multi-steps is hard to implement.
Although the current OLAP tools are splendid regarding its look and feel, few on-line analysis capabilities powerful enough are provided actually.
Then, what kind of OLAP do we need? What kind of OLAP tools we need?
It is very simple, and we need a kind of on-line analytical system that can support evaluation process, which SQL data computing or excel computation can handle.
Technically speaking, steps for evaluation process can be regarded as computation regarding data (query can be understood to be filter computation). This kind of computation can be freely defined by user and user can occasionally decide the next computation action according to the existing intermediate result, without having to model beforehand. Additionally, as data source is generally database system, it is necessary to require this kind of computation to be able to very well support mass structured data (tools like esProc) instead of simple numeric computation. And evaluation process is what business need especially in marketing and sales data analysis.
Then, can SQL (or MDX) play this role?
SQL is indeed invented for this aim and it owns complete computation capability and it adopts a writing style similar to natural language.
But, as SQL computation system is too basic, it is very difficult and over-elaborate to achieve complex computation by a SQL data computing, such as problems listed in the preceding paragraphs. It is even not so easy for programmers who have received professional training, so ordinary users can only use SQL to implement some of the simplest queries and aggregate computation (based on the filter and summarization of a single table). This result leads to the fact that the application of SQL has already deviated far away from its original intention of invention, almost becoming the expertise for programmers.
We should follow the working thought of SQL to carefully study the specific disadvantage of SQL and find the way to overcome it in an effort to develop a new generation of computation system, thereby implementing the evaluation process, namely, the real OLAP, instant data analytics.
Author: Jim King
BI technology consultant for Raqsoft
10 + years of experience on BI/OLAP application, statistical computing and analytics