Understanding Preceding Loads
Understanding Preceding Loads
Preceding Loads allow you to load a table of data in one pass with several successive transformations. Simple as they are, they can make impressive improvements to code.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
Some of the features of QlikView Training are not frequently blogged about. These features are used on a day-to-day basis, and developers often don't give them any thought. If you're unaware of these techniques, you should revisit them before reading this article. One feature that we will be talking about today is the Preceding Load.
What Is a Preceding Load?
Knowing exactly what a Preceding Load is will be important here. As the name implies, a Preceding Load occurs before, prior to, or in front of another load. Even if you're not aware of it, you've probably used a Preceding Load before. For a basic Preceding Load example, let's imagine that there is an SQL
SELECT statement while loading from an OLEDB or ODBC data source. As an option, the wizard could add a
LOAD section prior to
SELECT. Using one of these ahead of a database load is always recommended, as doing so opens up a complete range of syntax that is unavailable in the SQL statement.
Common Misunderstandings of Preceding Loads
It is essential to know the pattern of things that happen in your load script prior to getting stuck in Preceding Loads. The execution of the script is done from top to bottom, then from left to right in parallel with the tabs. (Note: There is no functional relevance to the tabs; they are only there to clean code.)
This execution plan will be altered by loops and subroutines by design, but this statement still holds true. The difference, however, is with LOAD blocks that can be shaped together and executed from the bottom up. See the simple SQL Preceding Load below:
DBPreceding LOAD UserName, Password ; SQL SELECT UserName, Password FROM MyServer.dbo.Credentials;
An execution of the SQL part is made and exchanged first and is then analyzed by the LOAD statement above. As we get into multiple coatings or tiers of Preceding Loads, this plan of loading from the bottom up holds true.
Why Apply a Preceding Load?
Simply put, a Preceding Load permits you to use values borrowed in one part of the load in the one above it. Suppose that from a text file, two dates are loaded. To know the distance between those two dates, we should change them from text values to numeric values. The code without a Preceding Load would look like this:
NoPreceding: LOAD Date(Date#(FromDate, 'YYYYMMDD'), 'DD MMM YYYY') as [From Date], Date(Date#(ToDate, 'YYYYMMDD'), 'DD MMM YYYY') as [To Date], Date#(ToDate, 'YYYYMMDD') - Date#(FromDate, 'YYYYMMDD') as Duration FROM ..\SourceData\Durations.csv (txt, codepage is 1252, embedded labels, delimiter is ',', msq);
You will see the code be cloned, as we change both of the dates twice (although I was recently informed by Henric Crinstrom that QlikView, with its intelligent cache, would not require calculating the values twice; cloning is the wrong thing to do in regards to code maintainability).
We can eliminate that cloning with a simple Preceding Load. This would make the code much cleaner and give it much more readability. It would appear like this:
SimplePreceding: LOAD * [To Date] - [From Data] as Duration ; LOAD Date(Date#(FromDate, 'YYYYMMDD'), 'DD MMM YYYY') as [From Date], Date(Date#(ToDate, 'YYYYMMDD'), 'DD MMM YYYY') as [To Date], FROM ..\SourceData\Durations.csv (txt, codepage is 1252, embedded labels, delimiter is ',', msq);
You will see that the fields in the lower part of the load are used in the one above. Note
* in the Preceding Load — that is important for dragging the fields up from the load below to what is truly loaded. This poses two possible difficulties: first in omitting the asterisk, after which the two date fields will not be in the final data model; second in cloning a field by using a field name in the following load that is dragged through with an asterisk that is then used again in the Preceding Load. These difficulties can be exhausting to spot. Moreover, the error message from the QlikView (Note: Use unique field names) does not always point you to the right part of the load script, so you need to be cautious.
Usually, I would recommend not using
* (especially when dragging fields from a database), but they are very useful in Preceding Loads. Note that you can also drag fields up by listing them if you only need to take some fields from your lower load to the preceding one.
Diving Deeper Into Preceding Loads
You don't need to stop there! You can sum up a Preceding Load if you needed to have another value calculated on fields derived from your Preceding Load. Taking the example of our date interval, we could then sum up another field based on whether the threshold has been breached upon.
Simply pile up another load on top of the one before, as shown below:
SimplePreceding LOAD *, [To Date] - [From Date] - as Duration ; LOAD Date(Date#(FromDate, 'YYYYMMDD'), 'DD MMM YYYY') as [From Date], Date(Date#(ToDate, 'YYYYMMDD'), 'DD MMM YYYY') as [To Date], FROM ..\SourceData\Durations.csv (txt, codepage is 1262, embedded labels, delimiter is ',', msq);
You can have a limitless number of levels of load. And although this looks like you are likely misinterpreting something (like a BI version of Inception), Preceding Loads help you unclutter and simplify a script. You can have expressions that use fields from any of the levels below in order to make new values.
Lastly, although they are not a part of a truly "back to basics" post, I would like to say that there are functions you can utilize in your Preceding Loads that you may previously have only linked with the first position of load.
GROUP BY are allowed to use values from the previous load. You should not use these features frequently, but you should know how and why they are sometimes used.
Your toolbox of QlikView load script with Preceding Loads enables you to create complex expressions while preserving your code simply by splitting things up into bite-sized chunks. Cloning can be eliminated and there can be an improvement in maintainability that is impressive for a simple, small technique.
Opinions expressed by DZone contributors are their own.