Speed up ASO SQL data loads by using multiple rules files

Just another quick post today about possibly speeding up data loads to an ASO database when loading from SQL. I got on a quick call with a former colleague that was looking to gain a little more performance on their load process to a massive ASO database, and the first thing that jumped out at me was that I recall you can do parallel loads with some native MaxL syntax.

Here’s a quick example of the syntax:


import database $APPLICATION.$DATABASE data
connect as $SQL_USER identified by $SQL_PW
using multiple rules_file $RULE1, $RULE2, $RULE3, $RULE4, $RULE5
to load_buffer_block starting with buffer_id 100 on error write to "errors.txt";

Basically, you provide multiple rules files (configured for your SQL datasource of course). The rules files are likely to be the same as each other but I suppose it’s possible you might want to partition the data in some logical way to try and speed things up even more.

For example, let’s say that in the code above, we are loading five years of data from a relational database. We might then make it so that each rule is set for this particular year by doing the following things:

  • Set the year in the data header
  • Remove that column from the list of SELECT columns
  • Put a filter/predicate in the WHERE clause on the query
  • Bonus points for using substitution variables in both the header definition and the where clause

Performance in this particular use case went up substantially. It’s my understanding that data loads that were taking an hour are now cut down to 17 minutes. Your mileage may vary, of course.

Let’s Not Forget About Hybrid BSO

That said, I think this can be an effective strategy for trying to squeeze performance out of some ASO cubes that need a smaller load window and you don’t want to go changing a lot of the internals in play. If you’re doing new development, then I strongly, strongly recommend using hybrid BSO (or rather, BSO and making sure the cube is configured properly so as to get the hybrid BSO performance benefits). I have been seeing hybrid BSO cubes absolutely killing it in performance, what with their ability to leverage ASO technology for aggregates, and massive calculation improvements owing to the smaller block sizes and indexes you get from having so many dynamic calc members in dimensions. Plus, you of course get all of the classic/rich/awesome BSO functionality out of the box, like dynamic time series, expense tagging, time balance, and more. These were never very strong areas for ASO and often required a lot of non-optimal workarounds to make users happy.

Understanding the Outline Extractor Relational Extraction Tables

I was going to do a nice in-depth post to follow up on my discussion of the relational cache outline extraction method/improvements on the Next Generation Outline Extractor, but someone already beat me to the punch. It turns out that the tool’s primary author, Tim Tow, blogged about the technique and the tables for ODTUG a couple of years back.

I’ll just add on one thing that I wanted to highlight, though: the general technique behind most relational extractions from an Essbase outline is to generate a single table, with such common columns as PARENT, CHILD, ALIAS, UDA, STORAGE, CONSOLIDATION, and so on. If you think about this, it tends to imply a number of limitations that might make this technique unfeasible for you:

  1. The table can only hold one extraction at a time
  2. You can only get the member name and a certain alias table alias at a time
  3. The columns in the table are variable based on the attribute dimensions that may be associated with the dimension
  4. More than one UDA: ¯\_(ツ)_/¯

Continue Reading…

PBJ 1.0.4 – New password options and start of CLI

The PBJ library has been getting a lot of attention lately from various developers using it to integrate with their own software and projects. Francisco Amores did a great blog post about using PBJ to help with data loading in an FDMEE project. Probably the coolest thing about his efforts is that it’s  use-case I never imagined: using PBJ in Jython to access PBCS.

One of the things that has been so great about collaborating with Francisco is getting targeted, useful, and practical comments on how he’s using the library and how it can be made better. And I have found time to make various improvements, enhancements, and fix bugs to address his feedback. This is one of the greatest things about open source software.

Continue Reading…

Essbase Renegade Members Revisited

For some reason the other day I was thinking “Whatever happened to that renegade members feature?” So I did some digging.

Renegade members, by the way, refers to this concept where instead of a data record being rejected, you can map it to some other member. Other names for this feature might have been “shovel members”, but renegade members sounds cooler. That said, it’s a feature with a cool name but an apparently terrible publicist.

Renegade members were blogged about as early as a few years ago, such as on Cameron’s blog (during the 2013 OpenWorld), in Russian (apparently), and even over at Rittman Mead’s blog (before Mark spent his days trying to get tea kettles to work with the internet, but I digress).

But there’s a a curious lack of information on renegade members since then. There is, however, just enough information on the internet to piece this together. There’s a little documentation about renegade members over on the official documentation. Just as important (for my purposes), there are two methods relating to renegade members that are in the Essbase JAPI Javadoc.

Continue Reading…

Camshaft (Essbase MDX query tool) 1.0.2 released

Apparently I’m having quite the productive Friday, what with showing how easy it is to setup drill-through with Dodeca and that I’m heading to Oracle Open World 2017 to contribute to a presentation on cool Essbase tools.

To these articles I’ll add that I just released a Camshaft point release. This release has a couple of fixes and enhancements. Thanks to André Märki and others for providing feedback.

This version of Camshaft fixes an issue where some data with many digits after the decimal would be rendered in scientific notation. Along with this fix I have added a new command-line switch, --maximum-fraction-digits (used on the command-line such as --maximum-fraction-digits=2) to set the max number of digits to render after a decimal.

Additionally, there was a bug with running a query from a file that is now fixed. You can now specify something like --query=somefile.mdx and Camshaft will look for the given file. If found, it’ll read its entire contents for an MDX query, then execute that. This option can help make command invocations with big gnarly MDX queries a little easier to manage.

Please keep that feedback coming and I’ll add enhancements/fixes to the best of my ability. I have some interesting Camshaft news coming in the near future that some people will really like!

As always the latest Camshaft documentation and download can be found linked from the Camshaft page.

Advanced integration with PBJ Java PBCS REST API library

This week’s blog posts are all about the upcoming Kscope16 conference and relate to the presentations I’m part of. This year I am co-presenting with Cameron Lackpour on on-premise Planning versus PBCS and talking about some different use cases. My particular focus for this presentation is how you might use the PBCS REST API with Java.

Over the last year I have put together a Java library that works with the PBCS REST API. It has the following characteristics:

  • Open Source (Apache Software License version 2.0)
  • Doesn’t depend on any Oracle libraries or code
  • High-quality, readable, fluent API

In my opinion, all of these goals have been met. Additionally, as of today, PBJ (PBCS Java Client) is available in Maven Central. What this means is that if you or your team programs in Java and use Maven for dependency management, you are just a few clicks away from being able to use this library.

For an overview of why you might want to use PBJ and how it compares to other scripting languages you might want to use instead (such as Groovy, Python, and more), come check out the presentation!

Today, however, I want to show how easy it is to incorporate PBJ into a Java program and do something quasi-practical. So the rest of this article will be oriented towards programmers, but for those of you that have employees or teammates that would be more likely to do the programming aspects of things, keep them in mind and send them a link.

First, let’s assume we’ve already created a new empty Maven project in Eclipse. Your experience will vary if you use IntelliJ IDEA or some other IDE. In this example I happen to be using Springsource Tool Suite (STS), which is pretty much the same as Eclipse.

pbj-up-and-running-01-empty-project

Next we need to open up our pom.xml (project definition file) to add a new dependency. You can do this manually by editing the XML file itself, but there’s a nice enough GUI in Eclipse that makes things even easier:

pbj-up-and-running-02-pom-editor

Next we need to go find the PBJ library. As I mentioned earlier, PBJ is available in the global Maven Central repository. If you have Maven set to update its index periodically or upon startup of your IDE, then you should be up to date and can find the PBJ library. As of right now the version of PBJ is just 1.0.1.

pbj-up-and-running-03-pbj-maven-dependency

Select the library, click OK, and then save the file. The PBJ library and its dependencies are added to your project.

pbj-up-and-running-04-pbj-added

Now we can create a new main class to test things out. At this point it’s just life as normal for the Java developer:

pbj-up-and-running-05-new-main-class

Just for good measure (and tradition!) let’s put in a simple code to print out “Hello world”, and run it. Note the output in the bottom middle pane:
pbj-up-and-running-06-hello-world

At this point we have project setup, we have all of the necessary PBJ files (and some additional transitive dependencies), and we are ready to write some Java code that uses methods in the PBJ library. The code to write is shown below:

pbj-up-and-running-07-refresh-cube

In the code you can see the following happen:

  1. A connection details object is created
  2. A connection to PBCS is made
  3. Ask for the list of available apps
  4. Print out the list of available apps
  5. Get a reference to a particular app (“Vision”)
  6. Call the refreshCube method on the app reference to refresh the cube
  7. Print message after cube refresh

It’s hard to imagine this code being much simpler. If might look like greek if you’re not familiar with programming or Java, but to a Java programmer, this will be readily comprehensible and its intent obvious. PBJ supports most of the REST API – importing data, metadata, business rules, and more.

To see a specific use-case and hear wry commentary from myself and Cameron, please swing by our presentation. We’ll cover an example of PBJ (don’t worry, it’s higher level than this!) but more generally some facets of administration of on-prem versus PBCS will be discussed as well. I think the presentation will really appeal to many different user groups.

Vess Updates Substitution Variables

A colleague of mine is running into an issue with substitution variables and was looking for a solution that he could use to sync values up. He thought maybe Vess would be a good fit. Vess, as I have blogged about before, is a “virtual” Essbase JDBC driver. Vess maps Essbase concepts and crams them into a typical database model. For example, Vess exposes tables that model substitution variables.

In the case of server-wide substitution variables, there is a “VARS” table that has two columns: NAME and VALUE. For each application, there is another table that contains four columns: NAME, VALUE, APPLICATION, DATABASE.

As a quick aside, this might seem a little odd to have separate tables. After all, this table is notionally about the same as the Variables screen in EAS. Well, you have to kind of flip your thinking a little. Don’t think of variables as being only either server or a cube: think about in terms of what variables are applicable to a cube. In other words, if you ask Essbase what variables are applicable to the whole server, then this would be the global variables only. If you ask Essbase what variables are applicable to a cube, then it’s the cube, app, AND the server specific variables. This is one of the reasons there are multiple tables to model the variables.

Getting back on track, given that we have these tables and we can treat them just like normal SQL tables, we can do some interesting things. Let’s say we want to create or update a variable specific to an app that exists in the global scope. We can do this in one line:


INSERT INTO SAMPLE.VARS<br />
SELECT NAME, VALUE, 'Sample', 'Basic'<br />
FROM VESS_SCHEMA.VARS WHERE NAME = 'Foo';

What’s going on here? In Vess, a schema named VESS_SCHEMA is presented for server-wide things (server wide variables are in the table VARS in this schema, as shown above). The server VARS table only has columns NAME and VALUE.

Each application on the Essbase sever is modeled as its own schema. In this case, our favorite app – Sample – gets a schema named SAMPLE. This schema also contains a VARS table (containing columns NAME, VALUE, APPLICATION, and DATABASE).

Given these tables we have, it’s a simple matter of selecting the server variable with the name we want (in this case, a variable named ‘Foo’), and insert it into the variables for the Sample app.

Of course, if we wanted to for some reason, we could alter the name using normal SQL (string truncating, substrings, etc), or whatever. We can also delete variables, such as this:


DELETE FROM SAMPLE.VARS WHERE CUBE = 'Basic';

As I’ve said before, Vess continues to be an “interesting” proof of concept. As time permits I am filling out more and more functionality. At present, Vess models things like substitution variables, metadata you might see in MaxL or EAS (cube statistics, user sessions, etc), can load data to cubes, and can do certain outline related operations.

Vess is not available as a public download at this time but I have handed a few copies out to get feedback. I think Vess is just about good enough to be used in automation and other things. If you’re interested in using this in a production situation (automation or otherwise), please contact me to discuss support options.

More fun with Vess, the Essbase JDBC driver

All I can say about Vess is that it’s still… interesting. This is definitely one of the more complex things I’ve ever worked on, as well as one of the more elegant. I did a lot of plumbing on how the driver works which is now enabling so interesting and rapid progress.

Here are some interesting things that you can do as of right now:

  1. Query list of applications and metadata
  2. Query list of cubes and metadata
  3. Query list of sessions (same as Sessions in EAS)
  4. Load data without a load rule by using a SQL INSERT statement
  5. List users
  6. List substitution variables

Some of the really fun things are actually the most subtle. Wherever possible Vess tries to supply metadata for a modeled table that matches up with Essbase semantics or naming limits. For example, substitution variables are in a table where you have the variable name, value, application, and database. Each of these columns are normal String columns but they have character limits that match the maximum Essbase length (320, 256, 30, and 30, respectively, taking into account Unicode).

Similarly for member names on the data load table. Also wherever possible, a primary key or nullability constraint is set on a table where it also makes sense. For example, substitution variables are assumed to be non-null, as well as member names for an Essbase fact table, cube names, and more.

I have been using a free generic JDBC tool called Squirrel SQL to help facilitate testing. Squirrel supports any database for which you can supply a generic JDBC driver. Here are a few screenshots from testing out Vess when connected to Sample/Basic (list of cubes on server):

View list of cubes and metadata with Vess

List of active sessions on server:

Viewing list of active sessions on Essbase server using Vess

Some of the tables also support DELETE semantics, for example, deleting a row from the session table is equivalent to killing the session, as with substitution variables.

Lastly, all tables that are modeled in Vess are part of the SQL metadata store, so that tools that go in to ask for metadata (such as Squirrel) get it back. This is interesting/notable because theoretically you could already use Vess with ODI to perform data loads and some other things. I haven’t tried this yet, but theoretically I could add the drivers, reverse the metadata, and use one of the tables in an interface. So it’d be fairly trivial to update a substitution variable directly from SQL, perform a data load, and you could even kick off a calc script simply by invoking a procedure.

In short, things are exciting in the Vess world. While many of you want to load data to Essbase cubes without a load rule, just as many are interested in pulling database metadata (things you normally dump with MaxL and scrape out), and this tool might just fill that need in a clean way. More to come later, I have some presentations to get ready for Kscope15 (although sadly none on this little technological flight of fancy…)

Quadruple Backslashes

No, Quadruple Backslashes is not a college rock band (that I know of). But maybe it should be one. Anyway, ever see a quadruple backslashes on a path in a batch file? Like this:

SET FILE=\\\\some-server\\path-to-app-folder\\app\\data.txt

What’s going on here? This is a UNC path, and the normal form is like this:

SET FILE=\\some-server\path-to-app-folder\app\data.txt

Let’s say we are doing a data import in a MaxL script:

import database Sample.Basic data
    from data_file "$FILE"
    using server rules_file "LdAll"
    on error write to "errors.txt";

In many scripting languages, backslashes are special. They are used to “escape ” things inside of strings. What if you need a newline inside of a string? That’s \n. What if you want a tab? That’s \t. What if you want to print a backslash itself? You can’t just put a single backslash in because then the interpreter thinks that the next character is something special. So to write out just a backslash you escape a backslash itself (so it’s \\).

To further complicate things, many scripting languages have a notion of interpolated and non-interpolated strings. That means that strings contained inside of double quotes (“This is a double quoted string!”) have their contents parsed/scanned for special characters. Strings in single quotes don’t (‘This is a single quoted string!’). So in the single-quoted string, we can sometimes (not always) get away with doing whatever we want with backslashes. But if we use the single quoted string we can’t stick variables in it.

So now we need a double quoted string so we can put variables in it, but this makes putting in backslashes for our UNC path a little complicated. No worries – just escape the backslashes. Returning to our quadruple backslashed MaxL example that is surrounded with double quotes, what’s happening is that MaxL parses the string contents (the stuff between the double quotes) and the four backslashes become two backslashes in the final string. The double backslashes in the other parts of the UNC path get translated into a single backslash.

The final resulting string has the two backslashes to start the UNC path and any other path delimiters are just single backslashes.

Normal Paths

If you are just loading a file from a normal, non-networked path (or it’s a mapped drive), and you’re on Windows, then I highly recommend just using forward slashes to delimit the path. For example, consider an input file at D:\Hyperion\data.txt. Your double quoted string would be D:\\Hyperion\\data.txt, which gets translated down to just single quotes. For the last decade or more, Windows has had awesome support for the more traditional forward slash Unix path separator, meaning you can just use D:/Hyperion/data.txt. The forward slash isn’t special at all so you don’t need to escape it. You can’t use double forward slashes to denote a UNC path, though, so //server-name/path/to/file.txt will not work, thus requiring some backslash kung foo to get working.

ODI Automation: Don’t Call Us, We’ll Call You

I have developed numerous jobs with Oracle Data Integrator. I have developed new jobs that connect systems together, and I have developed jobs in ODI that replace existing jobs. There are plenty of reasons for redeveloping a solution in ODI that I have talked about before. Generally you can significantly clean things up – sometimes a little, sometimes a lot.

Sometimes a very particular aspect (such as a certain section of script code) of an existing batch job will be targeted for replacement with something developed in ODI. Let’s say that there is an existing batch file that performs a series of six distinct steps (not counting boilerplate code like setting some variables and echoing out logging statements and such). For example, the steps might be this:

  1. Copy a file from one place to another (such as from an “incoming” folder to a known folder to another location)
  2. Run a custom program to process the text file in some way, generating a new file
  3. Use a command-line SQL program to run a particular line of SQL, to truncate a particular table
  4. Run a custom program (or a BULK INSERT) to load the text file to a SQL table
  5. Use command-line program to run a line of SQL such as an UPDATE on the newly loaded data (UPDATE STAGING SET YEAR = '2015' WHERE YEAR IS NULL or something)
  6. Run a MaxL script that clears a cube, imports data via a load rule, and calculates the database

The Perfect Solution

Now, sitting from my architectural ODI ivory tower, I would make the case that this should all be done in an ODI job with an interface. For example, loading a text file to Microsoft SQL Server, defaulting some of the years data, and loading the data to a cube could all more or less be done in a single interface (BULK INSERT LKM, Hyperion Data Load IKM, use appropriate load rule, etc.).

The Intermediate Solution

But let’s say that just the data load and nothing else is to be performed by ODI because we are revamping some of the job, but not the whole job, for some reason (risk, time, money, incrementalism). This means that we are going to call an ODI job from an existing batch file. We will use our trusty friend startscen.bat in order to accomplish this. startscen.bat is good for running scenarios. We pass in the name of the scenario, the ID, and the context so that we can very specifically ID the exact job for the ODI server to run.

So now we’ll keep our existing batch file and most of its contents, but in place of part of the previous code, we’ll have a call to startscen.bat and then resume any necessary processing after that. So technically, we’ve improved the process a bit (hopefully) by using ODI for part of it, but we haven’t really done much to significantly improve things. This is now where I get to the point of the article: I say go all in with your ODI automation.

Think of this intermediate solution like this, in terms of script code:

  1. Copy file from incoming folder
  2. Run custom program to transform file
  3. Call startscen.bat (invoke ODI scenario, load SQL data, load to Essbase)

The benefits of this incremental approach should be obvious: we’ve ostensibly cleaned up part of the job. Here’s the thing though: I think we’ve actually added some undesirable technical debt by doing it this way.

Yes, the job technically works, but think about how this batch file is positioned from an administration point of view now: we have a batch file, some configuration settings for that batch file (server names, encryption keys, folder paths, etc.), the batch file is likely outputting data or otherwise logging things to a text file, then we’re calling the ODI job itself, thereby creating an entry in the ODI Operator (which is a good thing, but it will only show part of the the overall job).

More often than not when I am redeveloping an automation process for someone, the batch files have no error control and inconsistent amounts of logging. The logging that is done is often times never looked at. Having easy visibility into our automation processes, however, is a very good thing. We can achieve this with ODI. While incrementalism has its merits, I say go all in with the ODI automation.

Go all in

What I mean by this is that instead of sticking with the batch file, instead of the batch file calling the automation steps and then calling an ODI step, use a package in ODI to call the necessary steps, your interfaces, and any other processing you need.

You get a lot of benefits from this approach. The ODI Operator will log each step so that you know how long each one took. You can get consistent and easy error handling/branching in an ODI package. You can easily log all of the output. You can schedule the job from the ODI scheduler. ODI has package steps that can be used to replace kludgey code in a batch file. For example, there is an ODI “wait for file” step that waits around for a file to show up. You can copy files around, pull files from FTP, and more – if you need to just call an arbitrary program or script you can do that with an ODI OS Command. Everything you need is there, and more.

You don’t even have to use the ODI scheduler. If you are already invested in another scheduler (Windows Scheduler, cron jobs, etc.) and want to keep everything in one place, then just call the ODI scenario from your scheduler, but that scenario will have been generated from a package that has all of the steps in it (instead of your batch file calling those).

This new world of pure ODI jobs is worth the extra effort to get to: you have status and metrics for all of your jobs (without tons of boilerplate logging code), you can manage your entire job in one place, and its easier to follow best (better) practices with development. While batch files and various scripts have served us well over the years, more and more I find myself abstracting away from the file system when it comes to automation and given the capabilities of ODI. It’s worth it.