Speed up ASO SQL data loads by using multiple rules files

Just another quick post today about possibly speeding up data loads to an ASO database when loading from SQL. I got on a quick call with a former colleague that was looking to gain a little more performance on their load process to a massive ASO database, and the first thing that jumped out at me was that I recall you can do parallel loads with some native MaxL syntax.

Here’s a quick example of the syntax:


import database $APPLICATION.$DATABASE data
connect as $SQL_USER identified by $SQL_PW
using multiple rules_file $RULE1, $RULE2, $RULE3, $RULE4, $RULE5
to load_buffer_block starting with buffer_id 100 on error write to "errors.txt";

Basically, you provide multiple rules files (configured for your SQL datasource of course). The rules files are likely to be the same as each other but I suppose it’s possible you might want to partition the data in some logical way to try and speed things up even more.

For example, let’s say that in the code above, we are loading five years of data from a relational database. We might then make it so that each rule is set for this particular year by doing the following things:

  • Set the year in the data header
  • Remove that column from the list of SELECT columns
  • Put a filter/predicate in the WHERE clause on the query
  • Bonus points for using substitution variables in both the header definition and the where clause

Performance in this particular use case went up substantially. It’s my understanding that data loads that were taking an hour are now cut down to 17 minutes. Your mileage may vary, of course.

Let’s Not Forget About Hybrid BSO

That said, I think this can be an effective strategy for trying to squeeze performance out of some ASO cubes that need a smaller load window and you don’t want to go changing a lot of the internals in play. If you’re doing new development, then I strongly, strongly recommend using hybrid BSO (or rather, BSO and making sure the cube is configured properly so as to get the hybrid BSO performance benefits). I have been seeing hybrid BSO cubes absolutely killing it in performance, what with their ability to leverage ASO technology for aggregates, and massive calculation improvements owing to the smaller block sizes and indexes you get from having so many dynamic calc members in dimensions. Plus, you of course get all of the classic/rich/awesome BSO functionality out of the box, like dynamic time series, expense tagging, time balance, and more. These were never very strong areas for ASO and often required a lot of non-optimal workarounds to make users happy.

JDBC and JNDI connections compared (with a Dodeca example)

Have you ever wondered what the difference between a JDBC and a JNDI connection is? If you’re familiar with at least one of these, it’s likely that you’re familiar with JDBC (but probably not JNDI).

JDBC connections come up often in the Oracle world (for good reason). It’s a standard model/framework for designing drivers that interact with relational databases. As it pertains to us in the Hyperion, Dodeca (and even Drillbridge!) world is that we often define connections in terms of specifying JDBC parameters. This typically means a driver class name (like com.mysql.jdbc.Driver for a MySQL driver), a JDBC URL (a URL specifying a server and optionally a database/schema and other parameters), and credentials (username/password). So if you’ve poked around in your infrastructure much at all, there’s a good chance that you’ve come across a JDBC connection.

You may have even come across something called JNDI and even vaguely known it was sort of an alternate way to configure a connection but never really had to bother with it. I’ll spare you the acronym details, but think of JNDI as a way of organizing database connections (and other objects actually, but we don’t need to worry about that at the moment) such that instead of our app/system having to know the server name and credentials, it just asks “Hello, can I have the resource that was defined for me with name XYZ?”

Continue Reading…

Using Excel AutoCorrect to help type special characters

A client of ours has a member name with a superscript two in it and users re used to typing it in quickly themselves, using the standard Windows Unicode keyboard shortcut, except the Windows shortcut seemed to stop working. This intrigued me a bit, so I did some digging.

As best I can tell, Excel has quietly dropped support for typing in Unicode characters using the Alt + digits shortcut. There seem to be some articles about how this is possible with Excel 2007, but I couldn’t get it to work with Excel 2010 at all. It’s possible that newer versions of Excel fixed/brought back the support.

I even found some articles about tweaking a registry key in Windows to specifically enable the Alt keyboard shortcut, but didn’t have any luck with that making it work in Excel.

That said, since the only special character in question (at the moment) seemed to just literally be the superscript two, as a quick workaround, I recommended setting up an AutoCorrect shortcut in Excel called (super2) that will put in the needed character. The configuration for this is pretty straightforward, and I used it to manually type in a seemingly new offering at The Beverage Company, called Energy²:

Excel's AutoCorrect can be used to help type commonly used special characters

Excel’s AutoCorrect can be used to help type commonly used special characters

If anyone wants to confirm the behavior in Excel 2013/2016 I’d be curious to know what the official situation is when someone needs to type in special characters via their Unicode code.

Performance nuances with MaxL data imports with local and server

Some time ago, I reviewed and revamped the MaxL automation for a client. One of the major performance gains I got was actually pretty simple to implement but resulted in a huge performance improvement.

Did you know that the MaxL import data command can can be told whether the file to load is a local data file or a server data file? Check out the MaxL reference here for a quick refresher. See that bold “local” after from? That’s the default, meaning if we omit the keyword altogether, then the MaxL interpreter just assumes it’s a local file.

Imagine that you have an Essbase server, and then a separate machine with the MaxL interpreter. This could be your local workstation or a dedicated automation server. Let’s say that there is a text file on your workstation at C:/Essbase/data.txt. You would craft a MaxL import command to import the local data file named C:/Essbase/data.txt. That’s because the file is local to the MaxL interpreter.

Now imagine that the file we want to load is actually on the server itself and we have a drive mapped (such as the Y: drive) from our workstation to the server. We can still import the data file as a local file, but this time it’s Y:/data.txt (Assume that the drive is mapped directly to the folder containing the file).

In this scenario, MaxL reads the file over the network from the server to the client, then uploads that data back to the server. This data flow is represented in the figure in the left of this diagram:

MaxL data loads: server vs. local

You might be thinking, “But wait, the file is on the server, shouldn’t it be faster?” Well, no. But there’s hope. Now consider server file loading. In this case we use the server keyword on the import statement and we specify the name of the file to load. Note that the file location is based on the database being loaded to. If you’re loading to Sample Basic, then Essbase will look in the ../app/Sample/Basic folder for the file. If you don’t want to put files in the database folder, you can actually cheat a little bit and specify a path such as ..\..\data.txt and load the file from a relative path. In this case by specifying the ..\..\, Essbase will go up two folders (to the \app folder) and look for the file there. You can fudge the paths a little, but the key is this: Essbase will load the file from itself, without the MaxL client incurring the performance penalty of two full trips of the data. This is depicted in the right figure in the diagram: the MaxL client issues a data load command to the server, which then loads the file directly, and we don’t incur the time needed to load the file.

In my case the automation the written to load a file that was already on the server (in the \app folder), so I just changed the import to be a server style import, and immediately cut the data import time dramatically.

I wouldn’t be surprised if this “anti-pattern” is being used in other places – so take a look at your automation. Let me know if you find this in your environment and are able to get a performance boost!

 

Do this, not that: Current vs. Prior Year dynamic calc in Scenario

Here’s just a quickie I saw the other day. Imagine a normal cube with a Years dimension, a Scenario dimension, and any other normal dimensions. Years contains FY2012, FY2013, FY2014 or similar and so on. Scenario contains Actual, Budget, and all the other normal stuff you’d expect to see.

Naturally, the Scenario dimension will contain all sorts of handy dynamic calcs, starting with our trusty Actual to Budget variance:

Actual vs. Budget: @VAR("Actual", "Budget");

So far so good.

How about a scenario that gives us the current year versus the prior year? Don’t do this:

@VAR("FY2014", "FY2013");

Or this (which is I guess slightly better but still not quite great):

@VAR(&CurrentYear, &PriorYear);

Why shouldn’t you do this? One, it requires maintenance – the kind of maintenance that is easily forgotten about util a user calls up and says that something doesn’t look quite right.

Second and more importantly, it’s semantically wrong. Hard-coding the year effectively breaks the inter-dimensional promise that our cube is ostensibly making – which is that the Scenario value we’re looking at should be based on the current Year member – not some arbitrary member irrespective of the POV.

(This all being said, yes, there could be a legitimate design reason to code a dynamic calc in Scenario that is always the current year irrespective of the POV, but I digress).

A simple formula can get us the prior value:

@PRIOR("Actual", 1, @CHILDREN("Years"))

As well as the actual versus prior:

@VAR("Actual", @PRIOR("Actual", 1, @CHILDREN("Years")));

Note that this assumes there is nothing else in the Years dimension and that it’s got a typical “ascending” sort (2010, 2011, 2012, in that order). If you have a years dimension going in descending order you could put -1 in for the @PRIOR command or just switch to @NEXT.

There you have it – a simple cleanup that saves maintenance, doesn’t rely on outline variables being updated, is intuitive, and more importantly, is doesn’t break the semantics of the cube.

A quick trick to avoid hard-coding folder names in batch files

I’m still surprised (although I guess I shouldn’t be) how often I come across batch automation files that have a first line of setting the current folder – using a hard-coded folder name. For example, if the automation is located in D:\Essbase\Automation, then the first line of the script looks like this:

cd /d D:\Essbase\Automation

99% of the time, this line is the same as the folder containing the batch file. Instead of hard-coding this, you can actually just use a handy shortcut on Windows:

cd /d %~dp0

The %~dp0 token/variable gets replaced at runtime with the current folder containing the executing file. The /d parameter simply tells the cd command to change drives, if necessary (so that the change directory command works if it’s going from the C drive to the D drive, for example).

Using the above technique, you can write batch files that are more standard looking, portable, and more flexible. It’s a good thing (said in the voice of Martha Stewart).

Eclipse tips for you Java folks out there

I know there’s lot of you Java and Essbase folks out there in addition to myself. I’m always interested in ways to use an IDE more effectively. I came across this list of Eclipse annoyances and easy solutions for them earlier today. I am already using some of these but some of them are new to me. So if you want to get your environment dialed in a little bit, check these out. And for you IntelliJ folks… just keep being your normal smug selves. ;-)

Oh, and I’m thinking about doing a small series on “Essential Java for Essbase Developers” or something along those lines. It’d be something like a tutorial series on how to navigate around all of the Java terminology, features, and ecosystem in order to get straight to developing the solutions you want to develop, just a little bit quicker. Let me know if you have any thoughts or suggestions!

 

 

Using the Unix paste command to join files together by column

I can’t believe I didn’t know about command-line utility until very recently. I was doing a little research on some text processing utilities and came across the “paste” command. As a Mac user I have this installed already, and this appears to be a fairly common LInux/Unix tool as well. It’s part of a suite of text processing utilities that are fairly standard. Oddly, I am very familiar with the likes of sed, grep, awk, and so on, and yet have not stumbled upon this.

Anyway, imagine the following files, starting with names.txt:

Jason
Cameron
Tim

And numbers.txt:

555-1234
555-9876
555-2468

Then we just run paste:

paste names.txt numbers.txt

And we get this:

Jason   555-1234
Cameron 555-9876
Tim     555-2468

Paste just marries the files up by column, reading from each file. You can supply more than two files.

I don’t have an immediate need for this utility for processing Essbase data, but it just might come in handy someday, so I’m going to keep it in my back pocket. And for you Windows users out there, well, you know the deal: get cygwin or whatever the latest and greatest Unix-on-Windows environment is.

Pre-seeding Hyperion Planning User Preferences with values for a smoother user experience

Wow, I think I am actually writing an article on Hyperion Planning. I think pigs are flying right now. I have been helping out on a system upgrade for the last few months where we are in many ways “refactoring” a Planning deployment. I’m borrowing that term from the software world. In other words, we are changing how things work under the hood without the explicit intention of changing how things look to users. One of the changes we are making, however, is to introduce some variables for users to be able to change their Version and Scenario.

Just to be clear, the variables are the ones that are set in the preferences menu. And we’d like to provide some defaults so that the users have the most likely choices pre-selected. We can export the User Preferences option from LCM. The corresponding XML file has a section for each user’s variables. It’s like this:

<UserPreferences>
  <UserPreference UserName="jason">
    <!-- some stuff here -->
    <UserVariables>
      <Variable Name="Scen_UserVar" Value="Forecast"/>
      <Variable Name="Ver_UserVar" Value="Working"/>
    </UserVariables>
  </UserPreference>
</UserPreferences>

There’s, of course, a UserPreference section for each user. We can edit the variables here in this config file and the import it to the target system (or back into the current one) to fill the values. A couple of notes to consider:

  • If you try to strip out the other stuff in the UserPreference section so that it doesn’t get touched, it’ll just load defaults for that user. You might not want to blow out the user’s settings that way.
  • The reason for trying to do the above bullet item would be if you’re just trying to copy and paste the same block of code for each user.
  • A user in the target system might not be in the User Preferences export – you can create that manually by copying and pasting a different user.

It’s incredibly likely that there’s a better way to do this or some magical option I don’t know of somewhere that’ll take care of it, but I wasn’t aware of it and decided to “brute” force it. The copying/pasting and editing was the “hardest” part as I couldn’t think if this procedure could be reasonably automated in UltraEdit or Notepad++ or something, so I just did it by hand.

Hope this helps someone!

Flipping an ODI model to a different technology and kicking the interfaces

One of my recent ODI projects is a relatively complex transformation job. I am effectively building up a master/detail set of records from a single table. The single table isn’t really a single table in the source, it’s multiple tables. Within ODI I make several passes on it, dialing in the fields with interfaces and procedures. I opted to use the in-memory engine (MEMORY_ENGINE) because I thought the architecture would be a little cleaner, and the amounts of data being pushed through are not huge.

Everything was fine, until I hit a legitimate ODI bug. I actually found a relevant case in Oracle support for it: ODI-1228 “statement is not in batch mode”. There was even a patch! Unfortunately, the patch required a version of ODI higher than what I had available. So on a tight deadline my choices were to push through an ODI upgrade or to find some workaround.

I decided to see if I had a low-cost option of switching from using the memory engine to just using an Oracle schema as a stage (note that the package and interfaces themselves are all just moving data between various Oracle servers, nothing Hyperion related even). So I went into my model for the staging table that I was using, and just switched it from the In-memory Engine to Oracle (using the drop down). No complaints from ODI there.

Next I went into one of the interfaces that was previously setup to have source/staging/target in terms of its whole process. I went straight to the Flow tab but had some issues and fun little NullPointerException errors which is always a fun time. The thing is, I changed a technology on a model being used in various interfaces but it’s not like any part of ODI went into those interfaces to say “Hey, this changed…” – in fact, when you change the technology of a model, ODI helpfully and plainly just says “Hey, this is likely to break stuff. Proceed at your own risk… THAR BE DRAGONS.” Or something like that.

Anyway, I found that I could sort of ‘kick’ the interface when I opened it, by checking the “Staging Area Other Than Target” option, then turning it off again (it was off in most of my interfaces). This forced the interface to sort of recalculate and reset the flow, which took into account the updated technology of the model. There might be a better way to do this rather than this “fuzzy” method, but it worked and I didn’t have to redo the plethora of interfaces in this package.

Hopefully this helps someone else out someday!